-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipelines without macros #639
Conversation
move generate_name! to utilities and add tests update doc string and correct for new usage in pipelines.jl tweak to accomodate functions fix bug in generate_only! more fixes add new pipeline types, public constructor and property access rm extra end in tests
Codecov Report
@@ Coverage Diff @@
## dev #639 +/- ##
==========================================
+ Coverage 85.42% 85.66% +0.24%
==========================================
Files 39 40 +1
Lines 3416 3572 +156
==========================================
+ Hits 2918 3060 +142
- Misses 498 512 +14
Continue to review full report at Codecov.
|
I'm fine with the arrow syntax being dropped in favour of something that people consider more standard and readable :) |
Awesome!!! I'm busy this weekend, but I'll try to take a look at this as soon as possible. |
@ablaom Not sure how urgent this is. I'm pretty swamped right now, so I won't be able to look at this this week or this weekend. But maybe next week... |
@CameronBieganek Thanks for the update! I think it's important to have a review and you are still my first choice. If you think it's likely more than a month away, let me know. |
@CameronBieganek Do you see you getting time for this in the next week and a bit? |
@ablaom I think I can actually take a look at this tomorrow evening. I'm putting it on my calendar. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good! Just a few minor comments here and there.
Bunch of minor suggestions of reviewer combined into one commit. Co-authored-by: Cameron Bieganek <8310743+CameronBieganek@users.noreply.github.com>
Add forgotten `@test` before boolean Co-authored-by: Cameron Bieganek <8310743+CameronBieganek@users.noreply.github.com>
Closing in favour of #664 |
@CameronBieganek Thanks very much indeed for wading through all that code. 🙏🏾 |
This PR is a step towards replacement of
@pipeline
with non-macro alternatives (JuliaAI/MLJ.jl#594). It implements new parametric types for pipelines and a new constructorPipeline
, following the suggestions of @CameronBieganek in the cited issue.The new pipelines offer all the features of
@pipeline
with the exception of target transformations, which are to be provided with a separate wrapper. Additional features are:inverse_transform
is implemented, where that makes sense (there was an issue raised, which I cannot find)predict
for unsupervised pipelines ending in a model that implementspredict
(clustering) (Cannot call predict on a @pipeline when the last component is a transformer MLJClusteringInterface.jl#10)cache=false
to suppress data caching at internal nodal machines|>
. When two pipelines are concatenated, the result is flat (does not introduce an extra layer of hyper-parameter nesting).The new doc-string is copied below under "details".
Conflict with existing feature
The overloading of
|>
conflicts with the "arrow syntax" described here but never actually documented in the manual. It's purpose is to reduce code needed to build learning networks. I have not seen it used anywhere outside of the cited tutorial and would suggest linear pipelines, as a more common use-case, would be a better use. A non-breaking alternative would be to overload the generic associative operator*
.@tlienart may wish to comment.
What do others think?
edit The actual doc-string has minor changes.
Create an instance of composite model type which sequentially composes
the specified components in order. This means
component1
receivesinputs, whose output is passed to
component2
, and so forth. A"component" is either a
Model
instance, a model type (convertedimmediately to its default instance) or any callable object.
At most one of the components may be a supervised model, but this
model can appear in any position.
The
@pipeline
macro accepts key-wordoptions
discussed furtherbelow.
Ordinary functions (and other callables) may be inserted in the
pipeline as shown in the following example:
Syntactic sugar
The
|>
operator is overloaded to construct pipelines out of models,functions, and existing pipelines:
Special operations
If all the
components
are invertible unsupervised models(transformers), then
inverse_transform
is implemented for thepipeline. If there are no supervised models, then
predict
isnevertheless implemented, assuming the last model (such as
KMeans
clustering) also implements it. Similarly, calling
transform
on asupervised pipeline calls
transform
on the supervised component.Optional key-word arguments
prediction_type
-prediction type of the pipeline; possible values:
:deterministic
,:probabilistic
,:interval
(default=:deterministic
if not inferable)operation
- operation applied to the supervised component model,when present; possible values:
predict
,predict_mean
,predict_median
,predict_mode
(default=predict
)cache
- whether the internal machines created for component modelsshould cache model-specific representations of data. See
machine
.!!! warning "Set
cache=false
to guarantee data anonymization"To build more complicated non-branching pipelines, refer to the MLJ
manual sections on composing models.
@CameronBieganek Any depth of review here would be greatly appreciated.