Pipelines without macros #639

ablaom · 2021-09-17T05:10:08Z

This PR is a step towards replacement of @pipeline with non-macro alternatives (JuliaAI/MLJ.jl#594). It implements new parametric types for pipelines and a new constructor Pipeline, following the suggestions of @CameronBieganek in the cited issue.

The new pipelines offer all the features of @pipeline with the exception of target transformations, which are to be provided with a separate wrapper. Additional features are:

inverse_transform is implemented, where that makes sense (there was an issue raised, which I cannot find)
implement predict for unsupervised pipelines ending in a model that implements predict (clustering) (Cannot call predict on a @pipeline when the last component is a transformer MLJClusteringInterface.jl#10)
option to specify cache=false to suppress data caching at internal nodal machines
Alternative constructor using |>. When two pipelines are concatenated, the result is flat (does not introduce an extra layer of hyper-parameter nesting).

The new doc-string is copied below under "details".

Conflict with existing feature

The overloading of |> conflicts with the "arrow syntax" described here but never actually documented in the manual. It's purpose is to reduce code needed to build learning networks. I have not seen it used anywhere outside of the cited tutorial and would suggest linear pipelines, as a more common use-case, would be a better use. A non-breaking alternative would be to overload the generic associative operator *.

@tlienart may wish to comment.

What do others think?

edit The actual doc-string has minor changes.

Pipeline(component1, component2, ... , componentk; options...)
Pipeline(name1=component1, name2=component2, ..., componentk; options...)
component1 |> component2 |> ... |> componentk

Create an instance of composite model type which sequentially composes
the specified components in order. This means component1 receives
inputs, whose output is passed to component2, and so forth. A
"component" is either a Model instance, a model type (converted
immediately to its default instance) or any callable object.

At most one of the components may be a supervised model, but this
model can appear in any position.

The @pipeline macro accepts key-word options discussed further
below.

Ordinary functions (and other callables) may be inserted in the
pipeline as shown in the following example:

Pipeline(X->coerce(X, :age=>Continuous), OneHotEncoder, ConstantClassifier)

Syntactic sugar

The |> operator is overloaded to construct pipelines out of models,
functions, and existing pipelines:

LinearRegressor = @load LinearRegressor pkg=MLJLinearModels add=true
PCA = @load PCA pkg=MultivariateStats add=true

pipe1 = MLJBase.table |> ContinuousEncoder |> Standardizer
pipe2 = PCA |> LinearRegressor
pipe1 |> pipe2

Special operations

If all the components are invertible unsupervised models
(transformers), then inverse_transform is implemented for the
pipeline. If there are no supervised models, then predict is
nevertheless implemented, assuming the last model (such as KMeans
clustering) also implements it. Similarly, calling transform on a
supervised pipeline calls transform on the supervised component.

Optional key-word arguments

prediction_type -
prediction type of the pipeline; possible values: :deterministic,
:probabilistic, :interval (default=:deterministic if not inferable)
operation - operation applied to the supervised component model,
when present; possible values: predict, predict_mean,
predict_median, predict_mode (default=predict)
cache - whether the internal machines created for component models
should cache model-specific representations of data. See machine.

!!! warning "Set cache=false to guarantee data anonymization"

This precaution applies to composite models, and only to those
implemented using learning networks.

To build more complicated non-branching pipelines, refer to the MLJ
manual sections on composing models.

@CameronBieganek Any depth of review here would be greatly appreciated.

move generate_name! to utilities and add tests update doc string and correct for new usage in pipelines.jl tweak to accomodate functions fix bug in generate_only! more fixes add new pipeline types, public constructor and property access rm extra end in tests

oops

codecov-commenter · 2021-09-17T05:26:12Z

Codecov Report

Merging #639 (fb87a71) into dev (239e73f) will increase coverage by 0.24%.
The diff coverage is 94.79%.

@@            Coverage Diff             @@
##              dev     #639      +/-   ##
==========================================
+ Coverage   85.42%   85.66%   +0.24%     
==========================================
  Files          39       40       +1     
  Lines        3416     3572     +156     
==========================================
+ Hits         2918     3060     +142     
- Misses        498      512      +14

Impacted Files	Coverage Δ
src/MLJBase.jl	`100.00% <ø> (ø)`
src/composition/learning_networks/nodes.jl	`67.80% <0.00%> (-1.16%)`	⬇️
src/composition/models/pipelines2.jl	`94.55% <94.55%> (ø)`
src/composition/learning_networks/inspection.jl	`86.53% <100.00%> (+0.26%)`	⬆️
src/composition/models/pipelines.jl	`98.98% <100.00%> (+0.40%)`	⬆️
src/sources.jl	`88.00% <100.00%> (+3.55%)`	⬆️
src/utilities.jl	`83.82% <100.00%> (+2.15%)`	⬆️
src/composition/learning_networks/arrows.jl	`0.00% <0.00%> (-73.34%)`	⬇️
src/machines.jl	`84.45% <0.00%> (+1.03%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 239e73f...fb87a71. Read the comment docs.

tlienart · 2021-09-17T05:27:03Z

I'm fine with the arrow syntax being dropped in favour of something that people consider more standard and readable :)

CameronBieganek · 2021-09-17T15:27:09Z

Awesome!!! I'm busy this weekend, but I'll try to take a look at this as soon as possible.

CameronBieganek · 2021-09-28T14:49:24Z

@ablaom Not sure how urgent this is. I'm pretty swamped right now, so I won't be able to look at this this week or this weekend. But maybe next week...

ablaom · 2021-09-28T19:34:29Z

@CameronBieganek Thanks for the update!

I think it's important to have a review and you are still my first choice. If you think it's likely more than a month away, let me know.

ablaom · 2021-10-19T20:27:48Z

@CameronBieganek Do you see you getting time for this in the next week and a bit?

CameronBieganek · 2021-10-20T01:57:40Z

@ablaom I think I can actually take a look at this tomorrow evening. I'm putting it on my calendar. :)

CameronBieganek

Looks pretty good! Just a few minor comments here and there.

src/composition/models/pipelines2.jl

src/utilities.jl

test/composition/models/pipelines2.jl

Bunch of minor suggestions of reviewer combined into one commit. Co-authored-by: Cameron Bieganek <8310743+CameronBieganek@users.noreply.github.com>

Add forgotten `@test` before boolean Co-authored-by: Cameron Bieganek <8310743+CameronBieganek@users.noreply.github.com>

ablaom · 2021-10-29T03:29:29Z

Closing in favour of #664

ablaom · 2021-10-29T03:30:01Z

@CameronBieganek Thanks very much indeed for wading through all that code. 🙏🏾

ablaom added 16 commits September 16, 2021 08:55

some decoupling from old pipelines.jl

b21afd0

add special ErrorNode that throws an exception when called

893a56a

add method to create learning network machine for a pipeline

1dd7d17

add operation as type parameter to pipeline structs

e905f3c

add forgotten methods for ErrorNode

183c6b8

add fit method and integration tests

fce89b9

make cache hyperparameter work

c39d7ab

add doc-string

a3c03c8

make a further split of pipeline constructor

59c2074

add syntactic sugar pipe1 |> pipe2 etc

c2ce8cb

oops

add test for calling predict on unsupervised pipeline

c570975

rm conflicting "arrow" syntax for learning networks

162b0f7

fix typo in tests

0a7a3c9

fix doc-string formatting

df770d7

change test to accomodate earlier julia versions

db31dc2

ablaom mentioned this pull request Sep 17, 2021

@from_network does more strange eval stuff JuliaAI/MLJ.jl#703

Closed

ablaom added 7 commits September 19, 2021 10:50

update doc-string

96c5eaf

typo

7cb8d68

get rid of @pipeline reference in doc-string

752021c

further doc-string update

e2612a8

doc-string tweak

f038592

doc-string tweak

84f4337

fix generate_name! doc-string

bc63276

ablaom mentioned this pull request Sep 22, 2021

Add a supervised model wrapper to implement target transformations #642

Closed

ablaom mentioned this pull request Oct 3, 2021

Improvements for unsupervised models that make probabilistic predictions #656

Closed

2 tasks

ablaom added 2 commits October 20, 2021 08:58

Merge branch 'dev' into pipelines2

081a90c

oops

1da951b

CameronBieganek reviewed Oct 21, 2021

View reviewed changes

ablaom and others added 7 commits October 27, 2021 15:51

Apply suggestions from code review

32ead80

Bunch of minor suggestions of reviewer combined into one commit. Co-authored-by: Cameron Bieganek <8310743+CameronBieganek@users.noreply.github.com>

Apply suggestions from code review

bef001d

Add forgotten `@test` before boolean Co-authored-by: Cameron Bieganek <8310743+CameronBieganek@users.noreply.github.com>

docstring change: MyHugeInteger -> MyEvenInteger

fb87a71

"maximize" code coverate in pipelines2

9bd33f2

typo

d73657f

correct a code comment

c7ae7c4

doc-string tweak

7072faa

ablaom closed this Oct 27, 2021

ablaom reopened this Oct 27, 2021

ablaom mentioned this pull request Oct 29, 2021

Macro free pipelines (PR onto for-0-point-19-release) #664

Merged

Merge branch 'for-0-point-19-release' into pipelines2

8fb3091

ablaom closed this Oct 29, 2021

DilumAluthge mentioned this pull request Nov 10, 2021

@pipeline throws LoadError/UndefVarError in Pluto notebook JuliaAI/MLJ.jl#865

Closed

ablaom mentioned this pull request Nov 24, 2021

Pipelines should pass through training losses when appropriate #672

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipelines without macros #639

Pipelines without macros #639

ablaom commented Sep 17, 2021 •

edited

Loading

codecov-commenter commented Sep 17, 2021 •

edited

Loading

tlienart commented Sep 17, 2021

CameronBieganek commented Sep 17, 2021

CameronBieganek commented Sep 28, 2021

ablaom commented Sep 28, 2021

ablaom commented Oct 19, 2021

CameronBieganek commented Oct 20, 2021

CameronBieganek left a comment

ablaom commented Oct 29, 2021

ablaom commented Oct 29, 2021

Pipelines without macros #639

Pipelines without macros #639

Conversation

ablaom commented Sep 17, 2021 • edited Loading

Conflict with existing feature

Syntactic sugar

Special operations

Optional key-word arguments

codecov-commenter commented Sep 17, 2021 • edited Loading

Codecov Report

tlienart commented Sep 17, 2021

CameronBieganek commented Sep 17, 2021

CameronBieganek commented Sep 28, 2021

ablaom commented Sep 28, 2021

ablaom commented Oct 19, 2021

CameronBieganek commented Oct 20, 2021

CameronBieganek left a comment

Choose a reason for hiding this comment

ablaom commented Oct 29, 2021

ablaom commented Oct 29, 2021

ablaom commented Sep 17, 2021 •

edited

Loading

codecov-commenter commented Sep 17, 2021 •

edited

Loading