Add an optional pipeline attribute to the Learner object #474

desilinguist · 2019-02-28T22:43:15Z

This PR addresses 1 major and 2 minor issues.

Issue #451

Add a new pipeline attribute to the Learner object if it was enabled either via a keyword argument to the Learner() call or as a config option in the Output section of a config file. If enabled, this attribute contains a scikit-learn Pipeline object containing all of the components that were used (vectorizer, selector, scaler, sampler, and the final estimator) that have already been fit as part of the SKLL training process. In some cases, a custom pipeline stage to force sparse feature vectors to dense is needed (see documentation).
Add detailed documentation about this attribute along with an example and explanatory note.
Add a comprehensive test.
Update other tests that are affected by the additional pipeline config option.

Issue #472

Fix the bug in Learner.predict() where sampling was being applied before the scaling.

Issue #473

Disallow sampling for MultinomialNB learner since it cannot handle negative feature values.

- Add a custom densification stage for when we are using feature hashing and doing scaling. - Move sampler to the right place in the pipeline steps.

# Conflicts: # skll/learner.py

desilinguist · 2019-03-05T21:09:08Z

Okay, this PR is now ready for review @jbiggsets @mulhod @bndgyawali @Lguyogiro.

coveralls · 2019-03-05T21:10:45Z

Coverage decreased (-0.2%) to 93.951% when pulling a31e830 on add-pipeline-attribute into 5f1a7f6 on master.

desilinguist · 2019-03-05T21:17:48Z

Some of the coverage decrease is expected but the rest doesn't make any sense to me. It says that these 2 lines are newly uncovered but we have never had tests for that.

- No need try and convert already dense arrays to dense.

desilinguist · 2019-03-07T13:32:58Z

Okay, that's more like it! This drop is indeed expected - I added a whole bunch of new tests to cover things we had never covered before (e.g., SkewedChi2Sampler) and discovered some bugs in the process so I guess it was a good thing :)

Now, this is really ready for review! @jbiggsets @mulhod @bndgyawali!

mulhod

I have a couple questions and comments. Looks really good.

doc/run_experiment.rst

skll/learner.py

- Simplify code to use `Reader.for_path(...)` everywhere. - Update note to be much more detailed abiout the densifier vs. turning `sparse` off. - Add code that would be required if there was no pipeline attribute for contrast.

jbiggsets

This looks really good!

desilinguist added 15 commits February 22, 2019 17:02

Add and propagate pipeline output config option.

d4026bf

Propagate pipeline argument to learner

c1b102a

Store pipeline when pipeline option is set

f6a743b

Turn off sparsification in vectorizer stored in pipeline

ca0abf0

Add documentation for pipeline output option

0d4d43f

Tweak docstring

45a55cb

Remove unneeded import and fix typo

3ef42dc

Disallow sampling with MultinomialNB.

a19afce

Move sampling to the right place in predict().

2b613af

Fix pipeline steps

7a7c4ad

- Add a custom densification stage for when we are using feature hashing and doing scaling. - Move sampler to the right place in the pipeline steps.

Add comprehensive test for pipeline attribute.

62af812

Add tests for MultinomialNB

ff2207d

Add tests for pipeline option.

62d2526

Fix all config tests to have the pipeline output.

f0dccbe

Add note in pipeline documentation.

53fa313

desilinguist self-assigned this Feb 28, 2019

desilinguist marked this pull request as ready for review March 5, 2019 19:59

Merge branch 'master' into add-pipeline-attribute

cd5bf73

# Conflicts: # skll/learner.py

desilinguist requested review from mulhod, Lguyogiro and jbiggsets March 5, 2019 20:07

desilinguist added 4 commits March 5, 2019 16:43

Add another test.

6cd62eb

Fix bug in SkewedChi2Sampler handling

1c289f2

- No need try and convert already dense arrays to dense.

Add a Densifier to the pipeline if we are using SkewedChi2Sampler

b5e2d6c

Add SkewedChi2Sampler to the pipeline test

13e5ea1

mulhod requested changes Mar 11, 2019

View reviewed changes

doc/run_experiment.rst Outdated Show resolved Hide resolved

doc/run_experiment.rst Show resolved Hide resolved

skll/learner.py Outdated Show resolved Hide resolved

desilinguist added 2 commits March 11, 2019 17:33

Fix typo and add warnings.

fdecbb4

Simplify code, add non-pipeline code, and update note

a31e830

- Simplify code to use `Reader.for_path(...)` everywhere. - Update note to be much more detailed abiout the densifier vs. turning `sparse` off. - Add code that would be required if there was no pipeline attribute for contrast.

desilinguist requested a review from mulhod March 11, 2019 21:35

mulhod approved these changes Mar 12, 2019

View reviewed changes

desilinguist added this to In progress in SKLL Release v2.5 via automation Mar 12, 2019

desilinguist added this to the 2.0 milestone Mar 12, 2019

jbiggsets approved these changes Mar 13, 2019

View reviewed changes

desilinguist merged commit d28116f into master Mar 13, 2019

SKLL Release v2.5 automation moved this from In progress to Done Mar 13, 2019

desilinguist deleted the add-pipeline-attribute branch March 13, 2019 19:42

This was referenced Mar 13, 2019

MultinomialNB doesn't work with feature sampling #473

Closed

Feature sampling doesn't work when predicting #472

Closed

Add optional pipeline attribute to Learner objects #451

Closed

desilinguist removed this from Done in SKLL Release v2.5 Sep 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an optional pipeline attribute to the Learner object #474

Add an optional pipeline attribute to the Learner object #474

desilinguist commented Feb 28, 2019

desilinguist commented Mar 5, 2019

coveralls commented Mar 5, 2019 •

edited

desilinguist commented Mar 5, 2019

desilinguist commented Mar 7, 2019

mulhod left a comment

jbiggsets left a comment

Add an optional pipeline attribute to the Learner object #474

Add an optional pipeline attribute to the Learner object #474

Conversation

desilinguist commented Feb 28, 2019

Issue #451

Issue #472

Issue #473

desilinguist commented Mar 5, 2019

coveralls commented Mar 5, 2019 • edited

desilinguist commented Mar 5, 2019

desilinguist commented Mar 7, 2019

mulhod left a comment

Choose a reason for hiding this comment

jbiggsets left a comment

Choose a reason for hiding this comment

coveralls commented Mar 5, 2019 •

edited