Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an optional pipeline attribute to the Learner object #474

Merged
merged 22 commits into from Mar 13, 2019

Conversation

desilinguist
Copy link
Member

This PR addresses 1 major and 2 minor issues.

Issue #451

  • Add a new pipeline attribute to the Learner object if it was enabled either via a keyword argument to the Learner() call or as a config option in the Output section of a config file. If enabled, this attribute contains a scikit-learn Pipeline object containing all of the components that were used (vectorizer, selector, scaler, sampler, and the final estimator) that have already been fit as part of the SKLL training process. In some cases, a custom pipeline stage to force sparse feature vectors to dense is needed (see documentation).
  • Add detailed documentation about this attribute along with an example and explanatory note.
  • Add a comprehensive test.
  • Update other tests that are affected by the additional pipeline config option.

Issue #472

  • Fix the bug in Learner.predict() where sampling was being applied before the scaling.

Issue #473

  • Disallow sampling for MultinomialNB learner since it cannot handle negative feature values.

@desilinguist desilinguist self-assigned this Feb 28, 2019
@desilinguist desilinguist marked this pull request as ready for review March 5, 2019 19:59
@desilinguist
Copy link
Member Author

Okay, this PR is now ready for review @jbiggsets @mulhod @bndgyawali @Lguyogiro.

@coveralls
Copy link

coveralls commented Mar 5, 2019

Coverage Status

Coverage decreased (-0.2%) to 93.951% when pulling a31e830 on add-pipeline-attribute into 5f1a7f6 on master.

@desilinguist
Copy link
Member Author

Some of the coverage decrease is expected but the rest doesn't make any sense to me. It says that these 2 lines are newly uncovered but we have never had tests for that.

@desilinguist
Copy link
Member Author

Okay, that's more like it! This drop is indeed expected - I added a whole bunch of new tests to cover things we had never covered before (e.g., SkewedChi2Sampler) and discovered some bugs in the process so I guess it was a good thing :)

Now, this is really ready for review! @jbiggsets @mulhod @bndgyawali!

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple questions and comments. Looks really good.

doc/run_experiment.rst Outdated Show resolved Hide resolved
doc/run_experiment.rst Show resolved Hide resolved
skll/learner.py Outdated Show resolved Hide resolved
- Simplify code to use `Reader.for_path(...)` everywhere.
- Update note to be much more detailed abiout the densifier vs. turning `sparse` off.
- Add code that would be required if there was no pipeline attribute for contrast.
@desilinguist desilinguist added this to In progress in SKLL Release v2.5 via automation Mar 12, 2019
@desilinguist desilinguist added this to the 2.0 milestone Mar 12, 2019
Copy link
Collaborator

@jbiggsets jbiggsets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good!

@desilinguist desilinguist merged commit d28116f into master Mar 13, 2019
SKLL Release v2.5 automation moved this from In progress to Done Mar 13, 2019
@desilinguist desilinguist deleted the add-pipeline-attribute branch March 13, 2019 19:42
@desilinguist desilinguist removed this from Done in SKLL Release v2.5 Sep 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants