New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an optional pipeline attribute to the Learner object #474
Conversation
- Add a custom densification stage for when we are using feature hashing and doing scaling. - Move sampler to the right place in the pipeline steps.
# Conflicts: # skll/learner.py
Okay, this PR is now ready for review @jbiggsets @mulhod @bndgyawali @Lguyogiro. |
Some of the coverage decrease is expected but the rest doesn't make any sense to me. It says that these 2 lines are newly uncovered but we have never had tests for that. |
- No need try and convert already dense arrays to dense.
Okay, that's more like it! This drop is indeed expected - I added a whole bunch of new tests to cover things we had never covered before (e.g., Now, this is really ready for review! @jbiggsets @mulhod @bndgyawali! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple questions and comments. Looks really good.
- Simplify code to use `Reader.for_path(...)` everywhere. - Update note to be much more detailed abiout the densifier vs. turning `sparse` off. - Add code that would be required if there was no pipeline attribute for contrast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good!
This PR addresses 1 major and 2 minor issues.
Issue #451
pipeline
attribute to the Learner object if it was enabled either via a keyword argument to theLearner()
call or as a config option in the Output section of a config file. If enabled, this attribute contains a scikit-learn Pipeline object containing all of the components that were used (vectorizer, selector, scaler, sampler, and the final estimator) that have already been fit as part of the SKLL training process. In some cases, a custom pipeline stage to force sparse feature vectors to dense is needed (see documentation).pipeline
config option.Issue #472
Learner.predict()
where sampling was being applied before the scaling.Issue #473
MultinomialNB
learner since it cannot handle negative feature values.