-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Break up pipeline.run into smaller methods #93
Milestone
Comments
thcrock
changed the title
Refactor based on codeclimate issues
Break up pipeline.run into smaller methods
Apr 18, 2017
thcrock
added a commit
that referenced
this issue
Apr 18, 2017
…113, #114] The original goal here was to implement a consistent 'replace' kwarg throughout the pipeline, so it can be restarted later more easily. A few other things made it in here. In general they made this change easier, so they aren't totally unrelated, but are different enough that they are helped by being called out. And some are simple performance updates. Directly related [#99]: - Create pipeline test that makes sure skip-if-present functionality works, rename existing generic pipeline test for clarity - Add retrieve from database functionality to Predictor class that ensures order is the same as the passed-in matrix, call if replace kwarg is False [#112] - Add replace kwarg to FeatureGenerator constructor that will skip creating table tasks if the feature table exists - Flip ModelTrainer replace default kwarg to True for conformity - Add replace kwarg to PipelineBase to pass through to all components Tangentially or not related: - Add sqlalchemy-postgres-copy, use in predictions writing for speed upgrade [#111] - Remove model.predict() [#45] - Move much functionality in pipelines to smaller methods, to allow both code reuse between them and inspection before running lengthy tasks [Resolves #93] - Use new matrix_store.empty instead of matrix_store.matrix.empty to allow the empty check without loading the entire matrix into memory [#113] - Use matrix_store internally in Predictor to reduce number of items that need to be passed around - Create trained models directory if it isn't there - Close DB sessions properly [#114]
jesteria
pushed a commit
that referenced
this issue
Nov 28, 2017
* Update sqlalchemy from 1.1.9 to 1.1.11 * Update sqlalchemy from 1.1.9 to 1.1.11 * Update sphinx from 1.5.6 to 1.6.3 * Update cryptography from 1.8.1 to 1.9 * Update pytest from 3.0.7 to 3.1.3
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Codeclimate has identified some real issues with duplication and complexity. The two pipeline.run methods are very big offenders here. Creating some new methods here to hide details and remove duplication should help.
https://codeclimate.com/github/dssg/triage/issues
This also helps with more granular usage, and inspection before any intensive methods are called.
The text was updated successfully, but these errors were encountered: