Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break up pipeline.run into smaller methods #93

Closed
thcrock opened this issue Apr 6, 2017 · 0 comments
Closed

Break up pipeline.run into smaller methods #93

thcrock opened this issue Apr 6, 2017 · 0 comments
Milestone

Comments

@thcrock
Copy link
Contributor

thcrock commented Apr 6, 2017

Codeclimate has identified some real issues with duplication and complexity. The two pipeline.run methods are very big offenders here. Creating some new methods here to hide details and remove duplication should help.

https://codeclimate.com/github/dssg/triage/issues

This also helps with more granular usage, and inspection before any intensive methods are called.

@thcrock thcrock added this to the v0.4 milestone Apr 18, 2017
@thcrock thcrock changed the title Refactor based on codeclimate issues Break up pipeline.run into smaller methods Apr 18, 2017
thcrock added a commit that referenced this issue Apr 18, 2017
…113, #114]

The original goal here was to implement a consistent 'replace' kwarg throughout the pipeline, so it can be restarted later more easily. A few other things made it in here. In general they made this change easier, so they aren't totally unrelated, but are different enough that they are helped by being called out. And some are simple performance updates.

Directly related [#99]:
- Create pipeline test that makes sure skip-if-present functionality works, rename existing generic pipeline test for clarity
- Add retrieve from database functionality to Predictor class that ensures order is the same as the passed-in matrix, call if replace kwarg is False [#112]
- Add replace kwarg to FeatureGenerator constructor that will skip creating table tasks if the feature table exists
- Flip ModelTrainer replace default kwarg to True for conformity
- Add replace kwarg to PipelineBase to pass through to all components

Tangentially or not related:
- Add sqlalchemy-postgres-copy, use in predictions writing for speed upgrade [#111]
- Remove model.predict() [#45]
- Move much functionality in pipelines to smaller methods, to allow both code reuse between them and inspection before running lengthy tasks [Resolves #93]
- Use new matrix_store.empty instead of matrix_store.matrix.empty to allow the empty check without loading the entire matrix into memory [#113]
- Use matrix_store internally in Predictor to reduce number of items that need to be passed around
- Create trained models directory if it isn't there
- Close DB sessions properly [#114]
ecsalomon added a commit that referenced this issue Apr 18, 2017
Workflow and performance updates [Resolves #45, #93, #99, #111, #112,…
jesteria pushed a commit that referenced this issue Nov 28, 2017
* Update sqlalchemy from 1.1.9 to 1.1.11

* Update sqlalchemy from 1.1.9 to 1.1.11

* Update sphinx from 1.5.6 to 1.6.3

* Update cryptography from 1.8.1 to 1.9

* Update pytest from 3.0.7 to 3.1.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant