Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds util functions needed for dynamic prepocessing pipelines #852

Merged
merged 44 commits into from
Jun 18, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jun 15, 2020

Closes #843

Q's:

  • We could do a lot more checks, such as checking target values and determining which pipeline to generate automatically, or at the very least, having some sort of check that everything makes sense. Yet this seems to mirror our checks elsewhere, should we do it here too?

@codecov
Copy link

codecov bot commented Jun 15, 2020

Codecov Report

Merging #852 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master     #852    +/-   ##
========================================
  Coverage   99.70%   99.70%            
========================================
  Files         195      195            
  Lines        8000     8181   +181     
========================================
+ Hits         7976     8157   +181     
  Misses         24       24            
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.00% <ø> (ø)
evalml/pipelines/utils.py 100.00% <100.00%> (ø)
evalml/tests/component_tests/test_components.py 100.00% <100.00%> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 99.80% <100.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d220dca...e306216. Read the comment docs.

@angela97lin angela97lin self-assigned this Jun 16, 2020
@angela97lin angela97lin marked this pull request as ready for review June 16, 2020 18:59
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exciting! 🎊 great stuff. Left some suggestions and docs tweaks, but nothing blocking merge IMO!

We should resolve the convo about adding problem_type to get_preprocessing_pipeline sometime soon though (before the release).

Ideas for more unit test coverage:

  • Check input and output dtypes--does make_pipeline work with np.array as input? pd.DataFrame with no column names?
  • Validate that we throw if the provided estimator isn't an Estimator subclass -- I think you already have this

That's all I could think of right now!

Food for thought for the future: we will likely end up supporting different versions of the preprocessing functions. Some may even be in external modules. We don't need to worry about this in this PR, but we should consider: what could we do to make backwards compatibility easy? One idea: add a "preprocessing_version" string input, which we could use to map to the impl. Another idea: keeping get_preprocessing_pipeline private could help us have more flexibility.

evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/tests/pipeline_tests/test_pipelines.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Show resolved Hide resolved
Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks solid to me

evalml/pipelines/utils.py Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
@kmax12
Copy link
Contributor

kmax12 commented Jun 17, 2020

Another idea: keeping get_preprocessing_pipeline private could help us have more flexibility.

I agree with @dsherry here. we can make this public later if we chooce. especially since we have make_pipeline(X, y, estimator, problem_type), which is cool to expose

Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin thanks for re-requesting. I left a comment in response on all_estimators. Once that's resolved LGTM

evalml/pipelines/__init__.py Show resolved Hide resolved
evalml/pipelines/utils.py Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/tests/pipeline_tests/test_pipelines.py Outdated Show resolved Hide resolved
evalml/tests/pipeline_tests/test_pipelines.py Outdated Show resolved Hide resolved
@angela97lin angela97lin merged commit 2a56753 into master Jun 18, 2020
@angela97lin angela97lin mentioned this pull request Jun 30, 2020
@angela97lin angela97lin deleted the 843_preprocessing_utils branch September 24, 2020 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add new utility functions necessary for generating dynamic preprocessing pipelines
3 participants