Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pipelines and make_pipelines to accept Woodwork DataTables #1393

Merged
merged 26 commits into from
Nov 9, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Nov 2, 2020

Addresses of #1288 and #1367 to update pipeline classes and make_pipelines to handle Woodwork data types. Components will be handled in a separate PR.

@angela97lin angela97lin changed the title Update pipelines and components to use Woodwork DataTables Update pipelines and components to accept Woodwork DataTables Nov 2, 2020
@codecov
Copy link

codecov bot commented Nov 2, 2020

Codecov Report

Merging #1393 (af6bc93) into main (554102c) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1393     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         214      214             
  Lines       14040    14107     +67     
=========================================
+ Hits        14033    14100     +67     
  Misses          7        7             
Impacted Files Coverage Δ
evalml/pipelines/binary_classification_pipeline.py 100.0% <ø> (ø)
evalml/automl/automl_search.py 99.7% <100.0%> (-<0.1%) ⬇️
evalml/pipelines/classification_pipeline.py 100.0% <100.0%> (ø)
evalml/pipelines/pipeline_base.py 100.0% <100.0%> (ø)
evalml/pipelines/regression_pipeline.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 100.0% <100.0%> (ø)
...assification_pipeline_tests/test_classification.py 100.0% <100.0%> (ø)
...tests/regression_pipeline_tests/test_regression.py 100.0% <100.0%> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 100.0% <100.0%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 554102c...af6bc93. Read the comment docs.

@angela97lin angela97lin self-assigned this Nov 2, 2020
@angela97lin angela97lin added this to the November 2020 milestone Nov 2, 2020
@angela97lin angela97lin changed the title Update pipelines and components to accept Woodwork DataTables Update pipelines and make_pipelines to accept Woodwork DataTables Nov 3, 2020
@angela97lin angela97lin marked this pull request as ready for review November 3, 2020 20:43
@angela97lin angela97lin requested review from dsherry, freddyaboulton, eccabay and bchen1116 and removed request for dsherry November 3, 2020 20:43
evalml/pipelines/pipeline_base.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
@angela97lin
Copy link
Contributor Author

angela97lin commented Nov 4, 2020

Note: we want warnings for users who pass in non-Woodwork data structures. Will either update this PR or put up one shortly after (depending on where reviews are) to address this.

EDIT: Per discussion with @dsherry For now, it is okay to just warn when users use AutoML, and not for individual pipelines/components. This aligns with our methodology that AutoML is smart, pipelines/components are not. Later, when we tackle #1289 (passing Woodwork data structures directly to pipelines), we could think about adding in warnings for pipelines/components as we will trigger the warning only once in AutoMLSearch, but there needs to be some work in Woodwork to make that a viable thing for EvalML.

@angela97lin angela97lin marked this pull request as draft November 4, 2020 22:27
@angela97lin angela97lin marked this pull request as ready for review November 4, 2020 23:14
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin I think this looks good! I am interested in your thoughts on whether we should convert back to pandas in make_pipeline!

def test_invalid_targets_regression_pipeline(target_type, dummy_regression_pipeline_class):
X, y = load_wine()
if target_type == "categorical":
y = pd.Categorical(y)
if target_type == "category":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we need this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton "category" is the woodwork "semantic tag" applied to indicate a feature is categorical. So, this change is necessary because we're using woodwork's feature types here instead of pandas data types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its confusing because "category" is also the physical type used by pandas, hence the usage on line 11 below.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin amazing!

Blocking:

  • delete pdb import from pipeline base
  • Resolve comment in make_pipelines about not needing _convert_woodwork_types_wrapper
  • Resolve @freddyaboulton comment about test_woodwork_classification_pipeline

docs/source/user_guide/objectives.ipynb Show resolved Hide resolved
evalml/automl/automl_search.py Show resolved Hide resolved
evalml/pipelines/classification_pipeline.py Show resolved Hide resolved
evalml/pipelines/classification_pipeline.py Show resolved Hide resolved
def test_invalid_targets_regression_pipeline(target_type, dummy_regression_pipeline_class):
X, y = load_wine()
if target_type == "categorical":
y = pd.Categorical(y)
if target_type == "category":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its confusing because "category" is also the physical type used by pandas, hence the usage on line 11 below.

evalml/tests/pipeline_tests/test_pipelines.py Show resolved Hide resolved
evalml/tests/pipeline_tests/test_pipelines.py Show resolved Hide resolved
evalml/tests/pipeline_tests/test_pipelines.py Show resolved Hide resolved
evalml/utils/gen_utils.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants