Update pipelines and `make_pipelines` to accept Woodwork DataTables #1393

angela97lin · 2020-11-02T18:08:05Z

Addresses of #1288 and #1367 to update pipeline classes and make_pipelines to handle Woodwork data types. Components will be handled in a separate PR.

codecov · 2020-11-02T18:13:39Z

Codecov Report

Merging #1393 (af6bc93) into main (554102c) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##             main    #1393     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         214      214             
  Lines       14040    14107     +67     
=========================================
+ Hits        14033    14100     +67     
  Misses          7        7

Impacted Files	Coverage Δ
evalml/pipelines/binary_classification_pipeline.py	`100.0% <ø> (ø)`
evalml/automl/automl_search.py	`99.7% <100.0%> (-<0.1%)`	⬇️
evalml/pipelines/classification_pipeline.py	`100.0% <100.0%> (ø)`
evalml/pipelines/pipeline_base.py	`100.0% <100.0%> (ø)`
evalml/pipelines/regression_pipeline.py	`100.0% <100.0%> (ø)`
evalml/pipelines/utils.py	`100.0% <100.0%> (ø)`
evalml/tests/automl_tests/test_automl.py	`100.0% <100.0%> (ø)`
...assification_pipeline_tests/test_classification.py	`100.0% <100.0%> (ø)`
...tests/regression_pipeline_tests/test_regression.py	`100.0% <100.0%> (ø)`
evalml/tests/pipeline_tests/test_pipelines.py	`100.0% <100.0%> (ø)`
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 554102c...af6bc93. Read the comment docs.

…lml into 1288_ww_pipelines_components

evalml/pipelines/pipeline_base.py

evalml/pipelines/utils.py

angela97lin · 2020-11-04T17:22:10Z

Note: we want warnings for users who pass in non-Woodwork data structures. Will either update this PR or put up one shortly after (depending on where reviews are) to address this.

EDIT: Per discussion with @dsherry For now, it is okay to just warn when users use AutoML, and not for individual pipelines/components. This aligns with our methodology that AutoML is smart, pipelines/components are not. Later, when we tackle #1289 (passing Woodwork data structures directly to pipelines), we could think about adding in warnings for pipelines/components as we will trigger the warning only once in AutoMLSearch, but there needs to be some work in Woodwork to make that a viable thing for EvalML.

freddyaboulton

@angela97lin I think this looks good! I am interested in your thoughts on whether we should convert back to pandas in make_pipeline!

evalml/tests/pipeline_tests/classification_pipeline_tests/test_classification.py

freddyaboulton · 2020-11-05T00:06:04Z

evalml/tests/pipeline_tests/regression_pipeline_tests/test_regression.py

 def test_invalid_targets_regression_pipeline(target_type, dummy_regression_pipeline_class):
    X, y = load_wine()
-    if target_type == "categorical":
-        y = pd.Categorical(y)
+    if target_type == "category":


How come we need this change?

@freddyaboulton "category" is the woodwork "semantic tag" applied to indicate a feature is categorical. So, this change is necessary because we're using woodwork's feature types here instead of pandas data types.

Its confusing because "category" is also the physical type used by pandas, hence the usage on line 11 below.

dsherry

@angela97lin amazing!

Blocking:

delete pdb import from pipeline base
Resolve comment in make_pipelines about not needing _convert_woodwork_types_wrapper
Resolve @freddyaboulton comment about test_woodwork_classification_pipeline

docs/source/user_guide/objectives.ipynb

evalml/automl/automl_search.py

evalml/pipelines/binary_classification_pipeline.py

evalml/pipelines/classification_pipeline.py

dsherry · 2020-11-06T21:56:15Z

evalml/tests/pipeline_tests/regression_pipeline_tests/test_regression.py

 def test_invalid_targets_regression_pipeline(target_type, dummy_regression_pipeline_class):
    X, y = load_wine()
-    if target_type == "categorical":
-        y = pd.Categorical(y)
+    if target_type == "category":


Its confusing because "category" is also the physical type used by pandas, hence the usage on line 11 below.

evalml/tests/pipeline_tests/test_pipelines.py

evalml/utils/gen_utils.py

angela97lin added 2 commits October 31, 2020 17:05

init

168ef21

Merge branch 'main' into 1288_ww_pipelines_components

5becd73

angela97lin changed the title ~~Update pipelines and components to use Woodwork DataTables~~ Update pipelines and components to accept Woodwork DataTables Nov 2, 2020

angela97lin added 2 commits November 2, 2020 13:51

release notes and move to gen_utils

a08c772

add and fix tests for pipelines

d14404f

angela97lin self-assigned this Nov 2, 2020

angela97lin added this to the November 2020 milestone Nov 2, 2020

angela97lin added 6 commits November 2, 2020 15:58

update impl and tests

f1f258e

update notebook to add datetime featurization

ff017ac

update more notebooks

0a99beb

remove unnecessary line'

095eece

cleanup

7fe62a7

Merge branch 'main' into 1288_ww_pipelines_components

59a556a

angela97lin changed the title ~~Update pipelines and components to accept Woodwork DataTables~~ Update pipelines and make_pipelines to accept Woodwork DataTables Nov 3, 2020

angela97lin added 8 commits November 3, 2020 13:46

clean up and update make_pipeline_tests

e31610b

Merge branch '1288_ww_pipelines_components' of github.com:alteryx/eva…

b01fff8

…lml into 1288_ww_pipelines_components

remove unnecessary line

80c42bb

remove more unnecessary lines

ab956d2

linting

96ab214

fix notebook metadata

4a58589

combine two conditionals

e3b0b3c

cleanup

f4c2cc2

angela97lin marked this pull request as ready for review November 3, 2020 20:43

angela97lin requested review from dsherry, freddyaboulton, eccabay and bchen1116 and removed request for dsherry November 3, 2020 20:43

angela97lin requested review from christopherbunn and jeremyliweishih November 3, 2020 20:44

Merge branch 'main' into 1288_ww_pipelines_components

32aea1e

eccabay reviewed Nov 4, 2020

View reviewed changes

evalml/pipelines/pipeline_base.py Outdated Show resolved Hide resolved

evalml/pipelines/utils.py Outdated Show resolved Hide resolved

angela97lin mentioned this pull request Nov 4, 2020

Update pipeline and components to return Woodwork data structures #1406

Closed

Merge branch 'main' into 1288_ww_pipelines_components

867b361

angela97lin marked this pull request as draft November 4, 2020 22:27

angela97lin marked this pull request as ready for review November 4, 2020 23:14

Merge branch 'main' into 1288_ww_pipelines_components

c03e299

freddyaboulton approved these changes Nov 5, 2020

View reviewed changes

dsherry approved these changes Nov 6, 2020

View reviewed changes

angela97lin added 5 commits November 9, 2020 00:20

Merge branch 'main' into 1288_ww_pipelines_components

9fc8e58

clean up from PR comments

1fda734

mergine

b4dda5c

clean up some docstrings

3e09564

Merge branch 'main' into 1288_ww_pipelines_components

af6bc93

angela97lin merged commit d78d1f2 into main Nov 9, 2020

angela97lin deleted the 1288_ww_pipelines_components branch November 9, 2020 17:37

dsherry mentioned this pull request Nov 10, 2020

Timeseries regression pipeline #1418

Merged

angela97lin mentioned this pull request Nov 11, 2020

Automl fails with custom pipelines if data contains datetime feature(s) #1367

Closed

dsherry mentioned this pull request Nov 24, 2020

Release v0.16.0 #1468

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update pipelines and `make_pipelines` to accept Woodwork DataTables #1393

Update pipelines and `make_pipelines` to accept Woodwork DataTables #1393

angela97lin commented Nov 2, 2020 •

edited

Loading

codecov bot commented Nov 2, 2020 •

edited

Loading

angela97lin commented Nov 4, 2020 •

edited

Loading

freddyaboulton left a comment

freddyaboulton Nov 5, 2020

dsherry Nov 6, 2020

dsherry Nov 6, 2020

dsherry left a comment

dsherry Nov 6, 2020

Update pipelines and make_pipelines to accept Woodwork DataTables #1393

Update pipelines and make_pipelines to accept Woodwork DataTables #1393

Conversation

angela97lin commented Nov 2, 2020 • edited Loading

codecov bot commented Nov 2, 2020 • edited Loading

Codecov Report

angela97lin commented Nov 4, 2020 • edited Loading

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Nov 5, 2020

Choose a reason for hiding this comment

dsherry Nov 6, 2020

Choose a reason for hiding this comment

dsherry Nov 6, 2020

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

dsherry Nov 6, 2020

Choose a reason for hiding this comment

Update pipelines and `make_pipelines` to accept Woodwork DataTables #1393

Update pipelines and `make_pipelines` to accept Woodwork DataTables #1393

angela97lin commented Nov 2, 2020 •

edited

Loading

codecov bot commented Nov 2, 2020 •

edited

Loading

angela97lin commented Nov 4, 2020 •

edited

Loading