Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AutoML to pass Woodwork DataTables to every pipeline/component (instead of pandas DataFrames) #1450

Merged
merged 22 commits into from
Nov 23, 2020

Conversation

angela97lin
Copy link
Contributor

Closes #1289

@angela97lin angela97lin added this to the November 2020 milestone Nov 20, 2020
@angela97lin angela97lin self-assigned this Nov 20, 2020
@codecov
Copy link

codecov bot commented Nov 20, 2020

Codecov Report

Merging #1450 (788d7b9) into main (e0f65c1) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1450     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         222      222             
  Lines       14813    14891     +78     
=========================================
+ Hits        14806    14884     +78     
  Misses          7        7             
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.7% <100.0%> (+0.1%) ⬆️
evalml/objectives/fraud_cost.py 100.0% <100.0%> (ø)
evalml/objectives/lead_scoring.py 100.0% <100.0%> (ø)
evalml/objectives/objective_base.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 100.0% <100.0%> (ø)
...alml/tests/objective_tests/test_fraud_detection.py 100.0% <100.0%> (ø)
evalml/tests/objective_tests/test_lead_scoring.py 100.0% <100.0%> (ø)
...lml/tests/objective_tests/test_standard_metrics.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0f65c1...788d7b9. Read the comment docs.

self._set_data_split(X)

data_checks = self._validate_data_checks(data_checks)
self._data_check_results = data_checks.validate(X, y)
self._data_check_results = data_checks.validate(_convert_woodwork_types_wrapper(X.to_dataframe()), _convert_woodwork_types_wrapper(y.to_series()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data checks currently don't support Woodwork types (to do in #1292 or later PR), so passing pandas to data checks.


if not isinstance(y_true, pd.Series):
y_true = pd.Series(y_true)
y_true = self._standardize_input_type(y_true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating to reuse our helper method (which handles DataColumns too)

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin I think this is great! I noticed none of out automl tests use ww inputs. Not sure if the right thing to do is update all of our tests to use ww, add some ww-specific unit tests, or both but I think we should do at least one before merge lol. In particular, I think we need to be careful that the user-defined types are preserved throughout.

evalml/automl/automl_search.py Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved
evalml/objectives/fraud_cost.py Outdated Show resolved Hide resolved
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin exciting! Great stuff :)

I left some questions/comments, and one impl suggestion in ObjectiveBase

evalml/objectives/fraud_cost.py Outdated Show resolved Hide resolved
evalml/objectives/objective_base.py Outdated Show resolved Hide resolved
evalml/objectives/objective_base.py Outdated Show resolved Hide resolved
evalml/objectives/fraud_cost.py Show resolved Hide resolved
evalml/objectives/objective_base.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved
@angela97lin angela97lin merged commit 2ddca58 into main Nov 23, 2020
@angela97lin angela97lin deleted the 1289_ww_automl branch November 23, 2020 21:22
@dsherry dsherry mentioned this pull request Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update AutoML to pass Woodwork DataTables to every pipeline/component (instead of pandas DataFrames)
4 participants