Update AutoML to pass Woodwork DataTables to every pipeline/component (instead of pandas DataFrames) #1450

angela97lin · 2020-11-20T04:51:53Z

codecov · 2020-11-20T05:13:28Z

Codecov Report

Merging #1450 (788d7b9) into main (e0f65c1) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##             main    #1450     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         222      222             
  Lines       14813    14891     +78     
=========================================
+ Hits        14806    14884     +78     
  Misses          7        7

Impacted Files	Coverage Δ
evalml/automl/automl_search.py	`99.7% <100.0%> (+0.1%)`	⬆️
evalml/objectives/fraud_cost.py	`100.0% <100.0%> (ø)`
evalml/objectives/lead_scoring.py	`100.0% <100.0%> (ø)`
evalml/objectives/objective_base.py	`100.0% <100.0%> (ø)`
evalml/pipelines/utils.py	`100.0% <100.0%> (ø)`
evalml/tests/automl_tests/test_automl.py	`100.0% <100.0%> (ø)`
...alml/tests/objective_tests/test_fraud_detection.py	`100.0% <100.0%> (ø)`
evalml/tests/objective_tests/test_lead_scoring.py	`100.0% <100.0%> (ø)`
...lml/tests/objective_tests/test_standard_metrics.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0f65c1...788d7b9. Read the comment docs.

angela97lin · 2020-11-20T20:28:44Z

evalml/automl/automl_search.py

        self._set_data_split(X)

        data_checks = self._validate_data_checks(data_checks)
-        self._data_check_results = data_checks.validate(X, y)
+        self._data_check_results = data_checks.validate(_convert_woodwork_types_wrapper(X.to_dataframe()), _convert_woodwork_types_wrapper(y.to_series()))


Data checks currently don't support Woodwork types (to do in #1292 or later PR), so passing pandas to data checks.

evalml/automl/automl_search.py

evalml/tests/objective_tests/test_fraud_detection.py

evalml/tests/automl_tests/test_automl.py

evalml/objectives/objective_base.py

angela97lin · 2020-11-20T21:43:29Z

evalml/objectives/lead_scoring.py

-
-        if not isinstance(y_true, pd.Series):
-            y_true = pd.Series(y_true)
+        y_true = self._standardize_input_type(y_true)


Updating to reuse our helper method (which handles DataColumns too)

freddyaboulton

@angela97lin I think this is great! I noticed none of out automl tests use ww inputs. Not sure if the right thing to do is update all of our tests to use ww, add some ww-specific unit tests, or both but I think we should do at least one before merge lol. In particular, I think we need to be careful that the user-defined types are preserved throughout.

evalml/automl/automl_search.py

evalml/tests/automl_tests/test_automl.py

evalml/objectives/fraud_cost.py

evalml/tests/automl_tests/test_automl.py

dsherry

@angela97lin exciting! Great stuff :)

I left some questions/comments, and one impl suggestion in ObjectiveBase

evalml/objectives/fraud_cost.py

evalml/tests/objective_tests/test_fraud_detection.py

evalml/objectives/objective_base.py

evalml/objectives/fraud_cost.py

evalml/objectives/objective_base.py

evalml/tests/automl_tests/test_automl.py

evalml/tests/objective_tests/test_standard_metrics.py

angela97lin added 5 commits November 19, 2020 18:39

init

8fe30ac

fix tests

8e205fb

oops, remove raise

6cf142b

comment out validate that expects only 1d data

e416c0e

remove raise again

df28c6f

angela97lin added this to the November 2020 milestone Nov 20, 2020

angela97lin self-assigned this Nov 20, 2020

angela97lin added 2 commits November 19, 2020 23:53

Merge branch 'main' into 1289_ww_automl

acff717

fix tests

e7ea594

angela97lin added 6 commits November 20, 2020 12:27

clean up fraud

1ce2b85

clean up fraud

8de87ed

remove all unnecessary code

16195d6

fix check

c8fab6a

fix metrics codecov

65e3540

more codecov

5a48cd9