Update evalml from selecting using pandas dtypes to selecting using Woodwork logical types#1551
Conversation
…o ange_stacked_circleci
Codecov Report
@@ Coverage Diff @@
## main #1551 +/- ##
=======================================
Coverage 100.0% 100.0%
=======================================
Files 240 240
Lines 18270 18271 +1
=======================================
+ Hits 18262 18263 +1
Misses 8 8
Continue to review full report at Codecov.
|
dsherry
left a comment
There was a problem hiding this comment.
@angela97lin looks great! I left a few questions/suggestions
|
@dsherry I went through and addressed your comments! I deleted the categorical_ww_type and others in place of just selecting the correct logical type/semantic tag, but I kept |
dsherry
left a comment
There was a problem hiding this comment.
@angela97lin looks great! Sorry I didn't get to this before merge. I left some questions.
| X = _convert_woodwork_types_wrapper(X.to_dataframe()) | ||
| if y.logical_type not in numeric_and_boolean_ww: | ||
| return messages | ||
| X_num = X.select(include=numeric_and_boolean_ww) |
There was a problem hiding this comment.
Is there a reason to include boolean too? Perhaps this should just be X.select('numeric')
There was a problem hiding this comment.
I guess this should just be based on whether or not we want to / it makes sense to check for label leakage if the target is boolean :)
| top_n = self.parameters['top_n'] | ||
| X = _convert_to_woodwork_structure(X) | ||
| if self.features_to_encode is None: | ||
| self.features_to_encode = self._get_cat_cols(X) |
There was a problem hiding this comment.
Did moving this code change any behavior? Or was it just cleaner this way, but functionally equivalent?
There was a problem hiding this comment.
It shouldn't change behavior; the reason for moving this is because we want to take advantage of Woodwork so we want to pass in X as a Woodwork structure before converting it later :)
| X = _convert_to_woodwork_structure(X) | ||
| X = _convert_woodwork_types_wrapper(X.to_dataframe()) | ||
| self._date_time_col_names = X.select_dtypes(include=datetime_dtypes).columns | ||
| self._date_time_col_names = X.select(include=["datetime"]).columns |
There was a problem hiding this comment.
Can't do X.select('datetime').columns ? Just curious because we're doing that elsewhere for numeric
| """ | ||
| X = _convert_to_woodwork_structure(X) | ||
| y = _convert_to_woodwork_structure(y) | ||
| if "numeric" not in y.semantic_tags: |
There was a problem hiding this comment.
Is there an is_numeric util method in woodwork which we can use? Or is this the recommended way to do this check? @gsheni
There was a problem hiding this comment.
Yes, there is (thought it wasn't meant to be user facing).
https://github.com/alteryx/woodwork/blob/main/woodwork/datacolumn.py#L318
* First round * Removed RMSLE, MSLE, and MAPE from non_core_objectives * Add objective parameter to all data_check subclasses * Add new data_check_message_code for a target that is incompatible with an objective * Add check to invalid_target_data_check for RMSLE, MSLE, and MAPE * Invalid Target Data Check update * Fix test case to not include invalid objectives in regression_core_objectives * Test updates * Invalid target data check update * Lint * Release notes * Tests and updates to code * Release notes * Latest Dependencies Version update * Lint fix * Dependency check * Fixing minor issues * test none objective update * Update to reflect changes in #1597 * Make objective mandatory for DefaultDataChecks and InvalidTargetDataCheck * Mock data check to see if objective is being passed from AutoML * Fix breaking tests due to mandatory objective param in DefaultDataChecks, and check error output from DefaultDataChecks * Change label to all positive so RMSLE, MSLE, and MAPE can be included in regression_core_objectives * Raise error for None objective in InvalidTargetDataCheck * Docstring example fix * Lint and docstring example * Jupyter notebook update for build_docs * Jupyter notebook update * remove print statement * Added test cases and docstring, updated error message * test case fixes * Test update * lint error * Add positive_only class attribute for ObjectiveBase * Adding @classproperty to ObjectiveBase for positive_only check * Lint error * Fix conflicts with #1551 * Lint
Closes #1290