-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update evalml from selecting using pandas dtypes to selecting using Woodwork logical types #1551
Conversation
…o ange_stacked_circleci
Codecov Report
@@ Coverage Diff @@
## main #1551 +/- ##
=======================================
Coverage 100.0% 100.0%
=======================================
Files 240 240
Lines 18270 18271 +1
=======================================
+ Hits 18262 18263 +1
Misses 8 8
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin looks great! I left a few questions/suggestions
evalml/pipelines/components/transformers/imputers/simple_imputer.py
Outdated
Show resolved
Hide resolved
@dsherry I went through and addressed your comments! I deleted the categorical_ww_type and others in place of just selecting the correct logical type/semantic tag, but I kept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin looks great! Sorry I didn't get to this before merge. I left some questions.
X = _convert_woodwork_types_wrapper(X.to_dataframe()) | ||
if y.logical_type not in numeric_and_boolean_ww: | ||
return messages | ||
X_num = X.select(include=numeric_and_boolean_ww) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to include boolean too? Perhaps this should just be X.select('numeric')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this should just be based on whether or not we want to / it makes sense to check for label leakage if the target is boolean :)
|
||
def fit(self, X, y=None): | ||
top_n = self.parameters['top_n'] | ||
X = _convert_to_woodwork_structure(X) | ||
if self.features_to_encode is None: | ||
self.features_to_encode = self._get_cat_cols(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did moving this code change any behavior? Or was it just cleaner this way, but functionally equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't change behavior; the reason for moving this is because we want to take advantage of Woodwork so we want to pass in X as a Woodwork structure before converting it later :)
@@ -77,8 +76,7 @@ def __init__(self, features_to_extract=None, encode_as_categories=False, random_ | |||
|
|||
def fit(self, X, y=None): | |||
X = _convert_to_woodwork_structure(X) | |||
X = _convert_woodwork_types_wrapper(X.to_dataframe()) | |||
self._date_time_col_names = X.select_dtypes(include=datetime_dtypes).columns | |||
self._date_time_col_names = X.select(include=["datetime"]).columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't do X.select('datetime').columns
? Just curious because we're doing that elsewhere for numeric
@@ -26,9 +25,9 @@ def fit(self, X, y): | |||
""" | |||
X = _convert_to_woodwork_structure(X) | |||
y = _convert_to_woodwork_structure(y) | |||
if "numeric" not in y.semantic_tags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an is_numeric
util method in woodwork which we can use? Or is this the recommended way to do this check? @gsheni
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is (thought it wasn't meant to be user facing).
https://github.com/alteryx/woodwork/blob/main/woodwork/datacolumn.py#L318
* First round * Removed RMSLE, MSLE, and MAPE from non_core_objectives * Add objective parameter to all data_check subclasses * Add new data_check_message_code for a target that is incompatible with an objective * Add check to invalid_target_data_check for RMSLE, MSLE, and MAPE * Invalid Target Data Check update * Fix test case to not include invalid objectives in regression_core_objectives * Test updates * Invalid target data check update * Lint * Release notes * Tests and updates to code * Release notes * Latest Dependencies Version update * Lint fix * Dependency check * Fixing minor issues * test none objective update * Update to reflect changes in #1597 * Make objective mandatory for DefaultDataChecks and InvalidTargetDataCheck * Mock data check to see if objective is being passed from AutoML * Fix breaking tests due to mandatory objective param in DefaultDataChecks, and check error output from DefaultDataChecks * Change label to all positive so RMSLE, MSLE, and MAPE can be included in regression_core_objectives * Raise error for None objective in InvalidTargetDataCheck * Docstring example fix * Lint and docstring example * Jupyter notebook update for build_docs * Jupyter notebook update * remove print statement * Added test cases and docstring, updated error message * test case fixes * Test update * lint error * Add positive_only class attribute for ObjectiveBase * Adding @classproperty to ObjectiveBase for positive_only check * Lint error * Fix conflicts with #1551 * Lint
Closes #1290