New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pipelines and make_pipelines
to accept Woodwork DataTables
#1393
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1393 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 214 214
Lines 14040 14107 +67
=========================================
+ Hits 14033 14100 +67
Misses 7 7
Continue to review full report at Codecov.
|
make_pipelines
to accept Woodwork DataTables
…lml into 1288_ww_pipelines_components
Note: we want warnings for users who pass in non-Woodwork data structures. Will either update this PR or put up one shortly after (depending on where reviews are) to address this. EDIT: Per discussion with @dsherry For now, it is okay to just warn when users use AutoML, and not for individual pipelines/components. This aligns with our methodology that AutoML is smart, pipelines/components are not. Later, when we tackle #1289 (passing Woodwork data structures directly to pipelines), we could think about adding in warnings for pipelines/components as we will trigger the warning only once in AutoMLSearch, but there needs to be some work in Woodwork to make that a viable thing for EvalML. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin I think this looks good! I am interested in your thoughts on whether we should convert back to pandas in make_pipeline
!
evalml/tests/pipeline_tests/classification_pipeline_tests/test_classification.py
Show resolved
Hide resolved
def test_invalid_targets_regression_pipeline(target_type, dummy_regression_pipeline_class): | ||
X, y = load_wine() | ||
if target_type == "categorical": | ||
y = pd.Categorical(y) | ||
if target_type == "category": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton "category" is the woodwork "semantic tag" applied to indicate a feature is categorical. So, this change is necessary because we're using woodwork's feature types here instead of pandas data types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its confusing because "category" is also the physical type used by pandas, hence the usage on line 11 below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin amazing!
Blocking:
- delete pdb import from pipeline base
- Resolve comment in
make_pipelines
about not needing_convert_woodwork_types_wrapper
- Resolve @freddyaboulton comment about
test_woodwork_classification_pipeline
def test_invalid_targets_regression_pipeline(target_type, dummy_regression_pipeline_class): | ||
X, y = load_wine() | ||
if target_type == "categorical": | ||
y = pd.Categorical(y) | ||
if target_type == "category": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its confusing because "category" is also the physical type used by pandas, hence the usage on line 11 below.
Addresses of #1288 and #1367 to update pipeline classes and
make_pipelines
to handle Woodwork data types. Components will be handled in a separate PR.