New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to set feature types without having to learn about woodwork directly #1555
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1555 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 240 240
Lines 18092 18120 +28
=========================================
+ Hits 18084 18112 +28
Misses 8 8
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good! I like the example Dylan gave in the issue and it should be good for user experience to keep it to one method. Can you add something in the docs to show how to use this? Might also want to add this to the API reference.
@angela97lin I'm excited to review this! Could we please also add this to the docs? I think it should go on the start page, and in the automl guide :) I had left some thoughts in the issue. |
For others reading, here's the example usage from the issue: X, y = infer_feature_types(X, y, feature_types={...})
automl.search(X, y)
pipeline = automl.get_pipeline(42)
pipeline.fit(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I do think it would be nice if the types (string vs dict) were the same for 1D and 2D datatables for the sake of consistency, but in terms of ease, I think this is easier for the user. I'm good with this implementation!
@dsherry I see that you were considering adding this util method to the start page as well as the automl guide. I've added it to the automl guide, but when trying to add to the start page, wondered if it would make it too clunky / if it's necessary since we want the start page to be the most minimal example possible. Our current data set for the start page / automl guide used the breast cancer data set, which is all numeric, so I changed it to the fraud data set. That being said, I would love your thoughts! |
Going to merge this in, since it seems like we're okay with the API, so that it's available in the next release. If there are any further comments about it / any improvements we want to make to the documents, I'd be happy to put up another PR |
Closes #1545
I currently have a function that will accept different inputs for
feature_types
depending on the input. If the input is 2d, we expect a dictionary mapping col name to woodwork logical type string. If the input is 1d, we expect a woodwork logical type or string equivalent.This seems the easiest to users, but I'm not sure this is the cleanest impl. Should this be two separate methods instead? Should we enforce a dictionary for 1d too, and force users to pass the name of the 1d input?
Would love thoughts before pushing forward on adding more tests :d
Docs here:https://evalml.alteryx.com/en/1545_infer_feature_types/user_guide/automl.html#AutoML-in-EvalML