Allow users to set feature types without having to learn about woodwork directly#1555
Allow users to set feature types without having to learn about woodwork directly#1555angela97lin merged 20 commits intomainfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1555 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 240 240
Lines 18092 18120 +28
=========================================
+ Hits 18084 18112 +28
Misses 8 8
Continue to review full report at Codecov.
|
jeremyliweishih
left a comment
There was a problem hiding this comment.
I think this looks good! I like the example Dylan gave in the issue and it should be good for user experience to keep it to one method. Can you add something in the docs to show how to use this? Might also want to add this to the API reference.
|
@angela97lin I'm excited to review this! Could we please also add this to the docs? I think it should go on the start page, and in the automl guide :) I had left some thoughts in the issue. |
|
For others reading, here's the example usage from the issue: X, y = infer_feature_types(X, y, feature_types={...})
automl.search(X, y)
pipeline = automl.get_pipeline(42)
pipeline.fit(X, y) |
bchen1116
left a comment
There was a problem hiding this comment.
LGTM! I do think it would be nice if the types (string vs dict) were the same for 1D and 2D datatables for the sake of consistency, but in terms of ease, I think this is easier for the user. I'm good with this implementation!
|
@dsherry I see that you were considering adding this util method to the start page as well as the automl guide. I've added it to the automl guide, but when trying to add to the start page, wondered if it would make it too clunky / if it's necessary since we want the start page to be the most minimal example possible. Our current data set for the start page / automl guide used the breast cancer data set, which is all numeric, so I changed it to the fraud data set. That being said, I would love your thoughts! |
|
Going to merge this in, since it seems like we're okay with the API, so that it's available in the next release. If there are any further comments about it / any improvements we want to make to the documents, I'd be happy to put up another PR 😁 |
Closes #1545
I currently have a function that will accept different inputs for
feature_typesdepending on the input. If the input is 2d, we expect a dictionary mapping col name to woodwork logical type string. If the input is 1d, we expect a woodwork logical type or string equivalent.This seems the easiest to users, but I'm not sure this is the cleanest impl. Should this be two separate methods instead? Should we enforce a dictionary for 1d too, and force users to pass the name of the 1d input?
Would love thoughts before pushing forward on adding more tests :d
Docs here:https://evalml.alteryx.com/en/1545_infer_feature_types/user_guide/automl.html#AutoML-in-EvalML