Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to set feature types without having to learn about woodwork directly #1555

Merged
merged 20 commits into from Dec 29, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Dec 14, 2020

Closes #1545

I currently have a function that will accept different inputs for feature_types depending on the input. If the input is 2d, we expect a dictionary mapping col name to woodwork logical type string. If the input is 1d, we expect a woodwork logical type or string equivalent.

This seems the easiest to users, but I'm not sure this is the cleanest impl. Should this be two separate methods instead? Should we enforce a dictionary for 1d too, and force users to pass the name of the 1d input?

Would love thoughts before pushing forward on adding more tests :d

Docs here:https://evalml.alteryx.com/en/1545_infer_feature_types/user_guide/automl.html#AutoML-in-EvalML

@angela97lin angela97lin added this to the December 2020 milestone Dec 14, 2020
@angela97lin angela97lin self-assigned this Dec 14, 2020
@codecov
Copy link

codecov bot commented Dec 14, 2020

Codecov Report

Merging #1555 (6c30734) into main (095b053) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1555     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         240      240             
  Lines       18092    18120     +28     
=========================================
+ Hits        18084    18112     +28     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/utils/__init__.py 100.0% <ø> (ø)
evalml/tests/utils_tests/test_gen_utils.py 100.0% <100.0%> (ø)
evalml/utils/gen_utils.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 095b053...6c30734. Read the comment docs.

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good! I like the example Dylan gave in the issue and it should be good for user experience to keep it to one method. Can you add something in the docs to show how to use this? Might also want to add this to the API reference.

@dsherry
Copy link
Collaborator

dsherry commented Dec 17, 2020

@angela97lin I'm excited to review this! Could we please also add this to the docs? I think it should go on the start page, and in the automl guide :) I had left some thoughts in the issue.

@dsherry
Copy link
Collaborator

dsherry commented Dec 17, 2020

For others reading, here's the example usage from the issue:

X, y = infer_feature_types(X, y, feature_types={...})
automl.search(X, y)
pipeline = automl.get_pipeline(42)
pipeline.fit(X, y)

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I do think it would be nice if the types (string vs dict) were the same for 1D and 2D datatables for the sake of consistency, but in terms of ease, I think this is easier for the user. I'm good with this implementation!

@angela97lin
Copy link
Contributor Author

@dsherry I see that you were considering adding this util method to the start page as well as the automl guide. I've added it to the automl guide, but when trying to add to the start page, wondered if it would make it too clunky / if it's necessary since we want the start page to be the most minimal example possible. Our current data set for the start page / automl guide used the breast cancer data set, which is all numeric, so I changed it to the fraud data set. That being said, I would love your thoughts!

@angela97lin
Copy link
Contributor Author

Going to merge this in, since it seems like we're okay with the API, so that it's available in the next release. If there are any further comments about it / any improvements we want to make to the documents, I'd be happy to put up another PR 😁

@angela97lin angela97lin merged commit c871f3b into main Dec 29, 2020
1 check passed
@angela97lin angela97lin deleted the 1545_infer_feature_types branch December 29, 2020 20:49
@dsherry dsherry mentioned this pull request Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow users to set feature types without having to learn about woodwork directly
4 participants