Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTable.select warns if no columns match #322

Closed
dsherry opened this issue Oct 27, 2020 · 3 comments · Fixed by #325
Closed

DataTable.select warns if no columns match #322

dsherry opened this issue Oct 27, 2020 · 3 comments · Fixed by #325
Assignees
Labels
bug Something isn't working evalml EvalML request

Comments

@dsherry
Copy link

dsherry commented Oct 27, 2020

The following code

import evalml
import woodwork as ww
X, y = evalml.demos.load_breast_cancer()
dt = ww.DataTable(X)
dt.select('natural_language')

currently emits this warning on the first run:

The following selectors were not present in your DataTable: natural_language

I would expect this to emit no warning and to return an DataTable containing 0 columns. That's pandas' behavior:

X.select_dtypes('int')

emits a dataframe with the same index as the original (569 rows), but with 0 columns.

Encountered in https://github.com/alteryx/evalml/pull/1062/files#r512896715

@dsherry dsherry added the bug Something isn't working label Oct 27, 2020
@thehomebrewnerd
Copy link
Contributor

@dsherry Thinking of this generally and ignoring what pandas does for the time being, why would you not want this warning as a user?

I was originally thinking that it might be nice to explicitly tell the user that they tried to select something that wasn't in the data, rather than just silently giving them nothing back, but maybe it isn't useful and goes against the behavior most users would expect to see.

@dsherry
Copy link
Author

dsherry commented Oct 27, 2020

@thehomebrewnerd that's a good question, and yes I agree we shouldn't simply mirror pandas behavior for the sake of mirroring pandas behavior, lol.

Consider the use-case in evalml which prompted me to file this: we want to select all natural_language features and pass those down the stack to the pipeline creation code etc. If there are no text features, that's fine for us, we just won't add a text featurizer component to the pipeline. So the warning doesn't add value in our current use-case; in fact it adds confusion when users run automl search, because they don't have the right context to understand the warning.

That use-case aside, my opinion is that our APIs should allow people to get into "bad" situations like having an empty set of columns selected. I'm sure there are some situations where a warning here would be helpful, but we don't know which type of situation the user is in, and I'd rather trust the user to handle this case.

What do you think?

@thehomebrewnerd
Copy link
Contributor

@dsherry Thanks for your thoughts. As I think about this more, I'm convincing myself that your approach of allowing people to get into "bad" situations is a reasonable approach. We can't anticipate all the "bad" situations anyway, and trying to handle them makes the code base bigger and harder to maintain. It's probably best to not implement these checks unless we really are certain they are needed.

With that, I'm in favor of removing this particular warning and returning no columns in this case.

@gsheni gsheni self-assigned this Oct 28, 2020
@gsheni gsheni added the evalml EvalML request label Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working evalml EvalML request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants