DataTable.select warns if no columns match #322

dsherry · 2020-10-27T18:21:05Z

The following code

import evalml
import woodwork as ww
X, y = evalml.demos.load_breast_cancer()
dt = ww.DataTable(X)
dt.select('natural_language')

currently emits this warning on the first run:

The following selectors were not present in your DataTable: natural_language

I would expect this to emit no warning and to return an DataTable containing 0 columns. That's pandas' behavior:

X.select_dtypes('int')

emits a dataframe with the same index as the original (569 rows), but with 0 columns.

Encountered in https://github.com/alteryx/evalml/pull/1062/files#r512896715

thehomebrewnerd · 2020-10-27T18:38:46Z

@dsherry Thinking of this generally and ignoring what pandas does for the time being, why would you not want this warning as a user?

I was originally thinking that it might be nice to explicitly tell the user that they tried to select something that wasn't in the data, rather than just silently giving them nothing back, but maybe it isn't useful and goes against the behavior most users would expect to see.

dsherry · 2020-10-27T18:59:17Z

@thehomebrewnerd that's a good question, and yes I agree we shouldn't simply mirror pandas behavior for the sake of mirroring pandas behavior, lol.

Consider the use-case in evalml which prompted me to file this: we want to select all natural_language features and pass those down the stack to the pipeline creation code etc. If there are no text features, that's fine for us, we just won't add a text featurizer component to the pipeline. So the warning doesn't add value in our current use-case; in fact it adds confusion when users run automl search, because they don't have the right context to understand the warning.

That use-case aside, my opinion is that our APIs should allow people to get into "bad" situations like having an empty set of columns selected. I'm sure there are some situations where a warning here would be helpful, but we don't know which type of situation the user is in, and I'd rather trust the user to handle this case.

What do you think?

thehomebrewnerd · 2020-10-27T19:38:27Z

@dsherry Thanks for your thoughts. As I think about this more, I'm convincing myself that your approach of allowing people to get into "bad" situations is a reasonable approach. We can't anticipate all the "bad" situations anyway, and trying to handle them makes the code base bigger and harder to maintain. It's probably best to not implement these checks unless we really are certain they are needed.

With that, I'm in favor of removing this particular warning and returning no columns in this case.

dsherry added the bug Something isn't working label Oct 27, 2020

dsherry mentioned this issue Oct 27, 2020

Integrate TextFeaturizer to automl alteryx/evalml#1062

Merged

gsheni mentioned this issue Oct 28, 2020

Remove no selected columns warning #325

Merged

gsheni self-assigned this Oct 28, 2020

gsheni closed this as completed in #325 Oct 28, 2020

gsheni added the evalml EvalML request label Feb 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataTable.select warns if no columns match #322

DataTable.select warns if no columns match #322

dsherry commented Oct 27, 2020

thehomebrewnerd commented Oct 27, 2020

dsherry commented Oct 27, 2020

thehomebrewnerd commented Oct 27, 2020

DataTable.select warns if no columns match #322

DataTable.select warns if no columns match #322

Comments

dsherry commented Oct 27, 2020

thehomebrewnerd commented Oct 27, 2020

dsherry commented Oct 27, 2020

thehomebrewnerd commented Oct 27, 2020