Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Datacheck to check for single target value #889

Closed
ctduffy opened this issue Jun 26, 2020 · 3 comments · Fixed by #893
Closed

Add Datacheck to check for single target value #889

ctduffy opened this issue Jun 26, 2020 · 3 comments · Fixed by #893
Assignees
Labels
bug Issues tracking problems with existing features.
Milestone

Comments

@ctduffy
Copy link
Contributor

ctduffy commented Jun 26, 2020

Currently, data checks pass when there is only one unique value in the target labels passed to the AutoML Search function, even though that causes an error later on. For example, if the target labels were [30, 30, 30, 30, 30, 30], the default data checks pass, but an IndexError: single positional indexer is out-of-bounds error is generated later on by pandas, (last point in the evalml library is in the pipeline class). I have attached an image of the stack trace for this error.

Error Stacktrace

We should create a Datacheck for this case, to ensure that the error message is helpful and relevant rather than caused later on with a cryptic and seemingly unrelated error message. The main decision that has to be made to solve this issue is whether to add a new data check to the set of default data checks, or to add this specific case to an existing data check. Currently, there is a target checker in place, so if going with the second option, it would likely make the most sense to add to this checker.

@ctduffy ctduffy added the bug Issues tracking problems with existing features. label Jun 26, 2020
@ctduffy ctduffy added this to the June 2020 milestone Jun 26, 2020
@dsherry
Copy link
Contributor

dsherry commented Jun 26, 2020

Thanks @ctduffy for filing!

@freddyaboulton is gonna grab this and we'll aim to have it in for the release Tues. NBD if it has to slip.

@freddyaboulton
Copy link
Contributor

freddyaboulton commented Jun 26, 2020

@dsherry @ctduffy I prefer the first solution (making a new data check) because I think we would want to also check if any feature has only one unique value and not just the labels. I think this way we can also knock out #197. What do you guys think?

@dsherry
Copy link
Contributor

dsherry commented Jun 26, 2020

I think you meant to tag @ctduffy

Ah damn, of course we already had this filed 🤦 😄 great find @freddyaboulton !!

I agree we should add this to the data checks. Whether we add a new check or upgrade InvalidTargetDataCheck, fine either way. Perhaps a new check is more clear.

Let's leave this open to track the bug, but when we merge a fix for #197 we should close this too. I'll update #197 with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues tracking problems with existing features.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants