You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, data checks pass when there is only one unique value in the target labels passed to the AutoML Search function, even though that causes an error later on. For example, if the target labels were [30, 30, 30, 30, 30, 30], the default data checks pass, but an IndexError: single positional indexer is out-of-bounds error is generated later on by pandas, (last point in the evalml library is in the pipeline class). I have attached an image of the stack trace for this error.
We should create a Datacheck for this case, to ensure that the error message is helpful and relevant rather than caused later on with a cryptic and seemingly unrelated error message. The main decision that has to be made to solve this issue is whether to add a new data check to the set of default data checks, or to add this specific case to an existing data check. Currently, there is a target checker in place, so if going with the second option, it would likely make the most sense to add to this checker.
The text was updated successfully, but these errors were encountered:
@dsherry@ctduffy I prefer the first solution (making a new data check) because I think we would want to also check if any feature has only one unique value and not just the labels. I think this way we can also knock out #197. What do you guys think?
Ah damn, of course we already had this filed 🤦 😄 great find @freddyaboulton !!
I agree we should add this to the data checks. Whether we add a new check or upgrade InvalidTargetDataCheck, fine either way. Perhaps a new check is more clear.
Let's leave this open to track the bug, but when we merge a fix for #197 we should close this too. I'll update #197 with this.
Currently, data checks pass when there is only one unique value in the target labels passed to the AutoML Search function, even though that causes an error later on. For example, if the target labels were
[30, 30, 30, 30, 30, 30]
, the default data checks pass, but anIndexError: single positional indexer is out-of-bounds
error is generated later on by pandas, (last point in the evalml library is in the pipeline class). I have attached an image of the stack trace for this error.We should create a
Datacheck
for this case, to ensure that the error message is helpful and relevant rather than caused later on with a cryptic and seemingly unrelated error message. The main decision that has to be made to solve this issue is whether to add a new data check to the set of default data checks, or to add this specific case to an existing data check. Currently, there is a target checker in place, so if going with the second option, it would likely make the most sense to add to this checker.The text was updated successfully, but these errors were encountered: