Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation to include all data checks and usage of data checks in AutoML #1412

Merged
merged 23 commits into from
Nov 11, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Nov 5, 2020

Closes #1395

Updated docs: Data checks
Data checks in AutoML

@angela97lin angela97lin self-assigned this Nov 5, 2020
@codecov
Copy link

codecov bot commented Nov 5, 2020

Codecov Report

Merging #1412 (f34f3ae) into main (78137f4) will not change coverage.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1412   +/-   ##
=======================================
  Coverage   100.0%   100.0%           
=======================================
  Files         216      216           
  Lines       14228    14228           
=======================================
  Hits        14221    14221           
  Misses          7        7           
Impacted Files Coverage Δ
evalml/data_checks/default_data_checks.py 100.0% <ø> (ø)
evalml/data_checks/high_variance_cv_data_check.py 100.0% <ø> (ø)
evalml/data_checks/class_imbalance_data_check.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78137f4...f34f3ae. Read the comment docs.

@angela97lin angela97lin marked this pull request as ready for review November 9, 2020 18:57
@angela97lin angela97lin added this to the November 2020 milestone Nov 9, 2020
@angela97lin angela97lin added the documentation Improvements or additions to documentation label Nov 9, 2020
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look great! Left a comment on OutlierDataCheck but everything else LGTM

docs/source/user_guide/data_checks.ipynb Show resolved Hide resolved
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left a few suggested tweaks

docs/source/user_guide/automl.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/automl.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/data_checks.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/data_checks.ipynb Outdated Show resolved Hide resolved
"\n",
"* if any of the target values are missing, an error is returned\n",
"* if the specified problem type is a binary classification problem but there is more or less than two unique values in the target, an error is returned\n",
"* if binary classification target classes are numeric values not equal to {0, 1}, an error is returned because it can cause unpredictable behavior when passed to pipelines"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great

Is the last point still true? Now that we have the label encoder in the classification pipeline, I think this use case will work fine. If I'm right, can we file something to either delete this from InvalidTargetDataCheck, or change it to a warning instead of an error? Certainly, providing something which looks suspicious could indicate the user has provided incorrect data for some reason, which makes me think showing a warning is a good idea rather than just deleting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, good point! I just filed #1422 to track investigating and updating the docs accordingly. For now, I'll merge as is with the documentation since that's the current behavior.

docs/source/user_guide/data_checks.ipynb Show resolved Hide resolved
docs/source/user_guide/data_checks.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/data_checks.ipynb Show resolved Hide resolved
docs/source/user_guide/data_checks.ipynb Outdated Show resolved Hide resolved
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin Nice! I left some minor comments.

docs/source/user_guide/data_checks.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/data_checks.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/data_checks.ipynb Show resolved Hide resolved
docs/source/user_guide/data_checks.ipynb Outdated Show resolved Hide resolved
@angela97lin angela97lin merged commit 2998895 into main Nov 11, 2020
@angela97lin angela97lin deleted the 1395_update_data_checks_docs branch November 11, 2020 01:13
@dsherry dsherry mentioned this pull request Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data checks: update docs
4 participants