Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update outliers data check implementation #1855

Merged
merged 17 commits into from
Feb 22, 2021
Merged

Update outliers data check implementation #1855

merged 17 commits into from
Feb 22, 2021

Conversation

angela97lin
Copy link
Contributor

Closes #1745

@codecov
Copy link

codecov bot commented Feb 17, 2021

Codecov Report

Merging #1855 (861da5e) into main (9886fa6) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1855     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         260      260             
  Lines       20838    20892     +54     
=========================================
+ Hits        20832    20886     +54     
  Misses          6        6             
Impacted Files Coverage Δ
evalml/data_checks/data_check.py 100.0% <ø> (ø)
evalml/data_checks/outliers_data_check.py 100.0% <100.0%> (ø)
...ests/data_checks_tests/test_outliers_data_check.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9886fa6...861da5e. Read the comment docs.

@@ -86,3 +86,39 @@ def test_outliers_data_check_string_cols():
details={"columns": ["d"]}).to_dict()],
"errors": []
}


def test_outlier_score():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private but added some tests anyways for sanity sake / code coverage :)

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin Looks great! I have some small comments about cleaning up the apis of the private static methods.

evalml/data_checks/outliers_data_check.py Show resolved Hide resolved
evalml/data_checks/outliers_data_check.py Show resolved Hide resolved
evalml/data_checks/outliers_data_check.py Outdated Show resolved Hide resolved
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, way to knock out this initially poorly defined PR ;). I think we might be able to remove the nan check stuff, but I'll leave that up to you.

evalml/data_checks/outliers_data_check.py Show resolved Hide resolved
evalml/data_checks/outliers_data_check.py Outdated Show resolved Hide resolved
evalml/data_checks/outliers_data_check.py Show resolved Hide resolved
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left one question, but nothing blocking

evalml/data_checks/outliers_data_check.py Show resolved Hide resolved
@angela97lin angela97lin merged commit 542b828 into main Feb 22, 2021
@angela97lin angela97lin deleted the 1745_outliers branch February 22, 2021 18:36
@chukarsten chukarsten mentioned this pull request Feb 23, 2021
@dsherry dsherry mentioned this pull request Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data Health/Checks: Probability of Outliers
4 participants