Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding basic outlier detection guardrail #151

Merged
merged 17 commits into from Nov 6, 2019
Merged

adding basic outlier detection guardrail #151

merged 17 commits into from Nov 6, 2019

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Oct 23, 2019

Closes #138

@codecov
Copy link

codecov bot commented Oct 23, 2019

Codecov Report

Merging #151 into master will increase coverage by 0.03%.
The diff coverage is 98.24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #151      +/-   ##
==========================================
+ Coverage   96.65%   96.69%   +0.03%     
==========================================
  Files          90       91       +1     
  Lines        2271     2328      +57     
==========================================
+ Hits         2195     2251      +56     
- Misses         76       77       +1
Impacted Files Coverage Δ
evalml/models/auto_regressor.py 90.9% <ø> (ø) ⬆️
evalml/models/auto_classifier.py 100% <ø> (ø) ⬆️
evalml/models/auto_base.py 93.36% <100%> (+0.16%) ⬆️
...alml/tests/guardrail_tests/test_detect_outliers.py 100% <100%> (ø)
evalml/tests/automl_tests/test_autoclassifier.py 100% <100%> (ø) ⬆️
evalml/guardrails/utils.py 96% <95.45%> (-0.43%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8fbbb9...b173747. Read the comment docs.

@angela97lin
Copy link
Contributor Author

angela97lin commented Oct 23, 2019

Notes:
After doing a bit of research, I combined IsolationForest from sklearn with IQR as per this article. Reason for doing so was that IsolationForest looked promising but required the user to give an estimate of contamination based on their data. With IQR, I instead will look at the scores returned from IsolationForest's decision_function and return the indices corresponding to any extreme score. Is this overkill for just a guardrail?

Also looked into just using IQR but thought that it may not be so great as we increase the number of features.

Definitely open to other suggestions!

@angela97lin angela97lin self-assigned this Oct 24, 2019
Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Do you think we can use IQR as a very basic outlier detection tool but also keep isolation forests as another?

evalml/guardrails/utils.py Outdated Show resolved Hide resolved
evalml/guardrails/utils.py Show resolved Hide resolved
kmax12
kmax12 previously approved these changes Nov 5, 2019
Copy link
Contributor

@kmax12 kmax12 left a comment

LGTM

@angela97lin
Copy link
Contributor Author

angela97lin commented Nov 5, 2019

@kmax12 per our brief discussion, should I start moving the guardrails out of Auto(*) classes? Or leave them in there for now?

Edit: per discussion, will push in for now and make one PR to remove all guardrails from Auto(*)

@angela97lin angela97lin requested a review from jeremyliweishih Nov 5, 2019
jeremyliweishih
jeremyliweishih previously approved these changes Nov 5, 2019
@angela97lin angela97lin requested a review from jeremyliweishih Nov 5, 2019
jeremyliweishih
jeremyliweishih previously approved these changes Nov 6, 2019
@angela97lin angela97lin merged commit a9d079d into master Nov 6, 2019
@angela97lin angela97lin mentioned this pull request Nov 15, 2019
@angela97lin angela97lin deleted the gr_outliers branch Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Guardrail: detect outilers
3 participants