Adding basic detect id columns guardrail #135

angela97lin · 2019-10-16T20:08:16Z

Fixes #115

codecov · 2019-10-16T20:23:46Z

Codecov Report

Merging #135 into master will increase coverage by 0.01%.
The diff coverage is 97.36%.

@@            Coverage Diff             @@
##           master     #135      +/-   ##
==========================================
+ Coverage   96.64%   96.65%   +0.01%     
==========================================
  Files          89       90       +1     
  Lines        2233     2271      +38     
==========================================
+ Hits         2158     2195      +37     
- Misses         75       76       +1

Impacted Files	Coverage Δ
evalml/models/auto_regressor.py	`90.9% <ø> (ø)`	⬆️
evalml/models/auto_classifier.py	`100% <ø> (ø)`	⬆️
...ests/preprocessing_tests/test_detect_id_columns.py	`100% <100%> (ø)`
evalml/guardrails/utils.py	`96.42% <100%> (+3.09%)`	⬆️
evalml/models/auto_base.py	`93.19% <80%> (-0.29%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a58a11...6933c6b. Read the comment docs.

…ng point uniqueness

evalml/models/auto_base.py

evalml/guardrails/utils.py

evalml/models/auto_classifier.py

evalml/models/auto_regressor.py

evalml/tests/preprocessing_tests/test_detect_id_columns.py

kmax12 · 2019-10-22T16:33:37Z

evalml/guardrails/utils.py

+        A dictionary of features with column name or index and their probability of being ID columns
+    """
+    id_cols = {}
+    col_names = [str(col) for col in X.columns.tolist()]


I find the logic internally here a bit a hard to follow.

it seems to me that if

.95 if any one of the 3 cases are true

or

1.0 if case 1 and 2 are true or 2 and 3 are true (case 1 and 3 being true isn't possible, but it's not immediately obvious through reading).

maybe we can take another stab to refactor? happy to discuss more if needed

Related to my comment on parameters about being more generous; since we're just issuing warnings would it be better to just set to 1.0 if any of the cases are true?

Given the current checks, it may make sense to do as you suggested @jeremyliweishih, as each of the checks are decent indications of an ID column... Otherwise, I could give each check a "confidence percentage" and sum up a column's percentage across the three current checks. Thoughts?

i think for now, let's not worry about the implementation much. as long as we're happy with API, we can change implementation in the future

jeremyliweishih

I think regardless of intent to use as a separate tool or as part of AutoBase: if our process is clear through documentation and we're not actively removing columns, I think it would be best to set a column as ID if it passes any of the checks.

kmax12

LGTM

kmax12 · 2019-11-05T17:20:46Z

evalml/guardrails/utils.py

+        A dictionary of features with column name or index and their probability of being ID columns
+    """
+    id_cols = {}
+    col_names = [str(col) for col in X.columns.tolist()]


i think for now, let's not worry about the implementation much. as long as we're happy with API, we can change implementation in the future

adding basic id detection guardrail

326d965

angela97lin closed this Oct 16, 2019

angela97lin reopened this Oct 16, 2019

Merge branch 'master' into gr_id

0b0e39c

angela97lin added 6 commits October 16, 2019 17:55

updating to address nonstring col names; still need to address floati…

e0ab6d0

…ng point uniqueness

merging

156fd92

accidentally commented out tests

1236da1

adding basic id detection guardrail

e4d95b6

Merge remote-tracking branch 'origin/gr_id' into gr_id

a54b71b

linting

5ca8aed

angela97lin self-assigned this Oct 17, 2019

angela97lin requested a review from kmax12 October 17, 2019 15:54

Merge branch 'master' into gr_id

4a49c7c

angela97lin removed the request for review from kmax12 October 18, 2019 19:55

angela97lin added 4 commits October 18, 2019 16:09

Merge branch 'master' into gr_id

ab49af8

fixing merge issues

0984af4

linting

a620e7a

adding to api ref

861c0b7

angela97lin requested review from kmax12 and jeremyliweishih October 21, 2019 14:34

angela97lin commented Oct 21, 2019

View reviewed changes

evalml/models/auto_base.py Show resolved Hide resolved

angela97lin commented Oct 21, 2019

View reviewed changes

evalml/guardrails/utils.py Outdated Show resolved Hide resolved

kmax12 suggested changes Oct 22, 2019

View reviewed changes

angela97lin added 6 commits October 23, 2019 12:37

docstrings + str test case

cd3c619

Merge branch 'master' into gr_id

edcbc19

Merge branch 'master' into gr_id

4805417

Merge branch 'master' into gr_id

cd38321

Merge branch 'master' into gr_id

0ab2718

Merge branch 'master' into gr_id

cedfaad

changelog

19d4e46

jeremyliweishih requested changes Oct 30, 2019

View reviewed changes

Merge branch 'master' into gr_id

3ba21bb

kmax12 previously approved these changes Nov 5, 2019

View reviewed changes

Merge branch 'master' into gr_id

6933c6b

angela97lin dismissed kmax12’s stale review via 6933c6b November 5, 2019 18:54

angela97lin requested a review from jeremyliweishih November 5, 2019 21:14

jeremyliweishih approved these changes Nov 5, 2019

View reviewed changes

angela97lin merged commit 9525b3f into master Nov 5, 2019

angela97lin mentioned this pull request Nov 15, 2019

v0.5.1 #216

Merged

angela97lin deleted the gr_id branch April 17, 2020 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding basic detect id columns guardrail #135

Adding basic detect id columns guardrail #135

angela97lin commented Oct 16, 2019

codecov bot commented Oct 16, 2019 •

edited

Loading

kmax12 Oct 22, 2019

jeremyliweishih Oct 23, 2019

angela97lin Oct 25, 2019

kmax12 Nov 5, 2019

jeremyliweishih left a comment

kmax12 left a comment

kmax12 Nov 5, 2019

Adding basic detect id columns guardrail #135

Adding basic detect id columns guardrail #135

Conversation

angela97lin commented Oct 16, 2019

codecov bot commented Oct 16, 2019 • edited Loading

Codecov Report

kmax12 Oct 22, 2019

Choose a reason for hiding this comment

jeremyliweishih Oct 23, 2019

Choose a reason for hiding this comment

angela97lin Oct 25, 2019

Choose a reason for hiding this comment

kmax12 Nov 5, 2019

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

kmax12 left a comment

Choose a reason for hiding this comment

kmax12 Nov 5, 2019

Choose a reason for hiding this comment

codecov bot commented Oct 16, 2019 •

edited

Loading