Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pct_null_rows to HighlyNullDataCheck #2211

Merged
merged 5 commits into from May 3, 2021
Merged

Conversation

jeremyliweishih
Copy link
Contributor

Fixes #2201.

@@ -45,7 +45,7 @@ def validate(self, X, y=None):
"data_check_name": "HighlyNullDataCheck",\
"level": "warning",\
"code": "HIGHLY_NULL",\
"details": {"column": "lots_of_null"}}],\
"details": {"column": "lots_of_null", "pct_null_rows": 0.8}}],\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to suggestion: keep as 0.8 or display as percentage 80%?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe display as percentage since the field name has the word percent in it? I don't feel super strongly about it though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 0.8 is better for now. I believe use [0, 1] notation for percent baseline, class imbalance ratios in params and other fields. I think its nice having our codebase be consistent with how we represent percentages.

@codecov
Copy link

codecov bot commented Apr 30, 2021

Codecov Report

Merging #2211 (a12a53e) into main (2f7f653) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2211   +/-   ##
=======================================
  Coverage   100.0%   100.0%           
=======================================
  Files         287      287           
  Lines       24451    24451           
=======================================
  Hits        24433    24433           
  Misses         18       18           
Impacted Files Coverage Δ
evalml/data_checks/highly_null_data_check.py 100.0% <ø> (ø)
evalml/tests/data_checks_tests/test_data_checks.py 100.0% <ø> (ø)
...s/data_checks_tests/test_highly_null_data_check.py 100.0% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f7f653...a12a53e. Read the comment docs.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review April 30, 2021 20:02
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @jeremyliweishih !

@@ -45,7 +45,7 @@ def validate(self, X, y=None):
"data_check_name": "HighlyNullDataCheck",\
"level": "warning",\
"code": "HIGHLY_NULL",\
"details": {"column": "lots_of_null"}}],\
"details": {"column": "lots_of_null", "pct_null_rows": 0.8}}],\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe display as percentage since the field name has the word percent in it? I don't feel super strongly about it though

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Don't feel too strongly about your suggestion, looks fine as is.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛴ !

@@ -45,7 +45,7 @@ def validate(self, X, y=None):
"data_check_name": "HighlyNullDataCheck",\
"level": "warning",\
"code": "HIGHLY_NULL",\
"details": {"column": "lots_of_null"}}],\
"details": {"column": "lots_of_null", "pct_null_rows": 0.8}}],\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 0.8 is better for now. I believe use [0, 1] notation for percent baseline, class imbalance ratios in params and other fields. I think its nice having our codebase be consistent with how we represent percentages.

@@ -57,11 +57,11 @@ def test_highly_null_data_check_warnings():
"warnings": [DataCheckWarning(message="Column 'lots_of_null' is 50.0% or more null",
data_check_name=highly_null_data_check_name,
message_code=DataCheckMessageCode.HIGHLY_NULL,
details={"column": "lots_of_null"}).to_dict(),
details={"column": "lots_of_null", "pct_null_rows": 0.8}).to_dict(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great, I was looking for a test case where this wasn't 1.0 :)

@jeremyliweishih jeremyliweishih merged commit c77b6a1 into main May 3, 2021
This was referenced May 4, 2021
@freddyaboulton freddyaboulton deleted the js_2201_null_rows branch May 13, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add pct_null_rows to HighlyNullDataCheck
4 participants