Skip to content

Patch pre-Release v0.30.1#2626

Merged
chukarsten merged 3 commits into
mainfrom
pre_release_patch
Aug 13, 2021
Merged

Patch pre-Release v0.30.1#2626
chukarsten merged 3 commits into
mainfrom
pre_release_patch

Conversation

@chukarsten
Copy link
Copy Markdown
Contributor

A small update to do a better job of detecting all-pd.NA values. I guess.

def convert_all_nan_unknown_to_double(data):
def is_column_pd_na(data, col):
return all([isinstance(x, type(pd.NA)) for x in data[col]])
return all(data[col].isna())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton and @bchen1116 I think you recommended both this and all(data[col] == pd.NA). What happened was I tried all(data[col] == pd.NA), which created weird behavior in as much as a column full of pd.NA returns a column full of pd.NA when compared to pd.NA. I found a little bit more about this here, where they discuss some of the design decisions about how arithmetic and comparisons using pd.NA result in pd.NA to "propagate pd.NA". I ended up just going with what I had and it was super slow - as you foresaw. I'm now trying to atone for my sins.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch.

assert invalid_targets_check.validate(X, y=pd.Series([np.nan, np.nan, np.nan])) == {
"warnings": [],
"errors": [
DataCheckError(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slightly different logic catches fully null Unknown columns and changes them to np.nan Double columns both in dataframes and in series now.

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 13, 2021

Codecov Report

Merging #2626 (7832e08) into main (f19591e) will not change coverage.
The diff coverage is 100.0%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #2626   +/-   ##
=====================================
  Coverage   99.9%   99.9%           
=====================================
  Files        297     297           
  Lines      27071   27071           
=====================================
  Hits       27027   27027           
  Misses        44      44           
Impacted Files Coverage Δ
...ta_checks_tests/test_invalid_targets_data_check.py 100.0% <ø> (ø)
evalml/utils/woodwork_utils.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f19591e...7832e08. Read the comment docs.

Copy link
Copy Markdown
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for making the changes. Do we need to update the release note with this PR # (i think you can just tack it on to the end of the ww update PR)

Copy link
Copy Markdown
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done @chukarsten , I'm so glad we caught this pre-release 👏

@chukarsten chukarsten merged commit 948e3b4 into main Aug 13, 2021
@freddyaboulton freddyaboulton deleted the pre_release_patch branch May 13, 2022 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants