Skip to content

Update MultiplyNumericBoolean to work with more inputs#2393

Merged
thehomebrewnerd merged 8 commits into
mainfrom
issue2390-multiply-numeric-boolean-bug
Dec 8, 2022
Merged

Update MultiplyNumericBoolean to work with more inputs#2393
thehomebrewnerd merged 8 commits into
mainfrom
issue2390-multiply-numeric-boolean-bug

Conversation

@thehomebrewnerd

Copy link
Copy Markdown
Contributor

Update MultiplyNumericBoolean to work with more inputs

Certain combination of input dtypes would result in errors with newer pandas versions. This PR refactors the calculation in MultiplyNumericBoolean to resolve this issue.

Fixes #2390

[2.0, nan, nan]
>>> multiply_numeric_boolean([2, 1, 2], [True, True, None]).tolist()
[2.0, 1.0, nan]
>>> multiply_numeric_boolean([2, 1, 2], pd.Series([True, True, pd.NA], dtype="boolean")).tolist()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new code the old boolean series had a dtype of object and pandas wasn't converting this to Int64. Updated here to use boolean since we should always get WW initialized series as inputs during normal CFM operations and that will have a dtype of either bool or boolean.

@codecov

codecov Bot commented Dec 8, 2022

Copy link
Copy Markdown

Codecov Report

Merging #2393 (d68d93e) into main (f804d4b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2393   +/-   ##
=======================================
  Coverage   99.50%   99.50%           
=======================================
  Files         322      322           
  Lines       20146    20161   +15     
=======================================
+ Hits        20046    20061   +15     
  Misses        100      100           
Impacted Files Coverage Δ
...ndard/transform/binary/multiply_numeric_boolean.py 100.00% <100.00%> (ø)
...s/tests/primitive_tests/test_transform_features.py 99.87% <100.00%> (+<0.01%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

multiply_numeric_boolean = MultiplyNumericBoolean()
for input in test_cases:
vals = pd.Series(input["val"])
vals = pd.Series(input["val"]).astype("Int64")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't covert here the input with a single pd.NA value ends up with a dtype of object which would never happen in real usage. An object input to the primitive also causes an error.

@gsheni gsheni requested a review from a team December 8, 2022 17:20
Comment thread pyproject.toml
"holidays >= 0.13",
"numpy >= 1.21.0",
"packaging >= 20.0",
"packaging >= 20.0, <22.0",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary for testing - can remove here or in PR #2389 depending on timing of merges.

Comment thread featuretools/tests/primitive_tests/test_transform_features.py Outdated
Comment thread docs/source/release_notes.rst Outdated
result = vals.where(mask, mask.replace({False: 0}))
result = vals * bools.astype("Int64")
# Replace all pd.NA with np.nan to avoid WW init error
result = result.replace({pd.NA: np.nan})

@sbadithe sbadithe Dec 8, 2022

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this line? If so, is it for a particular case? In the doctest, the result has a pd.NA.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. On the surface it seems like it is no longer necessary, but let me look a bit closer.

One thing I'm concerned about - the doc test doesn't actually initialize Woodwork, that happens later in the CFM process and this comment said it was done to avoid a WW init error. The doctest wouldn't cover that, and also the doctest example doesn't contain a np.nan because the return type is Int64 which cannot hold np.nan, so the replace call does nothing in that case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran DFS/CFM using a dataframe that had all the column dtype combinations that we should run into, and it worked successfully. That leads me to believe this replace call is no longer required. I'll try to remove and see if our CI passes for the min package versions as well.

Here are the features that were generated and computed successfully:

[<Feature: int>,
 <Feature: Int_no_null>,
 <Feature: Int_null>,
 <Feature: float>,
 <Feature: float_null>,
 <Feature: bool>,
 <Feature: boolean_no_null>,
 <Feature: boolean_null>,
 <Feature: bool * Int_no_null>,
 <Feature: bool * Int_null>,
 <Feature: bool * float>,
 <Feature: bool * float_null>,
 <Feature: bool * int>,
 <Feature: boolean_no_null * Int_no_null>,
 <Feature: boolean_no_null * Int_null>,
 <Feature: boolean_no_null * float>,
 <Feature: boolean_no_null * float_null>,
 <Feature: boolean_no_null * int>,
 <Feature: boolean_null * Int_no_null>,
 <Feature: boolean_null * Int_null>,
 <Feature: boolean_null * float>,
 <Feature: boolean_null * float_null>,
 <Feature: boolean_null * int>]

@thehomebrewnerd thehomebrewnerd enabled auto-merge (squash) December 8, 2022 19:28
@thehomebrewnerd thehomebrewnerd merged commit 722fbe8 into main Dec 8, 2022
@thehomebrewnerd thehomebrewnerd deleted the issue2390-multiply-numeric-boolean-bug branch December 8, 2022 19:28
@thehomebrewnerd thehomebrewnerd mentioned this pull request Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MultiplyNumericBoolean primitive doesn't work properly with BooleanNullable inputs on recent pandas versions

2 participants