Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for pandas 1.5.0 compatibility #2291

Merged
merged 16 commits into from
Sep 21, 2022
Merged

Updates for pandas 1.5.0 compatibility #2291

merged 16 commits into from
Sep 21, 2022

Conversation

thehomebrewnerd
Copy link
Contributor

Various changes needed for compatibility with pandas 1.5.0.

@thehomebrewnerd thehomebrewnerd marked this pull request as draft September 19, 2022 19:05
@codecov
Copy link

codecov bot commented Sep 20, 2022

Codecov Report

Merging #2291 (89bd9b8) into main (9d6ec35) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #2291      +/-   ##
==========================================
- Coverage   99.32%   99.31%   -0.02%     
==========================================
  Files         148      148              
  Lines       17706    17706              
==========================================
- Hits        17586    17584       -2     
- Misses        120      122       +2     
Impacted Files Coverage Δ
...retools/primitives/standard/transform_primitive.py 100.00% <ø> (ø)
...ools/primitives/standard/aggregation_primitives.py 99.68% <100.00%> (ø)
...utational_backend/test_calculate_feature_matrix.py 100.00% <100.00%> (ø)
...ools/tests/primitive_tests/test_dask_primitives.py 65.11% <100.00%> (ø)
featuretools/tests/synthesis/test_dfs_method.py 100.00% <100.00%> (ø)
featuretools/computational_backends/utils.py 95.65% <0.00%> (-0.97%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@thehomebrewnerd thehomebrewnerd marked this pull request as ready for review September 20, 2022 16:20
@thehomebrewnerd
Copy link
Contributor Author

thehomebrewnerd commented Sep 20, 2022

From what I can tell, the change in coverage is due a change in an error raised by pandas. What was previously an OverflowError is now a ValueError (technically an OutOfBoundDatetimeError which is a subclass of ValueError). We have logic in place for handling both, but because coverage is running on the new version of pandas, the logic branch for handling the OverflowError doesn't get hit now, but it would with older versions of pandas.

@@ -653,8 +653,8 @@ class Lag(TransformPrimitive):
You can specify the number of periods to shift the values

>>> lag_periods = Lag(periods=3)
>>> lag_periods(["hello", "world", "test", "foo", "bar"], pd.Series(pd.date_range(start="2020-01-01", periods=5, freq='D'))).tolist()
[nan, nan, nan, 'hello', 'world']
>>> lag_periods([True, False, False, True, True], pd.Series(pd.date_range(start="2020-01-01", periods=5, freq='D'))).tolist()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a string lag and a boolean lag in the example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I'm not sure it's so simple: #2291 (comment)

If we added a string lag, the doctest wouldn't pass for all of the pandas version we currently support because of the None vs nan output difference. Unless we added some extra conversion to the output to make sure we always get the same missing value representation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to point out, that once Woodwork gets initialized in the feature matrix we will always end up with the same missing value representation, but since that doesn't happen at the primitive function level, this difference in output will be present when calling it like we do in the doctest examples.

Copy link
Contributor

@gsheni gsheni Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Let's put a note in the docstring about this OR comment in the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment: b8e5cfe

@thehomebrewnerd thehomebrewnerd merged commit f6b98e3 into main Sep 21, 2022
@thehomebrewnerd thehomebrewnerd deleted the pandas-1.5.0 branch September 21, 2022 15:41
@rwedge rwedge mentioned this pull request Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants