Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for Pandas 1.1.0 #1079

Merged
merged 52 commits into from Jul 30, 2020
Merged

Fixes for Pandas 1.1.0 #1079

merged 52 commits into from Jul 30, 2020

Conversation

thehomebrewnerd
Copy link
Collaborator

@thehomebrewnerd thehomebrewnerd commented Jul 21, 2020

Fixes #1072

  • Update wrangle.py to change handling for special date offsets to support both older and newer versions of pandas
  • Use dask single-threaded scheduler for compute calls to avoid issue with Dask tests failing with threaded scheduler - see Dask Issue 6454
  • Bump pandas minimum version to 1.0.0 as changes to needed in wrangle.py to work with 1.1.0 will cause failures with older versions.
  • Bump dask minimum version to 2.12.0 to avoid error that happens with version 2.11.0 which does not implement dd.to_numeric()

@codecov
Copy link

codecov bot commented Jul 21, 2020

Codecov Report

Merging #1079 into main will decrease coverage by 0.01%.
The diff coverage is 94.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1079      +/-   ##
==========================================
- Coverage   98.36%   98.35%   -0.02%     
==========================================
  Files         126      126              
  Lines       13158    13161       +3     
==========================================
+ Hits        12943    12944       +1     
- Misses        215      217       +2     
Impacted Files Coverage Δ
featuretools/utils/wrangle.py 75.25% <85.71%> (-0.79%) ⬇️
...mputational_backend/test_feature_set_calculator.py 97.93% <100.00%> (ø)
featuretools/tests/conftest.py 100.00% <100.00%> (ø)
...ools/primitives/standard/aggregation_primitives.py 96.92% <0.00%> (-0.31%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3569038...2a557e0. Read the comment docs.

@thehomebrewnerd
Copy link
Collaborator Author

The fixes implemented in this PR were verified to work with pandas versions as old as v1.0.0, but they fail on versions older than that.

@thehomebrewnerd
Copy link
Collaborator Author

Based on limited testing, this code chunk_sum = chunk_sum.astype('int64') identified as a codecov issue appears to be hit with pandas 1.0.0 or 1.0.5, but not with pandas 1.1.0.

@thehomebrewnerd thehomebrewnerd changed the title Fixes for Pandas release candidate 1.1.0rc0 Fixes for Pandas 1.1.0 Jul 29, 2020
@rwedge
Copy link
Collaborator

rwedge commented Jul 29, 2020

For all these scheduler="single-threaded" options, could we set that via an environmental variable or something so we don't need to declare it each compute call

If not, I think we should make a SCHEDULER_TYPE variable and set that to "singlet-threaded", then use SCHEDULER-TYPE everywhere instead

requirements.txt Outdated
@@ -1,6 +1,6 @@
scipy>=0.13.3
numpy>=1.13.3
pandas>=0.24.1,<1.1.0
pandas>=1.0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any changes that require using 1.0 or higher?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - the changes in wrangle.py don't work with older versions because of the way the date offsets are handled. There might be a way to come up with something that would work with both, but I'd have to spend some more time looking into it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be able to use a different if-else construct to check for the two scenarios that were present in the previous code. I'll check.

Copy link
Collaborator

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@thehomebrewnerd thehomebrewnerd merged commit 1acf31a into main Jul 30, 2020
1 of 3 checks passed
@thehomebrewnerd thehomebrewnerd deleted the pandas-1.1.0rc0 branch July 30, 2020 16:19
@rwedge rwedge mentioned this pull request Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test featuretools compatibility with Pandas 1.1.0rc0
2 participants