Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for Pandas 1.1.0 #1079

Merged
merged 52 commits into from
Jul 30, 2020
Merged

Fixes for Pandas 1.1.0 #1079

merged 52 commits into from
Jul 30, 2020

Conversation

thehomebrewnerd
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd commented Jul 21, 2020

Fixes #1072

  • Update wrangle.py to change handling for special date offsets to support both older and newer versions of pandas
  • Use dask single-threaded scheduler for compute calls to avoid issue with Dask tests failing with threaded scheduler - see Dask Issue 6454
  • Bump pandas minimum version to 1.0.0 as changes to needed in wrangle.py to work with 1.1.0 will cause failures with older versions.
  • Bump dask minimum version to 2.12.0 to avoid error that happens with version 2.11.0 which does not implement dd.to_numeric()

@codecov
Copy link

codecov bot commented Jul 21, 2020

Codecov Report

Merging #1079 into main will decrease coverage by 0.01%.
The diff coverage is 94.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1079      +/-   ##
==========================================
- Coverage   98.36%   98.35%   -0.02%     
==========================================
  Files         126      126              
  Lines       13158    13161       +3     
==========================================
+ Hits        12943    12944       +1     
- Misses        215      217       +2     
Impacted Files Coverage Δ
featuretools/utils/wrangle.py 75.25% <85.71%> (-0.79%) ⬇️
...mputational_backend/test_feature_set_calculator.py 97.93% <100.00%> (ø)
featuretools/tests/conftest.py 100.00% <100.00%> (ø)
...ools/primitives/standard/aggregation_primitives.py 96.92% <0.00%> (-0.31%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3569038...2a557e0. Read the comment docs.

@thehomebrewnerd
Copy link
Contributor Author

The fixes implemented in this PR were verified to work with pandas versions as old as v1.0.0, but they fail on versions older than that.

@thehomebrewnerd
Copy link
Contributor Author

Based on limited testing, this code chunk_sum = chunk_sum.astype('int64') identified as a codecov issue appears to be hit with pandas 1.0.0 or 1.0.5, but not with pandas 1.1.0.

@thehomebrewnerd thehomebrewnerd changed the title Fixes for Pandas release candidate 1.1.0rc0 Fixes for Pandas 1.1.0 Jul 29, 2020
@thehomebrewnerd thehomebrewnerd requested a review from rwedge July 29, 2020 19:06
@rwedge
Copy link
Contributor

rwedge commented Jul 29, 2020

For all these scheduler="single-threaded" options, could we set that via an environmental variable or something so we don't need to declare it each compute call

If not, I think we should make a SCHEDULER_TYPE variable and set that to "singlet-threaded", then use SCHEDULER-TYPE everywhere instead

requirements.txt Outdated
@@ -1,6 +1,6 @@
scipy>=0.13.3
numpy>=1.13.3
pandas>=0.24.1,<1.1.0
pandas>=1.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any changes that require using 1.0 or higher?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - the changes in wrangle.py don't work with older versions because of the way the date offsets are handled. There might be a way to come up with something that would work with both, but I'd have to spend some more time looking into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be able to use a different if-else construct to check for the two scenarios that were present in the previous code. I'll check.

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test featuretools compatibility with Pandas 1.1.0rc0
2 participants