Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pandas workaround back in #1731

Merged
merged 1 commit into from
Oct 12, 2021
Merged

Add pandas workaround back in #1731

merged 1 commit into from
Oct 12, 2021

Conversation

davesque
Copy link
Contributor

@davesque davesque commented Oct 8, 2021

Pull Request Description

We realized earlier today that the pandas workarounds that were removed by #1677 and #1679 are probably still needed as the bug reported in pandas-dev/pandas#22501 may still exist. More investigation is needed to understand exactly what causes data frames with misaligned categorical indices to be merged incorrectly. For now, we should at least revert the changes made by the two aforementioned PRs.

Left to do:

  • More investigation into what causes the categorical merge bug to show up
  • Regression tests that check for improper merges in _calculate_agg_features resulting from the categorical merge bug
  • Add more details in this ticket about the code that revealed the issue

@davesque davesque changed the base branch from main to woodwork-integration October 8, 2021 22:14
@gsheni gsheni requested review from thehomebrewnerd, tamargrey and a team and removed request for thehomebrewnerd and tamargrey October 12, 2021 13:43
@rwedge
Copy link
Contributor

rwedge commented Oct 12, 2021

@davesque this can merge into main, woodwork-integration branch has already been merged

@davesque davesque changed the base branch from woodwork-integration to main October 12, 2021 15:31
Effectively reverts #1677 and #1679.
@codecov
Copy link

codecov bot commented Oct 12, 2021

Codecov Report

Merging #1731 (9e72515) into main (1909f2d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1731   +/-   ##
=======================================
  Coverage   98.69%   98.69%           
=======================================
  Files         138      138           
  Lines       15361    15368    +7     
=======================================
+ Hits        15161    15168    +7     
  Misses        200      200           
Impacted Files Coverage Δ
...s/computational_backends/feature_set_calculator.py 98.44% <100.00%> (+0.01%) ⬆️
featuretools/entityset/entityset.py 99.21% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1909f2d...9e72515. Read the comment docs.

Copy link
Contributor

@thehomebrewnerd thehomebrewnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davesque No issues with the reverted changes, but just a couple comments:

  1. Have you been able to confirm that reverting these changes fixes the bug you were seeing?
  2. Can we create an issue to investigate this further and try to add tests to our test suite that will fail if remove this code with the current pandas version?

@davesque
Copy link
Contributor Author

@thehomebrewnerd

1. Have you been able to confirm that reverting these changes fixes the bug you were seeing?

Yep, it fixes the issue. Confirmed that before making the PR.

2. Can we create an issue to investigate this further and try to add tests to our test suite that will fail if remove this code with the current pandas version?

There's a bullet point in the above PR description relating to this. I think the tests should be part of this PR.

@davesque davesque changed the title Add pandas workaround back in [WIP, DO NOT MERGE] Add pandas workaround back in Oct 12, 2021
@davesque davesque changed the title [WIP, DO NOT MERGE] Add pandas workaround back in Add pandas workaround back in Oct 12, 2021
@davesque davesque merged commit 3ff8349 into main Oct 12, 2021
@davesque davesque deleted the add-pandas-workaround branch October 12, 2021 16:41
@thehomebrewnerd thehomebrewnerd mentioned this pull request Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants