-
Notifications
You must be signed in to change notification settings - Fork 907
Support pandas 1.4.0 #1881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pandas 1.4.0 #1881
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1881 +/- ##
==========================================
- Coverage 98.83% 98.76% -0.08%
==========================================
Files 147 147
Lines 16291 16307 +16
==========================================
+ Hits 16102 16106 +4
- Misses 189 201 +12
Continue to review full report at Codecov.
|
|
The I don't know what would have changed between pandas versions to cause this change in koalas. I've confirmed this happens when you switch to pandas 1.4.0 on both koalas 1.8.1 and 1.8.2, and I confirmed this only happens for the |
| dask_computed_fm = dask_fm.compute().set_index('id').loc[fm.index][fm.columns] | ||
| # update the type of the future index column so it doesn't conflict with the pandas fm | ||
| dask_fm = dask_fm.compute().astype({'id': 'int64'}) | ||
| dask_computed_fm = dask_fm.set_index('id').loc[fm.index][fm.columns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the failing tests were because of a changed (fixed, actually, I think) behavior in set_index. Previously, the following would result in the index's dtype no longer being Int64, and now it's retained as Int64, but that means it now will get caught in the situation that came from #1810, so updating the dtype to be int64 to begin with fixes this problem.
df = pd.DataFrame({
'a': pd.Series([1, 2, 3,4], dtype='int64'),
'b': pd.Series([1, 2, 3,3], dtype='Int64'),
})
df.set_index('b')There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to revert some of the check_dtype=False changes that were introduced in #1810?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are columns beyond id that will have different dtypes from the pandas dataframe (values, for example, which is Integer for Pandas and IntegerNullable otherwise because of the changes in #1810 ). So we need to keep it in
We may be getting bit by koalas being in maintenance mode and no longer keeping up with pandas releases any more. Can we change the way we create the Koalas dataframe for that test to side-step this issue? If not, we could restrict the pandas version if koalas is installed like we are are doing in Woodwork right now. |
@thehomebrewnerd We can definitely find a way around this test's failure since the test is just checking that, if you add a dataframe into a koalas entityset, it matches the original pandas dataframe in the entityset. I'm thinking of several options to get around this issue:
I think I like 2 the best from a simplicity standpoint, but 3 from a completeness standpoint and being able to document this behavior somewhere. But I'm not convinced that just because we can means we should allow pandas 1.4.0 right now--not having null values maintained when converting from pandas to koalas seems like a big problem to me. Since this change only showed up between pandas versions, maybe we can open an issue in pandas and, if they are able to put out a bug fix, we can restrict our pandas version to that one later on. If it's not something they can handle, then we can allow this pandas version. Thoughts @rwedge @gsheni ? |
|
Not sure what the codecov changes are: https://app.codecov.io/gh/alteryx/featuretools/compare/1881/changes Seems like |
@tamargrey not sure why this has changed but |
docs/source/release_notes.rst
Outdated
| * Add ``__setitem__`` method to overload ``add_dataframe`` method on EntitySet (:pr:`1862`) | ||
| * Temporarily restrict woodwork max version (:pr:`1872`) | ||
| * Split Datetime and LatLong primitives into separate files (:pr:`1861`) | ||
| * Update to add support for pandas version 1.4.0 (:pr:`1881`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put this in the enhancement category
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
rwedge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds support for pandas 1.4.0
closes #1865