Skip to content

Added support for pandas 2 #4216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 27, 2023
Merged

Added support for pandas 2 #4216

merged 1 commit into from
Jul 27, 2023

Conversation

christopherbunn
Copy link
Contributor

@christopherbunn christopherbunn commented Jun 29, 2023

Resolves #4252

@codecov
Copy link

codecov bot commented Jun 29, 2023

Codecov Report

Merging #4216 (bd6ccba) into main (b398501) will decrease coverage by 0.0%.
The diff coverage is 97.3%.

❗ Current head bd6ccba differs from pull request most recent head 6b42b5f. Consider uploading reports for the commit 6b42b5f to get more accurate results

@@           Coverage Diff           @@
##            main   #4216     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        349     349             
  Lines      38410   38413      +3     
=======================================
+ Hits       38291   38293      +2     
- Misses       119     120      +1     
Files Changed Coverage Δ
...components/transformers/encoders/onehot_encoder.py 100.0% <ø> (ø)
evalml/preprocessing/utils.py 100.0% <ø> (ø)
...alml/tests/component_tests/test_one_hot_encoder.py 100.0% <ø> (ø)
evalml/tests/component_tests/test_oversampler.py 100.0% <ø> (ø)
evalml/tests/component_tests/test_undersampler.py 100.0% <ø> (ø)
...s/model_understanding_tests/test_visualizations.py 100.0% <ø> (ø)
...omponents/transformers/preprocessing/decomposer.py 99.3% <50.0%> (-0.7%) ⬇️
evalml/model_understanding/visualizations.py 100.0% <100.0%> (ø)
...ransformers/preprocessing/polynomial_decomposer.py 100.0% <100.0%> (ø)
...nents/transformers/preprocessing/stl_decomposer.py 100.0% <100.0%> (ø)
... and 12 more

@gsheni
Copy link
Contributor

gsheni commented Jul 25, 2023

@christopherbunn Woodwork and Featuretools now support pandas 2.0

@christopherbunn christopherbunn force-pushed the pandas_2_upgrade branch 3 times, most recently from 3ad7710 to 6a4f6e9 Compare July 26, 2023 20:40
Copy link
Contributor Author

@christopherbunn christopherbunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few points to consider:

Comment on lines 154 to 156
# Only need to handle nullable types on pandas < 2. Kept for backwards compatibility with pandas 1.x.
if pd.__version__[0] == "1":
X, y = cls._handle_nullable_types(cls, X, y)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this in so that we can retain pandas 1.x compatibility. If we need to remove it in the future , we should also remove this line too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's not covered since we only run the codecov parts on the latest dependencies (which is pandas 2.0) and this section only runs on the min deps CI.

>>> y = pd.Series([True, False, False, False, True])
>>> target_distribution(y)
>>> print(target_distribution(y).to_string())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This + the one on line 187 is needed since the name of the resulting series is different between pandas versions.

Comment on lines 499 to 504
marks=pytest.mark.xfail(
condition=pd.__version__[0] == "1",
strict=True,
raises=AssertionError,
reason="pandas 1.x does not recognize np.Nan in Float64 subtracted_floats.",
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only necessary to maintain backwards compatibility

Comment on lines 765 to 767
if index_type == "integer_index":
assert pd.api.types.is_integer_dtype(y[y_t_new.index].index)
assert pd.api.types.is_integer_dtype(output_inverse_y.index)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to removing this since we already assume the index are ints.

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! We need to make sure there isn't any accidentally deleted code and figure out what the minimum version should be before this can go in.

Copy link
Contributor Author

@christopherbunn christopherbunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, most of the code regressions are due to some bad merge work with your PR @eccabay 😅

@christopherbunn christopherbunn requested a review from eccabay July 27, 2023 15:00
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but +1 on @eccabay 's comments re checking for version

@christopherbunn christopherbunn enabled auto-merge (squash) July 27, 2023 17:04
@christopherbunn christopherbunn changed the title Add support for pandas 2 Added support for pandas 2 Jul 27, 2023
@christopherbunn christopherbunn merged commit 5b80a8e into main Jul 27, 2023
@christopherbunn christopherbunn deleted the pandas_2_upgrade branch July 27, 2023 17:43
remyogasawara pushed a commit that referenced this pull request Aug 2, 2023
* Squashed changes

* Ignored index

* Disabled column checking

* Reverted deleted code

* Updated pyproject.toml

* Replaced version check code
remyogasawara pushed a commit that referenced this pull request Aug 2, 2023
* Squashed changes

* Ignored index

* Disabled column checking

* Reverted deleted code

* Updated pyproject.toml

* Replaced version check code
remyogasawara pushed a commit that referenced this pull request Aug 2, 2023
* Squashed changes

* Ignored index

* Disabled column checking

* Reverted deleted code

* Updated pyproject.toml

* Replaced version check code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for pandas 2.0 to EvalML
4 participants