Skip to content

Update/pandas 3.0#694

Merged
guerinclement merged 12 commits into
MAIF:masterfrom
dalestee:update/pandas-3.0
Apr 24, 2026
Merged

Update/pandas 3.0#694
guerinclement merged 12 commits into
MAIF:masterfrom
dalestee:update/pandas-3.0

Conversation

@dalestee
Copy link
Copy Markdown
Collaborator

Description

Fixes compatibility issues introduced by the pandas 3.0 migration.

Main changes

pandas 3.0 breaking changes addressed (#677 )

  • StringDtype as default for string columns: pandas 3.0 now infers StringDtype instead of object for string columns and indexes. Added 'check_dtype=False' on tests for consistent results across pandas 2.x and 3.x. As it doesnt change the actual functionality, and we are testing the values and not the types.

  • Read-only arrays from .to_numpy(): pandas 3.0 returns read-only arrays from .to_numpy(). Fixed by adding .copy() after .to_numpy() calls where in-place operations (e.g. .sort()) are performed — in both compare_plot() and the corresponding test.

  • Integer indexing on string-indexed Series: pandas 3.0 no longer falls back to positional indexing when accessing a Series with a string index using integer keys. Fixed by passing .to_numpy() instead of a pandas Series to LIME's explain_instance() in lime_backend.py (both classification and regression branches).

dalestee and others added 8 commits April 21, 2026 09:48
* fix error

* test

* test: action should pass with 3.14 and without 3.9 and 3.10

* other version mods

* test normal

* test as before

* test without 3.9 and 3.10. plus 3.14

* fix error

* test after fixing ruff errors with ruff format

* Fix remaining ruff issues

* replacing isinstance(A, (X, Y)) by isinstance(A, X | Y)

* Format code with ruff

* python tests passed

* fix pyupgrade

* reducing tests

* hook precommit

* added tests names

* test all versions at the same time

* Update shapash/explainer/multi_decorator.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update shapash/decomposition/contributions.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* changing versions for pyupgrade

* ruff

* fix: possible bug where if it was a Series instead of a DataFrame it would crash

* upgrade: more robust syntax

* update: readme

* fix: correcting fallback

* fix: fixing boolean mistake

* fix: fallback for when viewport_data = None

---------

Co-authored-by: 61153a <61153a@slhdg002.maif.local>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
data_path = dirname(dirname(abspath(__file__)))
self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic.pkl"))
if int(pd.__version__.split(".")[0]) >= 3:
print("Using clean_titanic_pandas_3.pkl for pandas version >= 3")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this print

)
expected["pred"] = expected["pred"].astype(int)
assert not pd.testing.assert_frame_equal(expected, output)
assert not pd.testing.assert_frame_equal(expected, output, check_dtype=False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove assert not

dtype=object,
)
assert not pd.testing.assert_frame_equal(expected, output)
assert not pd.testing.assert_frame_equal(expected, output, check_dtype=False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@guerinclement guerinclement linked an issue Apr 23, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Shapash to remain compatible with pandas 3.0 behavioral changes, primarily by relaxing dtype assertions in tests, avoiding in-place mutation on read-only NumPy views, and adjusting LIME input types to avoid pandas indexing changes.

Changes:

  • Relaxed multiple assert_frame_equal checks to ignore dtype differences across pandas 2.x/3.x.
  • Updated LIME backend to pass NumPy arrays (instead of Series) to explain_instance() in some branches.
  • Added a pandas-3-specific integration test fixture and widened the pandas dependency range to <4.0.0.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit_tests/utils/test_columntransformer_backend.py Uses check_dtype=False in frame equality assertions for pandas 3 string dtype changes.
tests/unit_tests/utils/test_category_encoders_backend.py Uses check_dtype=False in multiple inverse-transform assertions.
tests/unit_tests/explainer/test_smart_state.py Switches to assert_frame_equal(..., check_dtype=False) for pandas 3 dtype inference.
tests/unit_tests/explainer/test_smart_plotter.py Copies NumPy arrays before in-place sorting to avoid pandas 3 read-only arrays.
tests/unit_tests/explainer/test_smart_explainer.py Uses check_dtype=False in dataframe comparisons.
tests/integration_tests/test_integration_inverse_tranform.py Loads a pandas-3-specific pickle fixture based on pandas major version.
tests/data/clean_titanic_pandas_3.pkl Adds a pandas 3-specific pickled dataset fixture.
shapash/backend/lime_backend.py Passes x.loc[i].values into LIME for some branches to avoid pandas 3 Series integer-indexing changes.
pyproject.toml Relaxes pandas upper bound from <3.0.0 to <4.0.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread shapash/backend/lime_backend.py Outdated
Comment on lines 48 to 49
exp = explainer.explain_instance(x.loc[i].values, self.model.predict_proba, num_features=x.shape[1])
lime_contrib.append({_transform_name(var_name[0], x): var_name[1] for var_name in exp.as_list()})
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In run_explainer(), only the binary-classification and regression paths were updated to pass a NumPy array into explain_instance(). The multi-class classification path (the elif num_classes > 2: block further down) still calls explainer.explain_instance(x.loc[i], ...) with a pandas Series, so the pandas 3.0 integer-indexing breaking change can still trigger there. Please apply the same to_numpy()/array conversion consistently in the multi-class branch as well (and avoid reusing i as the inner-loop variable, since it currently shadows the outer loop index).

Copilot uses AI. Check for mistakes.
Comment thread shapash/backend/lime_backend.py Outdated

else:
exp = explainer.explain_instance(x.loc[i], self.model.predict, num_features=x.shape[1])
exp = explainer.explain_instance(x.loc[i].values, self.model.predict, num_features=x.shape[1])
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says this fix uses .to_numpy(), but the code uses .values. For pandas objects, .to_numpy() is the recommended, explicit API (and allows copy= control), whereas .values can return different types depending on dtype and is discouraged. Consider switching to x.loc[i].to_numpy() (optionally with copy=True if LIME mutates the array) to align behavior across dtypes/pandas versions.

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +35
if int(pd.__version__.split(".")[0]) >= 3:
self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic_pandas_3.pkl"))
else:
self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic.pkl"))
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selecting a different pickle for pandas>=3 is fine, but clean_titanic_pandas_3.pkl appears to contain ArrowStringArray data (it references pyarrow in the pickle payload). This makes the integration test implicitly depend on pyarrow being installed when running with pandas 3, while the test extra in pyproject.toml doesn't include pyarrow. Either ensure pyarrow is part of the test dependencies, or regenerate the pandas-3 fixture using python string storage so it can be unpickled without pyarrow.

Suggested change
if int(pd.__version__.split(".")[0]) >= 3:
self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic_pandas_3.pkl"))
else:
self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic.pkl"))
pandas_major_version = int(pd.__version__.split(".")[0])
default_pickle_path = join(data_path, "data", "clean_titanic.pkl")
if pandas_major_version >= 3:
pandas_3_pickle_path = join(data_path, "data", "clean_titanic_pandas_3.pkl")
try:
self.ds_titanic_clean = pd.read_pickle(pandas_3_pickle_path)
except (ImportError, ModuleNotFoundError) as exc:
if "pyarrow" not in str(exc):
raise
self.ds_titanic_clean = pd.read_pickle(default_pickle_path)
else:
self.ds_titanic_clean = pd.read_pickle(default_pickle_path)

Copilot uses AI. Check for mistakes.
@guerinclement guerinclement merged commit d6a6d9b into MAIF:master Apr 24, 2026
@dalestee dalestee deleted the update/pandas-3.0 branch April 27, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Remove Pandas < 3.0 constraint

3 participants