Update/pandas 3.0 by dalestee · Pull Request #694 · MAIF/shapash

dalestee · 2026-04-23T07:24:40Z

Description

Fixes compatibility issues introduced by the pandas 3.0 migration.

Main changes

pandas 3.0 breaking changes addressed (#677 )

StringDtype as default for string columns: pandas 3.0 now infers StringDtype instead of object for string columns and indexes. Added 'check_dtype=False' on tests for consistent results across pandas 2.x and 3.x. As it doesnt change the actual functionality, and we are testing the values and not the types.
Read-only arrays from .to_numpy(): pandas 3.0 returns read-only arrays from .to_numpy(). Fixed by adding .copy() after .to_numpy() calls where in-place operations (e.g. .sort()) are performed — in both compare_plot() and the corresponding test.
Integer indexing on string-indexed Series: pandas 3.0 no longer falls back to positional indexing when accessing a Series with a string index using integer keys. Fixed by passing .to_numpy() instead of a pandas Series to LIME's explain_instance() in lime_backend.py (both classification and regression branches).

* fix error * test * test: action should pass with 3.14 and without 3.9 and 3.10 * other version mods * test normal * test as before * test without 3.9 and 3.10. plus 3.14 * fix error * test after fixing ruff errors with ruff format * Fix remaining ruff issues * replacing isinstance(A, (X, Y)) by isinstance(A, X | Y) * Format code with ruff * python tests passed * fix pyupgrade * reducing tests * hook precommit * added tests names * test all versions at the same time * Update shapash/explainer/multi_decorator.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update shapash/decomposition/contributions.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * changing versions for pyupgrade * ruff * fix: possible bug where if it was a Series instead of a DataFrame it would crash * upgrade: more robust syntax * update: readme * fix: correcting fallback * fix: fixing boolean mistake * fix: fallback for when viewport_data = None --------- Co-authored-by: 61153a <61153a@slhdg002.maif.local> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

guerinclement · 2026-04-23T07:28:04Z

        data_path = dirname(dirname(abspath(__file__)))
-        self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic.pkl"))
+        if int(pd.__version__.split(".")[0]) >= 3:
+            print("Using clean_titanic_pandas_3.pkl for pandas version >= 3")


please remove this print

guerinclement · 2026-04-23T07:29:19Z

        )
        expected["pred"] = expected["pred"].astype(int)
-        assert not pd.testing.assert_frame_equal(expected, output)
+        assert not pd.testing.assert_frame_equal(expected, output, check_dtype=False)


remove assert not

guerinclement · 2026-04-23T07:31:12Z

            dtype=object,
        )
-        assert not pd.testing.assert_frame_equal(expected, output)
+        assert not pd.testing.assert_frame_equal(expected, output, check_dtype=False)


Copilot

Pull request overview

This PR updates Shapash to remain compatible with pandas 3.0 behavioral changes, primarily by relaxing dtype assertions in tests, avoiding in-place mutation on read-only NumPy views, and adjusting LIME input types to avoid pandas indexing changes.

Changes:

Relaxed multiple assert_frame_equal checks to ignore dtype differences across pandas 2.x/3.x.
Updated LIME backend to pass NumPy arrays (instead of Series) to explain_instance() in some branches.
Added a pandas-3-specific integration test fixture and widened the pandas dependency range to <4.0.0.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/unit_tests/utils/test_columntransformer_backend.py	Uses `check_dtype=False` in frame equality assertions for pandas 3 string dtype changes.
tests/unit_tests/utils/test_category_encoders_backend.py	Uses `check_dtype=False` in multiple inverse-transform assertions.
tests/unit_tests/explainer/test_smart_state.py	Switches to `assert_frame_equal(..., check_dtype=False)` for pandas 3 dtype inference.
tests/unit_tests/explainer/test_smart_plotter.py	Copies NumPy arrays before in-place sorting to avoid pandas 3 read-only arrays.
tests/unit_tests/explainer/test_smart_explainer.py	Uses `check_dtype=False` in dataframe comparisons.
tests/integration_tests/test_integration_inverse_tranform.py	Loads a pandas-3-specific pickle fixture based on pandas major version.
tests/data/clean_titanic_pandas_3.pkl	Adds a pandas 3-specific pickled dataset fixture.
shapash/backend/lime_backend.py	Passes `x.loc[i].values` into LIME for some branches to avoid pandas 3 Series integer-indexing changes.
pyproject.toml	Relaxes pandas upper bound from `<3.0.0` to `<4.0.0`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-23T07:58:37Z

+                    exp = explainer.explain_instance(x.loc[i].values, self.model.predict_proba, num_features=x.shape[1])
                    lime_contrib.append({_transform_name(var_name[0], x): var_name[1] for var_name in exp.as_list()})


In run_explainer(), only the binary-classification and regression paths were updated to pass a NumPy array into explain_instance(). The multi-class classification path (the elif num_classes > 2: block further down) still calls explainer.explain_instance(x.loc[i], ...) with a pandas Series, so the pandas 3.0 integer-indexing breaking change can still trigger there. Please apply the same to_numpy()/array conversion consistently in the multi-class branch as well (and avoid reusing i as the inner-loop variable, since it currently shadows the outer loop index).

Copilot · 2026-04-23T07:58:37Z


            else:
-                exp = explainer.explain_instance(x.loc[i], self.model.predict, num_features=x.shape[1])
+                exp = explainer.explain_instance(x.loc[i].values, self.model.predict, num_features=x.shape[1])


The PR description says this fix uses .to_numpy(), but the code uses .values. For pandas objects, .to_numpy() is the recommended, explicit API (and allows copy= control), whereas .values can return different types depending on dtype and is discouraged. Consider switching to x.loc[i].to_numpy() (optionally with copy=True if LIME mutates the array) to align behavior across dtypes/pandas versions.

Copilot · 2026-04-23T07:58:37Z

+        if int(pd.__version__.split(".")[0]) >= 3:
+            self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic_pandas_3.pkl"))
+        else:
+            self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic.pkl"))


Selecting a different pickle for pandas>=3 is fine, but clean_titanic_pandas_3.pkl appears to contain ArrowStringArray data (it references pyarrow in the pickle payload). This makes the integration test implicitly depend on pyarrow being installed when running with pandas 3, while the test extra in pyproject.toml doesn't include pyarrow. Either ensure pyarrow is part of the test dependencies, or regenerate the pandas-3 fixture using python string storage so it can be unpickled without pyarrow.

Suggested change

if int(pd.__version__.split(".")[0]) >= 3:

self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic_pandas_3.pkl"))

else:

self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic.pkl"))

pandas_major_version = int(pd.__version__.split(".")[0])

default_pickle_path = join(data_path, "data", "clean_titanic.pkl")

if pandas_major_version >= 3:

pandas_3_pickle_path = join(data_path, "data", "clean_titanic_pandas_3.pkl")

try:

self.ds_titanic_clean = pd.read_pickle(pandas_3_pickle_path)

except (ImportError, ModuleNotFoundError) as exc:

if "pyarrow" not in str(exc):

raise

self.ds_titanic_clean = pd.read_pickle(default_pickle_path)

else:

self.ds_titanic_clean = pd.read_pickle(default_pickle_path)

dalestee and others added 8 commits April 21, 2026 09:48

Merge branch 'MAIF:master' into master

0b2283a

fix: lime error

3386efb

adding version

9a62d1e

correct read file

0604c76

fix: integration tests

b25e1db

fix: error only read

be7636f

fix: type error due to pandas 3.0

f4b8091

guerinclement requested changes Apr 23, 2026

View reviewed changes

guerinclement linked an issue Apr 23, 2026 that may be closed by this pull request

Bug: Remove Pandas < 3.0 constraint #677

Closed

dalestee added 2 commits April 23, 2026 09:35

fix: remove print

d6a154b

fix: remove useless assert not

a2bf7e3

guerinclement requested a review from Copilot April 23, 2026 07:54

Copilot started reviewing on behalf of guerinclement April 23, 2026 07:54 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

dalestee and others added 2 commits April 23, 2026 11:32

fex: applying pr changes

a43700c

Merge branch 'master' into update/pandas-3.0

126047a

guerinclement approved these changes Apr 24, 2026

View reviewed changes

guerinclement merged commit d6a6d9b into MAIF:master Apr 24, 2026

dalestee deleted the update/pandas-3.0 branch April 27, 2026 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update/pandas 3.0#694

Update/pandas 3.0#694
guerinclement merged 12 commits into
MAIF:masterfrom
dalestee:update/pandas-3.0

dalestee commented Apr 23, 2026

Uh oh!

guerinclement Apr 23, 2026

Uh oh!

guerinclement Apr 23, 2026

Uh oh!

guerinclement Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		exp = explainer.explain_instance(x.loc[i].values, self.model.predict_proba, num_features=x.shape[1])
		lime_contrib.append({_transform_name(var_name[0], x): var_name[1] for var_name in exp.as_list()})

-        if int(pd.__version__.split(".")[0]) >= 3:
-            self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic_pandas_3.pkl"))
-        else:
-            self.ds_titanic_clean = pd.read_pickle(join(data_path, "data", "clean_titanic.pkl"))
+        pandas_major_version = int(pd.__version__.split(".")[0])
+        default_pickle_path = join(data_path, "data", "clean_titanic.pkl")
+        if pandas_major_version >= 3:
+            pandas_3_pickle_path = join(data_path, "data", "clean_titanic_pandas_3.pkl")
+            try:
+                self.ds_titanic_clean = pd.read_pickle(pandas_3_pickle_path)
+            except (ImportError, ModuleNotFoundError) as exc:
+                if "pyarrow" not in str(exc):
+                    raise
+                self.ds_titanic_clean = pd.read_pickle(default_pickle_path)
+        else:
+            self.ds_titanic_clean = pd.read_pickle(default_pickle_path)

Conversation

dalestee commented Apr 23, 2026

Description

Main changes

pandas 3.0 breaking changes addressed (#677 )

Uh oh!

guerinclement Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

guerinclement Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

guerinclement Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants