Skip to content

Fix pandas3 compat: use pd.Series(...).value_counts() instead of pd.value_counts(...)#213

Merged
david-cortes-intel merged 1 commit into
IntelPython:mainfrom
cakedev0:fix/compat_with_pandas3
May 27, 2026
Merged

Fix pandas3 compat: use pd.Series(...).value_counts() instead of pd.value_counts(...)#213
david-cortes-intel merged 1 commit into
IntelPython:mainfrom
cakedev0:fix/compat_with_pandas3

Conversation

@cakedev0
Copy link
Copy Markdown
Contributor

Description

pandas 3 stopped exposing pd.value_counts so a couple of places were breaking. I changed to using pd.Series(...).value_counts() instead.


Checklist:

Completeness and readability

  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.

Signed-off-by: Arthur Lacote <arthur.lacote@probabl.ai>
@cakedev0
Copy link
Copy Markdown
Contributor Author

cakedev0 commented May 26, 2026

I'm not sure why the CI is red but I think it's unrelated to my changes, so I'm marking this PR as ready for review.

@cakedev0 cakedev0 marked this pull request as ready for review May 26, 2026 16:20
Copy link
Copy Markdown
Contributor

@david-cortes-intel david-cortes-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. CI errors are unrelated to this change.

But note that there's still other compatibility issues between the latest openml package and pandas>=3 which lead to errors:

  File "/localdisk2/mkl/dcortes/repos/scikit-learn_bench/sklbench/datasets/downloaders.py", line 99, in fetch_and_correct_openml
    x, y, _, _ = dataset.get_data(
                 ^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 820, in get_data
    data, categorical, attribute_names = self._load_data()
                                         ^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 611, in _load_data
    return self._cache_compressed_file_from_file(Path(file_to_load))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 556, in _cache_compressed_file_from_file
    attribute_names, categorical, data = self._parse_data_from_file(data_file)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 584, in _parse_data_from_file
    data, categorical, attribute_names = self._parse_data_from_arff(data_file)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 472, in _parse_data_from_arff
    pd.factorize(type_)[0]
    ^^^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/pandas/core/algorithms.py", line 791, in factorize
    values = _ensure_arraylike(values, func_name="factorize")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/pandas/core/algorithms.py", line 239, in _ensure_arraylike
    raise TypeError(
TypeError: factorize requires a Series, Index, ExtensionArray, np.ndarray or NumpyExtensionArray got list.

(CC @avolkov-intel )

@david-cortes-intel david-cortes-intel merged commit eb5333a into IntelPython:main May 27, 2026
4 of 11 checks passed
@cakedev0
Copy link
Copy Markdown
Contributor Author

there's still other compatibility issues between the latest openml package and pandas>=3 which lead to errors

I don't see those. Do you have a config that reproduces it?

Note: I'm using pixi and I'm installing openml from pypi.

@david-cortes-intel
Copy link
Copy Markdown
Contributor

there's still other compatibility issues between the latest openml package and pandas>=3 which lead to errors

I don't see those. Do you have a config that reproduces it?

Note: I'm using pixi and I'm installing openml from pypi.

It could be triggered like this, if you are interested in contributing to OpenML:

python -m sklbench --config configs/regular/ensemble.json --result-file try.json --filters algorithm:library=sklearn algorithm:device=cpu --prefetch

The openml package in conda-forge requires numpy<2, so right now PyPI is the only option.

@cakedev0
Copy link
Copy Markdown
Contributor Author

Thanks, I will use pandas 2 then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants