Skip to content

Conversation

@weiji14
Copy link
Member

@weiji14 weiji14 commented Nov 3, 2025

Description of proposed changes

Change from passing in dtype as "string[pyarrow_numpy]" to pd.StringDtype(storage="pyarrow", na_value=np.nan). Fixes TypeError: data type 'string[pyarrow_numpy]' not understood on GMT Dev Tests, e.g. at https://github.com/GenericMappingTools/pygmt/actions/runs/18702196024/job/53333116958#step:15:1167

FutureWarning previously raised, e.g. at https://github.com/GenericMappingTools/pygmt/actions/runs/18514508242/job/52762165871#step:15:1074:

pygmt/tests/test_clib_to_numpy.py::test_to_numpy_pandas_string[string[pyarrow_numpy]]
  /home/runner/work/pygmt/pygmt/pygmt/tests/test_clib_to_numpy.py:385: FutureWarning: The 'pyarrow_numpy' storage option name is deprecated and will be removed in pandas 3.0. Use 'pd.StringDtype(storage="pyarrow", na_value=np.nan)' to construct the same dtype.
  Or enable the 'pd.options.future.infer_string = True' option globally and use the "str" alias as a shorthand notation to specify a dtype (instead of "string[pyarrow_numpy]").
    array = pd.Series(["abc", "defg", "12345"], dtype=dtype)

Xref pandas-dev/pandas#60152

Fixes #

Preview:

Guidelines

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash command is:

  • /format: automatically format and lint the code

@weiji14 weiji14 added this to the 0.18.0 milestone Nov 3, 2025
@weiji14 weiji14 self-assigned this Nov 3, 2025
@weiji14 weiji14 added maintenance Boring but important stuff for the core devs run/test-gmt-dev Trigger the GMT Dev Tests workflow in PR labels Nov 3, 2025
pytest.param("string[pyarrow]", marks=skip_if_no(package="pyarrow")),
pytest.param("string[pyarrow_numpy]", marks=skip_if_no(package="pyarrow")),
pytest.param(
pd.StringDtype(storage="pyarrow", na_value=np.nan),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The na_value param was only added in pandas 2.3+ (https://pandas.pydata.org/pandas-docs/version/2.3/reference/api/pandas.StringDtype.html), and we can't drop pandas 2.2 because of SPEC 0 until 2026-01-19. So will need some if-then workaround for now.

@weiji14 weiji14 force-pushed the fix/deprecate_pyarrow_numpy branch from 8e2a8de to 8c6f3a2 Compare November 3, 2025 02:57
Otherwise will get an error on Python 3.12 tests when pd.StringDtype(storage="pyarrow", ...) is called without pyarrow installed.
Copy link
Member

@seisman seisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@weiji14 weiji14 marked this pull request as ready for review November 3, 2025 18:55
@seisman seisman removed the run/test-gmt-dev Trigger the GMT Dev Tests workflow in PR label Nov 4, 2025
@seisman seisman merged commit 09d7019 into main Nov 4, 2025
23 of 24 checks passed
@seisman seisman deleted the fix/deprecate_pyarrow_numpy branch November 4, 2025 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Boring but important stuff for the core devs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants