BUG: Pyarrow timestamp support for map() function #61236

arthurlw · 2025-04-06T06:10:47Z

closes BUG: PyArrow timestamp type does not work with map() function #61231 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
~~Added type annotations to new arguments/methods/functions.~~
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

mroeschke · 2025-04-07T17:01:56Z

pandas/core/algorithms.py

+            try:
+                # Convert elements to pandas.Timestamp (or datetime64[ns])
+                arr = arr.astype("datetime64[ns]")
+            except Exception:


This is the wrong place to fix this. This should be fixed in ArrowExtensionArray.map

mroeschke

Thanks but the required fix is in ArrowExtensionArray

mroeschke · 2025-04-14T16:57:43Z

pandas/core/arrays/arrow/array.py

@@ -1483,6 +1483,8 @@ def to_numpy(
    def map(self, mapper, na_action: Literal["ignore"] | None = None):
        if is_numeric_dtype(self.dtype):
            return map_array(self.to_numpy(), mapper, na_action=na_action)
+        elif self.dtype == "timestamp[ns][pyarrow]":
+            return map_array(self.to_numpy(dtype=object), mapper, na_action=na_action)


Can you avoid the type cast to object?

I tried using datetime64[ns] instead of object, but some tests expect Python objects (pd.Timestamp, ) and do not pass. I think keeping object helps preserve that expected behavior. Let me know if you'd prefer adjusting the test instead.

I think the failing test would need adjustment (we get a better result when we don't return object)

mroeschke · 2025-04-14T16:57:51Z

pandas/tests/series/methods/test_map.py

+
+def test_map_arrow_timestamp_dict():
+    # GH 61231
+    pytest.importorskip("pyarrow", minversion="10.0.1")


Suggested change

pytest.importorskip("pyarrow", minversion="10.0.1")

pytest.importorskip("pyarrow")

mroeschke · 2025-04-14T21:05:37Z

pandas/core/arrays/arrow/array.py

@@ -1483,6 +1483,10 @@ def to_numpy(
    def map(self, mapper, na_action: Literal["ignore"] | None = None):
        if is_numeric_dtype(self.dtype):
            return map_array(self.to_numpy(), mapper, na_action=na_action)
+        elif self.dtype == "timestamp[ns][pyarrow]":


Instead of adding an elif you can modify the existing if statement as if is_numeric_dtype(self.dtype) or self.dtype.kind in "mM":

mroeschke · 2025-04-17T20:29:40Z

pandas/tests/extension/test_arrow.py

-            result = data_missing.map(lambda x: x, na_action=na_action)
-            expected = data_missing.to_numpy(dtype=object)
-            tm.assert_numpy_array_equal(result, expected)
+            mapped = data_missing.map(lambda x: x, na_action=na_action)


Why do we have to include all this logic? Ideally this should be

def test_map(...): if data_missing.dtype == "float32[pyarrow]": result = data_missing.map(lambda x: x, na_action=na_action) # map roundtrips through objects, which converts to float64 expected = data_missing.to_numpy(dtype="float64", na_value=np.nan) tm.assert_numpy_array_equal(result, expected) else: super().test_map(data_missing, na_action)

Thanks for the suggestion! I agree it’d be ideal to keep this logic minimal, but this test fails specifically for timestamp[...] and duration[...], which seem to require additional normalization after .map() due to dtype coercion.

In particular, .map() on PyArrow-backed datetime/duration types returns a float64 result when pd.NA is present. To make the comparison meaningful, we cast both result and expected back to their logical dtypes (datetime64[ns] or timedelta64[ns]).

I will update the test to make this more readable.

returns a float64 result when pd.NA is present.

This does not seem correct behavior (might be a secondary bug). I would expect to_numpy on those types to return their associated datetime/timedelta64 types with NaT as the missing value

mroeschke · 2025-04-18T16:25:54Z

pandas/core/arrays/arrow/array.py

+            temp = Series(datelike, dtype=datelike.dtype)
+            mapped = temp.map(mapper, na_action=na_action)
+            return mapped._values
+
        if is_numeric_dtype(self.dtype):


Instead, this line should allow datelike types so to_numpy is called

arthurlw added 6 commits April 5, 2025 22:49

implemented pyarrow timestamp support

0ce5571

added test

469e28d

precommit

1dbf5d1

whatsnew

1cf2de8

added import condition for pyarrow in test

52dc7fb

precommit

66215a9

mroeschke reviewed Apr 7, 2025

View reviewed changes

mroeschke requested changes Apr 7, 2025

View reviewed changes

arthurlw added 2 commits April 7, 2025 11:58

moved logic from algorithms.py to ArrowExtensionArray.map

4445ecd

Updated condition

49e130c

mroeschke reviewed Apr 14, 2025

View reviewed changes

mroeschke added the Arrow label Apr 14, 2025

arthurlw and others added 5 commits April 14, 2025 11:33

updated according to reviewer suggestions

79c1fe2

Merge branch 'main' into pyarrow-timestamp-support-for-map

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

91e6401

reverted array.py logic

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

19d2870

change cast to dtype="datetime64[ns]"

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

88c180c

precommit

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

a421e66

mroeschke reviewed Apr 14, 2025

View reviewed changes

arthurlw added 7 commits April 15, 2025 15:15

updated test and map condition

04dda0d

updated test

02ae824

wrap with pd.series

73c4039

casted typing

b918323

updated test logic

f3545bf

updated testing logic

4526bb1

precommit

52cd37f

mroeschke reviewed Apr 17, 2025

View reviewed changes

arthurlw added 2 commits April 17, 2025 14:36

improved code readability

25e57c3

return with proper typing

c9f47ff

mroeschke reviewed Apr 18, 2025

View reviewed changes

added to_numpy() call

105d92b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Pyarrow timestamp support for map() function #61236

BUG: Pyarrow timestamp support for map() function #61236

arthurlw commented Apr 6, 2025 •

edited

Loading

mroeschke Apr 7, 2025

mroeschke left a comment

mroeschke Apr 14, 2025

arthurlw Apr 14, 2025

mroeschke Apr 14, 2025

mroeschke Apr 14, 2025

mroeschke Apr 14, 2025

mroeschke Apr 17, 2025

arthurlw Apr 17, 2025

mroeschke Apr 17, 2025

mroeschke Apr 18, 2025

	pytest.importorskip("pyarrow", minversion="10.0.1")
	pytest.importorskip("pyarrow")

BUG: Pyarrow timestamp support for map() function #61236

Are you sure you want to change the base?

BUG: Pyarrow timestamp support for map() function #61236

Conversation

arthurlw commented Apr 6, 2025 • edited Loading

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arthurlw commented Apr 6, 2025 •

edited

Loading