-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36323: [Python] Fix Timestamp scalar repr error on values outside datetime range #36877
Conversation
python/pyarrow/scalar.pxi
Outdated
@@ -113,8 +113,12 @@ cdef class Scalar(_Weakrefable): | |||
else: | |||
with nogil: | |||
check_status(self.wrapped.get().Validate()) | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/pyarrow/scalar.pxi
Outdated
@@ -1127,4 +1131,4 @@ def scalar(value, type=None, *, from_pandas=None, MemoryPool memory_pool=None): | |||
|
|||
# retrieve the scalar from the first position | |||
scalar = GetResultValue(array.get().GetScalar(0)) | |||
return Scalar.wrap(scalar) | |||
return Scalar.wrap(scalar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you revert this change too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be some linting issues and the following test is failing:
__________________ test_sequence_timestamp_from_int_with_unit __________________
@pytest.mark.pandas
def test_sequence_timestamp_from_int_with_unit():
# TODO(wesm): This test might be rewritten to assert the actual behavior
# when pandas is not installed
data = [1]
s = pa.timestamp('s')
ms = pa.timestamp('ms')
us = pa.timestamp('us')
ns = pa.timestamp('ns')
arr_s = pa.array(data, type=s)
assert len(arr_s) == 1
assert arr_s.type == s
> assert repr(arr_s[0]) == (
"<pyarrow.TimestampScalar: datetime.datetime(1970, 1, 1, 0, 0, 1)>"
)
E assert "<pyarrow.Tim...01T00:00:01'>" == '<pyarrow.Tim... 1, 0, 0, 1)>'
E - <pyarrow.TimestampScalar: datetime.datetime(1970, 1, 1, 0, 0, 1)>
E + <pyarrow.TimestampScalar: '1970-01-01T00:00:01'>
Regarding this error I have a question. The docs appear to want the repr of the TimestampScalar to show datetime but perhaps it could display in a different format? As the initial issue mentions, values that are put in outside the range of the datetime values could break the repr functionality so would it be wise to change docs to show |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right! We should update documentation as well to reflect this change.
python/pyarrow/scalar.pxi
Outdated
def __repr__(self): | ||
if isinstance(self, TimestampScalar): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking for TimestampScalar type in the base class, let's override __repr__
in the cdef class TimestampScalar
.
Here is the implementation of Array.__repr__
, which is mentioned in the issue linked. https://github.com/apache/arrow/blob/main/python/pyarrow/array.pxi#L1222
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! I can make that change. How would you specify the time format exactly. It mentions the default here but the tests specify <pyarrow.TimestampScalar: datetime.datetime(2012, 1, 1, 0, 0, tzinfo=<UTC>)>
. This has timezone info would you like the format to include %z
& %Z
modifiers to view this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Yes, let's include the timezone when supplied. Let's try and follow the ISO 8601 standard https://en.wikipedia.org/wiki/ISO_8601 and use %z
. Maybe something like %Y-%m-%dT%H:%M:%S%z
e.g.
>>> res = pa.scalar("2000-01-01").cast(pa.timestamp("s"))
>>> res_with_tz = pc.assume_timezone(res, "America/New_York")
>>> str(pc.strftime(res_with_tz, format='%Y-%m-%dT%H:%M:%S%z'))
'2000-01-01T00:00:00-0500'
Given the comments I've made some changes. Please let me know if there is anything else needed. |
@github-actions crossbow submit -g python |
|
@github-actions crossbow submit python
Is there something that needs to be changed considering this did not exit properly? |
|
Ah, could you re-submit this PR from a feature branch instead of |
I have created a new PR here: #36942 |
I close this in favor of #36942. |
Rationale for this change
#36323
What changes are included in this PR?
Changed the way repr is handled for TimestampScalar
Are these changes tested?
I have added a very basic test for this change to see whether it will error or not if outside the range.
Are there any user-facing changes?
The functionality of TimestampScalar's repr now uses the
strftime
function.