Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36323: [Python] Fix Timestamp scalar repr error on values outside datetime range #36877

Closed
wants to merge 3 commits into from

Conversation

aboss123
Copy link
Contributor

@aboss123 aboss123 commented Jul 25, 2023

Rationale for this change

#36323

What changes are included in this PR?

Changed the way repr is handled for TimestampScalar

Are these changes tested?

I have added a very basic test for this change to see whether it will error or not if outside the range.

Are there any user-facing changes?

The functionality of TimestampScalar's repr now uses the strftime function.

@kou kou changed the title MINOR: [Python] Fix Timestamp scalar repr error on values outside datetime range GH-36323: [Python] Fix Timestamp scalar repr error on values outside datetime range Jul 26, 2023
@@ -113,8 +113,12 @@ cdef class Scalar(_Weakrefable):
else:
with nogil:
check_status(self.wrapped.get().Validate())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -1127,4 +1131,4 @@ def scalar(value, type=None, *, from_pandas=None, MemoryPool memory_pool=None):

# retrieve the scalar from the first position
scalar = GetResultValue(array.get().GetScalar(0))
return Scalar.wrap(scalar)
return Scalar.wrap(scalar)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert this change too?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jul 26, 2023
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some linting issues and the following test is failing:

__________________ test_sequence_timestamp_from_int_with_unit __________________

    @pytest.mark.pandas
    def test_sequence_timestamp_from_int_with_unit():
        # TODO(wesm): This test might be rewritten to assert the actual behavior
        # when pandas is not installed
    
        data = [1]
    
        s = pa.timestamp('s')
        ms = pa.timestamp('ms')
        us = pa.timestamp('us')
        ns = pa.timestamp('ns')
    
        arr_s = pa.array(data, type=s)
        assert len(arr_s) == 1
        assert arr_s.type == s
>       assert repr(arr_s[0]) == (
            "<pyarrow.TimestampScalar: datetime.datetime(1970, 1, 1, 0, 0, 1)>"
        )
E       assert "<pyarrow.Tim...01T00:00:01'>" == '<pyarrow.Tim... 1, 0, 0, 1)>'
E         - <pyarrow.TimestampScalar: datetime.datetime(1970, 1, 1, 0, 0, 1)>
E         + <pyarrow.TimestampScalar: '1970-01-01T00:00:01'>

@aboss123
Copy link
Contributor Author

There seems to be some linting issues and the following test is failing:

__________________ test_sequence_timestamp_from_int_with_unit __________________

    @pytest.mark.pandas
    def test_sequence_timestamp_from_int_with_unit():
        # TODO(wesm): This test might be rewritten to assert the actual behavior
        # when pandas is not installed
    
        data = [1]
    
        s = pa.timestamp('s')
        ms = pa.timestamp('ms')
        us = pa.timestamp('us')
        ns = pa.timestamp('ns')
    
        arr_s = pa.array(data, type=s)
        assert len(arr_s) == 1
        assert arr_s.type == s
>       assert repr(arr_s[0]) == (
            "<pyarrow.TimestampScalar: datetime.datetime(1970, 1, 1, 0, 0, 1)>"
        )
E       assert "<pyarrow.Tim...01T00:00:01'>" == '<pyarrow.Tim... 1, 0, 0, 1)>'
E         - <pyarrow.TimestampScalar: datetime.datetime(1970, 1, 1, 0, 0, 1)>
E         + <pyarrow.TimestampScalar: '1970-01-01T00:00:01'>

Regarding this error I have a question. The docs appear to want the repr of the TimestampScalar to show datetime but perhaps it could display in a different format? As the initial issue mentions, values that are put in outside the range of the datetime values could break the repr functionality so would it be wise to change docs to show strftime output versus the datetime one?

Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! We should update documentation as well to reflect this change.

def __repr__(self):
if isinstance(self, TimestampScalar):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking for TimestampScalar type in the base class, let's override __repr__ in the cdef class TimestampScalar.

Here is the implementation of Array.__repr__, which is mentioned in the issue linked. https://github.com/apache/arrow/blob/main/python/pyarrow/array.pxi#L1222

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I can make that change. How would you specify the time format exactly. It mentions the default here but the tests specify <pyarrow.TimestampScalar: datetime.datetime(2012, 1, 1, 0, 0, tzinfo=<UTC>)>. This has timezone info would you like the format to include %z & %Z modifiers to view this?

Copy link
Member

@danepitkin danepitkin Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Yes, let's include the timezone when supplied. Let's try and follow the ISO 8601 standard https://en.wikipedia.org/wiki/ISO_8601 and use %z. Maybe something like %Y-%m-%dT%H:%M:%S%z e.g.

>>> res = pa.scalar("2000-01-01").cast(pa.timestamp("s"))
>>> res_with_tz = pc.assume_timezone(res, "America/New_York")
>>> str(pc.strftime(res_with_tz, format='%Y-%m-%dT%H:%M:%S%z'))
'2000-01-01T00:00:00-0500'

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 28, 2023
@aboss123
Copy link
Contributor Author

Given the comments I've made some changes. Please let me know if there is anything else needed.

@kou
Copy link
Member

kou commented Jul 30, 2023

@github-actions crossbow submit -g python

@github-actions
Copy link

Command '['git', '-C', '/tmp/tmpryb2in98/arrow', 'fetch', 'origin', 'pull/36877/head:main']' returned non-zero exit status 128.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/5708248418

@aboss123
Copy link
Contributor Author

@github-actions crossbow submit python

Command '['git', '-C', '/tmp/tmpryb2in98/arrow', 'fetch', 'origin', 'pull/36877/head:main']' returned non-zero exit status 128.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/5708248418

Is there something that needs to be changed considering this did not exit properly?

@github-actions
Copy link

Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/5709570640

@kou
Copy link
Member

kou commented Jul 31, 2023

Ah, could you re-submit this PR from a feature branch instead of main?
Our CI tool (Crossbow doesn't support main branch on fork.)

@aboss123
Copy link
Contributor Author

Ah, could you re-submit this PR from a feature branch instead of main? Our CI tool (Crossbow doesn't support main branch on fork.)

I have created a new PR here: #36942

@kou
Copy link
Member

kou commented Jul 31, 2023

I close this in favor of #36942.

@kou kou closed this Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] Repr of Timestamp scalars error if they are outside range of stdlib datetime.datetime
4 participants