Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Add timezone information when printing TimestampArray #39315

Open
AlenkaF opened this issue Dec 20, 2023 · 10 comments
Open

[Python] Add timezone information when printing TimestampArray #39315

AlenkaF opened this issue Dec 20, 2023 · 10 comments

Comments

@AlenkaF
Copy link
Member

AlenkaF commented Dec 20, 2023

Describe the enhancement requested

PrettyPrint for Timestamp is currently printing values in UTC even when the timezone is defined. This can get confusing and there is a PR open with a simple fix of adding "Z" to the end of the string in case timezone is defined:
#39272

This way we can at least see which tz is the data printed in.

It would also be good to add timezone information when printing the array, for example:

# Adding timezone information in new line
>>> pa.array([0], pa.timestamp('s', tz='+02:00'))
<pyarrow.lib.TimestampArray object at 0x125319de0>
<timestamp[s, tz=+02:00]>
[
  1970-01-01 00:00:00Z
]

# Or at the end of first line (not very clear in my opinion)
>>> pa.array([0], pa.timestamp('s'))
<pyarrow.lib.TimestampArray object at 0x125319e40><timestamp[s]> 
[
  1970-01-01 00:00:00
]

In Python this can be done by adding a separate __repr__ to TimestampArray class.

Would something similar also be needed for R or is timezone information available when printing an Array? cc @paleolimbot

Component(s)

Python, R

@paleolimbot
Copy link
Member

It looks like we already print the timezone?

arrow::as_arrow_array(Sys.time())
#> Array
#> <timestamp[us, tz=America/Halifax]>
#> [
#>   2023-12-20 13:33:42.632165
#> ]

...and it also looks like our abbreviated printer displays it too:

dplyr::glimpse(arrow::arrow_table(ts = Sys.time()))
#> Table
#> 1 rows x 1 columns
#> $ ts <timestamp[us, tz=America/Halifax]> 2023-12-20 09:34:44

The Z suffix is definitely a good idea though!

@AlenkaF
Copy link
Member Author

AlenkaF commented Dec 20, 2023

Thanks for the info Dewey!

@AlenkaF AlenkaF changed the title [Python][R] Add timezone information when printing TimestampArray [Python] Add timezone information when printing TimestampArray Dec 20, 2023
@jorisvandenbossche
Copy link
Member

While we are at improving the repr of arrays, we can probably leave out the boilerplate in <pyarrow.lib.TimestampArray object at 0x125319de0> (that's just python's default repr), and limit it to something like <pyarrow.TimestampArray> in general (similarly for other array types).
And then that also makes it easier for TimestampArray to add additional information. Something like <pyarrow.TimestampArray[us, tz=America/Halifax]> could then be an option (although I am not fully sure I like adding a [] parametrization to the array, because that is typically only something we do for types, not arrays)

@jorisvandenbossche
Copy link
Member

Some variants:

<pyarrow.TimestampArray[us, tz=America/Halifax]>
<pyarrow.TimestampArray: timestamp[us, tz=America/Halifax]>
<pyarrow.TimestampArray<timestamp[us, tz=America/Halifax]>>
<pyarrow.TimestampArray><timestamp[us, tz=America/Halifax]>

(maybe the first is actually fine, and also the most succinct)

@ianmcook
Copy link
Member

ianmcook commented May 13, 2024

I think this and other improvements to the way timezone-aware timestamps are printed would be very helpful for users.

The way PyArrow currently prints timezone-aware timestamp values can be very confusing. For example, you might try to create a Table like this:

from datetime import datetime
import pyarrow as pa

t = pa.table(
    {'ts': [datetime(1969, 1, 1, 1, 1, 1)]},
    schema=pa.schema([("ts", pa.timestamp("us", tz="America/New_York"))])
)

When you print it, it looks like the time represents the 01:01:01 EST:

t
## pyarrow.Table
## ts: timestamp[us, tz=America/New_York]
## ----
## ts: [[1969-01-01 01:01:01.000000Z]]

But upon closer inspection, it is actually representing the time 01:01:01 UTC which converts to 20:01:01 EST:

t["ts"][0]
## <pyarrow.TimestampScalar: '1968-12-31T20:01:01.000000-0500'>

@rok
Copy link
Member

rok commented May 13, 2024

We could display local time (using local_timestamp kernel), but then we should make it clear we're displaying wall time.

@ianmcook
Copy link
Member

That would require pyarrow.compute though. Is that included in all the builds of PyArrow that we distribute these days?

@rok
Copy link
Member

rok commented May 13, 2024

Fair point. It seems it is not. I suppose the same logic could be implemented in vanilla python to avoid new dependencies.

@jorisvandenbossche
Copy link
Member

The main issue (as was discussed in the original issue #30117, before we closed that after adding the "Z" suffix) is that showing the local timezone requires a timezone database to be present, and this is not guaranteed.

(I think the requirement on Compute could probably be fixed, by moving or replicating the logic of local_timestamp to our pretty printing utilities)

At the time the original issue was discussed, the tz database wasn't yet supported for Windows, but that has improved nowadays (although the user still need to download it manually and put it in the correct location or point pyarrow to it). We could decide to actually print wall time if a tzdb is available, and otherwise still fall back on showing the UTC values with "Z" suffix. That would be an annoying inconsistency in the pretty printing, but at least make it less confusing for many users on linux/mac.

@rok
Copy link
Member

rok commented May 14, 2024

Thanks for the reminder Joris! (I've forgotten about that discussion)
If we agree printing in local time (with a "Z" fallback) is the way to go I can implement the logic in printing utilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants