Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-4505: [C++] adding pretty print for dates, times, and timestamps #4268

Closed

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented May 7, 2019

type fmt
DATE32 (days since epoch) YYYY-MM-DD
DATE64 (days since epoch, measured in ms) YYYY-MM-DD
TIMESTAMP (<TimeUnit> since epoch) YYYY-MM-DD HH:mm:ss.xxx
TIME32 (<TimeUnit> since midnight) HH:mm:ss.xxx
TIME64 (<TimeUnit> since midnight) HH:mm:ss.xxx

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
It looks almost good to me.

cpp/src/arrow/type.h Outdated Show resolved Hide resolved
cpp/src/arrow/pretty_print.cc Outdated Show resolved Hide resolved
@bkietz bkietz force-pushed the 4505-Nicer-PrettyPrint-for-date32 branch from 88de6cb to b53c013 Compare May 8, 2019 17:17
@bkietz bkietz force-pushed the 4505-Nicer-PrettyPrint-for-date32 branch from b53c013 to 9086c9d Compare May 9, 2019 17:35
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think the timezone (which is an arbitrary string) shouldn't be appended to the pretty-printed values.

Status WriteDataValues(const TimestampArray& array) {
const int64_t* data = array.raw_values();
const auto type = static_cast<const TimestampType*>(array.type().get());
const auto fmt = "%F %T" + type->timezone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timezone is part of the type, it shouldn't be appended to the pretty-printed values IMHO.

1970-01-02 03:04:05.000678Z,
null
])expected";
CheckPrimitive<TimestampType, int64_t>(timestamp(TimeUnit::MICRO, "Z"), {0, 10},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While "Z" is a valid suffix to ISO 8601 timestamps, the name of the timezone wouldn't be "Z", it would be something like "UTC".
Besides, if the timezone is named e.g. "Europe/Paris", you don't want to append that to the formatted timestamp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Not for this PR, but it would then be nice to actually somehow show the type information in the repr of the array:

In [55]: pa.array([pd.Timestamp("2012-01-01", tz='Europe/Paris')])                                                                                            
Out[55]: 
<pyarrow.lib.TimestampArray object at 0x7ffa5d5060e8>
[
  1325376000000000
]

because currently you don't see the timezone, and also not the unit of the timestamp, without actually printing the array's type as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche That looks more like a Python-specific issue.

@pitrou
Copy link
Member

pitrou commented May 9, 2019

Thank you @bkietz. This can be merged when CI is green.

@pitrou pitrou closed this in f88474c May 10, 2019
@bkietz bkietz deleted the 4505-Nicer-PrettyPrint-for-date32 branch February 25, 2021 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants