Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1120: Support for writing timestamp(ns) to Int96 #865

Closed
wants to merge 3 commits into from

Conversation

xhochy
Copy link
Member

@xhochy xhochy commented Jul 18, 2017

data7_us = np.array([start, start + 1, start + 2], dtype='int64') / 1000
a7_us = pa.Array.from_pandas(data7_us, type=t7_us)

table = pa.Table.from_arrays([a1, a2, a3, a4, a5, a6, a7],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't too fond of this pattern when I looked originally -- names are opaque, frequent rewriting certain lines, and difficult to reorder items (for instance to put all timestamp columns near each other). Not sure it warrants addressing on this PR, but I wonder if there's a clearer way to structure these tests.


_check_roundtrip(table, expected=expected, version='2.0')

# date64 as date32
# time32[s] to time32[ms]
# 'timestamp[ns]' is saved as INT96 timestamp
Copy link
Contributor

@c-nichols c-nichols Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this doesn't explicitly check the data type on the column, there's room for a regression. Probably fine if int96 is the only option, but might be worth figuring out how to test, if the prop builder is going to be used for other things in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We at least check here that we support nanosecond precision. This is something that is not supported by any other type in Parquet.

Copy link
Contributor

@c-nichols c-nichols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Change-Id: I11b4a1136a1775ecacbe39d52e6039216cd7f80d
@@ -675,7 +675,8 @@ def read_pandas(source, columns=None, nthreads=1, metadata=None):


def write_table(table, where, row_group_size=None, version='1.0',
use_dictionary=True, compression='snappy', **kwargs):
use_dictionary=True, compression='snappy',
use_deprecated_int96_timestamps=False, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to create some kind of WriteOptions object soon since we're likely to accumulate plenty more options (like casting timestamps to milliseconds)

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@asfgit asfgit closed this in c5a89b7 Jul 18, 2017
@wesm wesm deleted the ARROW-1120 branch July 18, 2017 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants