ARROW-1120: Support for writing timestamp(ns) to Int96 #865

xhochy · 2017-07-18T14:28:13Z

Change-Id: I8b32864774b4df345bc028b4cc0f866fb8a0b9d1

c-nichols · 2017-07-18T14:56:24Z

python/pyarrow/tests/test_parquet.py

+    data7_us = np.array([start, start + 1, start + 2], dtype='int64') / 1000
+    a7_us = pa.Array.from_pandas(data7_us, type=t7_us)
+
+    table = pa.Table.from_arrays([a1, a2, a3, a4, a5, a6, a7],


I wasn't too fond of this pattern when I looked originally -- names are opaque, frequent rewriting certain lines, and difficult to reorder items (for instance to put all timestamp columns near each other). Not sure it warrants addressing on this PR, but I wonder if there's a clearer way to structure these tests.

c-nichols · 2017-07-18T14:58:49Z

python/pyarrow/tests/test_parquet.py


    _check_roundtrip(table, expected=expected, version='2.0')

+    # date64 as date32
+    # time32[s] to time32[ms]
+    # 'timestamp[ns]' is saved as INT96 timestamp


Since this doesn't explicitly check the data type on the column, there's room for a regression. Probably fine if int96 is the only option, but might be worth figuring out how to test, if the prop builder is going to be used for other things in the future.

We at least check here that we support nanosecond precision. This is something that is not supported by any other type in Parquet.

c-nichols

LGTM

Change-Id: I11b4a1136a1775ecacbe39d52e6039216cd7f80d

wesm · 2017-07-18T17:57:33Z

python/pyarrow/parquet.py

@@ -675,7 +675,8 @@ def read_pandas(source, columns=None, nthreads=1, metadata=None):


 def write_table(table, where, row_group_size=None, version='1.0',
-                use_dictionary=True, compression='snappy', **kwargs):
+                use_dictionary=True, compression='snappy',
+                use_deprecated_int96_timestamps=False, **kwargs):


We might want to create some kind of WriteOptions object soon since we're likely to accumulate plenty more options (like casting timestamps to milliseconds)

wesm

+1

c-nichols and others added 2 commits July 17, 2017 19:17

ARROW-1120 Support for writing timestamp(ns) to Int96

7c28835

Add flag for timestamp[ns] roundtrips

99f825d

Change-Id: I8b32864774b4df345bc028b4cc0f866fb8a0b9d1

c-nichols reviewed Jul 18, 2017

View reviewed changes

c-nichols approved these changes Jul 18, 2017

View reviewed changes

Use integer division

ff70832

Change-Id: I11b4a1136a1775ecacbe39d52e6039216cd7f80d

wesm reviewed Jul 18, 2017

View reviewed changes

wesm approved these changes Jul 18, 2017

View reviewed changes

asfgit closed this in c5a89b7 Jul 18, 2017

wesm deleted the ARROW-1120 branch July 18, 2017 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-1120: Support for writing timestamp(ns) to Int96 #865

ARROW-1120: Support for writing timestamp(ns) to Int96 #865

xhochy commented Jul 18, 2017

c-nichols Jul 18, 2017

c-nichols Jul 18, 2017 •

edited

xhochy Jul 18, 2017

c-nichols left a comment

wesm Jul 18, 2017

wesm left a comment

ARROW-1120: Support for writing timestamp(ns) to Int96 #865

ARROW-1120: Support for writing timestamp(ns) to Int96 #865

Conversation

xhochy commented Jul 18, 2017

c-nichols Jul 18, 2017

Choose a reason for hiding this comment

c-nichols Jul 18, 2017 • edited

Choose a reason for hiding this comment

xhochy Jul 18, 2017

Choose a reason for hiding this comment

c-nichols left a comment

Choose a reason for hiding this comment

wesm Jul 18, 2017

Choose a reason for hiding this comment

wesm left a comment

Choose a reason for hiding this comment

c-nichols Jul 18, 2017 •

edited