PARQUET-137 Add support for Pig datetimes#387
PARQUET-137 Add support for Pig datetimes#387osayankin wants to merge 1 commit intoapache:masterfrom
Conversation
gszadovszky
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the patch.
| } else if (pigType.type == DataType.CHARARRAY) { | ||
| bytes = ((String)t.get(i)).getBytes("UTF-8"); | ||
| } else if (pigType.type == DataType.DATETIME) { | ||
| bytes = convertDateTimeToInt96((DateTime) t.get(i)); |
There was a problem hiding this comment.
As we want to move away from INT96 to TIMESTAMP_MILLIS and TIMESTAMP_MICROS, shouldn't they be rather used here?
There was a problem hiding this comment.
Is there a specific use case here? Like working with Hive or Impala?
There was a problem hiding this comment.
As we want to move away from INT96 to TIMESTAMP_MILLIS and TIMESTAMP_MICROS, shouldn't they be rather used here?
I am not sure I understood you correctly. Do you want me to replace
bytes = convertDateTimeToInt96((DateTime) t.get(i));
to something like
bytes = convertDateTimeToMillis((DateTime) t.get(i));
or
bytes = convertDateTimeToMicros((DateTime) t.get(i));
where convertDateTimeToMillis converts DateTime to milliseconds? But do not DateTime.getTime() do the job? I can not understand here.
Is there a specific use case here? Like working with Hive or Impala?
Yes. Our customer stores data in Hive and reads it in Pig and vice versa.
There was a problem hiding this comment.
We want to deprecate int96 in favor of other timestamp types here:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types
Millis would work right?
julienledem
left a comment
There was a problem hiding this comment.
Thanks for the contribution. Adding date support sounds good. Please see comments in the review.
| case DataType.DOUBLE: | ||
| return primitive(name, PrimitiveTypeName.DOUBLE); | ||
| case DataType.DATETIME: | ||
| throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
if we use timestamp it would be a long with an original type of TIMESTAMP.
| } else if (pigType.type == DataType.CHARARRAY) { | ||
| bytes = ((String)t.get(i)).getBytes("UTF-8"); | ||
| } else if (pigType.type == DataType.DATETIME) { | ||
| bytes = convertDateTimeToInt96((DateTime) t.get(i)); |
| } else if (pigType.type == DataType.CHARARRAY) { | ||
| bytes = ((String)t.get(i)).getBytes("UTF-8"); | ||
| } else if (pigType.type == DataType.DATETIME) { | ||
| bytes = convertDateTimeToInt96((DateTime) t.get(i)); |
There was a problem hiding this comment.
Is there a specific use case here? Like working with Hive or Impala?
julienledem
left a comment
There was a problem hiding this comment.
Overall this looks good. See comments about type to use.
| } | ||
|
|
||
| @Test | ||
| public void testStorerWithDateTime() throws Exception { |
There was a problem hiding this comment.
"target/testStorerWithDateTime"
| pigServer.registerQuery("B = LOAD '"+out+"' USING "+ParquetLoader.class.getName()+"();"); | ||
| pigServer.registerQuery("Store B into 'out' using mock.Storage();"); | ||
| if (pigServer.executeBatch().get(0).getStatus() != JOB_STATUS.COMPLETED) { | ||
| throw new RuntimeException("Job failed", pigServer.executeBatch().get(0).getException()); |
There was a problem hiding this comment.
factor out method for this repeated logic?
| } else if (pigType.type == DataType.CHARARRAY) { | ||
| bytes = ((String)t.get(i)).getBytes("UTF-8"); | ||
| } else if (pigType.type == DataType.DATETIME) { | ||
| bytes = convertDateTimeToInt96((DateTime) t.get(i)); |
There was a problem hiding this comment.
We want to deprecate int96 in favor of other timestamp types here:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types
Millis would work right?
| case DataType.DOUBLE: | ||
| return primitive(name, PrimitiveTypeName.DOUBLE); | ||
| case DataType.DATETIME: | ||
| throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
if we use timestamp it would be a long with an original type of TIMESTAMP.
No description provided.