Skip to content

PARQUET-137 Add support for Pig datetimes#387

Open
osayankin wants to merge 1 commit intoapache:masterfrom
osayankin:master
Open

PARQUET-137 Add support for Pig datetimes#387
osayankin wants to merge 1 commit intoapache:masterfrom
osayankin:master

Conversation

@osayankin
Copy link
Copy Markdown

No description provided.

Copy link
Copy Markdown
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the patch.

} else if (pigType.type == DataType.CHARARRAY) {
bytes = ((String)t.get(i)).getBytes("UTF-8");
} else if (pigType.type == DataType.DATETIME) {
bytes = convertDateTimeToInt96((DateTime) t.get(i));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we want to move away from INT96 to TIMESTAMP_MILLIS and TIMESTAMP_MICROS, shouldn't they be rather used here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seconded

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific use case here? Like working with Hive or Impala?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we want to move away from INT96 to TIMESTAMP_MILLIS and TIMESTAMP_MICROS, shouldn't they be rather used here?

I am not sure I understood you correctly. Do you want me to replace

bytes = convertDateTimeToInt96((DateTime) t.get(i));

to something like

 bytes = convertDateTimeToMillis((DateTime) t.get(i));

or

bytes = convertDateTimeToMicros((DateTime) t.get(i));

where convertDateTimeToMillis converts DateTime to milliseconds? But do not DateTime.getTime() do the job? I can not understand here.

Is there a specific use case here? Like working with Hive or Impala?

Yes. Our customer stores data in Hive and reads it in Pig and vice versa.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to deprecate int96 in favor of other timestamp types here:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types
Millis would work right?

Copy link
Copy Markdown
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. Adding date support sounds good. Please see comments in the review.

case DataType.DOUBLE:
return primitive(name, PrimitiveTypeName.DOUBLE);
case DataType.DATETIME:
throw new UnsupportedOperationException();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intended to be just binary?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use timestamp it would be a long with an original type of TIMESTAMP.

} else if (pigType.type == DataType.CHARARRAY) {
bytes = ((String)t.get(i)).getBytes("UTF-8");
} else if (pigType.type == DataType.DATETIME) {
bytes = convertDateTimeToInt96((DateTime) t.get(i));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seconded

} else if (pigType.type == DataType.CHARARRAY) {
bytes = ((String)t.get(i)).getBytes("UTF-8");
} else if (pigType.type == DataType.DATETIME) {
bytes = convertDateTimeToInt96((DateTime) t.get(i));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific use case here? Like working with Hive or Impala?

Copy link
Copy Markdown
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. See comments about type to use.

}

@Test
public void testStorerWithDateTime() throws Exception {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"target/testStorerWithDateTime"

pigServer.registerQuery("B = LOAD '"+out+"' USING "+ParquetLoader.class.getName()+"();");
pigServer.registerQuery("Store B into 'out' using mock.Storage();");
if (pigServer.executeBatch().get(0).getStatus() != JOB_STATUS.COMPLETED) {
throw new RuntimeException("Job failed", pigServer.executeBatch().get(0).getException());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factor out method for this repeated logic?

} else if (pigType.type == DataType.CHARARRAY) {
bytes = ((String)t.get(i)).getBytes("UTF-8");
} else if (pigType.type == DataType.DATETIME) {
bytes = convertDateTimeToInt96((DateTime) t.get(i));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to deprecate int96 in favor of other timestamp types here:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types
Millis would work right?

case DataType.DOUBLE:
return primitive(name, PrimitiveTypeName.DOUBLE);
case DataType.DATETIME:
throw new UnsupportedOperationException();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use timestamp it would be a long with an original type of TIMESTAMP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants