-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hadoop InputRowParser for Orc file #3019
Conversation
@sirpkt can we also get some docs on usage and also update extensions.md file? |
@fjy Sure, I'll update doc. |
This patch used hive orc reader so that it has complex dependencies including hive-exec. |
During testing in the real server cluster, I experienced library dependency problem. |
maven dependency updated |
333720a
to
7740b66
Compare
"inputSpec": { | ||
"type": "static", | ||
"inputFormat": "org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat", | ||
"paths": "no_metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should be filled out in the paths field?
overall this PR looks reasonable to me however I have little experience with ORC files. Can someone who has more experience with ORC files take a look and comment on the API? |
@sirpkt there are some merge errors |
rebase and update based on comments |
builder.append(parseSpec.getTimestampSpec()).append(":string"); | ||
if (parseSpec.getDimensionsSpec().getDimensionNames().size() > 0) { | ||
builder.append(","); | ||
builder.append(StringUtils.join(parseSpec.getDimensionsSpec().getDimensionNames(), ":string,")).append(":string"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to append ":string" twice here ?
@sirpkt any chance we can finish this up? |
@fjy sorry for late response. |
updated based on the review comments |
👍 , all my comments are addressed. |
@@ -0,0 +1,134 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is missing a license header
👍 after the missing license header is added |
…f orc list from array to list
@fjy sorry for late response. |
@sirpkt can you help with this issue https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/druid-user/wiXVMMmXOgU/fHXn6hR0CQAJ |
Related with #3017
for example, string column
col1
and array of string columncol2
is represented bystruct<col1:string,col2:array<string>>
hadoop_orc_job.json
example, inputFormat Should be set asorg.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat