Data changes during ingestion of Array<String> using druid-parquet-extensions #5433

code-ditya · 2018-02-27T13:20:28Z

I am using druid-0.11.0, with druid-avro-extension:0.11.0 and druid-parquet-extension:0.10.0 and druid-hdfs-storage:0.11.0.

My data contained a column which was of type Array<String>. If data for this column contained value ["str1", "str2", "str3"] then after ingestion the same becomes ["{\"element\": \"str1\"}", "{\"element\": \"str2\"}", "{\"element\": \"str3\"}"].

The actual data for the same is stored in hdfs in parquet with snappy compression, and the parser for the same is used during ingestion. This issue persisted even after I changed the compression to gzip.

When the compression was removed in hdfs the data for the column post ingestion became ["{\"array_element\": \"str1\"}", "{\"array_element\": \"str2\"}", "{\"array_element\": \"str3\"}"].

The same data was earlier being properly ingested into druid when the data was in avro format with gzip compression with druid's avro parser.

I am attaching the ingestionSpec alongside.
druid_ingestion_schema.txt

The text was updated successfully, but these errors were encountered:

saurabh3091 · 2018-02-28T06:32:18Z

I am also facing the same issue and its a blocker for us. Will really appreciate if somebody can provide an explanation/fix.
Regards

Fixes apache#5433 This change makes Parquet input row reader corrects handle List data type.

quiet-listener · 2018-05-12T08:21:09Z

"tuningConfig": {
"jobProperties":{
"parquet.avro.add-list-element-records":"false"
}
}

try adding"parquet.avro.add-list-element-records":"false"in your ingestion spec file under jobProperties . It worked for me

code-ditya · 2018-05-31T11:28:04Z

Thanks @quitelistner . Setting this property helped resolve this issue.

gauravkumar37 added a commit to gauravkumar37/druid that referenced this issue Mar 3, 2018

Make parquet extension correctly parse List

46586ea

Fixes apache#5433 This change makes Parquet input row reader corrects handle List data type.

gauravkumar37 mentioned this issue Mar 3, 2018

Make parquet extension correctly parse List #5465

Closed

clintropolis mentioned this issue Oct 15, 2018

overhaul 'druid-parquet-extensions' module, promoting from 'contrib' to 'core' #6360

Merged

jon-wei closed this as completed in #6360 Nov 6, 2018

clintropolis mentioned this issue Nov 2, 2022

fix issue with parquet list conversion of nullable lists with complex nullable elements #13294

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data changes during ingestion of Array<String> using druid-parquet-extensions #5433

Data changes during ingestion of Array<String> using druid-parquet-extensions #5433

code-ditya commented Feb 27, 2018

saurabh3091 commented Feb 28, 2018

quiet-listener commented May 12, 2018 •

edited

Loading

code-ditya commented May 31, 2018

Data changes during ingestion of Array<String> using druid-parquet-extensions #5433

Data changes during ingestion of Array<String> using druid-parquet-extensions #5433

Comments

code-ditya commented Feb 27, 2018

saurabh3091 commented Feb 28, 2018

quiet-listener commented May 12, 2018 • edited Loading

code-ditya commented May 31, 2018

quiet-listener commented May 12, 2018 •

edited

Loading