-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data changes during ingestion of Array<String> using druid-parquet-extensions #5433
Comments
I am also facing the same issue and its a blocker for us. Will really appreciate if somebody can provide an explanation/fix. |
Fixes apache#5433 This change makes Parquet input row reader corrects handle List data type.
"tuningConfig": { try adding |
Thanks @quitelistner . Setting this property helped resolve this issue. |
I am using druid-0.11.0, with druid-avro-extension:0.11.0 and druid-parquet-extension:0.10.0 and druid-hdfs-storage:0.11.0.
My data contained a column which was of type
Array<String>
. If data for this column contained value["str1", "str2", "str3"]
then after ingestion the same becomes["{\"element\": \"str1\"}", "{\"element\": \"str2\"}", "{\"element\": \"str3\"}"]
.The actual data for the same is stored in hdfs in parquet with snappy compression, and the parser for the same is used during ingestion. This issue persisted even after I changed the compression to gzip.
When the compression was removed in hdfs the data for the column post ingestion became
["{\"array_element\": \"str1\"}", "{\"array_element\": \"str2\"}", "{\"array_element\": \"str3\"}"]
.The same data was earlier being properly ingested into druid when the data was in avro format with gzip compression with druid's avro parser.
I am attaching the ingestionSpec alongside.
druid_ingestion_schema.txt
The text was updated successfully, but these errors were encountered: