You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a number (~ hundreds of thousands) of Parquet files that have embedded Arrow schemas in them that have time-valued columns with the type DateTime(TimeUnit::Nanosecond, Some("UTC")).
One of the changes in the Arrow 2 -> 3 working window was to make the Parquet loader prefer the Arrow schema compared to the one generated from the columns.
But because DataFusion has the timezone field of the DateTime variant hardcoded as None, we can't load any of our data after this upgrade; we get errors like:
{{SELECT * FROM parquet_table WHERE ("timestamp" >= to_timestamp('2010-03-24T13:00:00.000000Z') AND "timestamp" <= to_timestamp('2010-03-25T00:00:00.000000Z')) ORDER BY timestamp ASC NULLS LAST;}}
{{Plan("'Timestamp(Nanosecond, Some("UTC")) >= Timestamp(Nanosecond, None)' can't be evaluated because there isn't a common type to coerce the types to")}}
Any ideas/thoughts?
The text was updated successfully, but these errors were encountered:
Comment from Andrew Lamb(alamb) @ 2021-01-20T14:26:14.792+0000:
[~m18e] I can try and take a look at fixing this -- do you have a reproducer (e.g. the input file) easily at hand?
Comment from Max Burke(m18e) @ 2021-01-21T19:36:59.503+0000:
Yup. I've attached a test file. For what it's worth, this is the change that we've applied locally to work around it: [https://github.com/urbanlogiq/arrow/commit/9be88cf2994fe55ae0d2f5ae137b9e73daac1ef0]
[^0100c909-2537-c4dc-ce1d-1b7a75d613e8.parquet]
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11324
We have a number (~ hundreds of thousands) of Parquet files that have embedded Arrow schemas in them that have time-valued columns with the type DateTime(TimeUnit::Nanosecond, Some("UTC")).
One of the changes in the Arrow 2 -> 3 working window was to make the Parquet loader prefer the Arrow schema compared to the one generated from the columns.
But because DataFusion has the timezone field of the DateTime variant hardcoded as None, we can't load any of our data after this upgrade; we get errors like:
{{SELECT * FROM parquet_table WHERE ("timestamp" >= to_timestamp('2010-03-24T13:00:00.000000Z') AND "timestamp" <= to_timestamp('2010-03-25T00:00:00.000000Z')) ORDER BY timestamp ASC NULLS LAST;}}
{{Plan("'Timestamp(Nanosecond, Some("UTC")) >= Timestamp(Nanosecond, None)' can't be evaluated because there isn't a common type to coerce the types to")}}
Any ideas/thoughts?
The text was updated successfully, but these errors were encountered: