Update arrow 40#6412
Conversation
| use datafusion_expr::ColumnarValue; | ||
|
|
||
| /// provide DataFusion default cast options | ||
| pub const DEFAULT_DATAFUSION_CAST_OPTIONS: CastOptions = CastOptions { safe: false }; |
There was a problem hiding this comment.
This change is necessary because FormatOptions isn't const constructible, it is somewhat an accident that CastOptions has historically been - any non-trivial additional options would have broken this
| SELECT '2000-01-01T00:00:00'::TIMESTAMP::TIMESTAMPTZ | ||
| ---- | ||
| 2000-01-01T08:00:00+08:00 | ||
| 1999-12-31T16:00:00+08:00 |
There was a problem hiding this comment.
This is the result of apache/arrow-rs#4201
Although I'm not entirely sure it is correct... 🤔
Edit: I'm fairly certain this isn't correct, but the logic upstream is correct so I'm confused. Perhaps DF is doing some timezone arithmetic itself somewhere?
There was a problem hiding this comment.
Ok I am fairly certain this is a bug in the scalar value logic
❯ create table foo (ts text) as values ('2000-01-01T00:00:00');
❯ select * from foo;
+---------------------+
| ts |
+---------------------+
| 2000-01-01T00:00:00 |
+---------------------+
# This is correct
❯ select arrow_cast(ts, 'Timestamp(Nanosecond, Some( "+08:00" ))') from foo;
+---------------------------+
| foo.ts |
+---------------------------+
| 2000-01-01T00:00:00+08:00 |
+---------------------------+
# This is incorrect
❯ select arrow_cast('2000-01-01T00:00:00', 'Timestamp(Nanosecond, Some( "+08:00" ))') from foo;
+-----------------------------+
| Utf8("2000-01-01T00:00:00") |
+-----------------------------+
| 1999-12-31T16:00:00+08:00 |
+-----------------------------+
1 row in set. Query took 0.002 seconds.
There was a problem hiding this comment.
Yes, I think a bunch of that logic is replicated in scalar.rs (as it wasn't available upstream in arrow-rs)
I think @berkaysynnada may have some thoughts / be thinking about this.
| ($DATA_TYPE:ident, $ENUM:expr, $ENUM2:expr, $ARRAY_TYPE:ident, $EXPR:expr, $SIZE:expr) => {{ | ||
| } | ||
|
|
||
| macro_rules! build_timestamp_array_from_option { |
There was a problem hiding this comment.
I split this out into its own macro as overloading based on number of arguments is kind of hard to parse
| // Need to call cast to cast to final data type with timezone/extra param | ||
| cast(&array, &DataType::$DATA_TYPE($ENUM, $ENUM2)) | ||
| .expect("cannot do temporal cast") | ||
| Arc::new($ARRAY_TYPE::from_value(*value, $SIZE).with_timezone_opt($TZ)) |
There was a problem hiding this comment.
This is the major change, we use with_timezone_opt instead of cast, as cast will now actually perform timezone conversion
| let display_filter = self.filter.as_ref().map_or_else( | ||
| || "".to_string(), | ||
| |f| format!(", filter={:?}", f.expression()), | ||
| |f| format!(", filter={}", f.expression()), |
There was a problem hiding this comment.
This change is so that we don't output all of CastOptions and by extension FormatOptions. It is also more correct to not be printing a debug representation as part of Display
| select arrow_cast(timestamp '2000-01-01T00:00:00', 'Timestamp(Nanosecond, Some( "+08:00" ))'); | ||
| ---- | ||
| 2000-01-01T08:00:00+08:00 | ||
| 2000-01-01T00:00:00+08:00 |
There was a problem hiding this comment.
🎉 It's correct now
|
Thanks @tustvold !❤️ |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?