[GLUTEN-11918][VL] Fall back Cast when per-expression timezone differs from session timezone#12048
Conversation
…ne differs from session timezone
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
| val involvesTimestamp = c.child.dataType == TimestampType || | ||
| c.dataType == TimestampType |
There was a problem hiding this comment.
Thanks. This only checks the top-level child/output type, so casts like ArrayType(TimestampType) -> ArrayType(StringType), MapType(TimestampType, ...), or structs containing timestamp still go to Velox when the Cast has a per-expression timezone different from the session timezone. Spark Cast applies the same zoneId recursively to array/map/struct elements, so these cases can still return the session-timezone result instead of the expression-timezone result.
There was a problem hiding this comment.
thanks for review the key point, add other complex types such as ArrayType, MapType,StructType,UserDefinedType to recursive detect the timestamp
e087b24 to
68f26d3
Compare
|
Run Gluten Clickhouse CI on x86 |
68f26d3 to
e4d29c2
Compare
|
Run Gluten Clickhouse CI on x86 |
| case udt: UserDefinedType[_] => | ||
| containsTimestamp(udt.sqlType) | ||
| case _ => false | ||
| } |
There was a problem hiding this comment.
Could we move containsTimestamp out of the Cast branch as a private helper method?
The recursive type check is independent from the local Cast state, and keeping it as a small private helper would make the Cast branch easier to read. It also avoids redefining the local function on every visit and makes the logic easier to reuse if another timezone-sensitive expression needs the same nested timestamp check later.
There was a problem hiding this comment.
sure, and extract the recursive timestamp-type check as a private helper
involvesTimestampType,no functional change; pure readability refactor.
… TimestampType in Array/Map/Struct/UDT
e4d29c2 to
1a7fe97
Compare
|
Run Gluten Clickhouse CI on x86 |
… TimestampType in Array/Map/Struct/UDT
|
Run Gluten Clickhouse CI on x86 |
What changes are proposed in this pull request?
When CastTransformer passes expressions through Substrait, it does not carry
per-expression timezone information. Gluten/Velox only uses the session-level
timezone for cast operations. This causes incorrect results when the
per-expression timezone differs from the session timezone and the cast involves
TimestampType (e.g., timestamp to string formatting in ToPrettyString).
This patch adds a timezone consistency check in ExpressionConverter before
creating CastTransformer. If the per-expression timezone differs from the
session timezone and the cast involves TimestampType, a GlutenNotSupportException
is thrown to fall back to Spark native execution.
How was this patch tested?
Enabled GlutenToPrettyStringSuite for both Spark 4.0 and Spark 4.1 which was
previously disabled due to 1 test failure caused by this timezone issue.
ISSUE: #11918
Was this patch authored or co-authored using generative AI tooling?