Skip to content

[GLUTEN-11918][VL] Fall back Cast when per-expression timezone differs from session timezone#12048

Merged
zml1206 merged 3 commits into
apache:mainfrom
Yao-MR:bugfix/timezone_fallback
May 8, 2026
Merged

[GLUTEN-11918][VL] Fall back Cast when per-expression timezone differs from session timezone#12048
zml1206 merged 3 commits into
apache:mainfrom
Yao-MR:bugfix/timezone_fallback

Conversation

@Yao-MR
Copy link
Copy Markdown
Contributor

@Yao-MR Yao-MR commented May 7, 2026

What changes are proposed in this pull request?

When CastTransformer passes expressions through Substrait, it does not carry
per-expression timezone information. Gluten/Velox only uses the session-level
timezone for cast operations. This causes incorrect results when the
per-expression timezone differs from the session timezone and the cast involves
TimestampType (e.g., timestamp to string formatting in ToPrettyString).

This patch adds a timezone consistency check in ExpressionConverter before
creating CastTransformer. If the per-expression timezone differs from the
session timezone and the cast involves TimestampType, a GlutenNotSupportException
is thrown to fall back to Spark native execution.

How was this patch tested?

Enabled GlutenToPrettyStringSuite for both Spark 4.0 and Spark 4.1 which was
previously disabled due to 1 test failure caused by this timezone issue.

ISSUE: #11918

Was this patch authored or co-authored using generative AI tooling?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Run Gluten Clickhouse CI on x86

@github-actions github-actions Bot added the CORE works for Gluten Core label May 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Run Gluten Clickhouse CI on x86

@Yao-MR Yao-MR marked this pull request as ready for review May 7, 2026 09:29
Comment on lines +477 to +478
val involvesTimestamp = c.child.dataType == TimestampType ||
c.dataType == TimestampType
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This only checks the top-level child/output type, so casts like ArrayType(TimestampType) -> ArrayType(StringType), MapType(TimestampType, ...), or structs containing timestamp still go to Velox when the Cast has a per-expression timezone different from the session timezone. Spark Cast applies the same zoneId recursively to array/map/struct elements, so these cases can still return the session-timezone result instead of the expression-timezone result.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for review the key point, add other complex types such as ArrayType, MapType,StructType,UserDefinedType to recursive detect the timestamp

@Yao-MR Yao-MR force-pushed the bugfix/timezone_fallback branch from e087b24 to 68f26d3 Compare May 7, 2026 09:57
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Run Gluten Clickhouse CI on x86

@Yao-MR Yao-MR force-pushed the bugfix/timezone_fallback branch from 68f26d3 to e4d29c2 Compare May 7, 2026 10:02
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Run Gluten Clickhouse CI on x86

@Yao-MR Yao-MR requested a review from zml1206 May 7, 2026 10:04
case udt: UserDefinedType[_] =>
containsTimestamp(udt.sqlType)
case _ => false
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move containsTimestamp out of the Cast branch as a private helper method?
The recursive type check is independent from the local Cast state, and keeping it as a small private helper would make the Cast branch easier to read. It also avoids redefining the local function on every visit and makes the logic easier to reuse if another timezone-sensitive expression needs the same nested timestamp check later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, and extract the recursive timestamp-type check as a private helper
involvesTimestampType,no functional change; pure readability refactor.

@Yao-MR Yao-MR force-pushed the bugfix/timezone_fallback branch from e4d29c2 to 1a7fe97 Compare May 8, 2026 02:16
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Run Gluten Clickhouse CI on x86

@zml1206 zml1206 changed the title [GLUTEN-11918][VL] Fall back Cast to Spark when per-expression timezone differs from session timezone [GLUTEN-11918][VL] Fall back Cast when per-expression timezone differs from session timezone May 8, 2026
@zml1206 zml1206 merged commit aa0bcc8 into apache:main May 8, 2026
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants