[SPARK-33863][PYTHON] Respect session timezone in udf workers #53161

gaogaotiantian · 2025-11-21T21:59:08Z

What changes were proposed in this pull request?

Respect spark.sql.session.timeZone in UDF workers.

This is discussed in #52980 but we decided to move it to a separate PR. There are still open questions left

It seems like this method can't get the changes by spark.conf.set. I believe this is trivial to people who are familiar with the configs so I did not investigate too much.
pandas/arrow UDFs are actually reading this config, but seems like that's only passed for those kind of UDFs. The message has no structure.

Why are the changes needed?

Relying on the timezone of local machine does not make any sense.

Does this PR introduce any user-facing change?

Yes. The UDF behavior regarding to timestamps and timezones will be changed.

How was this patch tested?

Manually

Was this patch authored or co-authored using generative AI tooling?

No

gaogaotiantian · 2025-11-21T22:00:31Z

@cloud-fan , @ueshin , @zhengruifeng we've discussed this but did not reach to a conclusion. I had a draft here and a few questions. We probably need to further discuss about the implementation and implication.

zhengruifeng

It is a behavior change, I think we need a flag for it.
Also we need new tests in test_udf.py

cloud-fan · 2025-11-24T09:34:59Z

python/pyspark/worker.py


-        # Use the local timezone to convert the timestamp
-        tz = datetime.datetime.now().astimezone().tzinfo
+        tzname = os.environ.get("SPARK_SESSION_LOCAL_TIMEZONE", None)


To confirm, we will hit this branch for every udf execution, not just once per python worker initialization, right?

That's correct, but it doesn't seem like spark.session.conf.set("spark.sql.session.timeZone") impacts the result. This only works when I create the session with the conf. Any ideas? I can investigate if that's an issue or we want to understand it. I just thought you might understand immediately.

@gaogaotiantian We can use the same way as the other configs to get the runtime config, like hideTraceback or simplifiedTraceback above. Please take a look at PythonRunner and its subclasses.

Ah, so basically overwrite this for every subclassed worker?

Yes. Also if we have a flag, the subclasses should decide whether it returns the session local timezone or None.

So the flag should be a conf in the same level as session local timezone? Or just Python udf level? Will it be default to the original behavior or the new behavior?

Yes, the flag should be the same level as the session local timezone, a runtime conf in SQLConf.
It can be enabled by default, but when disabled, the behavior should be the original behavior.
WDYT? cc @zhengruifeng @HyukjinKwon

The Arrow-based UDFs already handles the session local timezone, so it may be ok to just update BasePythonUDFRunner.envVars to have the env var there instead of PythonRunner?

zhengruifeng · 2025-11-24T10:23:58Z

core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala

      envVars.put("SPARK_SIMPLIFIED_TRACEBACK", "1")
    }
+    if (sessionLocalTimeZone.isDefined) {
+      envVars.put("SPARK_SESSION_LOCAL_TIMEZONE", sessionLocalTimeZone.get)


for arrow-based UDFs, sessionLocalTimeZone is actually already passed to the python side

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala

Line 153 in ed23cc3

val timeZoneConf = Seq(SQLConf.SESSION_LOCAL_TIMEZONE.key -> conf.sessionLocalTimeZone)

However this workerConf is not available in vanilla Python UDF, probably we can consider supporting it in vanilla Python UDF in the future. also cc @HeartSaVioR

yea it's better to pass the configs via a proper protocol, instead of system variables. But it's already the case for vanilla python runner and I think it's fine to follow it.

cloud-fan · 2025-11-25T02:00:18Z

can we add a test which sets spark.sql.session.timeZone to a different value than the CI machine local timezone?

Respect session timezone in udf

45daec4

github-actions bot added SQL CORE PYTHON labels Nov 21, 2025

gaogaotiantian changed the title ~~[SPARK-33863] Respect session timezone in udf workers~~ [SPARK-33863][PYTHON] Respect session timezone in udf workers Nov 21, 2025

zhengruifeng reviewed Nov 24, 2025

View reviewed changes

cloud-fan reviewed Nov 24, 2025

View reviewed changes

zhengruifeng reviewed Nov 24, 2025

View reviewed changes

[SPARK-33863][PYTHON] Respect session timezone in udf workers #53161

Are you sure you want to change the base?

[SPARK-33863][PYTHON] Respect session timezone in udf workers #53161

Conversation

gaogaotiantian commented Nov 21, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gaogaotiantian commented Nov 21, 2025

Uh oh!

zhengruifeng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants