[SPARK-56413] Add gRPC UDF execution protocol by haiyangsun-db · Pull Request #55657 · apache/spark

haiyangsun-db · 2026-05-03T07:48:44Z

What changes were proposed in this pull request?

Adds udf_protocol.proto, the gRPC wire contract between the Spark engine and a
UDF worker process, as described in SPIP. Sits next to the existing worker_spec.proto.

Defines a Worker service with two RPCs:

Execute(stream UdfRequest) returns (stream UdfResponse) — one bidirectional
stream per UDF execution. Lifecycle on the stream: Init → 0..N
DataRequest / DataResponse → exactly one Finish or Cancel.
PayloadChunk streams oversized UDF bodies.
Manage(WorkerRequest) returns (WorkerResponse) — unary, worker-scoped
(heartbeat, graceful shutdown).

UdfPayload carries the engine-opaque callable bytes plus a format tag,
an eval_type worker-dispatch hint, and optional input/output encoders.
Init carries data_format, schemas, session_conf, task_context, and
timezone (the first graduate from session_conf); a reserved field range
absorbs future graduates.

Also fixes two typos in common.proto (exachanged/bidrectional).

Out of scope

No planning info on the wire (no execution-shape / cardinality enum, no
chained-UDF metadata). Both can be added additively later.

Why are the changes needed?

Spark Connect's UDF support today is Python-only and tied to a Python-specific
socket protocol. Onboarding other client languages requires a structured,
language-neutral wire contract. This PR lands the proto layer; engine and
worker implementations will follow.

Does this PR introduce any user-facing change?

No. Wire contract only; not yet wired into any end-to-end path.

How was this patch tested?

Verified the proto compiles with protoc against common.proto and
worker_spec.proto, and inspected the generated descriptor for field-number
and oneof correctness. End-to-end conformance tests will land with the
engine-side client and first worker implementation.

Was this patch authored or co-authored using generative AI tooling?

Yes

xianzhe-databricks · 2026-05-04T11:13:19Z

+
+    // (Optional) Session timezone, promoted out of [[session_conf]]
+    // because every eval needs it for timestamp encoding/decoding.
+    optional string timezone = 7;


is string the canonical type to represent the timezone? I am afraid all kinds of conversion errors may happen with no schema/enum enforcement.

this is convention from Spark, timezone is a string in spark.

sven-weber-db · 2026-05-04T12:35:32Z

+
+    // (Optional) Session timezone, promoted out of [[session_conf]]
+    // because every eval needs it for timestamp encoding/decoding.
+    optional string timezone = 7;


We should specify the exact format in which the timezone will be reported since its a string

Timezone in spark is a string config, we should get it from spark following the same format.

Ok, great. Thank you for clarifying!

sven-weber-db

Thank you for addressing the comments

hvanhovell · 2026-05-06T13:12:11Z

+  session.init(Init.newBuilder()
+    .setUdf(UdfPayload.newBuilder()
+      .setPayload(ByteString.copyFrom(serializedFunction))
+      .setFormat("py-cloudpickle-v3"))


Does this already exists? Or do we still need to create this? The only reason why I am bringing it up is because examples are forever :)...

haiyangsun-db · 2026-05-08T04:08:15Z

@hvanhovell could you please help take another pass?

grpc udf protocol

2477fa3

haiyangsun-db force-pushed the SPARK-56413 branch from a11e533 to 2477fa3 Compare May 3, 2026 16:12

haiyangsun-db marked this pull request as ready for review May 3, 2026 16:13

haiyangsun-db changed the title ~~[SPARK-56413] Introduce the grpc protocol for UDF execution.~~ [SPARK-56413] Add gRPC UDF execution protocol May 3, 2026

update README, remove InitMessage place holder.

0736d42

xianzhe-databricks reviewed May 4, 2026

View reviewed changes

Comment thread udf/worker/proto/src/main/protobuf/udf_protocol.proto

xianzhe-databricks reviewed May 4, 2026

View reviewed changes

Comment thread udf/worker/proto/src/main/protobuf/udf_protocol.proto

xianzhe-databricks reviewed May 4, 2026

View reviewed changes

Comment thread udf/worker/proto/src/main/protobuf/udf_protocol.proto

xianzhe-databricks reviewed May 4, 2026

View reviewed changes

Comment thread udf/worker/proto/src/main/protobuf/udf_protocol.proto Outdated

sven-weber-db reviewed May 4, 2026

View reviewed changes

address comments

973b037

sven-weber-db approved these changes May 5, 2026

View reviewed changes

hvanhovell reviewed May 6, 2026

View reviewed changes

haiyangsun-db added 2 commits May 7, 2026 04:31

address comments.

5c30dda

fix README

7ca4b62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56413] Add gRPC UDF execution protocol#55657

[SPARK-56413] Add gRPC UDF execution protocol#55657
haiyangsun-db wants to merge 5 commits intoapache:masterfrom
haiyangsun-db:SPARK-56413

haiyangsun-db commented May 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xianzhe-databricks May 4, 2026

Uh oh!

haiyangsun-db May 4, 2026

Uh oh!

Uh oh!

sven-weber-db May 4, 2026

Uh oh!

haiyangsun-db May 5, 2026

Uh oh!

sven-weber-db May 5, 2026

Uh oh!

Uh oh!

Uh oh!

sven-weber-db left a comment

Uh oh!

hvanhovell May 6, 2026

Uh oh!

haiyangsun-db May 7, 2026

Uh oh!

haiyangsun-db commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

haiyangsun-db commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Out of scope

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xianzhe-databricks May 4, 2026

Choose a reason for hiding this comment

Uh oh!

haiyangsun-db May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sven-weber-db May 4, 2026

Choose a reason for hiding this comment

Uh oh!

haiyangsun-db May 5, 2026

Choose a reason for hiding this comment

Uh oh!

sven-weber-db May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sven-weber-db left a comment

Choose a reason for hiding this comment

Uh oh!

hvanhovell May 6, 2026

Choose a reason for hiding this comment

Uh oh!

haiyangsun-db May 7, 2026

Choose a reason for hiding this comment

Uh oh!

haiyangsun-db commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

haiyangsun-db commented May 3, 2026 •

edited

Loading