[SPARK-56543] Add RTM stateless benchmark by jerrypeng · Pull Request #55420 · apache/spark

jerrypeng · 2026-04-20T01:09:06Z

What changes were proposed in this pull request?

Adds RTMKafkaKafkaBenchmarkSuite, a stateless end-to-end benchmark for the Real-Time Mode (RTM) trigger in Structured Streaming.

The benchmark:

Spins up a local-cluster Spark context (local-cluster[3, 5, 1024]) and a live embedded Kafka broker.
Generates synthetic records at 1,000 records/sec into an input Kafka topic (5 partitions).
Runs a stateless pipeline with RealTimeTrigger: reads from Kafka → base64-encodes the value → stamps a source-timestamp header → writes to an output Kafka topic.
Captures per-batch processing latency via Spark's observe() API.
After N batches complete, reads back the output topic and reports e2e latency percentiles (p0, p50, p90, p95, p99, p100) by comparing the source-timestamp header to the Kafka sink
timestamp.

Why are the changes needed?

There is currently no benchmarks to measure RTM stateless Kafka-to-Kafka latency. This makes it hard to quantify regressions or improvements to the RTM code path in CI or local development. This benchmark provides a repeatable, self-contained way to measure that.

Does this PR introduce any user-facing change?

no

How was this patch tested?

This is a benchmark-only test suite. The suite was manually verified to compile and initialize correctly against the current codebase.

To run it explicitly:
build/sbt "sql-kafka-0-10/testOnly *RTMKafkaKafkaBenchmarkSuite"

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6 (claude-sonnet-4-6)

viirya · 2026-04-24T02:01:46Z

+ * [[RealTimeTrigger]]. After the run it reports e2e latency percentiles.
+ *
+ * This benchmark intentionally runs a real local-cluster and a live Kafka broker, so it
+ * is slow and is not included in the default test run. Run it explicitly when measuring


This claims "not included in default test run" but it is a KafkaSourceTest so I suppose that CI will run it?

viirya · 2026-04-24T02:03:39Z

+
+    val success = new AtomicLong(0)
+
+    new Timer().scheduleAtFixedRate(


Don't we need to stop/cancel this timer?

viirya · 2026-04-24T02:05:22Z

+      }
+    })
+
+    latch.await()


We probably should set a timeout. If it timeouts, we should stop query/stop generator and throw query exception.

jerrypeng · 2026-04-24T06:26:19Z

@viirya thank you for your review! I have addressed your comments. PTAL.

viirya · 2026-04-27T18:04:01Z

+ * is slow. Run it explicitly when measuring RTM throughput and latency for the stateless path.
+ */
+class RTMKafkaKafkaBenchmarkSuite
+  extends KafkaSourceTest


This extends KafkaSourceTest and is a test suite. So CI will run this benchmark automatically. I think we should make it as a program that manually run.

Add created a new annotation that excludes from running automatically on CI. Folks can still run manually

viirya · 2026-04-27T18:08:06Z

+    getLatencies(longRunningBatchDurationMs, numBatches, outputTopic)
+  }
+
+  private def genData(url: String, topicName: String, throughput: Long): Unit = {


genData now cancels the timer and closes the producer in finally, which is good, but the caller only invokes dataGenThread.interrupt() and then continues without waiting for that cleanup to finish. Since this suite mixes in ThreadAudit, the test may finish while the generator thread is still unwinding, sleeping, or blocked in producer.close(), which can lead to flaky thread-leak failures. Please join the thread with a bounded timeout after interrupting it, and ideally put query shutdown plus generator stop/join in an outer try/finally so cleanup also happens if await is interrupted or another exception is thrown.

jerrypeng · 2026-04-28T05:09:10Z

@viirya thank you for your review! I have addressed your comments. PTAL.

[SPARK-56543] Add RTM stateless benchmark

0817bb6

jerrypeng force-pushed the SPARK-56543 branch from 804ccea to 0817bb6 Compare April 21, 2026 18:07

viirya reviewed Apr 24, 2026

View reviewed changes

Comment thread ...afka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/RTMKafkaKafkaBenchmarkSuite.scala

viirya reviewed Apr 24, 2026

View reviewed changes

addressing comments

56b296a

viirya reviewed Apr 27, 2026

View reviewed changes

jerrypeng added 2 commits April 28, 2026 05:01

addressing comments

b3ab042

fix comment

93f842d

jerrypeng requested a review from viirya April 28, 2026 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56543] Add RTM stateless benchmark#55420

[SPARK-56543] Add RTM stateless benchmark#55420
jerrypeng wants to merge 4 commits intoapache:masterfrom
jerrypeng:SPARK-56543

jerrypeng commented Apr 20, 2026 •

edited

Loading

Uh oh!

viirya Apr 24, 2026

Uh oh!

Uh oh!

viirya Apr 24, 2026

Uh oh!

viirya Apr 24, 2026

Uh oh!

jerrypeng commented Apr 24, 2026

Uh oh!

viirya Apr 27, 2026

Uh oh!

jerrypeng Apr 28, 2026

Uh oh!

viirya Apr 27, 2026

Uh oh!

jerrypeng Apr 28, 2026

Uh oh!

jerrypeng commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		val success = new AtomicLong(0)

		new Timer().scheduleAtFixedRate(

Conversation

jerrypeng commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

viirya Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

viirya Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

viirya Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

jerrypeng commented Apr 24, 2026

Uh oh!

viirya Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

jerrypeng Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

viirya Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

jerrypeng Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

jerrypeng commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jerrypeng commented Apr 20, 2026 •

edited

Loading