[SPARK-56023][SS] Better load balance in LowLatencyMemoryStream by eason-yuchen-liu · Pull Request #54848 · apache/spark

eason-yuchen-liu · 2026-03-17T03:52:15Z

What changes were proposed in this pull request?

Rewrite addData to use records.size % numPartitions for better load balance across partitions.

Why are the changes needed?

Previously, it will only load balance across partitions when a sequence of data is input altogether. This change enables load balance for one-row-at-a-time input patterns.

Does this PR introduce any user-facing change?

No. This is a test only source.

How was this patch tested?

CI.

Was this patch authored or co-authored using generative AI tooling?

No.

HeartSaVioR · 2026-03-18T06:28:53Z

...src/main/scala/org/apache/spark/sql/execution/streaming/sources/LowLatencyMemoryStream.scala

-        val partitionId = index % numPartitions
-        records(partitionId) += ((toRow(item).copy().asInstanceOf[UnsafeRow], timestamp))
+    data.iterator.foreach { item =>
+      val partitionId = records.size % numPartitions


How this works? records.size will be always the same (= numPartitions) regardless of how the events are currently distributed, right? This change will simply put the data in a single partition, the first partition.

While we are here, I'd love to see the test at this point.

Thanks for catching it. My bad to make such silly mistake. I have fixed the issue, and added a new unit test. Thanks.

HeartSaVioR · 2026-03-18T22:17:45Z

...src/main/scala/org/apache/spark/sql/execution/streaming/sources/LowLatencyMemoryStream.scala

-        val partitionId = index % numPartitions
-        records(partitionId) += ((toRow(item).copy().asInstanceOf[UnsafeRow], timestamp))
+    data.iterator.foreach { item =>
+      val partitionId = records.map(_.size).sum % numPartitions


nit: shall we just track the overall count separately?

HeartSaVioR · 2026-03-18T23:35:38Z

...src/main/scala/org/apache/spark/sql/execution/streaming/sources/LowLatencyMemoryStream.scala

    val timestamp = clock.getTimeMillis()
    data.iterator.foreach { item =>
      records(partitionId) += ((toRow(item).copy().asInstanceOf[UnsafeRow], timestamp))
+      numRecords += 1


Beyond the PR so this comment doesn't block this PR to merge.

Do we have a pattern of mix-up between writing to specific partition & writing without specifying partition? We probably need to be smarter if we want to keep the balance for that pattern, but I agree this is sorta over engineering, and I don't know we ever have that pattern.

HeartSaVioR

+1 pending CI

HeartSaVioR · 2026-03-18T23:41:12Z

@eason-yuchen-liu Looks like there is build failure - could you please check the CI and fix it? Thanks in advance!

HeartSaVioR · 2026-03-19T03:27:21Z

https://github.com/eason-yuchen-liu/spark/runs/67676529366

CI passes.

HeartSaVioR · 2026-03-19T03:27:30Z

Thanks! Merging to master.

### What changes were proposed in this pull request? Rewrite `addData` to use `records.size % numPartitions` for better load balance across partitions. ### Why are the changes needed? Previously, it will only load balance across partitions when a sequence of data is input altogether. This change enables load balance for one-row-at-a-time input patterns. ### Does this PR introduce _any_ user-facing change? No. This is a test only source. ### How was this patch tested? CI. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#54848 from eason-yuchen-liu/lowLatencyMemoryStreamLoadBalance. Authored-by: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

Improve load balance in LowLatencyMemoryStream addData

69061a7

HeartSaVioR reviewed Mar 18, 2026

View reviewed changes

address comments

cde4a5c

eason-yuchen-liu requested a review from HeartSaVioR March 18, 2026 22:14

HeartSaVioR reviewed Mar 18, 2026

View reviewed changes

address comment

be1b3f4

HeartSaVioR reviewed Mar 18, 2026

View reviewed changes

HeartSaVioR approved these changes Mar 18, 2026

View reviewed changes

eason-yuchen-liu added 2 commits March 18, 2026 18:18

compile

6041b7e

style

5ef5079

HeartSaVioR closed this in 0411a57 Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56023][SS] Better load balance in LowLatencyMemoryStream#54848

[SPARK-56023][SS] Better load balance in LowLatencyMemoryStream#54848
eason-yuchen-liu wants to merge 5 commits intoapache:masterfrom
eason-yuchen-liu:lowLatencyMemoryStreamLoadBalance

eason-yuchen-liu commented Mar 17, 2026

Uh oh!

HeartSaVioR Mar 18, 2026 •

edited

Loading

Uh oh!

eason-yuchen-liu Mar 18, 2026

Uh oh!

HeartSaVioR Mar 18, 2026

Uh oh!

eason-yuchen-liu Mar 18, 2026

Uh oh!

HeartSaVioR Mar 18, 2026

Uh oh!

HeartSaVioR left a comment

Uh oh!

HeartSaVioR commented Mar 18, 2026

Uh oh!

HeartSaVioR commented Mar 19, 2026

Uh oh!

HeartSaVioR commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eason-yuchen-liu commented Mar 17, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HeartSaVioR Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eason-yuchen-liu Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

eason-yuchen-liu Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Mar 18, 2026

Uh oh!

HeartSaVioR commented Mar 19, 2026

Uh oh!

HeartSaVioR commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HeartSaVioR Mar 18, 2026 •

edited

Loading