[SPARK-53794][SS] Add option to limit deletions per maintenance operation associated with rocksdb state provider #52511

anishshri-db · 2025-10-03T01:18:44Z

What changes were proposed in this pull request?

Add option to limit deletions per maintenance operation associated with rocksdb state provider

Why are the changes needed?

We see some instances where the changelog deletion can take a really long time. This means that for that partition, we also cannot upload full snapshots which affects recovery/replay scenarios. This problem is much more apparent on resource constrained clusters. So, we add an option to allow for incremental cleanup per maintenance operation invocation.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit tests

[info] Run completed in 17 seconds, 591 milliseconds.
[info] Total number of tests run: 8
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 8, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

… associated with rocksdb state provider

anishshri-db · 2025-10-03T01:20:56Z

cc - @HeartSaVioR - PTAL, thx !

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala

liviazhu

LGTM, thank you!

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala

dongjoon-hyun · 2025-10-07T02:22:34Z

Hi, @anishshri-db , just a question, do you have any reason not to wait for an approval from the Apache Spark committers?

For some urgent cases, Apache Spark release managers can do the merging their PRs to unblock the release (and, of course, expects a late post-merge approval), but I'm just curious if you have those kind of exceptional cases in your Apache Spark 4.1.0 PR.

BTW, according to the PR code, I can give my +1 to meet the Apache Spark community requirements (if you missed the approval mistakenly). So, this is just a question and there is no big deal here.

anishshri-db · 2025-10-07T03:12:09Z

Hi @dongjoon-hyun - thanks for the pointer. Sorry not intentional, was not sure whether we always need committer stamp for committer authored PRs as well. Could you please stamp this one for me ? Will make sure to get committer stamp henceforth before merging. Thanks !

dongjoon-hyun · 2025-10-07T04:14:38Z

+1, LGTM.

No problem at all. I just wanted to remove any such an ASF issue from this PR . Thank you, @anishshri-db .

[SPARK-53794] Add option to limit deletions per maintenance operation…

0c02066

… associated with rocksdb state provider

anishshri-db changed the title ~~[SPARK-53794] Add option to limit deletions per maintenance operation associated with rocksdb state provider~~ [SPARK-53794][SS] Add option to limit deletions per maintenance operation associated with rocksdb state provider Oct 3, 2025

github-actions bot added SQL STRUCTURED STREAMING labels Oct 3, 2025

ericm-db reviewed Oct 3, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Show resolved Hide resolved

Address feedback

e827a02

anishshri-db requested a review from ericm-db October 3, 2025 18:57

ericm-db approved these changes Oct 3, 2025

View reviewed changes

liviazhu reviewed Oct 4, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala Outdated Show resolved Hide resolved

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala Outdated Show resolved Hide resolved

anishshri-db added 2 commits October 4, 2025 18:57

Address review comments

d21adce

Fix comment

375e43d

anishshri-db requested a review from liviazhu October 5, 2025 02:01

Address review comments

c12dd21

liviazhu approved these changes Oct 6, 2025

View reviewed changes

liviazhu reviewed Oct 6, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala Outdated Show resolved Hide resolved

Fix comments

a148203

anishshri-db closed this in 08bd390 Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53794][SS] Add option to limit deletions per maintenance operation associated with rocksdb state provider #52511

[SPARK-53794][SS] Add option to limit deletions per maintenance operation associated with rocksdb state provider #52511

anishshri-db commented Oct 3, 2025

Uh oh!

anishshri-db commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

liviazhu left a comment

Uh oh!

Uh oh!

dongjoon-hyun commented Oct 7, 2025 •

edited

Loading

Uh oh!

anishshri-db commented Oct 7, 2025

Uh oh!

dongjoon-hyun commented Oct 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

[SPARK-53794][SS] Add option to limit deletions per maintenance operation associated with rocksdb state provider #52511

[SPARK-53794][SS] Add option to limit deletions per maintenance operation associated with rocksdb state provider #52511

Conversation

anishshri-db commented Oct 3, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

anishshri-db commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

liviazhu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dongjoon-hyun commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anishshri-db commented Oct 7, 2025

Uh oh!

dongjoon-hyun commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun commented Oct 7, 2025 •

edited

Loading

dongjoon-hyun commented Oct 7, 2025 •

edited

Loading