Skip to content

Conversation

anishshri-db
Copy link
Contributor

What changes were proposed in this pull request?

Add option to limit deletions per maintenance operation associated with rocksdb state provider

Why are the changes needed?

We see some instances where the changelog deletion can take a really long time. This means that for that partition, we also cannot upload full snapshots which affects recovery/replay scenarios. This problem is much more apparent on resource constrained clusters. So, we add an option to allow for incremental cleanup per maintenance operation invocation.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit tests

[info] Run completed in 17 seconds, 591 milliseconds.
[info] Total number of tests run: 8
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 8, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

@anishshri-db anishshri-db changed the title [SPARK-53794] Add option to limit deletions per maintenance operation associated with rocksdb state provider [SPARK-53794][SS] Add option to limit deletions per maintenance operation associated with rocksdb state provider Oct 3, 2025
@anishshri-db
Copy link
Contributor Author

cc - @HeartSaVioR - PTAL, thx !

@anishshri-db anishshri-db requested a review from ericm-db October 3, 2025 18:57
@anishshri-db anishshri-db requested a review from liviazhu October 5, 2025 02:01
Copy link
Contributor

@liviazhu liviazhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Oct 7, 2025

Hi, @anishshri-db , just a question, do you have any reason not to wait for an approval from the Apache Spark committers?

Screenshot 2025-10-06 at 19 23 10

For some urgent cases, Apache Spark release managers can do the merging their PRs to unblock the release (and, of course, expects a late post-merge approval), but I'm just curious if you have those kind of exceptional cases in your Apache Spark 4.1.0 PR.

BTW, according to the PR code, I can give my +1 to meet the Apache Spark community requirements (if you missed the approval mistakenly). So, this is just a question and there is no big deal here.

@anishshri-db
Copy link
Contributor Author

Hi @dongjoon-hyun - thanks for the pointer. Sorry not intentional, was not sure whether we always need committer stamp for committer authored PRs as well. Could you please stamp this one for me ? Will make sure to get committer stamp henceforth before merging. Thanks !

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Oct 7, 2025

+1, LGTM.

No problem at all. I just wanted to remove any such an ASF issue from this PR . Thank you, @anishshri-db .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants