Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind #44712

Closed
wants to merge 1 commit into from

Conversation

siying
Copy link
Contributor

@siying siying commented Jan 12, 2024

What changes were proposed in this pull request?

(1) increase RocksDB L0 compaction trigger, slowdown trigger and stop trigger
(2) Increase background threads for flush and compaction to 2. To limit the chance of a CPU spike, make CPU priority for compaction to be low

Why are the changes needed?

We introduce two RocksDB tunings to reduce the chance that RocksDB compaction can fall behind, delay a checkpoint.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

We run an end-to-end stream query where RocksDB state store is reasonably loaded. This change reduces latency by about 30%.

Was this patch authored or co-authored using generative AI tooling?

No.

@HyukjinKwon
Copy link
Member

Let's file a JIRA, see also https://spark.apache.org/contributing.html

@siying siying changed the title Make RocksDB State Store Compaction Less Likely to fall behind [SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind Jan 16, 2024
// in some workloads where batch size is very large, some data might take a very long time to
// be compacted.
columnFamilyOptions.setLevel0FileNumCompactionTrigger(16)
columnFamilyOptions.setLevel0SlowdownWritesTrigger(200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make any of these configurable ?

@@ -671,6 +682,21 @@ class RocksDB(
override protected def logName: String = s"${super.logName} $loggingId"
}

object RocksDB {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the code blocks below need to be embedded within a singleton ? Would the blocks be invoked currently ?

cc - @HeartSaVioR - to confirm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it won't be evaluated till it is referenced. @siying Could you please try adding a log to see whether the log is printed without referencing the object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll add a logging and see how many times it is called, though I believe it is a singleton.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not saying it will be called multiple times. We meant it is not clear whether this block is ever evaluated (executed once) or not (never executed), because we do not explicitly refer this object.

// Snapshot checkpoint requires a flush, so more threads will reduce the blocking time. More
// compaction threads will reduce the chance that compaction is backlogged, causing online
// traffic to slowdown.
if (RocksDBEnv.getDefault().getBackgroundThreads(Priority.HIGH) < 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this interact with the setMaxgroundJobs setting though ?

if user sets maxBackgroundJobs=2, then we are explicitly overriding to 4 ? should we change minimum allowed for maxBackgroundJobs to 4 then ?

Copy link

github-actions bot commented May 5, 2024

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label May 5, 2024
@github-actions github-actions bot closed this May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants