-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind #44712
Conversation
Let's file a JIRA, see also https://spark.apache.org/contributing.html |
// in some workloads where batch size is very large, some data might take a very long time to | ||
// be compacted. | ||
columnFamilyOptions.setLevel0FileNumCompactionTrigger(16) | ||
columnFamilyOptions.setLevel0SlowdownWritesTrigger(200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make any of these configurable ?
@@ -671,6 +682,21 @@ class RocksDB( | |||
override protected def logName: String = s"${super.logName} $loggingId" | |||
} | |||
|
|||
object RocksDB { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the code blocks below need to be embedded within a singleton ? Would the blocks be invoked currently ?
cc - @HeartSaVioR - to confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it won't be evaluated till it is referenced. @siying Could you please try adding a log to see whether the log is printed without referencing the object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I'll add a logging and see how many times it is called, though I believe it is a singleton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not saying it will be called multiple times. We meant it is not clear whether this block is ever evaluated (executed once) or not (never executed), because we do not explicitly refer this object.
// Snapshot checkpoint requires a flush, so more threads will reduce the blocking time. More | ||
// compaction threads will reduce the chance that compaction is backlogged, causing online | ||
// traffic to slowdown. | ||
if (RocksDBEnv.getDefault().getBackgroundThreads(Priority.HIGH) < 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this interact with the setMaxgroundJobs
setting though ?
if user sets maxBackgroundJobs=2, then we are explicitly overriding to 4 ? should we change minimum allowed for maxBackgroundJobs to 4 then ?
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
(1) increase RocksDB L0 compaction trigger, slowdown trigger and stop trigger
(2) Increase background threads for flush and compaction to 2. To limit the chance of a CPU spike, make CPU priority for compaction to be low
Why are the changes needed?
We introduce two RocksDB tunings to reduce the chance that RocksDB compaction can fall behind, delay a checkpoint.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
We run an end-to-end stream query where RocksDB state store is reasonably loaded. This change reduces latency by about 30%.
Was this patch authored or co-authored using generative AI tooling?
No.