[SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind #44712

siying · 2024-01-12T20:40:03Z

What changes were proposed in this pull request?

(1) increase RocksDB L0 compaction trigger, slowdown trigger and stop trigger
(2) Increase background threads for flush and compaction to 2. To limit the chance of a CPU spike, make CPU priority for compaction to be low

Why are the changes needed?

We introduce two RocksDB tunings to reduce the chance that RocksDB compaction can fall behind, delay a checkpoint.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

We run an end-to-end stream query where RocksDB state store is reasonably loaded. This change reduces latency by about 30%.

Was this patch authored or co-authored using generative AI tooling?

No.

HyukjinKwon · 2024-01-13T00:27:11Z

Let's file a JIRA, see also https://spark.apache.org/contributing.html

anishshri-db · 2024-01-17T03:39:26Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

+  // in some workloads where batch size is very large, some data might take a very long time to
+  // be compacted.
+  columnFamilyOptions.setLevel0FileNumCompactionTrigger(16)
+  columnFamilyOptions.setLevel0SlowdownWritesTrigger(200)


Do we need to make any of these configurable ?

anishshri-db · 2024-01-17T03:41:16Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

@@ -671,6 +682,21 @@ class RocksDB(
  override protected def logName: String = s"${super.logName} $loggingId"
 }

+object RocksDB {


Do the code blocks below need to be embedded within a singleton ? Would the blocks be invoked currently ?

cc - @HeartSaVioR - to confirm

Maybe it won't be evaluated till it is referenced. @siying Could you please try adding a log to see whether the log is printed without referencing the object?

Sure I'll add a logging and see how many times it is called, though I believe it is a singleton.

We are not saying it will be called multiple times. We meant it is not clear whether this block is ever evaluated (executed once) or not (never executed), because we do not explicitly refer this object.

anishshri-db · 2024-01-17T03:42:34Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

+  // Snapshot checkpoint requires a flush, so more threads will reduce the blocking time. More
+  // compaction threads will reduce the chance that compaction is backlogged, causing online
+  // traffic to slowdown.
+  if (RocksDBEnv.getDefault().getBackgroundThreads(Priority.HIGH) < 2) {


How would this interact with the setMaxgroundJobs setting though ?

if user sets maxBackgroundJobs=2, then we are explicitly overriding to 4 ? should we change minimum allowed for maxBackgroundJobs to 4 then ?

github-actions · 2024-05-05T00:20:34Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added SQL STRUCTURED STREAMING labels Jan 12, 2024

Make RocksDB State Store Compaction Less Likely to fall behind

218bc67

siying force-pushed the rocksdb_flush branch from 3aebf62 to 218bc67 Compare January 12, 2024 22:04

siying changed the title ~~Make RocksDB State Store Compaction Less Likely to fall behind~~ [SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind Jan 16, 2024

anishshri-db reviewed Jan 17, 2024

View reviewed changes

github-actions bot added the Stale label May 5, 2024

github-actions bot closed this May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind #44712

[SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind #44712

siying commented Jan 12, 2024

HyukjinKwon commented Jan 13, 2024

anishshri-db Jan 17, 2024

anishshri-db Jan 17, 2024

HeartSaVioR Jan 17, 2024

siying Jan 24, 2024

HeartSaVioR Jan 25, 2024 •

edited

anishshri-db Jan 17, 2024

github-actions bot commented May 5, 2024

[SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind #44712

[SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind #44712

Conversation

siying commented Jan 12, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

HyukjinKwon commented Jan 13, 2024

anishshri-db Jan 17, 2024

Choose a reason for hiding this comment

anishshri-db Jan 17, 2024

Choose a reason for hiding this comment

HeartSaVioR Jan 17, 2024

Choose a reason for hiding this comment

siying Jan 24, 2024

Choose a reason for hiding this comment

HeartSaVioR Jan 25, 2024 • edited

Choose a reason for hiding this comment

anishshri-db Jan 17, 2024

Choose a reason for hiding this comment

github-actions bot commented May 5, 2024

HeartSaVioR Jan 25, 2024 •

edited