[SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled #46491

anishshri-db · 2024-05-09T00:58:09Z

What changes were proposed in this pull request?

Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled

Why are the changes needed?

Without this, we are providing memory usage that is the max usage per node at a partition level.
For eg - if we report this

    "allRemovalsTimeMs" : 93,
    "commitTimeMs" : 32240,
    "memoryUsedBytes" : 15956211724278,
    "numRowsDroppedByWatermark" : 0,
    "numShufflePartitions" : 200,
    "numStateStoreInstances" : 200,

We have 200 partitions in this case.
So the memory usage per partition / state store would be ~78GB. However, this node has 256GB memory total and we have 2 such nodes. We have configured our cluster to use 30% of available memory on each node for RocksDB which is ~77GB.
So the memory being reported here is actually per node rather than per partition which could be confusing for users.

Does this PR introduce any user-facing change?

No - only a metrics reporting change

How was this patch tested?

Added unit tests

[info] Run completed in 10 seconds, 878 milliseconds.
[info] Total number of tests run: 24
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 24, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

…nded memory usage is enabled

anishshri-db · 2024-05-09T00:59:05Z

@HeartSaVioR - could you PTAL, thx !

HeartSaVioR · 2024-05-09T01:07:43Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

+    // running on the same node and account the usage to this single cache. In this case, its not
+    // possible to provide partition level or query level memory usage.
+    val memoryUsage = if (conf.boundedMemoryUsage) {
+      0L


Could you please try with -1L? I'd say let's distinguish without doubt if possible.

Tried this, but it seems the progress metrics will still convert this to 0

"stateOperators" : [ { "operatorName" : "stateStoreSave", "numRowsTotal" : 3, "numRowsUpdated" : 3, "allUpdatesTimeMs" : 10, "numRowsRemoved" : 0, "allRemovalsTimeMs" : 0, "commitTimeMs" : 211, "memoryUsedBytes" : 0,

It seems the lowest allowed value for the SQLMetric will still be interpreted as 0.

We seem to do the same thing for reporting numRowsTotal if the trackingTotalNumRows flag is disabled as well

Ah OK. Let's leave it as it is. Probably I was also struggled about this and forgot.

HeartSaVioR

+1 pending CI.

HeartSaVioR · 2024-05-09T08:09:23Z

Thanks! Merging to master.

… bounded memory usage is enabled ### What changes were proposed in this pull request? Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled ### Why are the changes needed? Without this, we are providing memory usage that is the max usage per node at a partition level. For eg - if we report this ``` "allRemovalsTimeMs" : 93, "commitTimeMs" : 32240, "memoryUsedBytes" : 15956211724278, "numRowsDroppedByWatermark" : 0, "numShufflePartitions" : 200, "numStateStoreInstances" : 200, ``` We have 200 partitions in this case. So the memory usage per partition / state store would be ~78GB. However, this node has 256GB memory total and we have 2 such nodes. We have configured our cluster to use 30% of available memory on each node for RocksDB which is ~77GB. So the memory being reported here is actually per node rather than per partition which could be confusing for users. ### Does this PR introduce _any_ user-facing change? No - only a metrics reporting change ### How was this patch tested? Added unit tests ``` [info] Run completed in 10 seconds, 878 milliseconds. [info] Total number of tests run: 24 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 24, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46491 from anishshri-db/task/SPARK-48208. Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

[SPARK-48208] Skip providing memory usage metrics from RocksDB if bou…

fac9392

…nded memory usage is enabled

github-actions bot added SQL STRUCTURED STREAMING labels May 9, 2024

anishshri-db changed the title ~~[SPARK-48208] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled~~ [SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled May 9, 2024

HeartSaVioR reviewed May 9, 2024

View reviewed changes

HeartSaVioR approved these changes May 9, 2024

View reviewed changes

HeartSaVioR closed this in 045ec6a May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled #46491

[SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled #46491

anishshri-db commented May 9, 2024

anishshri-db commented May 9, 2024

HeartSaVioR May 9, 2024

anishshri-db May 9, 2024

HeartSaVioR May 9, 2024

HeartSaVioR left a comment

HeartSaVioR commented May 9, 2024

[SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled #46491

[SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled #46491

Conversation

anishshri-db commented May 9, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

anishshri-db commented May 9, 2024

HeartSaVioR May 9, 2024

Choose a reason for hiding this comment

anishshri-db May 9, 2024

Choose a reason for hiding this comment

HeartSaVioR May 9, 2024

Choose a reason for hiding this comment

HeartSaVioR left a comment

Choose a reason for hiding this comment

HeartSaVioR commented May 9, 2024