-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled #46491
Conversation
…nded memory usage is enabled
@HeartSaVioR - could you PTAL, thx ! |
// running on the same node and account the usage to this single cache. In this case, its not | ||
// possible to provide partition level or query level memory usage. | ||
val memoryUsage = if (conf.boundedMemoryUsage) { | ||
0L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please try with -1L? I'd say let's distinguish without doubt if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried this, but it seems the progress metrics will still convert this to 0
"stateOperators" : [ {
"operatorName" : "stateStoreSave",
"numRowsTotal" : 3,
"numRowsUpdated" : 3,
"allUpdatesTimeMs" : 10,
"numRowsRemoved" : 0,
"allRemovalsTimeMs" : 0,
"commitTimeMs" : 211,
"memoryUsedBytes" : 0,
It seems the lowest allowed value for the SQLMetric will still be interpreted as 0.
We seem to do the same thing for reporting numRowsTotal
if the trackingTotalNumRows
flag is disabled as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK. Let's leave it as it is. Probably I was also struggled about this and forgot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending CI.
Thanks! Merging to master. |
… bounded memory usage is enabled ### What changes were proposed in this pull request? Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled ### Why are the changes needed? Without this, we are providing memory usage that is the max usage per node at a partition level. For eg - if we report this ``` "allRemovalsTimeMs" : 93, "commitTimeMs" : 32240, "memoryUsedBytes" : 15956211724278, "numRowsDroppedByWatermark" : 0, "numShufflePartitions" : 200, "numStateStoreInstances" : 200, ``` We have 200 partitions in this case. So the memory usage per partition / state store would be ~78GB. However, this node has 256GB memory total and we have 2 such nodes. We have configured our cluster to use 30% of available memory on each node for RocksDB which is ~77GB. So the memory being reported here is actually per node rather than per partition which could be confusing for users. ### Does this PR introduce _any_ user-facing change? No - only a metrics reporting change ### How was this patch tested? Added unit tests ``` [info] Run completed in 10 seconds, 878 milliseconds. [info] Total number of tests run: 24 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 24, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46491 from anishshri-db/task/SPARK-48208. Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
What changes were proposed in this pull request?
Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled
Why are the changes needed?
Without this, we are providing memory usage that is the max usage per node at a partition level.
For eg - if we report this
We have 200 partitions in this case.
So the memory usage per partition / state store would be ~78GB. However, this node has 256GB memory total and we have 2 such nodes. We have configured our cluster to use 30% of available memory on each node for RocksDB which is ~77GB.
So the memory being reported here is actually per node rather than per partition which could be confusing for users.
Does this PR introduce any user-facing change?
No - only a metrics reporting change
How was this patch tested?
Added unit tests
Was this patch authored or co-authored using generative AI tooling?
No