Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled #46491

Closed
wants to merge 1 commit into from

Conversation

anishshri-db
Copy link
Contributor

What changes were proposed in this pull request?

Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled

Why are the changes needed?

Without this, we are providing memory usage that is the max usage per node at a partition level.
For eg - if we report this

    "allRemovalsTimeMs" : 93,
    "commitTimeMs" : 32240,
    "memoryUsedBytes" : 15956211724278,
    "numRowsDroppedByWatermark" : 0,
    "numShufflePartitions" : 200,
    "numStateStoreInstances" : 200,

We have 200 partitions in this case.
So the memory usage per partition / state store would be ~78GB. However, this node has 256GB memory total and we have 2 such nodes. We have configured our cluster to use 30% of available memory on each node for RocksDB which is ~77GB.
So the memory being reported here is actually per node rather than per partition which could be confusing for users.

Does this PR introduce any user-facing change?

No - only a metrics reporting change

How was this patch tested?

Added unit tests

[info] Run completed in 10 seconds, 878 milliseconds.
[info] Total number of tests run: 24
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 24, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

@anishshri-db anishshri-db changed the title [SPARK-48208] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled [SPARK-48208][SS] Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled May 9, 2024
@anishshri-db
Copy link
Contributor Author

@HeartSaVioR - could you PTAL, thx !

// running on the same node and account the usage to this single cache. In this case, its not
// possible to provide partition level or query level memory usage.
val memoryUsage = if (conf.boundedMemoryUsage) {
0L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please try with -1L? I'd say let's distinguish without doubt if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this, but it seems the progress metrics will still convert this to 0

  "stateOperators" : [ {
    "operatorName" : "stateStoreSave",
    "numRowsTotal" : 3,
    "numRowsUpdated" : 3,
    "allUpdatesTimeMs" : 10,
    "numRowsRemoved" : 0,
    "allRemovalsTimeMs" : 0,
    "commitTimeMs" : 211,
    "memoryUsedBytes" : 0,

It seems the lowest allowed value for the SQLMetric will still be interpreted as 0.

We seem to do the same thing for reporting numRowsTotal if the trackingTotalNumRows flag is disabled as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. Let's leave it as it is. Probably I was also struggled about this and forgot.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending CI.

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
… bounded memory usage is enabled

### What changes were proposed in this pull request?
Skip providing memory usage metrics from RocksDB if bounded memory usage is enabled

### Why are the changes needed?
Without this, we are providing memory usage that is the max usage per node at a partition level.
For eg - if we report this
```
    "allRemovalsTimeMs" : 93,
    "commitTimeMs" : 32240,
    "memoryUsedBytes" : 15956211724278,
    "numRowsDroppedByWatermark" : 0,
    "numShufflePartitions" : 200,
    "numStateStoreInstances" : 200,
```

We have 200 partitions in this case.
So the memory usage per partition / state store would be ~78GB. However, this node has 256GB memory total and we have 2 such nodes. We have configured our cluster to use 30% of available memory on each node for RocksDB which is ~77GB.
So the memory being reported here is actually per node rather than per partition which could be confusing for users.

### Does this PR introduce _any_ user-facing change?
No - only a metrics reporting change

### How was this patch tested?
Added unit tests

```
[info] Run completed in 10 seconds, 878 milliseconds.
[info] Total number of tests run: 24
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 24, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46491 from anishshri-db/task/SPARK-48208.

Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants