Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1472][part-4] feature(server): Introduce metrics for Netty's pinnedDirectMemory and usedDirectMemory #1524

Merged
merged 1 commit into from
Feb 14, 2024

Conversation

rickyma
Copy link
Contributor

@rickyma rickyma commented Feb 13, 2024

What changes were proposed in this pull request?

We need to know the exact direct memory usage of PooledByteBufAllocator in Netty.
So we should introduce metrics for Netty's pinnedDirectMemory and usedDirectMemory.

Why are the changes needed?

A sub PR for: #1519

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

@rickyma
Copy link
Contributor Author

rickyma commented Feb 13, 2024

@jerqi PTAL

@codecov-commenter
Copy link

codecov-commenter commented Feb 13, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (d87dc90) 54.20% compared to head (49a3c72) 55.19%.

Files Patch % Lines
...pache/uniffle/server/NettyDirectMemoryTracker.java 50.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1524      +/-   ##
============================================
+ Coverage     54.20%   55.19%   +0.99%     
  Complexity     2808     2808              
============================================
  Files           430      410      -20     
  Lines         24410    22058    -2352     
  Branches       2082     2082              
============================================
- Hits          13231    12175    -1056     
+ Misses        10349     9124    -1225     
+ Partials        830      759      -71     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Test Results

2 287 files  ±0  2 287 suites  ±0   4h 28m 15s ⏱️ -45s
  819 tests ±0    818 ✅ ±0   1 💤 ±0  0 ❌ ±0 
9 086 runs  ±0  9 073 ✅ ±0  13 💤 ±0  0 ❌ ±0 

Results for commit 49a3c72. ± Comparison against base commit d87dc90.

@@ -79,6 +79,8 @@ public class ShuffleServerMetrics {
private static final String USED_BUFFER_SIZE = "used_buffer_size";
private static final String READ_USED_BUFFER_SIZE = "read_used_buffer_size";
private static final String USED_DIRECT_MEMORY_SIZE = "used_direct_memory_size";
private static final String ALLOCATED_DIRECT_MEMORY_SIZE = "allocated_direct_memory_size";
private static final String PINNED_DIRECT_MEMORY_SIZE = "pinned_direct_memory_size";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of pinned?

Copy link
Contributor Author

@rickyma rickyma Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to netty/netty#11667.

To be brief, it means the direct memory actually used by PooledByteBufAllocator, cannot be used anymore. If we use the old metric usedDirectMemory, we cannot tell whether the direct memory is cached by PooledByteBufAllocator or if it's truly exhausted.

@rickyma rickyma requested a review from jerqi February 13, 2024 09:27
Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @rickyma

@jerqi jerqi merged commit 72d85a7 into apache:master Feb 14, 2024
38 checks passed
zuston pushed a commit that referenced this pull request Feb 23, 2024
…mory issue causing OOM (#1534)

### What changes were proposed in this pull request?

When we use `UnpooledByteBufAllocator` to allocate off-heap `ByteBuf`, Netty directly requests off-heap memory from the operating system instead of allocating it according to `pageSize` and `chunkSize`. This way, we can obtain the exact `ByteBuf` size during the pre-allocation of memory, avoiding distortion of metrics such as `usedMemory`. 

Moreover, we have restored the code submission of the PR [#1521](#1521). We ensure that there is sufficient direct memory for the Netty server during decoding `sendShuffleDataRequest` by taking into account the `encodedLength` of `ByteBuf` in advance during the pre-allocation of memory, thus avoiding OOM during decoding `sendShuffleDataRequest`. 

Since we are not using `PooledByteBufAllocator`, the PR [#1524](#1524) is no longer needed.

### Why are the changes needed?

A sub PR for: #1519

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.
@rickyma rickyma deleted the issue-1472-part-4 branch May 5, 2024 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants