-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement] ShuffleBlock should be release when finished reading #74
Conversation
Did you verify this patch? I think we would better have ut for every pr, we should test the feature by hand at least. |
We have tested in production environment. Sometimes it is difficult to write ut, I will try to write ut for this pr later |
Have your company used the Uniffle in your production environment? Just curious, could you tell me the name of your company? |
I know it's sometimes diffcult. If you can't write ut for it. you should provide some production envrionment detailed data to prove the effect of the pr. It's best to have screenshot. |
client-spark/common/src/main/java/org/apache/spark/shuffle/reader/RssShuffleDataIterator.java
Outdated
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/reader/RssShuffleDataIterator.java
Show resolved
Hide resolved
SF |
client-spark/common/src/main/java/org/apache/spark/shuffle/reader/RssShuffleDataIterator.java
Outdated
Show resolved
Hide resolved
I think use HeapByteBuffer is better here. What do you think? |
We find Uniffle's GC time is longer than Spark origin shuffle in our test when the task read shuffle data. So we'd better use off heap memory. But grpc use heap memory, so we can't use offheap memory totally. We will replace grpc with netty in the future. We hope we can use offheap memory totally. |
When? Do we have a detailed plan? |
Maybe October, we hope to do that, but there are always other important things to do. For 0.6 version, we plan to support to deploy Uniffle on K8S. For 0.7 version, we plan to replace grpc with netty, but we only replace part interface (read shuffle data and write shuffle data). |
client-spark/common/src/main/java/org/apache/spark/shuffle/reader/RssShuffleDataIterator.java
Outdated
Show resolved
Hide resolved
common/src/test/java/org/apache/uniffle/common/RssShuffleUtilsTest.java
Outdated
Show resolved
Hide resolved
common/src/test/java/org/apache/uniffle/common/RssShuffleUtilsTest.java
Outdated
Show resolved
Hide resolved
common/src/test/java/org/apache/uniffle/common/RssShuffleUtilsTest.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #74 +/- ##
============================================
- Coverage 56.39% 56.38% -0.02%
- Complexity 1173 1178 +5
============================================
Files 149 149
Lines 7953 7992 +39
Branches 761 766 +5
============================================
+ Hits 4485 4506 +21
- Misses 3226 3243 +17
- Partials 242 243 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your contribution. It's a really great work.
What changes were proposed in this pull request?
release shuffleblock when finished reading
Why are the changes needed?
We found spark executor is easy be killed by yarn, and i found it is because executor use too mush offheap memory when read shuffle data.
I found most of offheap memory is used to store uncompressed shuffle Data, and this part of memory will be release only when GC is triggered
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add new ut