Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] ShuffleBlock should be release when finished reading #74

Merged
merged 5 commits into from
Jul 29, 2022

Conversation

xianjingfeng
Copy link
Member

@xianjingfeng xianjingfeng commented Jul 26, 2022

What changes were proposed in this pull request?

release shuffleblock when finished reading

Why are the changes needed?

We found spark executor is easy be killed by yarn, and i found it is because executor use too mush offheap memory when read shuffle data.
I found most of offheap memory is used to store uncompressed shuffle Data, and this part of memory will be release only when GC is triggered

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add new ut

@jerqi
Copy link
Contributor

jerqi commented Jul 26, 2022

Did you verify this patch? I think we would better have ut for every pr, we should test the feature by hand at least.

@xianjingfeng
Copy link
Member Author

xianjingfeng commented Jul 26, 2022

Did you verify this patch? I think we would better have ut for every pr, we should test the feature by hand at least.

We have tested in production environment. Sometimes it is difficult to write ut, I will try to write ut for this pr later

@jerqi
Copy link
Contributor

jerqi commented Jul 26, 2022

Did you verify this patch? I think we would better have ut for every pr, we should test the feature by hand at least.

We have tested in production environment. Sometimes it is difficult to write ut, I will try to write ut for this pr later

Have your company used the Uniffle in your production environment? Just curious, could you tell me the name of your company?

@jerqi
Copy link
Contributor

jerqi commented Jul 26, 2022

Did you verify this patch? I think we would better have ut for every pr, we should test the feature by hand at least.

We have tested in production environment. Sometimes it is difficult to write ut, I will try to write ut for this pr later

I know it's sometimes diffcult. If you can't write ut for it. you should provide some production envrionment detailed data to prove the effect of the pr. It's best to have screenshot.

@xianjingfeng
Copy link
Member Author

Did you verify this patch? I think we would better have ut for every pr, we should test the feature by hand at least.

We have tested in production environment. Sometimes it is difficult to write ut, I will try to write ut for this pr later

Have your company used the Uniffle in your production environment? Just curious, could you tell me the name of your company?

SF

@xianjingfeng
Copy link
Member Author

I think use HeapByteBuffer is better here. What do you think?

@jerqi
Copy link
Contributor

jerqi commented Jul 27, 2022

I think use HeapByteBuffer is better here. What do you think?

We find Uniffle's GC time is longer than Spark origin shuffle in our test when the task read shuffle data. So we'd better use off heap memory. But grpc use heap memory, so we can't use offheap memory totally. We will replace grpc with netty in the future. We hope we can use offheap memory totally.

@xianjingfeng
Copy link
Member Author

We will replace grpc with netty in the future. We hope we can use offheap memory totally.

When? Do we have a detailed plan?

@jerqi
Copy link
Contributor

jerqi commented Jul 27, 2022

We will replace grpc with netty in the future. We hope we can use offheap memory totally.

When? Do we have a detailed plan?

Maybe October, we hope to do that, but there are always other important things to do. For 0.6 version, we plan to support to deploy Uniffle on K8S. For 0.7 version, we plan to replace grpc with netty, but we only replace part interface (read shuffle data and write shuffle data).

@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2022

Codecov Report

Merging #74 (92c1b0a) into master (aa18be0) will decrease coverage by 0.01%.
The diff coverage is 68.75%.

@@             Coverage Diff              @@
##             master      #74      +/-   ##
============================================
- Coverage     56.39%   56.38%   -0.02%     
- Complexity     1173     1178       +5     
============================================
  Files           149      149              
  Lines          7953     7992      +39     
  Branches        761      766       +5     
============================================
+ Hits           4485     4506      +21     
- Misses         3226     3243      +17     
- Partials        242      243       +1     
Impacted Files Coverage Δ
.../apache/uniffle/common/exception/RssException.java 0.00% <0.00%> (ø)
...e/spark/shuffle/reader/RssShuffleDataIterator.java 89.70% <50.00%> (-3.95%) ⬇️
...ava/org/apache/uniffle/common/RssShuffleUtils.java 95.65% <100.00%> (+2.31%) ⬆️
.../apache/uniffle/coordinator/CoordinatorServer.java 68.67% <0.00%> (-2.22%) ⬇️
.../java/org/apache/uniffle/server/ShuffleServer.java 63.41% <0.00%> (-1.30%) ⬇️
...he/uniffle/client/impl/ShuffleWriteClientImpl.java 25.95% <0.00%> (-0.10%) ⬇️
...he/uniffle/coordinator/CoordinatorGrpcService.java 2.36% <0.00%> (-0.06%) ⬇️
...java/org/apache/uniffle/common/rpc/GrpcServer.java 0.00% <0.00%> (ø)
.../uniffle/storage/handler/impl/LocalFileWriter.java 90.00% <0.00%> (ø)
.../hadoop/mapreduce/task/reduce/RssEventFetcher.java 88.57% <0.00%> (+1.68%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution. It's a really great work.

@jerqi jerqi merged commit ccb39ed into apache:master Jul 29, 2022
@xianjingfeng xianjingfeng deleted the issue_73 branch April 5, 2023 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants