-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#706] Implement spill method to avoid memory deadlock #714
Conversation
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #714 +/- ##
============================================
+ Coverage 60.95% 63.27% +2.32%
- Complexity 1956 1984 +28
============================================
Files 244 231 -13
Lines 13308 11456 -1852
Branches 1119 1125 +6
============================================
- Hits 8112 7249 -863
+ Misses 4740 3804 -936
+ Partials 456 403 -53
... and 15 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@jerqi PTAL. I have updated the spark3 code, if the design is OK, I will change the spark2 related code. |
I did a quick through of the code, I didn't see any serious problem about this problem. I will look it in details later tonight. However, do you think it's worth a design doc to address problems in #706? |
It looks unnecessary, the optimization is not a big change for current codebase. |
Looking forward this. |
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Outdated
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Outdated
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Show resolved
Hide resolved
OK. Let me do some code review first. Let's decide that later. Even if a design doc is not necessary, some docs might be necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some review, this PR may require more work, especially with UTs and some integration tests....
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Outdated
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Outdated
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Show resolved
Hide resolved
client-spark/common/src/test/java/org/apache/spark/shuffle/writer/DataPusherTest.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Outdated
Show resolved
Hide resolved
client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java
Show resolved
Hide resolved
PTAL again @smallzhongfeng @advancedxy . Thanks |
the check style looks is still failed, please help address that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version looks in a good shape. I will also review it later this weekend
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/AddBlockEvent.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Outdated
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Show resolved
Hide resolved
import org.apache.uniffle.common.ShuffleBlockInfo; | ||
import org.apache.uniffle.common.util.ThreadUtils; | ||
|
||
public class DataPusher implements Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could DataPusher be used for MapReduce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen this part code.
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Outdated
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Outdated
Show resolved
Hide resolved
client-spark/common/src/test/java/org/apache/spark/shuffle/writer/WriteBufferManagerTest.java
Outdated
Show resolved
Hide resolved
client-spark/common/src/test/java/org/apache/spark/shuffle/writer/WriteBufferManagerTest.java
Outdated
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkConfig.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/AddBlockEvent.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkConfig.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Show resolved
Hide resolved
client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java
Show resolved
Hide resolved
client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/WriteBufferManager.java
Show resolved
Hide resolved
PTAL again @advancedxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally lgtm, except some minor comments
client-spark/common/src/main/java/org/apache/spark/shuffle/writer/DataPusher.java
Show resolved
Hide resolved
client-spark/common/src/test/java/org/apache/spark/shuffle/writer/WriteBufferManagerTest.java
Outdated
Show resolved
Hide resolved
client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java
Outdated
Show resolved
Hide resolved
common/src/test/java/org/apache/uniffle/common/util/ThreadUtilsTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for some minor comment.
PTAL again @smallzhongfeng @advancedxy BTW, this PR has been applied in our internal uniffle, it works well. |
Just rebase the latest master. cc @advancedxy @smallzhongfeng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. But it seems that you need to rebase again.
Pity. Add done @advancedxy |
…che#714) ### What changes were proposed in this pull request? 1. Introduce the `DataPusher` to replace the `eventLoop`, this could be as general part for spark2 and spark3. 2. Implement the `spill` method in `WriterBufferManager` to avoid memory deadlock. ### Why are the changes needed? In current codebase, if having several `WriterBufferManagers`, when each other is acquiring memory, the deadlock will happen. To solve this, we should implement spill function to break this deadlock condition. Fix: apache#706 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? 1. Existing UTs 2. Newly added UTs
…ock (apache#714)" This reverts commit 1b48c12.
…ock (apache#714)" This reverts commit 1b48c12.
### What changes were proposed in this pull request? Disable the memory spill operation. ### Why are the changes needed? In #714 , the memory spill is introduced to solve the dead lock. For a pity, these part code should be handled carefully, including concurrency and data consistency, like the fix PR #811 . And this part has bugs and I will fix these in the next days. Currently, I want to revert the PR #714. But the partial refactor of #714 is still meaningful. So I submit this PR to disable the memory spill. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Don't need
…sure data correctness (#1558) ### What changes were proposed in this pull request? Verify the number of written records to enhance data accuracy. Make sure all data records are sent by clients. Make sure bugs like #714 will never be introduced into the code. ### Why are the changes needed? A follow-up PR for #848. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs.
What changes were proposed in this pull request?
DataPusher
to replace theeventLoop
, this could be as general part for spark2 and spark3.spill
method inWriterBufferManager
to avoid memory deadlock.Why are the changes needed?
In current codebase, if having several
WriterBufferManagers
, when each other is acquiring memory, the deadlock will happen. To solve this, we should implement spill function to break this deadlock condition.Fix: #706
Does this PR introduce any user-facing change?
No.
How was this patch tested?