[#2111] improvement(client): cleanup the useless shuffle server clients when unregister shuffle#2112
[#2111] improvement(client): cleanup the useless shuffle server clients when unregister shuffle#2112xianjingfeng wants to merge 4 commits intoapache:masterfrom
Conversation
Test Results 3 034 files + 152 3 034 suites +152 6h 43m 35s ⏱️ + 46m 53s Results for commit 8729ecd. ± Comparison against base commit d170004. This pull request removes 30 and adds 190 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
|
What will it happen if we don't have this patch? |
If there are many stages in a application, the client will keep the clients for all shuffle servers, which will waste memory. |
One server may be used by multiple stages. We can't remove it if only one stage is finished. |
Right, not all the shuffle servers will be deleted in this PR |
zuston
left a comment
There was a problem hiding this comment.
Emm. I think we should have the more graceful way to close the shuffle-server clients.
For my sight, I hope the closing way should be optional, which can be implemented by the pluggable closable policies.
| return serverToClients.get(shuffleServerInfo); | ||
| } | ||
|
|
||
| public synchronized void cleanUselessShuffleServerClients( |
There was a problem hiding this comment.
Is is possible that the another thread to hold the removing client?
There was a problem hiding this comment.
I think it is impossible, if another thread hold the removing client, the shuffleServerInfo of the removing client will always be in shuffleServerInfoMap.
+1
What are the possible policies? |
|
@zuston gently ping. |
1 similar comment
|
@zuston gently ping. |
| return serverToClients.get(shuffleServerInfo); | ||
| } | ||
|
|
||
| public synchronized void cleanUselessShuffleServerClients( |
There was a problem hiding this comment.
This method impl is not straightforward, I hope the impl like the following code:
public synchronized void closeClients(List<ShuffleServerInfo> candidates) {
// ....
}
|
Sorry the late review and I think i missed this thread. @xianjingfeng |
| return serverToClients.get(shuffleServerInfo); | ||
| } | ||
|
|
||
| public synchronized void closeClients(Set<ShuffleServerInfo> candidates) { |
There was a problem hiding this comment.
I think you misunderstand my thought. I hope the candidates are the deleted client candidates rather than being excluded clients.
public synchronized void closeClients(Set<ShuffleServerInfo> candidates) {
clients.remove_all(candidates)
}There was a problem hiding this comment.
We need use synchronized outside if we find the candidates outside of this method. As follow
ShuffleServerClientFactory shuffleServerClientFactory = ShuffleServerClientFactory.getInstance();
synchronized (shuffleServerClientFactory) {
Set<ShuffleServerInfo> candidates = shuffleServerClientFactory.findClosableClients(getAllShuffleServers());
shuffleServerClientFactory.closeClients(candidates);
}It feels worse than the original implementation.
There was a problem hiding this comment.
From my thought, all clients are managed by the shuffleServerClientFactory, if in this class, we could maintain a structure to record the every client reference count num. if the client reference count is 0, then we could release it.
If so, the +1 operation could be happened on the getShuffleServerClient. And -1 should be happened on the removing in shuffleWriteClientImpl.
How about this way?
There was a problem hiding this comment.
If so, the +1 operation could be happened on the
getShuffleServerClient. And -1 should be happened on the removing in shuffleWriteClientImpl.
I think this can't be achieved. To do this, we must invoke getShuffleServerClient before we invoke the RPC interface every time and -1 after each call to the RPC interface.
This reverts commit f6485f7.
summaryzb
left a comment
There was a problem hiding this comment.
How about get the client through a guava cache with expire time setting, maybe it's more common without caring about whether the corresponding server is useless or availeble
This solution will have problems if the client is only retrieved from the cache once when it is used in the future. |
|
I think I should close this PR. Because I find |
What changes were proposed in this pull request?
Cleanup useless the shuffle server clients when unregister shuffle
Why are the changes needed?
Fix: #2111
Does this PR introduce any user-facing change?
No.
How was this patch tested?
UT