New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17076: Optimize OrderedNodePlacementPlugin#getAllReplicasOnNode
#2076
SOLR-17076: Optimize OrderedNodePlacementPlugin#getAllReplicasOnNode
#2076
Conversation
…` field, which keeps track of all replicas. Instead of computing a new list every time. 2. Added a getAllReplicaCount method to avoid creating new list of replicas if only the count is required
OrderedNodePlacementPlugin#getAllReplicasOnNode
Got this while running
However, re-running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested a couple of changes, but overall this looks good to me!
@@ -515,6 +521,7 @@ public final void removeReplica(Replica replica) { | |||
}); | |||
if (hasReplica.get()) { | |||
removeProjectedReplicaWeights(replica); | |||
allReplicas.remove(replica); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're separately tracking replicas in a top-level map (and assuming that this is all accessed by a single thread), I think this method could be simplified to:
if (allReplicas.remove(replica)) {
// [the nested removal logic]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @magibney ! I'm wondering if these 2 statements should just be placed in
solr/solr/core/src/java/org/apache/solr/cluster/placement/plugins/OrderedNodePlacementPlugin.java
Line 516 in d890246
hasReplica.set(true); |
Perhaps I have overlooked some edge cases which using the AtomicBoolean hasReplica was necessary ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After i read the code again, I'm not 100% about the original purpose of hasReplica
flag.
However, it does make the logic a bit more isolated and easier to follow - iterate the list first, figure out if the replica exists, if it does, deal with it AFTEr finishing the traversal. In a logical level, it's pretty clean and avoid any concurrent modification exception (though the map/list being iterated on is private field anyway).
Hm...I simply move the allReplicas.remove to within the if (reps.remove(replica))
block now, cause i do think it's the best if the operations (add/remove) on the 2 collections stay close to each other.
Anyway, I think these are very minor concerns? 😊 either ways are fine
.flatMap(shard -> shard.values().stream()) | ||
.flatMap(Collection::stream) | ||
.collect(Collectors.toSet()); | ||
return new HashSet<>(allReplicas); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on how this is being used currently, there's no need to wrap this in a new HashSet. The method is public, so you never know who might call it and what their assumptions might be; but you are replacing Collectors.toSet()
, which specifically makes no guarantees about mutability/immutability, so I'd be inclined to just do Collections.unmodifableSet(allReplicas)
, taking this opportunity to make the API more conservative, without breaking any backcompat.
Also, if the motivating concern is "many replicas", the cost of instantiating a new HashMap that we don't really need is not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, the current fix is to use the most defensive approach which does not modify the behavior at all. The minor concern with unmodifableSet
was that the "view" could still mutate after this method returns and this could break certain expectation of the caller.
Otherwise I don't have any objection using unmodifableSet
. Perhaps @HoustonPutman can share some thoughts? 😊
#2076) * 1. Changed getAllReplicasOnNode to just return a copy of `allReplicas` field, which keeps track of all replicas. Instead of computing a new list every time. 2. Added a getAllReplicaCount method to avoid creating new list of replicas if only the count is required * ./gradlew tidy * minor refactoring
apache#2076) * 1. Changed getAllReplicasOnNode to just return a copy of `allReplicas` field, which keeps track of all replicas. Instead of computing a new list every time. 2. Added a getAllReplicaCount method to avoid creating new list of replicas if only the count is required * ./gradlew tidy * minor refactoring
apache#2076) * 1. Changed getAllReplicasOnNode to just return a copy of `allReplicas` field, which keeps track of all replicas. Instead of computing a new list every time. 2. Added a getAllReplicaCount method to avoid creating new list of replicas if only the count is required * ./gradlew tidy * minor refactoring
Just a little reminder... when squash-merging, remember to edit the commit message so it's cleaned up to reflect the final result in totality; it's almost always necessary. Should not refer to running "tidy", for example. |
Thanks for the reminder, will make sure next time!
…On Wed, May 1, 2024 at 10:06 AM David Smiley ***@***.***> wrote:
Just a little reminder... when squash-merging, remember to edit the commit
message so it's cleaned up to reflect the final result in totality; it's
almost always necessary. Should not refer to running "tidy", for example.
—
Reply to this email directly, view it on GitHub
<#2076 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF67P4HYVNZXBTISWFQ2DJTZADZGZAVCNFSM6AAAAAA7LV4DYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBYGUYTKMBWGU>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
https://issues.apache.org/jira/browse/SOLR-17076
Description
OrderedNodePlacementPlugin#getAllReplicasOnNode
can be slow in a cluster with lot of replicas. The effect could compound with new collection creation with many shards.Solution
Introduced a new field
allReplicas
which keeps track of all the replicas added/removed in the Plugin. Return a copy of that set instead forgetAllReplicasOnNode
.Also added a new convenient method
getAllReplicaCount
if only the count is neededTests
A minor change, rely on existing unit test case
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.