SOLR-16855: Add a MigrateReplicas API by HoustonPutman · Pull Request #1730 · apache/solr

HoustonPutman · 2023-06-27T17:17:39Z

https://issues.apache.org/jira/browse/SOLR-16855

tflobbe

Looks great. Just a couple questions

tflobbe · 2023-06-27T23:11:28Z

+      if (!clusterState.liveNodesContain(sourceNode)) {
+        throw new SolrException(
+            SolrException.ErrorCode.BAD_REQUEST, "Source Node: " + sourceNode + " is not live");
+      }


Should this be allowed?

Currently the ReplaceNode/BalanceReplicas logic does not allow for moving replicas off of non-live nodes. I think that's something that we should think about for a separate ticket. Especially now that cores are not deleted on startup if they don't exist in Zookeeper. #1321

I'm not sure how this will work currently. My thinking is, nodes that are removed, you want to "vacate" them and have the replicas be created somewhere else, so that:

You don't see the node anymore in the "nodes" page and

You get your replication factor back to whatever it was before

Would this API be the one to use in this case?

Yeah I do think this is the correct API for that, but it (REPLACENODE) is not currently able to do that. AFAIK.

I think we should make this a param as a part of MigrateReplicas and ReplaceNode. (migrateFromNonLiveNodes or something like that). That would somewhat necessitate solr.deleteUnknownCores, from the above linked PR, to be set to true, so that when the node is started back up, it deletes the data directories from those nodes.

Again, I think this should be done in a separate JIRA/PR.

HoustonPutman · 2023-06-29T19:29:06Z

So I found a bug, or room for improvement, in the orderedNodePlacementPlugin code due to one of the tests. Basically it will do a greedy lookup to find the best node for each new replica type. However, an un-optimal ordering of the replicas being placed can result in an un-optimal set of placements. Its hard to make this logic fool-proof, without blowing up the computation time. But I've come up with a "hack", not really, but it's only fixes a subset of the un-optimal cases, that we can have that works for now.

We'll need to make the OrderedNodePlacementPlugin smarter to make all cases optimal, or at least closer to optimal.
But this can be done separately as an optimization in the future.

This is better than the previous implementations, which also would have had the same bug. But now we can fix it for all placement plugins at once!

- Remove old WeightedNode sorting logic that is no longer needed - Remove Balancing stashed weight logic that isn't used

…eplicas...

HoustonPutman · 2023-06-30T20:11:08Z

Ok the tests now pass (fixed a weird bug where splitShard can send a placementRequest with no actual replicas to add... but anyways)

The "hacky" fix for tied-nodes, is now expanded and no longer "hacky". We only loop once to retry replicas skipped because of tied nodes, but that should be ok. We can improve this later by looping more if we need to. However, the code is now generalized for any amount of tied-nodes, and will act accordingly given the number of replicas to place and the number of tied nodes.

I wish that this fix could have been separate from this PR, but there is little point having this MigrateReplicas API if it isn't going to do a good migration.

@tflobbe Amazingly, AffinityPlacementFactoryTest.testScalability is now 20% faster than before this PR (which is roughly 60-70% faster than a few weeks ago). I haven't taken the time to see how this code sped it up so much, but I'll take the win.

HoustonPutman · 2023-06-30T20:12:32Z

I'll merge this early next week in case anyone wants to take a look at the new logic/data structures used for the orderedNodes placement stuff.

Also: - Fix orderedPlacement logic for tied nodes - Remove stashed weight logic in WeightedNode that is no longer used (cherry picked from commit b3883ad)

HoustonPutman added 6 commits June 27, 2023 12:09

Add migrate replicas command and API

1c36757

Add docs for migrateReplicas, cleanup related docs

3e65228

Add changelog entry

f548b91

Add first tests

e0e50f4

Add integration test

5742d45

tidy

dc26b3d

HoustonPutman requested review from gerlowskija, risdenk and tflobbe June 27, 2023 17:17

tflobbe approved these changes Jun 28, 2023

View reviewed changes

HoustonPutman added 4 commits June 29, 2023 15:30

Fix orderedPlacement computation for bad edge case

4eea768

Merge remote-tracking branch 'apache' into migrate-replicas

219d575

fixes

e83b96c

Tidy

b965927

sonatype-lift Bot reviewed Jun 29, 2023

View reviewed changes

Comment thread solr/core/src/java/org/apache/solr/cluster/placement/plugins/OrderedNodePlacementPlugin.java Outdated

HoustonPutman added 5 commits June 30, 2023 13:44

Make tie-logic full-featured

60458d2

- Remove old WeightedNode sorting logic that is no longer needed - Remove Balancing stashed weight logic that isn't used

Small fixes

577f566

Refactor shared code into method.

7203d4b

Use dequeue instead of linkedList

dd1b36e

Fix scenario when the placementRequest doesn't actually request any r…

c556115

…eplicas...

HoustonPutman added 3 commits July 5, 2023 12:04

Merge remote-tracking branch 'apache' into migrate-replicas

2d849f0

Add docs for the new internal classes

b18e749

Merge remote-tracking branch 'apache' into migrate-replicas

607352e

HoustonPutman merged commit b3883ad into apache:main Jul 6, 2023

HoustonPutman deleted the migrate-replicas branch July 6, 2023 14:52

HoustonPutman added a commit that referenced this pull request Jul 6, 2023

SOLR-16855: Add a MigrateReplicas API (#1730)

fd0f6a5

Also: - Fix orderedPlacement logic for tied nodes - Remove stashed weight logic in WeightedNode that is no longer used (cherry picked from commit b3883ad)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-16855: Add a MigrateReplicas API#1730

SOLR-16855: Add a MigrateReplicas API#1730
HoustonPutman merged 18 commits into
apache:mainfrom
HoustonPutman:migrate-replicas

HoustonPutman commented Jun 27, 2023 •

edited

Loading

Uh oh!

tflobbe left a comment

Uh oh!

tflobbe Jun 27, 2023

Uh oh!

HoustonPutman Jun 29, 2023

Uh oh!

tflobbe Jun 29, 2023 •

edited

Loading

Uh oh!

HoustonPutman Jun 29, 2023

Uh oh!

Uh oh!

HoustonPutman commented Jun 29, 2023 •

edited

Loading

Uh oh!

Uh oh!

HoustonPutman commented Jun 30, 2023

Uh oh!

HoustonPutman commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HoustonPutman commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tflobbe left a comment

Choose a reason for hiding this comment

Uh oh!

tflobbe Jun 27, 2023

Choose a reason for hiding this comment

Uh oh!

HoustonPutman Jun 29, 2023

Choose a reason for hiding this comment

Uh oh!

tflobbe Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HoustonPutman Jun 29, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HoustonPutman commented Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

HoustonPutman commented Jun 30, 2023

Uh oh!

HoustonPutman commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HoustonPutman commented Jun 27, 2023 •

edited

Loading

tflobbe Jun 29, 2023 •

edited

Loading

HoustonPutman commented Jun 29, 2023 •

edited

Loading