Conversation
1f3b34d to
d1e0988
Compare
Previously, `find_next_node/0` didn't handle odd cases such as current node not being in the `mem3:nodes()` or having an empty `mem3:nodes()` and just crashed with a badmatch. Make sure to handle those cases and return `node()` for them. We already handled `node()` target as a no-op in `push/2`. We just have to also add it to `maybe_resubmit/2`. Also, to avoid confusion between "live" nodes and "all" nodes, in the helper function opt to just use `Mem3Nodes` variable name to make it clear what we're dealing with. Fix #5191
d1e0988 to
dd4870f
Compare
|
an explicit |
|
It is a bit cute to use a replicate to self as a fallback. I was mainly relying on the existing "replicate to self" behavior being a no-op in couchdb/src/mem3/src/mem3_sync.erl Lines 79 to 82 in 637fb79 nonode in various places.
I built a small test module to play with with the previous logic wondering the same thing, why we didn't see this before? -module(nn).
-export([
find_next_node/3
]).
find_next_node(Self, LiveNodes, Mem3Nodes) ->
AllNodes0 = lists:sort(Mem3Nodes),
AllNodes1 = [X || X <- AllNodes0, lists:member(X, LiveNodes)],
AllNodes = AllNodes1 ++ [hd(AllNodes1)],
[_Self, Next | _] = lists:dropwhile(fun(N) -> N =/= Self end, AllNodes),
Next.> c(nn).
> nn:find_next_node(n, [a,n], [a]).
** exception error: no match of right hand side value []
in function nn:find_next_node/3 (nn.erl, line 11)So one case where we'd trigger this is if the node we're on removes itself from the nodes list. Then the user reported the logs being filled and the machine was "frozen". It must have happened between the initial sync started with the node in the mem3:nodes() list then it was removed and an error happened, where initial_sync crashed. On a crash we end up restarting it couchdb/src/mem3/src/mem3_sync_nodes.erl Lines 75 to 78 in d38f14f |
Previously,
find_next_node/0didn't handle odd cases such as current node not being in themem3:nodes()or having an emptymem3:nodes()and just crashed with a badmatch.Make sure to handle those cases and return
node()for them. We already handlednode()target as a no-op inpush/2. We just have to also add it tomaybe_resubmit/2.Also, to avoid confusion between "live" nodes and "all" nodes, in the helper function opt to just use
Mem3Nodesvariable name to make it clear what we're dealing with.Fix #5191