-
Notifications
You must be signed in to change notification settings - Fork 109
Open
Labels
Description
Swarm: 3.2.1
Elixir: 1.6.1
Erlang: 20.2.2
Suppose we have two nodes A and B.
We have a worker W which was spawned on node A. Now node A crashes and B takes over W as expected.
When A is starting again Swarm is trying to do the handover of W. But in our case this happens before the Supervisor Test.Supervisor for W is started yet. Swarm is retrying the handover and succeeds after one or two retries but nevertheless there is an exception thrown like the one shown:
[error] [swarm on swarm_test1@127.0.0.1] [tracker:handle_handoff] ** (exit) exited in: GenServer.call(Test.Supervisor, {:start_child, [:state]}, :infinity)
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
(elixir) lib/gen_server.ex:821: GenServer.call/3
(swarm) lib/swarm/tracker/tracker.ex:646: Swarm.Tracker.handle_handoff/3
(stdlib) gen_statem.erl:1240: :gen_statem.call_state_function/5
(stdlib) gen_statem.erl:1012: :gen_statem.loop_event/6
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3Maybe there is a way to tell when node A is fully up again (with the whole Supervision tree loaded) and Swarm can savely begin the handoff.