Skip to content

Handover begins before Supervision tree is fully loaded #69

@mbaeuerle

Description

@mbaeuerle

Swarm: 3.2.1
Elixir: 1.6.1
Erlang: 20.2.2

Suppose we have two nodes A and B.
We have a worker W which was spawned on node A. Now node A crashes and B takes over W as expected.
When A is starting again Swarm is trying to do the handover of W. But in our case this happens before the Supervisor Test.Supervisor for W is started yet. Swarm is retrying the handover and succeeds after one or two retries but nevertheless there is an exception thrown like the one shown:

[error] [swarm on swarm_test1@127.0.0.1] [tracker:handle_handoff] ** (exit) exited in: GenServer.call(Test.Supervisor, {:start_child, [:state]}, :infinity)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir) lib/gen_server.ex:821: GenServer.call/3
    (swarm) lib/swarm/tracker/tracker.ex:646: Swarm.Tracker.handle_handoff/3
    (stdlib) gen_statem.erl:1240: :gen_statem.call_state_function/5
    (stdlib) gen_statem.erl:1012: :gen_statem.loop_event/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

Maybe there is a way to tell when node A is fully up again (with the whole Supervision tree loaded) and Swarm can savely begin the handoff.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions