Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horde.Supervisors supports transient and temporary processes #36

Closed
derekkraan opened this issue Nov 14, 2018 · 10 comments
Closed

Horde.Supervisors supports transient and temporary processes #36

derekkraan opened this issue Nov 14, 2018 · 10 comments
Labels
enhancement New feature or request

Comments

@derekkraan
Copy link
Owner

Currently Horde.Supervisor only supports permanent processes. For feature-parity with DynamicSupervisor, supporting transient and temporary processes is needed.

@derekkraan derekkraan added the enhancement New feature or request label Nov 14, 2018
@derekkraan
Copy link
Owner Author

Hi @QuantamHD,

This is correct. We use DynamicSupervisor for doing the actual supervision, and the challenge with temporary and transient processes is reflecting this information in the δ-CRDT. In the case of permanent processes, the state is leading, but in the case of temporary or transient processes, this is reversed.

For an overview of the behaviour of temporary and transient, see https://hexdocs.pm/elixir/Supervisor.Spec.html#module-restart-values-restart

I have shied away from doing "real" supervision in Horde.Supervisor, preferring to delegate this to DynamicSupervisor. This won't be possible anymore if we support transient and temporary children.

If you have any other questions, don't hesitate to ask!

@QuantamHD
Copy link
Contributor

QuantamHD commented Jan 15, 2019

In terms of "leading" state do you mean that the state can be set before process creation and not modified afterward for permanent strategies? While for the temporary and transient cases require CRDT modification at process creation time as well as process exits?

Could we achieve this feature with monitors on the processes managed by the DynamicSupervisor or do you think we would need a fully implemented supervisor behavior?

@derekkraan
Copy link
Owner Author

What I mean is, with permanent restart, a process is only removed from the supervisor by us, by calling DynamicSupervisor.terminate_child. With transient and temporary restart, the process is removed by the supervisor itself when the process exits (depending on the exit reason and the strategy).

We could achieve this with monitors on the processes that are managed by the DynamicSupervisor, but the complexity of our "supervision by proxy" approach is increasing every time we extend the approach, and I think we should consider (and I have been considering) whether Horde could benefit from fully implementing its own supervisor instead of relying on DynamicSupervisor.

Having our own supervisor implementation in Horde would also simplify some other parts of the code, like the graceful shutdown bits, which is already a bit of a hack.

Do you have any thoughts on this?

@QuantamHD
Copy link
Contributor

I would say that makes sense to me. We could also base the implementation off of https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/dynamic_supervisor.ex given that Horde currently tries to maintain API parity with it.

@QuantamHD
Copy link
Contributor

One question I do have is can a temporary process actually be restarted in Horde given the strict definition provided in the docs? It sounds like a node down should kill it.

@derekkraan
Copy link
Owner Author

derekkraan commented Jan 15, 2019

We could also base the implementation off Elixir's DynamicSupervisor

This sounds like a very good option. If our changes are limited, then we will also be able to port any changes / bugfixes etc from the stdlib DynamicSupervisor back to Horde's forked version and keep them mostly in sync.

One question I do have is can a temporary process actually be restarted in Horde given the strict definition provided in the docs? It sounds like a node down should kill it.

I hadn't thought about this. We are extending Supervisor into places it has not been before, so some interpretation is required. In this case I think never restarting is the correct behaviour as you say. For processes that should always be restarted until they have a normal exit code, we have :transient.

@QuantamHD
Copy link
Contributor

Keeping the bug fixes of stdlib DynamicSupervisor sounds great! It looks like we would need to insert CRDT modification code here. Could you explain to me the current "supervision by proxy" strategy employed by Horde? I see that you select nodes bases a consistent hash scheme, and it looks like you forward child specs to that node based on that. If we employed a custom Supervisor how might that strategy change?

@derekkraan
Copy link
Owner Author

The current strategy is to let DynamicSupervisor do all our supervision for us. So we proxy all calls either down to the DynamicSupervisor or to the DeltaCRDT (depending on what information is desired).

There is already a use case that I have had to work around (when the CRDT is not "leading"), and that's during a shutdown. During a shutdown, what we want is to update the CRDT state for each process as they shut down, and not at the end, after all processes have shut down. This is because there could be a big difference in shutdown time per process, and we don't want to have a scenario where a process shuts down and then has to wait for all other processes on the node to shut down before it is started on another node. So this is already a chink in the armour of our "supervision by proxy" approach.

And the way this works in practice is that there is a "graceful shutdown" process that recognizes when a shutdown has been initiated and proxies terminate messages from the ProcessCanary processes when there is a shutdown ongoing.

So this "graceful shutdown" behaviour is a candidate for improvement if we fork DynamicSupervisor.

I don't think the hash scheme ("distribution strategy") details will change if we fork DynamicSupervisor. This is purely for choosing which node a process should start on, and doesn't have anything to do with supervision of the process once it has actually started.

@QuantamHD
Copy link
Contributor

QuantamHD commented Jan 16, 2019

I have a little WIP branch up for you to look at #62 I added and modified the dynamic supervisor so that child_ids aren't thrown away and on process deletion, the supervisor will also remove the child_id from the CRDT. I have one failing test related to deduping I was hoping you could give me a hand with

@derekkraan
Copy link
Owner Author

Implemented in #70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants