-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Swarm.Supervisor - a dynamic supervisor for distributed processes #6
Comments
Does this address the problem where currently in the example provided, if the process dies it is then restarted by the supervisor yet it doesn't register with Swarm? |
@dax0 yes, though in the current example it shouldn't be restarted since the restart type is |
So currently what is the best way to create a process that is managed by Swarm that should be restarted? |
The following seems to fail to restart when Worker crashes. As soon as I take out
|
With simple_one_for_one, if the process exits with :normal, :shutdown, or |
I'm sending an abnormal exit message with Process.exit On Mon, Oct 31, 2016, 10:40 PM Paul Schoenfelder notifications@github.com
|
I have the same problem when using |
So, we're just starting a project and were looking to use swarm with this exact use case in mind. Is this currently still an issue? |
@bschaeffer Yes, if you have a need for supervising the processes being dynamically registered, you have to link/monitor those children and handle restarting them yourself currently - I would like to add a supervisor module to do this out of the box, but as I haven't had the need for it myself yet, and unfortunately lack the time to develop it at the moment, it may be awhile before I can get to that myself. That said, handling the restart yourself as mentioned above is fairly easy to do, so it's not a huge obstacle, just something to be aware of. |
I'm not sure what you're referring to? Currently |
@kwrooijen I was referring to the first part of my comment, which is supervising children directly by linking/monitoring them and reacting to down/exit signals appropriately - this is what Supervisor does under the covers (it also does things like handling too many restarts within a given interval, etc., but it's core functionality is simply linking and monitoring the process and reacting to those signals). |
I made a makeshift dynamic supervisor that will re-register the name when the child process terminates with reasons other than Here's the code in a gist: https://gist.github.com/narrowtux/3b21967599e1f42787b24667633f68f0 It's not a full Supervisor in the sense that it does not give you a behaviour, it doesn't accept These are all things that I imagine @bitwalker wants to be fixed before merging it into this library, but it's a start for all those who need this feature right now. |
Could I mean, at minimum this appears to be working as a drop in replacement for the example code in the README:
output:
|
So I was able to accomplish something similar to what @narrowtux did above, but with an actual |
I'm interested! |
beardedeagle, I believe that the flaw with that setup is that if the process crashes and is restarted by the supervisor it will no longer be tracked by swarm. After a restart Swarm.whereis_name/1 returns I took the route of creating a
The thing that I'm not pleased with is that a supervisor could end up supervising a process on a remote node since It seems like the best of both worlds might be something like |
hey guys, I'm working on a supervisor which will be designed exactly to do that what @jerel mentions. It is based on DynamicSupervisor - it's still work in progress (I have to fix API for naming processes) but generally it seems to work, but for now you have to pass name of the process in first parameter of args passed to worker. My repo: https://github.com/kelostrada/swarm_dynamic_supervisor |
Just an update here: To properly handle supervision, support needs to exist within Swarm itself, at least for things which use To that end, I've been building out support for using |
@bitwalker I'm not sure why it should touch all the internals, however I created a simple DynamicSupervisor which reregisters names in Swarm on start and after restart (by calling Swarm.register_name which later calls GenServers My knowledge on the subject is probably limited though so I might be doing something wrong. However for now it seems to be working as expected (process groups don't work this way though). |
@kelostrada How are you handling the case where the cluster topology changes? Swarm assigns processes started via However, if Swarm understands external supervison and delegates it to a true supervisor, then restarts/shutdowns are honored, they don't require going through the tracker, so they behave like any other supervised process; Swarm just needs to consult the supervisor to get the new pid of the restarted process (or in the case where the supervisor dies or the child is removed due to restart strategy, we don't bother re-registering it). We can also coordinate handoff in such a way that the supervisor doesn't think it needs to restart the old process by using I guess all I'm trying to say is that yes, you can get something that works pretty well without explicit support from Swarm, but there are hidden edge cases that may bite you if you aren't aware of them, and it's my opinion that Swarm should support external supervision. I've got most of this done locally, but there are still a few bits left, plus testing to perform. If there are specific supervision requirements you think need to be addressed, now is definitely the time to bring those up. |
Thanks for your indepth analysis. Yes you are right I didn't handle this case. However with I never said this is not needed to be supported internally by Swarm out of the box. I'm very happy to hear you are working on the solution already. Do you have your work in progress hosted somewhere on github perhaps? I would be happy to help test it. |
I think there are a few more issues to address. I wasn't sure where to post them, I guess this post is a good start. In my case, there is a dynamic supervisor ( The first issue is that even tho the top-level process supervisor is started and distributed properly, starting the child processes becomes quite hard with Swarm: The second problem is probably most important: it is impossible to have anything distributed this way that would accept state handoff, because Swarm has no idea on how to move the small processes (started by the supervisor itself started by swarm) to another node and therefore cannot possibly handle the transitioning. I asked around on the Elixir Slack, and pretty much no one around attempted to answer - except for one person who recommended to stay away from Swarm if we also wanted a more complex supervision tree. I hope the feedback will help to shape that supervisor @bitwalker mentioned, or maybe point to where some documentation is lacking (I could very easily admit that some of these problems would be related to my usage of Swarm in a wrong way). |
@hickscorp Maybe an idea is to let the state handoff/restoration logic live in the |
@Qqwy Yeah but that's the thing - a BallotBox is a supervisor (Dynamically started), so it cannot handle the handoff message (Where would I implement it? :D ) |
Has there been any update on this? |
What about this issue? |
I believe this is still planned, but time has been focused other places. There are ways you can handle this yourself in the interim though. |
Are there any examples of how this can be handled in the interim? From reading this thread, it seems that @narrowtux's pseudo Supervisor might be closest to working with restarts and topology changes. But @beardedeagle if you have any examples of what you had in mind, that would be great. |
So not sure what @beardedeagle has in mind either ref. the ways you can handle this yourself, @cadamis , but I figured that instead of a worker, i.e. GenServer, I start a Supervisor (with one_for_one) strategy, which starts the worker and also acts as a proxy. The trick I am using here is that my Supervisor is registered in Swarm, and whenever I send a message, say, GenServer.call, it's being sent to the supervisor. Supervisor, however, immediately forwards the message to it's started worker, and returns Unfortunately this also means that things like hand-off when topology changes have to be done on the Supervisor level. Basically it wraps around the GenServer worker and provides all of the infrastructure needed for Swarm + normal supervision functionality, i.e. restarting , and also is a proxy whenever it needs to be |
I ended up working around this by using a standard DynamicSupervisor/Genserver worker setup, and then doing a check in
Assuming |
@bitwalker Is this in-progress work available on a branch somewhere? I'm also very keen to get this functionality going (having just been bitten by its absence this week) and I'm happy to work on tests and anything else that still needs doing. |
Implement a new supervisor module which can ensure registered processes are properly restarted according to normal supervisor semantics, but are also restarted/shifted based on cluster topology changes.
This supervisor will be for dynamic children only.
The text was updated successfully, but these errors were encountered: