-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow clean uninitialization of silos which failed to start #2119
Comments
Another question: |
We generally don't want to terminate a silo which is going through the graceful shutdown process. Why would we?
I think the difference is that the |
@sergeybykov do you know of a way to uninitialize everything after an initialization failure so that I can try again rather than having to tear down the whole process? Do you see a problem with tweaking the |
Conceptually, I don't see a problem wit trying that. I suspect the complications will be around the races from trying to stop system components that haven't initialized yet and new kinds of non-deterministic failures (NREs?) as a result. I wonder if @gabikliot can think of other reasons. |
Yes, what Sergey wrote is correct. If we have some state where we get stuck forever (due to some error or what not) - that is obviously a bug, which needs to be fixed. Except for such potential bugs (which I did not see any evidence of existing so far), I see conceptually no problem with this atomic state transition handling. If you can explain the problem again, I can try to help find a solution. |
As further clarification:
I think any call to silo.Terminate and silo.Stop should unitialize a silo, obviously once the call returns. Is not it what happens now? Multi-threaded simultaneous calls to those methods are queued/ignored. I actually don't remember that we intended ever for silo to be restartable - start, stop, start. I think I assumed silo is one time. The state transition diagram I think, as far as I recall it, did not have a back arrow or loops. There was simply never a need to do that. Should be easy to fix if needed I think. |
Thanks, Gabi. It's not that I want to restart a silo, I want to retry The only viable policy right now is to crash the process. On 7 Sep 2016 4:49 PM, "Gabriel Kliot" notifications@github.com wrote:
|
Why not create and start a new one instead? What's the benefit of trying to resuscitate a failed one? |
Can you recreate this in test, or at least show the exception? I am sure it will be easy to fix, just undo something static. |
@sergeybykov that does not work, if you try to recreate your
@gabikliot I'll submit a PR with a test & a fix. It turns out that the fix is simple: I just needed to add a call to |
Yep, that's exactly what I thought. It should be trivial to fix. |
Fix #2119 by allowing full uninitialization in SiloHost
Often I see errors amounting to trying to re-initialize some singletons in Orleans which have already been initialized. Examples include
GrainTypeManager
andUnobservedExceptionHandler
.Usually when a user reports an error like this, the error is actually something else which caused their silo to fail initialization, but their code retries a few times and hides the original exception further up the log.
Currently it's not clear how to uninitialize a silo so that we can have a clean attempt at starting it.
We should fix this.
While looking into the issue, I see this code in
Silo.Terminate()
and hope someone can explain it to me. Why can we not terminate a silo which is not in the "Running" state?If we could just remove that exception, then calls to
Silo.Stop()
/Silo.Shutdown()
could successfully clean up the static members.Of course, another way to fix this problem is to use DI pervasively and kill all stateful statics.
The text was updated successfully, but these errors were encountered: