Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prints "Could not find any config for exited child..." upon upgrade #46

Closed
gdb opened this issue Apr 12, 2015 · 4 comments
Closed

Prints "Could not find any config for exited child..." upon upgrade #46

gdb opened this issue Apr 12, 2015 · 4 comments

Comments

@gdb
Copy link
Contributor

gdb commented Apr 12, 2015

I now see this pretty consistently:

[MASTER 34636] ERROR: Could not find any config for exited child 34672! This probably indicates a bug in Einhorn.

Looks like it was introduced in: 673f5a8. Sounds like we had a race before, but I think this means we have a race still.

@ebroder, what would you think about de-racing by making the state_passer live past the SIGCHLD handler install? I'd guess that would mean either (a) installing the SIGCHLD handle before loading state — though I haven't checked if there's a chicken and an egg there — or (b) having some coordination mechanism between the state_passer and the master. Thoughts?

@gdb
Copy link
Contributor Author

gdb commented Apr 12, 2015

(As a note, looks like this can also be triggered when can_safely_reload? returns false.)

@ebroder
Copy link
Contributor

ebroder commented Apr 13, 2015

Hmm. So my recollection is that I wrote the code in #39 because our einhorn worker monitoring would fire because einhorn would keep the state passer in its state variables indefinitely, thus looking like a very old worker process.

But in any case, it feels like the code in #40 should safely handle all possible cases of this race - it's basically a more generalized and better version of the same issues as #39. Maybe we can revert 673f5a8 now that #40 is in place? We should make sure we think through the state upgrade path before we do, though.

I'm not totally confident all of that is right, though. But intuitively, it seems like a better fix than making the communication with the state passer more complicated.

@gdb
Copy link
Contributor Author

gdb commented Apr 13, 2015

Ah yeah, I like that much better. I'll poke at it.

  • gdb

On Sun, Apr 12, 2015 at 11:41 PM, Evan Broder notifications@github.com
wrote:

Hmm. So my recollection is that I wrote the code in #39
#39 because our einhorn worker
monitoring would fire because einhorn would keep the state passer in its
state variables indefinitely, thus looking like a very old worker process.

But in any case, it feels like the code in #40
#40 should safely handle all
possible cases of this race - it's basically a more generalized and better
version of the same issues as #39
#39. Maybe we can revert 673f5a8
673f5a8
now that #40 #40 is in place? We
should make sure we think through the state upgrade path before we do,
though.

I'm not totally confident all of that is right, though. But intuitively,
it seems like a better fix than making the communication with the state
passer more complicated.


Reply to this email directly or view it on GitHub
#46 (comment).

@adambrakhane
Copy link

Did you guys ever figure out how to resolve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants