-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
environmentd: Have environmentd panic on child crashes during testing #16347
environmentd: Have environmentd panic on child crashes during testing #16347
Conversation
With this PR #16318 is visible directly in the main CI, not even Nightly is required: https://buildkite.com/materialize/nightlies/builds/1422#0184c2d2-25f0-4a4e-983c-9335a512c612 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, except that I think the ReusePort
is hiding a deeper bug! I think it's time for the process orchestrator to learn to use Unix domain sockets instead of TCP connections.
ce5186a
to
222de04
Compare
@benesch I would like to move this forward as soon as possible, it is now or never while the builds are green even with panic-on-crash enabled. |
Yes, apologies, I just can't get behind setting |
222de04
to
91e5fdd
Compare
@philip-stoev I'm so sorry but there appears to be at least one real bug here. The legacy upgrade tests are crashing with:
I'm not sure what's going wrong in cargo-test. It's of course absolutely impossible to see what's going on there since the logs are disabled. Sigh. |
I already did? |
Oh, my local branch must have messed up then . Let me reset it. |
4e2b0f1
to
3ed6700
Compare
Aha! It's |
By default, the process orchestrator will restart any computed and storageds that it manages. During testing, this will unfortunately mask panics that should have caused the test to fail instead. - Add a new environmentd-command line option that will cause environmentd to panic if any of its children exit with an exit code that implies a crash or a panic happened on the child. - Make sure the new behavior is in effect for sqllogictests, cargo unit tests and mzcompose-based tests - do not panic environmentd in mzcompose workflows that explicitly use mz_panic() to panic a computed - Remove the KillReplica Action from Zippy, that was using mz_panic() internally. The original idea of this Action was to panic a single replica, but in practice mz_panic() is evaludated by all replicas and they all panic together. Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
3ed6700
to
f442d9d
Compare
Just pushed up a fix to disable |
By default, the process orchestrator will restart any computed and storageds that it manages. During testing, this will unfortunately mask panics that should have caused the test to fail instead.
Motivation
The fact that the process orchestrator is restarting panicked children masks bugs that should have been exposed during testing.