-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloudtest: Fortify the test_oom_clusterd test #18861
cloudtest: Fortify the test_oom_clusterd test #18861
Conversation
@@ -132,6 +135,16 @@ impl Default for ClusterReplicaSizeMap { | |||
workers: scale.into(), | |||
}, | |||
); | |||
|
|||
inner.insert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new SIZE type , along with all the other types in this file, are defaults that are only used for testing. In the cloud, a completely separate set of SIZEs is installed and used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd feel more confident if we had this behind a if testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this entire thing is the default when executing bin/environmentd unless the entire SIZE list is overriden with a command-line option, which is what happens in the cloud
there is no easy way to distinguish "test" from "non-test" invocations of bin/environmentd at this time
The test was forcing an clusterd to OOM, but this could also cause the entire Buildkite instance that it is running on to become unresponsive. Fix by running the test against a memory-constrained Mz cluster that will OOM without bringing down the entire machine.
e1fe556
to
7fca33a
Compare
@@ -132,6 +135,16 @@ impl Default for ClusterReplicaSizeMap { | |||
workers: scale.into(), | |||
}, | |||
); | |||
|
|||
inner.insert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd feel more confident if we had this behind a if testing
Thanks, Philip! |
The test was forcing an clusterd to OOM, but this could also cause the entire Buildkite instance that it is running on to become unresponsive.
Fix by running the test against a memory-constrained Mz cluster that will OOM without bringing down the entire machine.
Motivation
CI was failing with "Agent lost" error, indicating a runaway process.