Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: Tests around cluster isolation #12335

Conversation

philip-stoev
Copy link
Contributor

@philip-stoev philip-stoev commented May 9, 2022

Test that if one cluster is disrupted in any way, the other
cluster can continue processing queries as usual.

Motivation

  • This PR adds a known-desirable feature.

#12220

Tips for reviewer

@benesch I made some small changes to c.testdrive() so that I can pass args to the testdrive container in order to control the random seed.

@philip-stoev philip-stoev marked this pull request as ready for review May 9, 2022 15:38
@philip-stoev philip-stoev requested a review from benesch May 9, 2022 16:17
Copy link
Member

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mzcompose bits all LGTM, but @frankmcsherry or someone else from the compute team might have thoughts on whether mz_sleep or mz_panic are actually going to exercise the right things, because I don't think there's any guarantee that they're executed on the active cluster (vs during optimization in the coordinator).

@frankmcsherry
Copy link
Contributor

I'm afraid I don't have thoughts other than that I don't know that mz_panic() can be certain to do anything. Committing it to a timestamp seems to create serializability issues if we hope to do any work at later timestamps. Generally "side-effecting" methods are hard to pin down, especially if you want to be certain they do a thing (optimizing out execution of mz_panic would be both a sane thing to do at some point, and also might allow tests to pass that you would want to fail).

@philip-stoev
Copy link
Contributor Author

I am removing the use of mz_panic() to avoid further controversy. Using mz_sleep however will exercise a legitimate scenario where the TCP socket is open between materialized and computed but computed is otherwise indisposed to make progress on any other dataflow.

As written, the mz_sleep will surely be run on computed as it has an non-constant argument.

@philip-stoev philip-stoev force-pushed the compute-instance-isolation-tests branch from 349ae7d to 10553f2 Compare May 10, 2022 12:24
Test that if one cluster is disrupted in any way, the other
cluster can continue processing queries as usual.
@philip-stoev philip-stoev force-pushed the compute-instance-isolation-tests branch from 10553f2 to 61b0d08 Compare May 10, 2022 12:37
@philip-stoev philip-stoev merged commit ca0d505 into MaterializeInc:main May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants