Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acceptance: introduce partition nemesis in acceptance tests #4712

Merged
merged 2 commits into from
Apr 18, 2016

Conversation

tbg
Copy link
Member

@tbg tbg commented Feb 29, 2016

This nemesis allows node partitions to be set up during tests. Partitions can
be bidirectional, but also one-way (i.e. one node can initiate connections to
another, but not the other way around). The latter likely isn't too
interesting with the change to streaming RPCs, but it should be possible to
introduce interesting phenomena nevertheless.

Wrote a bank test with the nemesis. I found the existing bank test too
complicated for its own sake, so I re-wrote it. The plan is to make it
the more advanced bank example with constraints which would fail under
SNAPSHOT isolation (i.e. transferring money from A and B to C and D under
the constraint that A+B >= 0).

Review on Reviewable

@tbg tbg force-pushed the dookar branch 2 times, most recently from 5c6f476 to 4abbb6e Compare February 29, 2016 15:18
@mrtracy
Copy link
Contributor

mrtracy commented Feb 29, 2016

Review status: 0 of 11 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


acceptance/partition.go, line 59 [r1] (raw file):
It appears that this isn't just a one-time random sever; it appears to randomly sever the network into two groups and then restore it, several times as long as the nemesis is running. If that's the case, the comment should be extended.


acceptance/partition.go, line 99 [r1] (raw file):
Does this need to be in a goroutine? The only caller immediately blocks on the returned channel.


acceptance/iptables/iptables.go, line 18 [r1] (raw file):
Should blacklist be a type? Also, maybe this method should be "BidirectionalBlacklist", because it seems that iptables command can do more than blacklist IPs.


Comments from the review on Reviewable.io

@tbg
Copy link
Member Author

tbg commented Mar 4, 2016

Review status: 0 of 11 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


acceptance/partition.go, line 59 [r1] (raw file):
Done


acceptance/partition.go, line 99 [r1] (raw file):
No, doesn't seem necessary, at least not currently. Removed.


acceptance/iptables/iptables.go, line 18 [r1] (raw file):
Yeah, it could be a type. I'd like to leave as is for now since the blacklist is mostly relevant for the lowest level API and likely won't be called directly from tests. Same for renaming, we can do that when we actually use iptables for other stuff, which may never happen.


Comments from the review on Reviewable.io

tbg added a commit to tbg/cockroach that referenced this pull request Mar 5, 2016
tbg added a commit to tbg/cockroach that referenced this pull request Mar 6, 2016
@tbg tbg force-pushed the dookar branch 3 times, most recently from 7b0434d to 62ce786 Compare March 7, 2016 15:41
@tbg
Copy link
Member Author

tbg commented Mar 7, 2016

Looks like this just broke with Docker 1.10 (on CircleCI) since privileged mode is allowed no more. I opened a support request with CircleCI to see whether anything can be done about that.

@tbg tbg changed the title [WIP] Introduce partition nemesis in acceptance tests Introduce partition nemesis in acceptance tests Apr 15, 2016
@tbg tbg changed the title Introduce partition nemesis in acceptance tests acceptance: introduce partition nemesis in acceptance tests Apr 15, 2016
@tbg
Copy link
Member Author

tbg commented Apr 15, 2016

Resuscitated! PTAL. Doesn't run on CircleCI but fit for nightlies and local testing. I'm planning to make this work against GCE/AWS (or at least try to), which shouldn't be so bad.

@tbg tbg force-pushed the dookar branch 2 times, most recently from c324934 to b0c3f91 Compare April 15, 2016 16:09
@tamird
Copy link
Contributor

tamird commented Apr 15, 2016

Reviewed 4 of 11 files at r1, 1 of 5 files at r2, 6 of 10 files at r5, 2 of 2 files at r7.
Review status: 8 of 10 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


acceptance/chaos_test.go, line 307 [r7] (raw file):
?


acceptance/cluster/cluster.go, line 39 [r7] (raw file):
shouldn't this return a net.Addr? also looks like it's only used to get the hostname, so maybe just return that.


acceptance/cluster/localcluster.go, line 36 [r7] (raw file):
could we avoid depending on docker/docker?


acceptance/cluster/localcluster.go, line 384 [r7] (raw file):
should l.networkID be of type container.NetworkMode?


acceptance/cluster/localcluster.go, line 737 [r7] (raw file):
I think you should use a util.UnresolvedAddr here to avoid needing defaultPort.


acceptance/cluster/localcluster.go, line 811 [r7] (raw file):
nit: do this right after the first error check


acceptance/cluster/localcluster.go, line 826 [r7] (raw file):
nit: dubious use of strings.Join - default stringer is probably ok (or use %v which will just wrap with [])


acceptance/iptables/iptables.go, line 15 [r7] (raw file):
reflow


Comments from Reviewable

@bdarnell
Copy link
Contributor

Reviewed 8 of 10 files at r5, 2 of 2 files at r7.
Review status: all files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.


acceptance/partition_test.go, line 1 [r7] (raw file):
Some of the new files in this PR are missing license headers.


acceptance/iptables/iptables.go, line 15 [r1] (raw file):
Wrap this comment at 80 chars.


acceptance/iptables/iptables.go, line 34 [r7] (raw file):
Mention that arguments with embedded whitespace are not supported (because of the naive String() function)


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Apr 15, 2016

Review status: 0 of 10 files reviewed at latest revision, 14 unresolved discussions.


acceptance/chaos_test.go, line 307 [r7] (raw file):
Must've snuck in on the rebase. Removed.


acceptance/partition_test.go, line 1 [r7] (raw file):
Added. Didn't we have some license-checking script Pete wrote? May that's worthy for promotion to make check (or we just use a 'grep').


acceptance/cluster/cluster.go, line 39 [r7] (raw file):
It's used to get the IP address for iptables. Changed to net.Addr.


acceptance/cluster/localcluster.go, line 36 [r7] (raw file):
I don't see how - engine-api/client doesn't have anything as far as I can tell: https://godoc.org/github.com/docker/engine-api/client :(


acceptance/cluster/localcluster.go, line 384 [r7] (raw file):
Done.


acceptance/cluster/localcluster.go, line 737 [r7] (raw file):
How does that help? I need the port.


acceptance/cluster/localcluster.go, line 811 [r7] (raw file):
Done.


acceptance/cluster/localcluster.go, line 826 [r7] (raw file):
Both %s and %v do that, but that seems ok here. Changed.


acceptance/iptables/iptables.go, line 15 [r1] (raw file):
Done (I think this refers to Bidirectional below, correct?)


acceptance/iptables/iptables.go, line 15 [r7] (raw file):
Done.


acceptance/iptables/iptables.go, line 34 [r7] (raw file):
Done:

// Cmd is a naive command without proper support for whitespace.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 0 of 10 files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.


acceptance/partition_test.go, line 1 [r7] (raw file):
I think you're talking about build/deplicense.sh, which attempts to determine the license for a dependency repo; it doesn't check that every file in the repo has the appropriate header. It would be easy to grep for this in make check.


acceptance/iptables/iptables.go, line 15 [r1] (raw file):
Yeah


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Apr 15, 2016

Review status: 0 of 10 files reviewed at latest revision, 13 unresolved discussions.


acceptance/partition_test.go, line 1 [r7] (raw file):
Ok, have a reminder in my queue.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Apr 16, 2016

Reviewed 7 of 10 files at r8, 1 of 1 files at r9.
Review status: 8 of 10 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


acceptance/cluster/localcluster.go, line 36 [r7] (raw file):
How big is that function? Could we copy pasta it and file an issue upstream?


acceptance/cluster/localcluster.go, line 737 [r7] (raw file):
Do you? the only caller parses out the host name and throws away the port. What am I missing?


acceptance/cluster/localcluster.go, line 285 [r9] (raw file):
man, this is weird - is it just that the API is inconsistent in using string instead of NetworkMode here and below, or are we conflating things by making networkID's type NetworkMode?


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Apr 16, 2016

Review status: 5 of 12 files reviewed at latest revision, 8 unresolved discussions.


acceptance/cluster/localcluster.go, line 36 [r7] (raw file):
It's like 100loc. I cribbed it and filed an issue (pinged this PR).


acceptance/cluster/localcluster.go, line 737 [r7] (raw file):
Oh, sorry, you're right. Changed to return net.IP (I do want the IP and not the host name).


acceptance/cluster/localcluster.go, line 285 [r9] (raw file):
It's... weird. I think string works better overall. Changed back.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Apr 16, 2016

Reviewed 7 of 11 files at r1, 4 of 5 files at r2, 5 of 5 files at r3, 11 of 11 files at r4, 4 of 10 files at r5, 2 of 10 files at r8, 5 of 6 files at r10.
Review status: 10 of 12 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


acceptance/chaos_test.go, line 369 [r10] (raw file):
How come this is in this change?

aren't 3 nodes enough? also include the number in the skip message


acceptance/cluster/localcluster.go, line 66 [r10] (raw file):
no longer used?


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Apr 16, 2016

Review status: 8 of 12 files reviewed at latest revision, 5 unresolved discussions.


acceptance/chaos_test.go, line 369 [r10] (raw file):
Drive-by. Fixed.


acceptance/cluster/localcluster.go, line 66 [r10] (raw file):
Done.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Apr 16, 2016

Reviewed 2 of 2 files at r11.
Review status: 10 of 12 files reviewed at latest revision, 3 unresolved discussions.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Apr 16, 2016

Reviewed 1 of 13 files at r12, 10 of 12 files at r13.
Review status: 11 of 13 files reviewed at latest revision, 5 unresolved discussions.


Makefile, line 189 [r13] (raw file):
did you mean to change time\.Now to time.Now? also why the inconsistent application of -l?


Makefile, line 191 [r13] (raw file):
why does it need to be $$?


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Apr 16, 2016

Review status: 10 of 13 files reviewed at latest revision, 5 unresolved discussions.


Makefile, line 189 [r13] (raw file):
Yep. I tried using -l wherever possible (some of the other ones grep out parts from the matched line). Did I miss any?


Makefile, line 191 [r13] (raw file):
Makefile :(


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Apr 16, 2016

Reviewed 1 of 1 files at r14.
Review status: 11 of 13 files reviewed at latest revision, 4 unresolved discussions.


Makefile, line 189 [r13] (raw file):
Nah, I think you got 'em all.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Apr 17, 2016

:lgtm:


Reviewed 2 of 12 files at r13.
Review status: all files reviewed at latest revision, 7 unresolved discussions.


acceptance/partition.go, line 58 [r14] (raw file):
I get that you're trying to assert the type, but I think there's no need to make the function variable; just add a static assertion instead:

var _ NemesisFn = BidirectionalPartitionNemesis

acceptance/partition.go, line 97 [r14] (raw file):
what's the point of this goroutine? looks like the function blocks until it exists?


acceptance/partition_test.go, line 32 [r14] (raw file):
what does this mean?


acceptance/partition_test.go, line 52 [r14] (raw file):
can you add a TODO to share this code with other bank test(s)?


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Apr 18, 2016

Review status: 0 of 13 files reviewed at latest revision, 7 unresolved discussions.


acceptance/partition.go, line 58 [r14] (raw file):
Done.


acceptance/partition.go, line 97 [r14] (raw file):
Goroutine removed.


acceptance/partition_test.go, line 32 [r14] (raw file):
When you're changing around the iptables stuff, this is the test you're going to want to use for manually poking around in the containers.


acceptance/partition_test.go, line 52 [r14] (raw file):
Done.


Comments from Reviewable

@tbg tbg force-pushed the dookar branch 2 times, most recently from c55c73b to 6331e75 Compare April 18, 2016 09:34
@tamird
Copy link
Contributor

tamird commented Apr 18, 2016

Reviewed 13 of 13 files at r16.
Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


Comments from Reviewable

tbg added 2 commits April 18, 2016 16:00
It was not correctly excluding files for os.Getenv, for example
it would complain about `os.Getenv` in `acceptance/util_test.go`
which should be excluded.
This nemesis allows node partitions to be set up during tests. Partitions can
be bidirectional, but also one-way (i.e. one node can initiate connections to
another, but not the other way around). The latter likely isn't too
interesting with the change to streaming RPCs, but it should be possible to
introduce interesting phenomena nevertheless.

Wrote a bank test with the nemesis. I found the existing bank test too
complicated for its own sake, so I re-wrote it. The plan is to make it
the more advanced bank example with constraints which would fail under
SNAPSHOT isolation (i.e. transferring money from A and B to C and D under
the constraint that A+B >= 0).
@tbg tbg merged commit 4fd10fb into cockroachdb:master Apr 18, 2016
@tbg tbg deleted the dookar branch April 18, 2016 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants