Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jepsen] [YSQL] Occasional long recovery during partition-one #1992

Open
frozenspider opened this issue Aug 7, 2019 · 2 comments

Comments

@frozenspider
Copy link
Contributor

commented Aug 7, 2019

(Moved from #950)

partition-one nemesis may knock cluster unconscious for ~100 seconds. Then it recovers (even though partition is still in action)

Comment by @aphyr (#950 (comment)):

Just a heads-up that there may be more work to do here: I've observed occasional ~100-second recovery times for a partition isolating a single node, on version 1.3.1.0. For instance: 20190805T185957.000-0400.zip.

latency-raw (32)

Command-line suggested by @aphyr was

lein run test --os debian --version 1.3.1.0 --nemesis partition-one --nemesis-interval 200 --nemesis-schedule fixed --time-limit 500 -w ysql/append --test-count 10

on commit ed8ecfb9146816cb99e458c9cb92f59e19a7b78f

@frozenspider

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2019

Note that this is hard to reproduce:

Kyle Kingsbury [19:06]
It's pretty infrequent. Less than 1 in 10 runs seems to show it, but when it does happen, I also saw that issue with transaction expired errors actually being committed.

Most of the time recovery is quick:
latency-raw

@frozenspider frozenspider added this to To do in Jepsen Testing via automation Aug 7, 2019

@frozenspider frozenspider added this to To do in SQL Support via automation Aug 7, 2019

@frozenspider frozenspider changed the title [Jepsen] [YSQL] Occasional recovery during partition-one [Jepsen] [YSQL] Occasional long recovery during partition-one Aug 7, 2019

@kmuthukk kmuthukk added kind/enhancement and removed kind/bug labels Aug 7, 2019

@frozenspider

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2019

Was able to reproduce that (once in 20 attempts):
logs-jepsen_2019-08-08_00-21-01_append_n-part1_t500.zip
latency-raw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.