Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jepsen] Acknowledged inserts can be present in reads for tens of seconds, then disappear #824

Closed
aphyr opened this Issue Jan 31, 2019 · 4 comments

Comments

5 participants
@aphyr
Copy link

commented Jan 31, 2019

In Jepsen set tests, on a five-node debian cluster running 1.1.10.0, we can observe the apparent loss of acknowledged writes after randomized 3/2 network partitions. Writes succeed, are visible to every subsequent read operation, then, after some critical juncture, are missing from every subsequent read, despite that read observing all other records written before and after the missing element. The insertion of these elements, and the time they appear to go missing, appear correlated with the end of a network partition and the later recovery of the cluster to a healthy state.

latency-raw 83

For instance, in this test, element 427 is written successfully from 81.8 to 92.21 seconds, during which the nemesis heals a network partition. It is present in every subsequent read from 98.4 seconds to 141.1 seconds, when the cluster happens to be recovering from a network partition which ended about 15 seconds prior; it is missing from 142 seconds through the end of the test, at 185 seconds.

latency-raw 84

In this test, element 1063 is written successfully from 144.18 to 144.22 seconds, right when the cluster is recovering from a partition which ended ~15 seconds prior. Element 1063 is present immediately in subsequent reads at 144.24 seconds, and persists until 269 seconds, a read which completes immediately before the nemesis isolates n1 and n4 from n2, n3, and n5. The next read, at 303 seconds, fails to observe element 1063, and it remains missing for the remainder of the test. The final read occurs at 308 seconds.

latency-raw 85

In this test, element 5894 was inserted successfully from 300.55 to 300.57 seconds, just as the cluster is beginning to recover from a previous network partition. 5894 is present in subsequent reads at 304.2 seconds, and is last observed at 346.4 seconds, just prior to a partition isolating n3 and n5 from n1, n2, and n4. That partition ends at 406 seconds, and at 423 seconds, reads begin again--but none observe element 5894.

@kmuthukk kmuthukk added this to To Do in YBase features via automation Jan 31, 2019

@kmuthukk kmuthukk added this to To do in Jepsen Testing via automation Jan 31, 2019

@ttyusupov

This comment has been minimized.

Copy link
Contributor

commented Feb 1, 2019

Wasn't able to reproduce for 92 test runs with following settings:

GCP n1-standard-8 host with 5 LXC YugaByte nodes, OS: Debian Jessie
lein run test --os debian --version 1.1.10.0 --test set --time-limit 300 --concurrency 5 --nemesis partition-random-halves

Also wasn't able to reproduce for 20 test runs with the same settings on the following config:

Jepsen control node: GCP n1-standard-4 instance, OS: CentOS 7
5 YugaByte GCP n1-standard-2 instances, OS: Debian Jessie

But after I changed tserver flag load_balancer_max_concurrent_adds to 10 I was able to reproduce on both.

LXC:

results-sorted/ok/set/20190201T012307.000Z/jepsen.log
results-sorted/ok/set/20190201T012943.000Z/jepsen.log
results-sorted/ok/set/20190201T013558.000Z/jepsen.log
results-sorted/ok/set/20190201T014231.000Z/jepsen.log
results-sorted/ok/set/20190201T014852.000Z/jepsen.log
results-sorted/ok/set/20190201T015456.000Z/jepsen.log
results-sorted/ok/set/20190201T020130.000Z/jepsen.log
results-sorted/ok/set/20190201T020801.000Z/jepsen.log
**results-sorted/invalid/set/20190201T021432.000Z/jepsen.log**
results-sorted/ok/set/20190201T022101.000Z/jepsen.log
results-sorted/no-history/set/20190201T022744.000Z/jepsen.log
results-sorted/ok/set/20190201T022824.000Z/jepsen.log
results-sorted/ok/set/20190201T023454.000Z/jepsen.log
results-sorted/no-history/set/20190201T024054.000Z/jepsen.log
results-sorted/ok/set/20190201T024135.000Z/jepsen.log
results-sorted/ok/set/20190201T024758.000Z/jepsen.log
results-sorted/no-history/set/20190201T025427.000Z/jepsen.log
**results-sorted/invalid/set/20190201T025508.000Z/jepsen.log**
results-sorted/ok/set/20190201T030157.000Z/jepsen.log
results-sorted/ok/set/20190201T030820.000Z/jepsen.log
results-sorted/ok/set/20190201T031449.000Z/jepsen.log
results-sorted/ok/set/20190201T032111.000Z/jepsen.log
results-sorted/ok/set/20190201T032719.000Z/jepsen.log
results-sorted/ok/set/20190201T033342.000Z/jepsen.log
results-sorted/ok/set/20190201T034005.000Z/jepsen.log
results-sorted/ok/set/20190201T034641.000Z/jepsen.log
results-sorted/ok/set/20190201T035257.000Z/jepsen.log
**results-sorted/invalid/set/20190201T035926.000Z/jepsen.log**

GCP cluster:

results-sorted/ok/set/20190131T232727.000Z/jepsen.log
results-sorted/ok/set/20190131T233347.000Z/jepsen.log
results-sorted/ok/set/20190131T234015.000Z/jepsen.log
results-sorted/ok/set/20190131T234649.000Z/jepsen.log
results-sorted/ok/set/20190131T235301.000Z/jepsen.log
results-sorted/ok/set/20190131T235942.000Z/jepsen.log
results-sorted/ok/set/20190201T000648.000Z/jepsen.log
results-sorted/ok/set/20190201T001328.000Z/jepsen.log
results-sorted/ok/set/20190201T002005.000Z/jepsen.log
results-sorted/ok/set/20190201T002639.000Z/jepsen.log
results-sorted/ok/set/20190201T003306.000Z/jepsen.log
results-sorted/ok/set/20190201T003922.000Z/jepsen.log
results-sorted/ok/set/20190201T004558.000Z/jepsen.log
results-sorted/ok/set/20190201T005243.000Z/jepsen.log
**results-sorted/invalid/set/20190201T005914.000Z/jepsen.log**
results-sorted/ok/set/20190201T010536.000Z/jepsen.log
results-sorted/ok/set/20190201T011157.000Z/jepsen.log
results-sorted/ok/set/20190201T011828.000Z/jepsen.log
results-sorted/ok/set/20190201T012505.000Z/jepsen.log
results-sorted/ok/set/20190201T013139.000Z/jepsen.log
results-sorted/no-history/set/20190201T013800.000Z/jepsen.log
results-sorted/ok/set/20190201T013841.000Z/jepsen.log
results-sorted/ok/set/20190201T014454.000Z/jepsen.log
results-sorted/ok/set/20190201T015125.000Z/jepsen.log
results-sorted/ok/set/20190201T015802.000Z/jepsen.log
results-sorted/no-history/set/20190201T020423.000Z/jepsen.log
results-sorted/ok/set/20190201T020503.000Z/jepsen.log
results-sorted/ok/set/20190201T021134.000Z/jepsen.log
results-sorted/ok/set/20190201T021759.000Z/jepsen.log
results-sorted/ok/set/20190201T022355.000Z/jepsen.log
results-sorted/ok/set/20190201T023029.000Z/jepsen.log
results-sorted/ok/set/20190201T023709.000Z/jepsen.log
results-sorted/ok/set/20190201T024332.000Z/jepsen.log
results-sorted/no-history/set/20190201T024947.000Z/jepsen.log
results-sorted/ok/set/20190201T025028.000Z/jepsen.log
results-sorted/ok/set/20190201T025712.000Z/jepsen.log
results-sorted/ok/set/20190201T030322.000Z/jepsen.log
results-sorted/ok/set/20190201T030954.000Z/jepsen.log
results-sorted/ok/set/20190201T031634.000Z/jepsen.log
results-sorted/ok/set/20190201T032254.000Z/jepsen.log
**results-sorted/invalid/set/20190201T032923.000Z/jepsen.log**
results-sorted/ok/set/20190201T033545.000Z/jepsen.log
results-sorted/ok/set/20190201T034211.000Z/jepsen.log
results-sorted/ok/set/20190201T034836.000Z/jepsen.log
results-sorted/ok/set/20190201T035511.000Z/jepsen.log
results-sorted/ok/set/20190201T040134.000Z/jepsen.log
results-sorted/ok/set/20190201T040813.000Z/jepsen.log
results-sorted/ok/set/20190201T041448.000Z/jepsen.log
results-sorted/ok/set/20190201T042100.000Z/jepsen.log
results-sorted/ok/set/20190201T042725.000Z/jepsen.log
**results-sorted/invalid/set/20190201T043347.000Z/jepsen.log**
results-sorted/ok/set/20190201T044023.000Z/jepsen.log
results-sorted/ok/set/20190201T044704.000Z/jepsen.log
results-sorted/ok/set/20190201T045336.000Z/jepsen.log
results-sorted/ok/set/20190201T045954.000Z/jepsen.log
results-sorted/ok/set/20190201T050619.000Z/jepsen.log
results-sorted/ok/set/20190201T051242.000Z/jepsen.log
results-sorted/ok/set/20190201T051911.000Z/jepsen.log
results-sorted/ok/set/20190201T052553.000Z/jepsen.log
results-sorted/ok/set/20190201T053212.000Z/jepsen.log
results-sorted/ok/set/20190201T053843.000Z/jepsen.log
results-sorted/ok/set/20190201T054501.000Z/jepsen.log
results-sorted/ok/set/20190201T060216.000Z/jepsen.log
results-sorted/ok/set/20190201T060852.000Z/jepsen.log
results-sorted/ok/set/20190201T061536.000Z/jepsen.log
results-sorted/ok/set/20190201T062203.000Z/jepsen.log
results-sorted/ok/set/20190201T062846.000Z/jepsen.log
results-sorted/ok/set/20190201T063524.000Z/jepsen.log
results-sorted/ok/set/20190201T064205.000Z/jepsen.log
results-sorted/ok/set/20190201T064837.000Z/jepsen.log
results-sorted/ok/set/20190201T065512.000Z/jepsen.log
results-sorted/ok/set/20190201T070140.000Z/jepsen.log
**results-sorted/invalid/set/20190201T070741.000Z/jepsen.log**
results-sorted/ok/set/20190201T071339.000Z/jepsen.log
results-sorted/ok/set/20190201T072026.000Z/jepsen.log
results-sorted/ok/set/20190201T072704.000Z/jepsen.log
**results-sorted/invalid/set/20190201T073335.000Z/jepsen.log**
@ttyusupov

This comment has been minimized.

Copy link
Contributor

commented Feb 1, 2019

Just invalid results schedule:
LXC

8 test runs
results-sorted/invalid/set/20190201T021432.000Z/jepsen.log
8 test runs
results-sorted/invalid/set/20190201T025508.000Z/jepsen.log
9 test runs
results-sorted/invalid/set/20190201T035926.000Z/jepsen.log

GCP cluster:

13 test runs
results-sorted/invalid/set/20190201T005914.000Z/jepsen.log
25 test runs
results-sorted/invalid/set/20190201T032923.000Z/jepsen.log
9 test runs
results-sorted/invalid/set/20190201T043347.000Z/jepsen.log
21 test runs
results-sorted/invalid/set/20190201T070741.000Z/jepsen.log
3 test runs
results-sorted/invalid/set/20190201T073335.000Z/jepsen.log

@kmuthukk

This comment has been minimized.

Copy link
Collaborator

commented Feb 2, 2019

We have been able to repro this internally now, although still very very intermittently, by increasing load_balancer_max_concurrent_adds from 1 (default) to 10.

From the logs we have a theory on what's going on. This appears to happen when raft membership changes (such as during a load balancing operation) AND some other failure (such as a network partition) happen at the same time. Based on the logs, we have a theory on this, and are now in the process of implementing/validating the fix.

yugabyte-ci pushed a commit that referenced this issue Feb 4, 2019

#824: Only count votes from VOTERS in the active config for determini…
…ng replicated OpIds

Summary: When determining which OpIds have been replicated, we now only accept votes from peers that are VOTERS in the active config. Before this change we were accepting votes from any peer that was a member of the active config.

Test Plan: New TestChangeConfigBasedOnJepsen

Reviewers: timur, mikhail, amitanand, kannan

Reviewed By: amitanand, kannan

Subscribers: ybase, bharat

Differential Revision: https://phabricator.dev.yugabyte.com/D6098
@hectorgcr

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2019

When determining which OpIds have been replicated, we now only accept votes from peers that are VOTERS in the active config. Before this change we were accepting votes from any peer that was a member of the active config.

@hectorgcr hectorgcr closed this Feb 5, 2019

YBase features automation moved this from To Do to Done Feb 5, 2019

Jepsen Testing automation moved this from To do to Done Feb 5, 2019

@mbautin mbautin changed the title Acknowledged inserts can be present in reads for tens of seconds, then disappear [Jepsen] Acknowledged inserts can be present in reads for tens of seconds, then disappear Feb 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.