Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jepsen] Acknowledged inserts can be present in reads for tens of seconds, then disappear #824

Closed
aphyr opened this issue Jan 31, 2019 · 4 comments
Assignees
Labels
community/request Issues created by external users kind/enhancement This is an enhancement of an existing feature

Comments

@aphyr
Copy link

aphyr commented Jan 31, 2019

In Jepsen set tests, on a five-node debian cluster running 1.1.10.0, we can observe the apparent loss of acknowledged writes after randomized 3/2 network partitions. Writes succeed, are visible to every subsequent read operation, then, after some critical juncture, are missing from every subsequent read, despite that read observing all other records written before and after the missing element. The insertion of these elements, and the time they appear to go missing, appear correlated with the end of a network partition and the later recovery of the cluster to a healthy state.

latency-raw 83

For instance, in this test, element 427 is written successfully from 81.8 to 92.21 seconds, during which the nemesis heals a network partition. It is present in every subsequent read from 98.4 seconds to 141.1 seconds, when the cluster happens to be recovering from a network partition which ended about 15 seconds prior; it is missing from 142 seconds through the end of the test, at 185 seconds.

latency-raw 84

In this test, element 1063 is written successfully from 144.18 to 144.22 seconds, right when the cluster is recovering from a partition which ended ~15 seconds prior. Element 1063 is present immediately in subsequent reads at 144.24 seconds, and persists until 269 seconds, a read which completes immediately before the nemesis isolates n1 and n4 from n2, n3, and n5. The next read, at 303 seconds, fails to observe element 1063, and it remains missing for the remainder of the test. The final read occurs at 308 seconds.

latency-raw 85

In this test, element 5894 was inserted successfully from 300.55 to 300.57 seconds, just as the cluster is beginning to recover from a previous network partition. 5894 is present in subsequent reads at 304.2 seconds, and is last observed at 346.4 seconds, just prior to a partition isolating n3 and n5 from n1, n2, and n4. That partition ends at 406 seconds, and at 423 seconds, reads begin again--but none observe element 5894.

@kmuthukk kmuthukk added the kind/enhancement This is an enhancement of an existing feature label Jan 31, 2019
@kmuthukk kmuthukk added this to To Do in YBase features via automation Jan 31, 2019
@kmuthukk kmuthukk added this to To do in Jepsen Testing via automation Jan 31, 2019
@ttyusupov
Copy link
Contributor

Wasn't able to reproduce for 92 test runs with following settings:

GCP n1-standard-8 host with 5 LXC YugaByte nodes, OS: Debian Jessie
lein run test --os debian --version 1.1.10.0 --test set --time-limit 300 --concurrency 5 --nemesis partition-random-halves

Also wasn't able to reproduce for 20 test runs with the same settings on the following config:

Jepsen control node: GCP n1-standard-4 instance, OS: CentOS 7
5 YugaByte GCP n1-standard-2 instances, OS: Debian Jessie

But after I changed tserver flag load_balancer_max_concurrent_adds to 10 I was able to reproduce on both.

LXC:

results-sorted/ok/set/20190201T012307.000Z/jepsen.log
results-sorted/ok/set/20190201T012943.000Z/jepsen.log
results-sorted/ok/set/20190201T013558.000Z/jepsen.log
results-sorted/ok/set/20190201T014231.000Z/jepsen.log
results-sorted/ok/set/20190201T014852.000Z/jepsen.log
results-sorted/ok/set/20190201T015456.000Z/jepsen.log
results-sorted/ok/set/20190201T020130.000Z/jepsen.log
results-sorted/ok/set/20190201T020801.000Z/jepsen.log
**results-sorted/invalid/set/20190201T021432.000Z/jepsen.log**
results-sorted/ok/set/20190201T022101.000Z/jepsen.log
results-sorted/no-history/set/20190201T022744.000Z/jepsen.log
results-sorted/ok/set/20190201T022824.000Z/jepsen.log
results-sorted/ok/set/20190201T023454.000Z/jepsen.log
results-sorted/no-history/set/20190201T024054.000Z/jepsen.log
results-sorted/ok/set/20190201T024135.000Z/jepsen.log
results-sorted/ok/set/20190201T024758.000Z/jepsen.log
results-sorted/no-history/set/20190201T025427.000Z/jepsen.log
**results-sorted/invalid/set/20190201T025508.000Z/jepsen.log**
results-sorted/ok/set/20190201T030157.000Z/jepsen.log
results-sorted/ok/set/20190201T030820.000Z/jepsen.log
results-sorted/ok/set/20190201T031449.000Z/jepsen.log
results-sorted/ok/set/20190201T032111.000Z/jepsen.log
results-sorted/ok/set/20190201T032719.000Z/jepsen.log
results-sorted/ok/set/20190201T033342.000Z/jepsen.log
results-sorted/ok/set/20190201T034005.000Z/jepsen.log
results-sorted/ok/set/20190201T034641.000Z/jepsen.log
results-sorted/ok/set/20190201T035257.000Z/jepsen.log
**results-sorted/invalid/set/20190201T035926.000Z/jepsen.log**

GCP cluster:

results-sorted/ok/set/20190131T232727.000Z/jepsen.log
results-sorted/ok/set/20190131T233347.000Z/jepsen.log
results-sorted/ok/set/20190131T234015.000Z/jepsen.log
results-sorted/ok/set/20190131T234649.000Z/jepsen.log
results-sorted/ok/set/20190131T235301.000Z/jepsen.log
results-sorted/ok/set/20190131T235942.000Z/jepsen.log
results-sorted/ok/set/20190201T000648.000Z/jepsen.log
results-sorted/ok/set/20190201T001328.000Z/jepsen.log
results-sorted/ok/set/20190201T002005.000Z/jepsen.log
results-sorted/ok/set/20190201T002639.000Z/jepsen.log
results-sorted/ok/set/20190201T003306.000Z/jepsen.log
results-sorted/ok/set/20190201T003922.000Z/jepsen.log
results-sorted/ok/set/20190201T004558.000Z/jepsen.log
results-sorted/ok/set/20190201T005243.000Z/jepsen.log
**results-sorted/invalid/set/20190201T005914.000Z/jepsen.log**
results-sorted/ok/set/20190201T010536.000Z/jepsen.log
results-sorted/ok/set/20190201T011157.000Z/jepsen.log
results-sorted/ok/set/20190201T011828.000Z/jepsen.log
results-sorted/ok/set/20190201T012505.000Z/jepsen.log
results-sorted/ok/set/20190201T013139.000Z/jepsen.log
results-sorted/no-history/set/20190201T013800.000Z/jepsen.log
results-sorted/ok/set/20190201T013841.000Z/jepsen.log
results-sorted/ok/set/20190201T014454.000Z/jepsen.log
results-sorted/ok/set/20190201T015125.000Z/jepsen.log
results-sorted/ok/set/20190201T015802.000Z/jepsen.log
results-sorted/no-history/set/20190201T020423.000Z/jepsen.log
results-sorted/ok/set/20190201T020503.000Z/jepsen.log
results-sorted/ok/set/20190201T021134.000Z/jepsen.log
results-sorted/ok/set/20190201T021759.000Z/jepsen.log
results-sorted/ok/set/20190201T022355.000Z/jepsen.log
results-sorted/ok/set/20190201T023029.000Z/jepsen.log
results-sorted/ok/set/20190201T023709.000Z/jepsen.log
results-sorted/ok/set/20190201T024332.000Z/jepsen.log
results-sorted/no-history/set/20190201T024947.000Z/jepsen.log
results-sorted/ok/set/20190201T025028.000Z/jepsen.log
results-sorted/ok/set/20190201T025712.000Z/jepsen.log
results-sorted/ok/set/20190201T030322.000Z/jepsen.log
results-sorted/ok/set/20190201T030954.000Z/jepsen.log
results-sorted/ok/set/20190201T031634.000Z/jepsen.log
results-sorted/ok/set/20190201T032254.000Z/jepsen.log
**results-sorted/invalid/set/20190201T032923.000Z/jepsen.log**
results-sorted/ok/set/20190201T033545.000Z/jepsen.log
results-sorted/ok/set/20190201T034211.000Z/jepsen.log
results-sorted/ok/set/20190201T034836.000Z/jepsen.log
results-sorted/ok/set/20190201T035511.000Z/jepsen.log
results-sorted/ok/set/20190201T040134.000Z/jepsen.log
results-sorted/ok/set/20190201T040813.000Z/jepsen.log
results-sorted/ok/set/20190201T041448.000Z/jepsen.log
results-sorted/ok/set/20190201T042100.000Z/jepsen.log
results-sorted/ok/set/20190201T042725.000Z/jepsen.log
**results-sorted/invalid/set/20190201T043347.000Z/jepsen.log**
results-sorted/ok/set/20190201T044023.000Z/jepsen.log
results-sorted/ok/set/20190201T044704.000Z/jepsen.log
results-sorted/ok/set/20190201T045336.000Z/jepsen.log
results-sorted/ok/set/20190201T045954.000Z/jepsen.log
results-sorted/ok/set/20190201T050619.000Z/jepsen.log
results-sorted/ok/set/20190201T051242.000Z/jepsen.log
results-sorted/ok/set/20190201T051911.000Z/jepsen.log
results-sorted/ok/set/20190201T052553.000Z/jepsen.log
results-sorted/ok/set/20190201T053212.000Z/jepsen.log
results-sorted/ok/set/20190201T053843.000Z/jepsen.log
results-sorted/ok/set/20190201T054501.000Z/jepsen.log
results-sorted/ok/set/20190201T060216.000Z/jepsen.log
results-sorted/ok/set/20190201T060852.000Z/jepsen.log
results-sorted/ok/set/20190201T061536.000Z/jepsen.log
results-sorted/ok/set/20190201T062203.000Z/jepsen.log
results-sorted/ok/set/20190201T062846.000Z/jepsen.log
results-sorted/ok/set/20190201T063524.000Z/jepsen.log
results-sorted/ok/set/20190201T064205.000Z/jepsen.log
results-sorted/ok/set/20190201T064837.000Z/jepsen.log
results-sorted/ok/set/20190201T065512.000Z/jepsen.log
results-sorted/ok/set/20190201T070140.000Z/jepsen.log
**results-sorted/invalid/set/20190201T070741.000Z/jepsen.log**
results-sorted/ok/set/20190201T071339.000Z/jepsen.log
results-sorted/ok/set/20190201T072026.000Z/jepsen.log
results-sorted/ok/set/20190201T072704.000Z/jepsen.log
**results-sorted/invalid/set/20190201T073335.000Z/jepsen.log**

@ttyusupov
Copy link
Contributor

Just invalid results schedule:
LXC

8 test runs
results-sorted/invalid/set/20190201T021432.000Z/jepsen.log
8 test runs
results-sorted/invalid/set/20190201T025508.000Z/jepsen.log
9 test runs
results-sorted/invalid/set/20190201T035926.000Z/jepsen.log

GCP cluster:

13 test runs
results-sorted/invalid/set/20190201T005914.000Z/jepsen.log
25 test runs
results-sorted/invalid/set/20190201T032923.000Z/jepsen.log
9 test runs
results-sorted/invalid/set/20190201T043347.000Z/jepsen.log
21 test runs
results-sorted/invalid/set/20190201T070741.000Z/jepsen.log
3 test runs
results-sorted/invalid/set/20190201T073335.000Z/jepsen.log

@kmuthukk
Copy link
Collaborator

kmuthukk commented Feb 2, 2019

We have been able to repro this internally now, although still very very intermittently, by increasing load_balancer_max_concurrent_adds from 1 (default) to 10.

From the logs we have a theory on what's going on. This appears to happen when raft membership changes (such as during a load balancing operation) AND some other failure (such as a network partition) happen at the same time. Based on the logs, we have a theory on this, and are now in the process of implementing/validating the fix.

yugabyte-ci pushed a commit that referenced this issue Feb 4, 2019
…ng replicated OpIds

Summary: When determining which OpIds have been replicated, we now only accept votes from peers that are VOTERS in the active config. Before this change we were accepting votes from any peer that was a member of the active config.

Test Plan: New TestChangeConfigBasedOnJepsen

Reviewers: timur, mikhail, amitanand, kannan

Reviewed By: amitanand, kannan

Subscribers: ybase, bharat

Differential Revision: https://phabricator.dev.yugabyte.com/D6098
@hectorgcr
Copy link
Contributor

When determining which OpIds have been replicated, we now only accept votes from peers that are VOTERS in the active config. Before this change we were accepting votes from any peer that was a member of the active config.

YBase features automation moved this from To Do to Done Feb 5, 2019
Jepsen Testing automation moved this from To do to Done Feb 5, 2019
@mbautin mbautin changed the title Acknowledged inserts can be present in reads for tens of seconds, then disappear [Jepsen] Acknowledged inserts can be present in reads for tens of seconds, then disappear Feb 7, 2019
@yugabyte-ci yugabyte-ci added the community/request Issues created by external users label Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community/request Issues created by external users kind/enhancement This is an enhancement of an existing feature
Projects
Development

No branches or pull requests

6 participants