Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot isolation violation with tablet moves #4534

Closed
aphyr opened this issue Jan 9, 2020 · 2 comments
Closed

Snapshot isolation violation with tablet moves #4534

aphyr opened this issue Jan 9, 2020 · 2 comments
Assignees

Comments

@aphyr
Copy link

@aphyr aphyr commented Jan 9, 2020

Immediately following tablet move operations, version 1.1.1 can exhibit transient snapshot isolation violations in the Jepsen bank test. @manishrjain says that there are known issues with tablet moves in 1.1.1, so this might be fixed already in master. I'm just writing this up so we have a record!

What version of Dgraph are you using?

1.1.1

Have you tried reproducing the issue with the latest release?

1.1.1 is the latest official release; I haven't seen this on 1.1.1-48-g157896305 yet.

What is the hardware spec (RAM, OS)?

A 5-node EC2 m4.large cluster.

Steps to reproduce the issue (command/config used to run Dgraph).

With Jepsen 3bff032adf3a4277e5cbbc2cd05ecec90c69f61e, run

lein run test --version 1.1.1 --concurrency 2n --nemesis move-tablet --time-limit 300 -w bank

Expected behaviour and actual result.

With an initial starting balance of $100 across all accounts, we expect every read to observe $100. However, immediately following move-tablet operations (grey vertical lines), clients can temporarily observe values as low as 45. Because these issues are transient, I think they're probably constrained to the read path; there's no evidence thus far that updates can permanently alter the total amount of money in all accounts.

bank (9)

20200109T200137.000Z.zip

@MichelDiz MichelDiz added the area/operations label Jan 10, 2020
@manishrjain
Copy link
Contributor

@manishrjain manishrjain commented Jan 11, 2020

This PR fixes this issue. The issue was only with read transactions.

#4496

This commit: ec44550 passes the bank test with move-tablet nemesis. @danielmai to confirm.

@danielmai danielmai added area/testing/jepsen and removed area/operations labels Jan 11, 2020
@danielmai
Copy link
Contributor

@danielmai danielmai commented Jan 12, 2020

The release/v1.1 branch (e18986f) contains the fix in #4496 and will be released as v1.1.2.

Dgraph version   : v1.1.1-56-ge18986f1c
Dgraph SHA-256   : 36eecf05e86803b38a4a77da4a2410b11a7ec8d1c558d661a8024e4d0914d919
Commit SHA-1     : e18986f1c
Commit timestamp : 2020-01-11 17:49:16 -0800
Branch           : release/v1.1
Go version       : go1.13.5

This build passes 10 individual 600-second test runs bank test with move-tablet nemesis.

20200112T145350.000Z.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants