New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client-side sequencing violations around predicate migration #2358

Closed
aphyr opened this Issue Apr 25, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@aphyr

aphyr commented Apr 25, 2018

On 1.0.4, and to a lesser extent on b9169b8 (1.0.5-dev, 2018-04-24 12:28:49 +1000 nightly), client-side sequencing fails to provide order during predicate migrations, even over single records. In a workload where clients concurrently read, increment, and write back a single counter, a single client can observe that counter going backwards over time.

In the sequential test, we create a schema with an integer indexed key (using the @upsert directive), and an integer value. We then increment that counter by reading its value by key, incrementing the observed value, and writing that value back, all inside a transaction. Since increment transactions write every value they read, increment transactions must serialize under snapshot isolation, and the value of that record should rise with increasing SI timestamps.

The design concept doc states that with client-side sequencing...

In short, this map ensures that updates made by the client, or seen by the client, would never be unseen; in fact, they would be visible in a sequential order.

Since every update increase the value of the counter, no client should observe the value decrease.

Unfortunately, on 1.0.4, in the presence of predicate migrations, single clients do exactly that. In 20180424T215433.000-0500.zip, we observe successive pairs of operations where the value drops. Here, process 0 sees the value drop from 183 to 178:

      [{:type :ok,
        :f :read,
        :value 183,
        :process 0,
        :time 54094798427,
        :index 8015}
       {:type :ok,
        :f :inc,
        :value 178,
        :process 0,
        :time 54226456497,
        :index 8035}]

We can plot the values observed by each process over time, and observe processes zipping backwards and forwards in time for multiple seconds:

sequential 0 2

Version b9169b8 (v1.0.5-dev, 2018-04-24 12:28:49 +1000 nightly) reduces the magnitude and frequency of non-monotonic fluctuations, but they are still readily reproducible in a one-minute test:
20180424T205443.000-0500.zip

sequential 0 3

You can reproduce this with Jepsen f1436557a3b754b1c455c601aa39196f2c7e4f5b by running

lein run test --package-url https://github.com/dgraph-io/dgraph/releases/download/v1.0.4/dgraph-linux-amd64.tar.gz --force-download --time-limit 60 --concurrency 1n --workload sequential --sequencing client --upsert-schema --nemesis move-tablet --test-count 5

@aphyr aphyr changed the title from Client-side sequencing exhibits non-monotonicity during predicate migration to Client-side sequencing violations during predicate migration Apr 25, 2018

@aphyr aphyr changed the title from Client-side sequencing violations during predicate migration to Client-side sequencing violations around predicate migration Apr 25, 2018

@manishrjain manishrjain added the bug label Apr 25, 2018

@manishrjain

This comment has been minimized.

Show comment
Hide comment
@manishrjain

manishrjain May 18, 2018

Member

Ran the above command. This one looks like it has been fixed too. Running it with 600s timeout now.

Member

manishrjain commented May 18, 2018

Ran the above command. This one looks like it has been fixed too. Running it with 600s timeout now.

@manishrjain

This comment has been minimized.

Show comment
Hide comment
@manishrjain

manishrjain May 18, 2018

Member

Got 3 consecutive successful runs with a 600-second timeout. Think we got this.

Member

manishrjain commented May 18, 2018

Got 3 consecutive successful runs with a 600-second timeout. Think we got this.

@aphyr

This comment has been minimized.

Show comment
Hide comment
@aphyr

aphyr May 22, 2018

This looks good after a few hours of further testing.

aphyr commented May 22, 2018

This looks good after a few hours of further testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment