You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After a period of network partitions, Dgraph 1.0.3-dev (5563bd2) can wind up stuck in a mode where all transactions which attempt to modify a key conflict immediately (e.g. on mutate, not commit). Using the schema
value: [int] .
... we build up a set of integers by performing mutations associating each integer with a fixed UID; e.g. to insert the number 5, we execute a transaction with a single mutation:
{uid: "0x01"
value: 5}
To read this set, we query for all values associated with that UID using { q(func: uid($u)) { uid, value } }. As we saw in #2152, this appears to show a stale state of the DB: all values up to some number are present, then every subsequent acknowledged value is missing. To distinguish between stale reads and lost updates, we follow that read, in the same transaction, with a sequence of inserts or deletes to the exact triples which we believe were successfully inserted--if we read the 5, we insert {uid: "0x01", value: 5}, and if we failed to read 5, we delete {uid: "0x01", value: 5} instead. These update transactions fail immediately with a conflict.
INFO [2018-02-2210:45:23,629] jepsen worker 0 - jepsen.util 140:invoke:readnil
INFO [2018-02-2210:45:23,631] jepsen worker 0 - jepsen.dgraph.set Forcing conflict by deleting 0
INFO [2018-02-2210:45:23,678] jepsen worker 0 - jepsen.util 140:fail:readnil:conflict
INFO [2018-02-2210:45:23,849] jepsen worker 9 - jepsen.util 169:invoke:readnil
INFO [2018-02-2210:45:23,851] jepsen worker 9 - jepsen.dgraph.set Forcing conflict by deleting 0
INFO [2018-02-2210:45:23,875] jepsen worker 9 - jepsen.util 169:fail:readnil:conflict
INFO [2018-02-2210:45:25,000] jepsen worker 6 - jepsen.util 106:invoke:readnil
INFO [2018-02-2210:45:25,002] jepsen worker 6 - jepsen.dgraph.set Forcing conflict by deleting 0
INFO [2018-02-2210:45:25,048] jepsen worker 6 - jepsen.util 106:fail:readnil:conflict
INFO [2018-02-2210:45:30,837] jepsen worker 7 - jepsen.util 157:invoke:readnil
INFO [2018-02-2210:45:30,839] jepsen worker 7 - jepsen.dgraph.set Forcing conflict by deleting 0
INFO [2018-02-2210:45:30,885] jepsen worker 7 - jepsen.util 157:fail:readnil:conflict
INFO [2018-02-2210:45:38,448] jepsen worker 3 - jepsen.util 153:invoke:readnil
INFO [2018-02-2210:45:38,450] jepsen worker 3 - jepsen.dgraph.set Forcing conflict by deleting 0
INFO [2018-02-2210:45:38,481] jepsen worker 3 - jepsen.util 153:fail:readnil:conflict
This state appears to persist indefinitely--in an hour without any network disruption, and no other transactions, every update failed in this way.
In an optimistic concurrency control system, we would expect these updates to fail if another transaction modified and committed that key some time after our update transaction began and before it completed. If transaction start times were allocated sequentially, the conflict-failure of n sequential updates to the same key implies the existence of n ongoing update transactions affecting that key, but in this test, we have no such evidence. There are any number of timed-out transactions that might be applied just in time to cause these failures, but eventually we should exhaust those.
Alternatively, these update transactions could be obtaining starting timestamps from some point far in the past, such that they conflict with an update transaction that completed long ago.
To reproduce this behavior, try Jepsen d8bb86a5219d17abe5a4125581350571c5ffe209, and run
lein run test -f --package-url https://github.com/dgraph-io/dgraph/releases/download/nightly/dgraph-linux-amd64.tar.gz -w uid-set --time-limit 120 --nemesis partition-random-halves --concurrency 2n --test-count 10
The text was updated successfully, but these errors were encountered:
After a period of network partitions, Dgraph 1.0.3-dev (5563bd2) can wind up stuck in a mode where all transactions which attempt to modify a key conflict immediately (e.g. on mutate, not commit). Using the schema
... we build up a set of integers by performing mutations associating each integer with a fixed UID; e.g. to insert the number 5, we execute a transaction with a single mutation:
To read this set, we query for all values associated with that UID using
{ q(func: uid($u)) { uid, value } }
. As we saw in #2152, this appears to show a stale state of the DB: all values up to some number are present, then every subsequent acknowledged value is missing. To distinguish between stale reads and lost updates, we follow that read, in the same transaction, with a sequence of inserts or deletes to the exact triples which we believe were successfully inserted--if we read the 5, we insert{uid: "0x01", value: 5}
, and if we failed to read 5, we delete{uid: "0x01", value: 5}
instead. These update transactions fail immediately with a conflict.This state appears to persist indefinitely--in an hour without any network disruption, and no other transactions, every update failed in this way.
In an optimistic concurrency control system, we would expect these updates to fail if another transaction modified and committed that key some time after our update transaction began and before it completed. If transaction start times were allocated sequentially, the conflict-failure of n sequential updates to the same key implies the existence of n ongoing update transactions affecting that key, but in this test, we have no such evidence. There are any number of timed-out transactions that might be applied just in time to cause these failures, but eventually we should exhaust those.
Alternatively, these update transactions could be obtaining starting timestamps from some point far in the past, such that they conflict with an update transaction that completed long ago.
To reproduce this behavior, try Jepsen d8bb86a5219d17abe5a4125581350571c5ffe209, and run
The text was updated successfully, but these errors were encountered: