_version does not uniquely identify a particular version of a row #3711

Open
aphyr opened this Issue Jun 28, 2016 · 7 comments

Comments

5 participants
@aphyr

aphyr commented Jun 28, 2016

Hi there!

I've been looking in to crate.io recently, and I've found something that feels a little bit surprising. If you perform a number of concurrent blind updates to a 0.54.9-1~jessie cluster which experiences a network partition, it's possible for two reads of a single row to have the same _version but different values.

For instance, here are two primary-key reads (select (value, _version) from registers where id = ?) which returned different values for the same version of the row:

      [{:value 8631, :_version 8622} {:value 8625, :_version 8622}]]

Or repeated reads which do not agree:

      [{:value 8570, :_version 8568}
       {:value 8570, :_version 8568}
       {:value 8572, :_version 8568}
       {:value 8570, :_version 8568}
       {:value 8572, :_version 8568}
       {:value 8570, :_version 8568}
       {:value 8572, :_version 8568}]])},

Crate's optimistic concurrency control docs say things like "This [version] is increased by 1 on every update" and "Querying for the correct _version ensures that no concurrent update has taken place." This suggests to me that _version should uniquely identify a version of a row. This undermines the safety of Crate's concurrency model: even if conditional updates based on _version are safe, if clients can't agree on what value a particular _version identifies, there's no way to avoid concurrency anomalies. I imagine lost updates might be possible. This might also affect the safety of SQL update statements which do not rely explicitly on _version: for instance, UPDATE foo SET visits = visits + 1, but I haven't tested those yet.

You can reproduce this behavior by cloning Jepsen at b25e636f and running lein test in crate/, with the standard five-node setup; see Jepsen's docs for details. Or, you should be able to reproduce them yourself, by having five clients, each bound to one host of a five-node cluster, perform concurrent writes to a single row, and having five more clients perform concurrent reads, recording their _versions. A ~200 second network partition isolating each node from two of its neighbors, forming an overlapping ring topology, appears to be sufficient to induce this behavior--but that's literally the first failure mode I tried, so there may be simpler ones.

As advised, I'm using an explicit expected-nodes count, majority values for all minimum-master-limits in the config file, and I've lowered some timeouts to speed up the testing process. The table is a simple (pkey, value) table, replicated to all nodes.

I suspect this issue stems from, (and also affects) whatever underlying ElasticSearch version you're using, but it's possible those problems have been resolved in 5.0.0. As a courtesy to your customers, may I recommend you adopt their resiliency status as a part of your documentation, so users know what behaviors they can expect?

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@joemoe

This comment has been minimized.

Show comment
Hide comment
@joemoe

joemoe Jun 28, 2016

Contributor

hi @aphyr, thanks for reporting the issue. we will have a look at this today and come up with a solution. we'll definitely make the behavior explicit to our users.

Contributor

joemoe commented Jun 28, 2016

hi @aphyr, thanks for reporting the issue. we will have a look at this today and come up with a solution. we'll definitely make the behavior explicit to our users.

@aphyr

This comment has been minimized.

Show comment
Hide comment
@aphyr

aphyr Jun 28, 2016

Hiya! I've written up a more thorough description of the test and results here: https://aphyr.com/posts/332-jepsen-crate-0-54-9-version-divergence, and the latest Jepsen commits include a more more reliable failure schedule. I can also confirm that this fault manifests with simple network partitions, in addition to overlapping majorities.

aphyr commented Jun 28, 2016

Hiya! I've written up a more thorough description of the test and results here: https://aphyr.com/posts/332-jepsen-crate-0-54-9-version-divergence, and the latest Jepsen commits include a more more reliable failure schedule. I can also confirm that this fault manifests with simple network partitions, in addition to overlapping majorities.

@seut

This comment has been minimized.

Show comment
Hide comment
@seut

seut Jul 5, 2016

Member

@aphyr we were able to reproduce this issue using an integration test (additional to running the jepsen suite) and we've figured out that this issue is happening with plain Elasticsearch as well (ran against ES 2.3, 5.0-alpha3 & master).
As we couldn't find the root cause of this issue yet, we've created an ES issue elastic/elasticsearch#19269, I guess (hoping ;) the awsome ES guys can help tracking that down.

Member

seut commented Jul 5, 2016

@aphyr we were able to reproduce this issue using an integration test (additional to running the jepsen suite) and we've figured out that this issue is happening with plain Elasticsearch as well (ran against ES 2.3, 5.0-alpha3 & master).
As we couldn't find the root cause of this issue yet, we've created an ES issue elastic/elasticsearch#19269, I guess (hoping ;) the awsome ES guys can help tracking that down.

@aphyr

This comment has been minimized.

Show comment
Hide comment
@aphyr

aphyr Jul 5, 2016

Oh this is terrific news, haha. I was thinking 5.0 would have this fixed! I'll see if I can put a little more time into this and see if we can get it to lose updates. :)

aphyr commented Jul 5, 2016

Oh this is terrific news, haha. I was thinking 5.0 would have this fixed! I'll see if I can put a little more time into this and see if we can get it to lose updates. :)

@seut

This comment has been minimized.

Show comment
Hide comment
@seut

seut Jul 5, 2016

Member

@aphyr
if you have any information which ES issues could be related, would be awesome to share. also your results of provoking lost update would help a lot. thanks!

Member

seut commented Jul 5, 2016

@aphyr
if you have any information which ES issues could be related, would be awesome to share. also your results of provoking lost update would help a lot. thanks!

@lukas-vlcek

This comment has been minimized.

Show comment
Hide comment
@lukas-vlcek

lukas-vlcek Jul 20, 2017

@seut Any update on this issue after you upgraded to ES 5.2.2? Are you still able to repro this issue?

@seut Any update on this issue after you upgraded to ES 5.2.2? Are you still able to repro this issue?

@jodok

This comment has been minimized.

Show comment
Hide comment
@jodok

jodok Jul 20, 2017

Member

@lukas-vlcek as we released the 2.0 CrateDB series that include a lot of resiliency fixes we're planning to revisit @aphyrs testcases.

Member

jodok commented Jul 20, 2017

@lukas-vlcek as we released the 2.0 CrateDB series that include a lot of resiliency fixes we're planning to revisit @aphyrs testcases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment