Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One CQL call can lead to multiple writes to DocDB due to internal retries #537

Closed
ttyusupov opened this issue Nov 7, 2018 · 1 comment

Comments

3 participants
@ttyusupov
Copy link
Contributor

commented Nov 7, 2018

We can have one CQL call lead to multiple writes to DocDB with different timestamps due to retries inside TabletInvoker.
So, we can have a scenario when a sequence of two CQL calls: write 4, cas 4->1 lead to having 3 values into DocDB:

SubDocKey(DocKey(0xba22, [0], []), [ColumnId(1); HT{ physical: 1540796350911062 w: 1 }]) -> 4
SubDocKey(DocKey(0xba22, [0], []), [ColumnId(1); HT{ physical: 1540796350842148 }]) -> 1
SubDocKey(DocKey(0xba22, [0], []), [ColumnId(1); HT{ physical: 1540796338616427 w: 1 }]) -> 4

And concurrent reader observes reading 4, 1, 4.
Which is treated by Jepsen as non-linearizable history and breaks the Single key ACID test.
This happens under kill-start-tserver nemesis when tservers are randomly killed and started during workload.

@ttyusupov ttyusupov added this to To Do in YBase features via automation Nov 7, 2018

@ttyusupov ttyusupov moved this from To Do to In progress in YBase features Nov 7, 2018

@kmuthukk kmuthukk added this to To do in Jepsen Testing via automation Nov 7, 2018

@kmuthukk kmuthukk added the bug label Nov 7, 2018

yugabyte-ci pushed a commit that referenced this issue Nov 30, 2018

ENG-4123: #537: Ignore duplicate write requests
Summary:
YQL proxy could retry sending a write RPC to tserver when it did not receive a response from the previous call.
But there could be scenarios when the previous call was in fact successfully applied. That could cause duplication
of writes, with the second write happening outside of the time window the client expects this write operation
to happen in, which could cause issues with linearizability.

This diff adds a request id to each write RPC, so write RPCs related to the same write request could be identified
and duplicates could be ignored.

Test Plan:
ybd --cxx-test ql-stress-test --gtest_filter QLStressTest.RetryWrites
ybd --cxx-test ql-stress-test --gtest_filter QLStressTest.RetryWritesWithRestarts

Reviewers: timur, mikhail

Reviewed By: mikhail

Subscribers: bogdan, ybase, bharat

Differential Revision: https://phabricator.dev.yugabyte.com/D5660
@kmuthukk

This comment has been minimized.

Copy link
Collaborator

commented Dec 4, 2018

Nice work @spolitov !

Fixed in 023c20a

@kmuthukk kmuthukk closed this Dec 4, 2018

YBase features automation moved this from In progress to Done Dec 4, 2018

Jepsen Testing automation moved this from To do to Done Dec 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.