Skip to content
Browse files
[#2125] Avoid picking read time for the serializable transactions
For backward compatibility, we have a list of capabilities, supported by tserver.
Right after master restart, while it did not receive heartbeats from tservers it does not know their capabilities.
So, when meta cache is updated tserver with empty capabilities could be stored in it.

Also, we have a feature that read time for the transaction with snapshot isolation is picked at tablet server,
while processing the first request from this transaction.

But, when iterating with tablet server that does not support picking read time,
we set transaction read time at the proxy layer (i.e. YCQL or YSQL proxy).
This particular code contained a bug and also set read time for the transaction with serializable isolation.

Read time for the transaction with serializable isolation should not be set, since
if it would be set transaction with serializable isolation could miss writes made by committed transactions.

The scenario that happened in #2125:
1) All masters are turned off.
2) Meta cache is invalidated and right after master restarted client fetches tablet server information with empty capabilities.
3) When the client tries to send the next transaction request, the buggy code is executed and read time is picked for the transaction with serializable isolation
(it is T2 transaction in #2125).
4) Operation executed for this transaction missed values that were recently committed by other transactions. Following by incorrect state.

So actually it is not just serializable violation but could cause more serious consistency bugs.

Fixed by avoid setting read time for the transaction with serializable isolation.

Extended PgLibPqTest.OnConflict test:
1) Add reads to transaction
2) Find order cycles

Added PgLibPqTest.OnConlictWithKillMaster.

Test Plan: ybd --gtest_filter PgLibPqTest.OnConflict -n 100

Reviewers: mikhail

Reviewed By: mikhail

Subscribers: ybase

Differential Revision:
  • Loading branch information
spolitov committed Aug 24, 2019
1 parent 00eebc4 commit 4e87446657e6b77fe5bd6b4161b3433e8d7ac24a
Showing with 271 additions and 35 deletions.
  1. +13 −2 src/yb/client/
  2. +8 −3 src/yb/tablet/
  3. +250 −30 src/yb/yql/pgwrapper/
@@ -255,6 +255,7 @@ AsyncRpcBase<Req, Resp>::AsyncRpcBase(AsyncRpcData* data, YBConsistencyLevel con
const ConsistentReadPoint* read_point = batcher_->read_point();
bool has_read_time = false;
if (read_point) {
// Set read time for consistent read only if the table is transaction-enabled and
@@ -263,13 +264,18 @@ AsyncRpcBase<Req, Resp>::AsyncRpcBase(AsyncRpcData* data, YBConsistencyLevel con
table()->InternalSchema().table_properties().is_transactional()) {
auto read_time = read_point->GetReadTime(tablet_invoker_.tablet()->tablet_id());
if (read_time) {
has_read_time = true;
auto& transaction_metadata = batcher_->transaction_metadata();
if (!transaction_metadata.transaction_id.is_nil()) {
SetTransactionMetadata(transaction_metadata, batcher_->may_have_metadata(), &req_);
bool serializable = transaction_metadata.isolation == IsolationLevel::SERIALIZABLE_ISOLATION;
LOG_IF(DFATAL, has_read_time && serializable)
<< "Read time should NOT be specified for serializable isolation: "
<< read_point->GetReadTime().ToString();
@@ -304,8 +310,13 @@ void AsyncRpcBase<Req, Resp>::SendRpcToTserver() {
if (!tablet_invoker_.current_ts().HasCapability(CAPABILITY_PickReadTimeAtTabletServer)) {
ConsistentReadPoint* read_point = batcher_->read_point();
if (read_point && !read_point->GetReadTime()) {
auto txn = batcher_->transaction();
// If txn is not set, this is a consistent scan across multiple tablets of a
// non-transactional YCQL table.
if (!txn || txn->isolation() == IsolationLevel::SNAPSHOT_ISOLATION) {

@@ -1857,9 +1857,14 @@ Status Tablet::StartDocWriteOperation(WriteOperation* operation) {
if (!read_time) {
auto safe_time = SafeTime(RequireLease::kTrue);
read_time = ReadHybridTime::FromHybridTimeRange({safe_time, clock_->NowRange().second});
} else if (prepare_result.need_read_snapshot) {
DSCHECK_NE(isolation_level, IsolationLevel::SERIALIZABLE_ISOLATION, InvalidArgument,
"Read time should NOT be specified for serializable isolation level");
} else if (prepare_result.need_read_snapshot &&
isolation_level == IsolationLevel::SERIALIZABLE_ISOLATION) {
auto status = STATUS_FORMAT(
"Read time should NOT be specified for serializable isolation level: $0",
LOG(DFATAL) << status;
return status;

0 comments on commit 4e87446

Please sign in to comment.