Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Support locking options in SELECT statement (SELECT FOR UPDATE etc.) #1199

Open
ndeodhar opened this issue Apr 11, 2019 · 13 comments

Comments

@ndeodhar
Copy link
Contributor

commented Apr 11, 2019

(This is needed for Yugaware and it also blocks FOREIGN KEY constraints in snapshot isolation -- i.e. non-serializable).
Support FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE or FOR KEY SHARE locking options in SELECT statement.

@ndeodhar ndeodhar added the area/sql label Apr 11, 2019
@ndeodhar ndeodhar added this to To do in SQL Support via automation Apr 11, 2019
@mbautin

This comment has been minimized.

Copy link
Collaborator

commented Jun 11, 2019

A relevant blog post by Michael Paquier: https://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-for-key-share-and-for-no-key-update/

The relevant PostgreSQL documentation page: https://www.postgresql.org/docs/11/explicit-locking.html#LOCKING-ROWS

In addition to table-level locks, there are row-level locks, which are listed as below with the contexts in which they are used automatically by PostgreSQL. See Table 13.3 for a complete table of row-level lock conflicts. Note that a transaction can hold conflicting locks on the same row, even in different subtransactions; but other than that, two transactions can never hold conflicting locks on the same row. Row-level locks do not affect data querying; they block only writers and lockers to the same row.

Row-level Lock Modes

FOR UPDATE
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.

The FOR UPDATE lock mode is also acquired by any DELETE on a row, and also by an UPDATE that modifies the values on certain columns. Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.

FOR NO KEY UPDATE
Behaves similarly to FOR UPDATE, except that the lock acquired is weaker: this lock will not block SELECT FOR KEY SHARE commands that attempt to acquire a lock on the same rows. This lock mode is also acquired by any UPDATE that does not acquire a FOR UPDATE lock.

FOR SHARE
Behaves similarly to FOR NO KEY UPDATE, except that it acquires a shared lock rather than exclusive lock on each retrieved row. A shared lock blocks other transactions from performing UPDATE, DELETE, SELECT FOR UPDATE or SELECT FOR NO KEY UPDATE on these rows, but it does not prevent them from performing SELECT FOR SHARE or SELECT FOR KEY SHARE.

FOR KEY SHARE
Behaves similarly to FOR SHARE, except that the lock is weaker: SELECT FOR UPDATE is blocked, but not SELECT FOR NO KEY UPDATE. A key-shared lock blocks other transactions from performing DELETE or any UPDATE that changes the key values, but not other UPDATE, and neither does it prevent SELECT FOR NO KEY UPDATE, SELECT FOR SHARE, or SELECT FOR KEY SHARE.

PostgreSQL doesn't remember any information about modified rows in memory, so there is no limit on the number of rows locked at one time. However, locking a row might cause a disk write, e.g., SELECT FOR UPDATE modifies selected rows to mark them locked, and so will result in disk writes.

@mbautin

This comment has been minimized.

Copy link
Collaborator

commented Jun 11, 2019

Here is how some of these could be mapped to YugaByte DB locking / intent modes:

FOR UPDATE: strong read + strong write lock on the DocKey, as if we're replacing or deleting the entire row in DocDB.
FOR NO KEY UPDATE: strong read + weak write lock on the DocKey, as if we're reading the entire row and then writing only a subset of columns in DocDB
FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB
FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being replaced / deleted, as if we're simply reading some of the column. This is the type of locking that is used by foreign keys, so this will prevent the referenced row from disappearing. The reason it does not conflict with the FOR NO KEY UPDATE above is conceptually the following: an operation that reads the entire row and then writes a subset of columns (FOR NO KEY UPDATE) does not have to conflict with an operation that could be reading a different subset of columns (FOR KEY SHARE).

Conflict matrix:

FOR UPDATE vs FOR UPDATE = conflict (detected as strong read vs. strong write conflict)
FOR UPDATE vs FOR NO KEY UPDATE = conflict (detected as strong read vs. weak write conflict)
FOR UPDATE vs FOR SHARE = conflict (detected as strong write vs. strong read conflict)
FOR UPDATE vs FOR KEY SHARE = conflict (detected as strong write vs. weak read conflict)

FOR NO KEY UPDATE vs. FOR NO KEY UPDATE = conflict (detected as strong read vs. weak write, because each operation pretends to be reading the entire row)
FOR NO KEY UPDATE vs. FOR SHARE = conflict (detected as weak write vs. strong read, as the FOR SHARE operation pretends to be reading the entire row)
FOR NO KEY UPDATE vs. FOR KEY SHARE = OK (no conflict -- weak locks don't conflict with each other, and strong read does not conflict with weak read)

FOR SHARE vs. FOR SHARE = OK (reads locks don't conflict)
FOR SHARE vs . FOR KEY SHARE = OK (reads locks don't conflict)

FOR KEY SHARE vs. FOR KEY SHARE = OK (weak read locks obviously don't conflict)

So the above locking scheme matches the conflict matrix expected between PostgreSQL locking levels.

Open question: is the above mapping be any different for our implementation of serializable vs. snapshot isolation? What else need to be implemented at snapshot isolation to support this?

@m-iancu m-iancu added this to the v2.0 milestone Jun 13, 2019
@mbautin mbautin changed the title [YSQL] Support locking options in SELECT statement [YSQL] Support locking options in SELECT statement (SELECT FOR UPDATE etc.) Jun 24, 2019
@mbautin

This comment has been minimized.

Copy link
Collaborator

commented Jun 24, 2019

We'll need to modify TabletServiceImpl::Read to turn on the provisional record writing on reads in this case as well. Currently it is only turned on for serializable isolation.

Here is how we write the provisional records (the "read intents") on the read paths:

  if (serializable_isolation) {
    WriteRequestPB write_req;
    *write_req.mutable_write_batch()->mutable_transaction() = req->transaction();
    write_req.set_tablet_id(req->tablet_id());
    read_time.AddToPB(&write_req);
    write_req.mutable_write_batch()->set_may_have_metadata(req->may_have_metadata());
    // TODO(dtxn) write request id

    auto* write_batch = write_req.mutable_write_batch();
    auto status = leader_peer.peer->tablet()->CreateReadIntents(
        req->transaction(), req->ql_batch(), req->pgsql_batch(), write_batch);
    if (!status.ok()) {
      SetupErrorAndRespond(
          resp->mutable_error(), status, TabletServerErrorPB::UNKNOWN_ERROR, &context);
      return;
    }

    auto operation_state = std::make_unique<WriteOperationState>(
        leader_peer.peer->tablet(), &write_req, nullptr /* response */,
        docdb::OperationKind::kRead);

    auto context_ptr = std::make_shared<RpcContext>(std::move(context));
    read_context.context = context_ptr.get();
    operation_state->set_completion_callback(std::make_unique<ReadOperationCompletionCallback>(
        this, leader_peer.peer, read_context, context_ptr));
    leader_peer.peer->WriteAsync(
        std::move(operation_state), leader_peer.leader_term, context_ptr->GetClientDeadline());
    return;
  }
@mbautin

This comment has been minimized.

Copy link
Collaborator

commented Jun 24, 2019

One additional thing that needs to happen for snapshot isolation + select for ... kind of locking (and not for serializable isolation) is checking for conflicts with transactions committed between the transaction start time and the current read time. E.g. if the row was deleted between the transaction start time and current read time, we should not be able to acquire the FOR KEY SHARE lock. We should be able to reuse some of the existing logic that we use to detect these kinds of conflicts in snapshot isolation.

Look for this function in conflict_resolution.cc:

  CHECKED_STATUS CheckConflictWithCommitted(
      const TransactionId& id, HybridTime commit_time) override {
    DSCHECK(commit_time.is_valid(), Corruption, "Invalid transaction commit time");

    VLOG(4) << "Committed: " << id << ", " << commit_time;

    // commit_time equals to HybridTime::kMax means that transaction is not actually committed,
    // but is being committed. I.e. status tablet is trying to replicate COMMITTED state.
    // So we should always conflict with such transaction, because we are not able to read its
    // results.
    //
    // read_time equals to HybridTime::kMax in case of serializable isolation or when
    // read time was not yet picked for snapshot isolation.
    // So it should conflict only with transactions that are being committed.
    //
    // In all other cases we have concrete read time and should conflict with transactions
    // that were committed after this point.
    if (commit_time >= read_time_) {
      return MakeConflictStatus(id, "committed", conflicts_metric_);
    }

    return Status::OK();
  }
@mbautin

This comment has been minimized.

Copy link
Collaborator

commented Jun 24, 2019

This is the important part:

    // read_time equals to HybridTime::kMax in case of serializable isolation or when
    // read time was not yet picked for snapshot isolation.
    // So it should conflict only with transactions that are being committed.

We need to make sure read_time_ is set correctly for writing provisional records on the read path in snapshot isolation, then I think the existing conflict-with-committed detection logic should work.

@mbautin

This comment has been minimized.

Copy link
Collaborator

commented Jun 25, 2019

In tablet_service.cc we call CreateReadIntents:

    auto status = leader_peer.peer->tablet()->CreateReadIntents(
        req->transaction(), req->ql_batch(), req->pgsql_batch(), write_batch);

This is what CreateReadIntents does in tablet.cc:

Status Tablet::CreateReadIntents(
    const TransactionMetadataPB& transaction_metadata,
    const google::protobuf::RepeatedPtrField<QLReadRequestPB>& ql_batch,
    const google::protobuf::RepeatedPtrField<PgsqlReadRequestPB>& pgsql_batch,
    docdb::KeyValueWriteBatchPB* write_batch) {
  auto txn_op_ctx = VERIFY_RESULT(CreateTransactionOperationContext(transaction_metadata));

  for (const auto& ql_read : ql_batch) {
    docdb::QLReadOperation doc_op(ql_read, txn_op_ctx);
    RETURN_NOT_OK(doc_op.GetIntents(SchemaRef(), write_batch));
  }

  for (const auto& pgsql_read : pgsql_batch) {
    docdb::PgsqlReadOperation doc_op(pgsql_read, txn_op_ctx);
    RETURN_NOT_OK(doc_op.GetIntents(SchemaRef(), write_batch));
  }

  return Status::OK();
}

and it ends up calling PgsqlReadOperation::GetIntents:

Status PgsqlReadOperation::GetIntents(const Schema& schema, KeyValueWriteBatchPB* out) {
  auto pair = out->mutable_read_pairs()->Add();

  if (request_.partition_column_values().empty()) {
    // Empty components mean that we don't have primary key at all, but request
    // could still contain hash_code as part of tablet routing.
    // So we should ignore it.
    pair->set_key(std::string(1, ValueTypeAsChar::kGroupEnd));
  } else {
    std::vector<PrimitiveValue> hashed_components;
    RETURN_NOT_OK(InitKeyColumnPrimitiveValues(
        request_.partition_column_values(), schema, 0 /* start_idx */, &hashed_components));

    DocKey doc_key(request_.hash_code(), hashed_components);
    pair->set_key(doc_key.Encode().data());
  }

  pair->set_value(std::string(1, ValueTypeAsChar::kNull));
  return Status::OK();
}

The PgsqlReadOperation::GetIntents function generates top-level read intents (currently only used for serializable isolation), and then they get turned into an entire sequence of weak intents on ancestors + strong intent on the key itself later (will point out where exactly that happens in a further comment).

@mbautin

This comment has been minimized.

Copy link
Collaborator

commented Jun 25, 2019

The entry point to resolving operation conflicts is in the call from Tablet::StartDocWriteOperation to docdb::ResolveOperationConflicts.

@mbautin

This comment has been minimized.

Copy link
Collaborator

commented Jun 25, 2019

Here's where WriteOperationState gets wrapped into a WriteOperation:

auto operation = std::make_unique<WriteOperation>(std::move(state), term, deadline, this);

(in tablet_peer.cc, TabletPeer::WriteAsync).

@mbautin

This comment has been minimized.

Copy link
Collaborator

commented Jun 25, 2019

We decided that we will focus on implementing the following two lock types, but initially it would be identical to what serializable isolation read intents do today, so both of the following intent types would be the same for now:

FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB
FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being replaced / deleted, as if we're simply reading some of the column. This is the type of locking that is used by the foreign key logic in PostgreSQL when we add a row that references another table through a foreign key. A read is done on the appropriate key of the referenced table with FOR KEY SHARE, preventing that row from disappearing.

We will relax the locking mode in FOR KEY SHARE later.

@m-iancu m-iancu moved this from To do to In progress in SQL Support Jun 26, 2019
WesleyW added a commit that referenced this issue Jul 31, 2019
Summary:
Added evolutions file and changed var types to be compatible with ysql
Added workaround to #1199 to `yugabyte-db`

Test Plan: `yugabyte-db start --enable_ui`

Reviewers: bogdan, ram

Reviewed By: ram

Subscribers: yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D6976
@lhotari

This comment has been minimized.

Copy link

commented Sep 16, 2019

Is there a plan to also support the "NOWAIT" and "SKIP LOCKED" options for "SELECT ... FOR UPDATE" that PostgreSQL supports?

@parthibd

This comment has been minimized.

Copy link

commented Sep 26, 2019

Any deadline for supporting this feature?

@ben-pr-p

This comment has been minimized.

Copy link

commented Sep 27, 2019

It looks like lots of migrations runners (http://knexjs.org/, for example) use select * from migrations_table for update in order to prevent concurrent migrations from running, so this is a bigger barrier to Postgres compatibility than it might initially appear unfortunately

@parthibd

This comment has been minimized.

Copy link

commented Sep 28, 2019

The same with liquibase and hibernate. That's the part I'm really stuck on.

hectorgcr added a commit that referenced this issue Oct 16, 2019
…SELECT statements

Summary:
Support row-level explicit locking in `SELECT` statements for YSQL tables.
The new supported cases are:
- `SELECT .. FOR SHARE`
- `SELECT .. FOR KEY SHARE`

Additionally, since the `FOR KEY SHARE` case is also used internally for referential integrity
checking (i.e. foreign keys), this diff also enables DMLs on tables with foreign key references
regardless of isolation level (previously only worked in `SERIALIZABLE` isolation).

Implementation is done by passing the row mark as flag in the DocDB read request and taking
appropriate lock for those requests that have that corresponding row mark.

Limitations (that will be addressed in follow-up commits):
- The `FOR UPDATE` and `FOR NO KEY UPDATE` are not yet supported
- The locking for `FOR KEY SHARE` is stronger than strictly required (strong read lock instead of
   weak read lock). This is still correct but less efficient (will conflict more often than required).
- We don't support having different row mark settings within one DocDB (read) batch. This is fine
  for now because we do not batch such requests together, but later we might.

The current implementation makes data visible between two transactions when it shouldn't. This is because we are currently not detecting read conflicts. #2523 tracks this issue.

Test Plan:
Java test testForeignKeyConflicts has been converted to two tests: testForeignKeyConflictsWithSerializableIsolation and testForeignKeyConflictsWithSnapshotIsolation/

```postgres=# create table t(k int primary key, s text);
CREATE TABLE
postgres=# select * from t where k = 1 for key share;
 k | s
---+---
(0 rows)

postgres=# insert into t(k, s) values (1, 'a');
INSERT 0 1
postgres=# select * from t where k = 1 for key share;
 k | s
---+---
 1 | a
(1 row)

postgres=# select * from t for key share;
 k | s
---+---
 1 | a
(1 row)
```

```postgres=# create table t(k int primary key, s text);
CREATE TABLE
postgres=#  insert into t(k, s) values (1, 'a');
INSERT 0 1
postgres=# begin;
BEGIN
postgres=# insert into t(k,s) values (2, 'b');
INSERT 0 1
postgres=# select * from t for key share;
 k | s
---+---
 1 | a
 2 | b
(2 rows)

postgres=# select * from t where k = 1 for key share;
 k | s
---+---
 1 | a
(1 row)

postgres=# abort;
ROLLBACK
postgres=# select * from t where k = 1 for key share;
 k | s
---+---
 1 | a
(1 row)

postgres=# select * from t for key share;
 k | s
---+---
 1 | a
(1 row)
```

```
postgres=# create table t(id int primary key, s text);
CREATE TABLE
postgres=# create table s(id int primary key , t_id int references t(id));
CREATE TABLE
postgres=# insert into t(id, s) values (1, 'a');
INSERT 0 1
postgres=# insert into t(id, s) values (2, 'b');
INSERT 0 1
postgres=# insert into s(id, t_id) values (100, 1);
INSERT 0 1
postgres=# select s.id, t.s from s, t where s.t_id = t.id;
 id  | s
-----+---
 100 | a
(1 row)

postgres=# begin;
BEGIN
postgres=# delete from t where id = 2;
DELETE 1
postgres=# select * from t;
 id | s
----+---
  1 | a
(1 row)

postgres=# commit;
COMMIT
postgres=# select * from t;
 id | s
----+---
  1 | a
(1 row)
```

Reviewers: mihnea, mikhail

Reviewed By: mikhail

Subscribers: bogdan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D7007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
SQL Support
  
In progress
8 participants
You can’t perform that action at this time.