Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

SELECT ... LOCK IN SHARE MODE doesn't work #44

Closed
yoshinorim opened this Issue · 6 comments

2 participants

@yoshinorim
Owner

create table r1 (id1 int, id2 int, value int, primary key (id1, id2)) engine=rocksdb;
insert into r1 values (1,1,1), (2,2,2), (3,3,3), (4,4,4), (5,5,5);

T1:
begin;
update r1 set value=value+1 where id1=2 and id2=2;

T2:
begin;
select * from r1 where id1=2 and id2=2 lock in share mode;
=> not blocked

@spetrunia spetrunia was assigned by yoshinorim
@spetrunia spetrunia added this to the high-pri milestone
@spetrunia
Collaborator

So,

SELECT ... LOCK IN SHARE MODE should place read locks on rows it returns.

(a read lock is compatible with another read lock. write lock is not compatible
with either read or a write lock)

Tasks to do:

  • Lock_table and Row_lock should support read locks
  • ha_rocksdb should recognize LOCK IN SHARE MODE and request read locks.
@spetrunia
Collaborator

Looking at how to change LockTable/Row_locks to support read-write locks.

MySQL has rwlock.h, but in include/atomic/rwlock.h one can find

#define my_atomic_rwlock_init(name)        pthread_mutex_init(& (name)->rw, 0)

that is, these rw locks are actually mutexes.

@spetrunia
Collaborator

On using pthread_rwlock_*

pthread_rwlock_t provides RW-locks.
The API provides functions pthread_rwlock_timedwrlock() and
pthread_rwlock_timedrdlock() which wait for the lock and support timeouts.

One problem is that there is no way for other thread to abort the wait.
Current exclusive-locking code uses {thd_enter_cond(); mysql_cond_timedwait()}
which allows KILL command to abort the wait by signaling the condition.

Another issue is that a user of pthread_rwlock* will need to keep track
of what locks he already has. Man page for pthread_rwlock_timedrdlock() says:

The calling thread may deadlock if at the time the call is made it holds
a write lock on rwlock.

This can be fixed by keeping tracks of the write locks that are acquired by
the current transaction and simply not trying to acquire a read lock if we
already have a write lock on the same row.

Man page for pthread_rwlock_timedwrlock() says:

The calling thread may deadlock if at the time the call is made it holds
the read-write lock.

That is, there seems to be no way to "upgrade" a read lock into a write lock.

@spetrunia
Collaborator

On lock recursivity.

The user can issue

SELECT * FROM t1 WHERE pk=1 FOR UPDATE;
SELECT * FROM t1 WHERE pk=1 LOCK IN SHARE MODE;

in arbitrary order. (Current) exclusive locks allow multiple get_lock() calls which must be followed by multiple release_lock() calls.

Row_lock::owner_data and Row_lock::busy store information about how many times the owner has acquired the lock.

The situation gets more complex when we have multiple readers who may have acquired read locks multiple times, as well as the writer who may also hold read locks.

Possible ways out:
1. Support recursive locks.
2. Demand that the callers don't acquire locks they already have.

== Recursive locks ==
Row_lock should remember who has locked it (and how many times).

Currently Row_lock stores info about the thread that holds the write lock, and how many times it has acquired it. We will also need a list of read lock owners + counts.
(Q: is it ok to have a list or array or any other structure?)

Advantages:

  • Lock subsystem will have simple API.

Disadvantages:

  • Locking Internals will be heavy and complex. If there are a lot of read locks, we will be inefficient.

(however, transaction will do its lock lookups while not holding the mutex. as opposed to scanning the read lock holders which will be done while holding row lock's mutex)

== Non-recursive locks ==
Before a transaction attempts to get a lock, it should check it against the locks it has.

Currently a transaction stores locks it has acquired in an array. The array may have duplicates.

We will need to switch to std::unordered_map (or something similar), and we will also need to keep a list of locks acquired by the current statement (if the statement fails, we will need to release those)

Advantages:

  • Lock_table implementation is simpler.

Disadvantages:

  • Lock_table API will be complex.
  • Transactions will need to maintain hash tables of their locks, which is an extra expense ( number-of-locks-transaction-holds is typically greater than number-of-read-lock-holders-for-a-certain-row).
@spetrunia
Collaborator

Now, implementing a recursive r/w locks with capability to upgrade read lock + write lock. The implementation has a limit of how many read locks can be held for a given row. This limitation can be lifted in the future.

@spetrunia
Collaborator

Implementation: https://reviews.facebook.net/D38265

Limitations:

  • There is max. limit on the number of read locks one can have
  • There is one mysql_cond_t condition, and everyone who wishes to acquire a read or a write lock wait for it. Then, mysql_cond_broadcast() is used to wake up all of the waiters. This may cause unneeded wake-ups.
  • The question of read lockers starving a write locker is left to OS thread scheduling.
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Issue #44: SELECT ... LOCK IN SHARE MODE
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Test Plan: Added unit test, enabled MTR tests

Reviewers: maykov, jtolmer, yoshinorim, hermanlee4

Reviewed By: hermanlee4

Differential Revision: https://reviews.facebook.net/D38265
f270f10
@spetrunia spetrunia closed this
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Issue #44: SELECT ... LOCK IN SHARE MODE
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Test Plan: Added unit test, enabled MTR tests

Reviewers: maykov, jtolmer, yoshinorim, hermanlee4

Reviewed By: hermanlee4

Differential Revision: https://reviews.facebook.net/D38265
cb52d29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.