MySQLOnRocksDB/mysql-5.6
forked from facebook/mysql-5.6

Loading…
DML statements over reverse-ordered CFs are very slow after #86. #107
Possible solutions:
Fix facebook/rocksdb#616. This might be complex.
We only care about the scenario where SQL layer does Deletes/Updates on the fly. This means, SQL layer will request to delete the row that was just read. Rdb_transaction could catch the Delete operation and before the iterator is invalidated, advance it so it is no longer pointing at the row that's about to be deleted.
A storage engine may implement Bulk Update/Delete API. This is not a full solution because there are cases where bulk update/delete is not used.
@spetrunia : According to Siying, it is safe to mutable WriteBatchWithIndex while iterating it if using in a right way (the most important thing is to operate on a copy of the key when changing the state). Here is a unit test to verify it -- https://github.com/facebook/rocksdb/blob/3.13.fb/utilities/write_batch_with_index/write_batch_with_index_test.cc#L1247-L1366
@igorcanadi added a test case for mutating WriteBatchWithIndex: https://reviews.facebook.net/D39501
So you should be able to add/remove keys from it. You need to be careful about the way to do it. For example, the way will not work:
rocksdb::Slice rowkey= iter->key();
trx->Delete(rowkey); // (*)
iter->Prev(); // (**)
rowkey is a reference to memory location. In our implementation issuing Delete() with it will cause problem as it changes in the middle of deletion and cause wrong results. But if you do:
std::string rowkey= iter->key();
trx->Delete(rowkey); // (*)
iter->Prev(); // (**)
It will work.
@yoshinorim @siying thanks for clarification. My example was overly simplified, in the actual code MyRocks copies away the key/value it has got from the iterator.
https://reviews.facebook.net/D45873 (Issue #86 patch) is now updated to make use of this
Finally figured out why some DELETE queries got very slow (about 100x slower) after fix for #86.
Consider a query:
EXPLAIN is:
MySQL will use the following algorithm
The table uses reverse column families, so this translates into these RocksDB
calls:
Note the lines () and (*).
include/rocksdb/utilities/transaction.h has this comment:
I assume it refers to this comment in issue #616:
So I implemented 'class Stabilized_iterator', which wraps the iterator returned
by GetIterator(), but keeps itself valid across Put/Merge/Delete calls.
It does so by
This works, but in the above scenario it is very slow.
Here's why. The table is in reverse-ordered CF, so it stores the data in this physical order:
However, DELETE works in the logical order. First it deletes row00, then row01, etc. Eventually, Transaction's WriteBatchWithIndex has:
We read row04. We call "trx->Delete(row04)", and the WriteBatchWithIndex now is:
Then, we call iter->Prev() (line (**)). Stabilized_iterator notes that its underlying iterator is invalidated. In orer to restore it, it calls
This operation finds row04 in the table, but it also sees {kDeletedRecord, row04} in the WriteBatchWithIndex. It advances both of its underlying iterators, until it reaches another-table-row.
Then, Stabilized_iterator calls backend_iter->Prev(). In this call, the iterator walks back through the pairs of row00, row01, ... row04, until it finds row05 in the base table.
This works, but if one deletes N rows then it's O(N^2) operations.