Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

support binlog + rocksdb group commit #8

Closed
jonahcohen opened this Issue · 0 comments

3 participants

@jonahcohen
Collaborator

From @mdcallag:

This requires more discussion but many of us are in favor of it.

It might be time to use the binlog as the source of truth to avoid the complexity and inefficiency of keeping RocksDB and the binlog synchronized via internal XA. There are two modes for this. The first mode is durable in which case fsync is done after writing the binlog and RocksDB WAL. The other mode is non-durable in which case fsync might only be done once per second and we rely on lossless semisync to recover. Binlog as source of truth might have been discussed on a MariaDB mail list many years ago - https://lists.launchpad.net/maria-developers/msg01998.html

Some details are at http://yoshinorimatsunobu.blogspot.com/2014/04/semi-synchronous-replication-at-facebook.html

The new protocol will be:
1) write binlog
2) optionally sync binlog
3) optionally wait for semisync ack
3) commit rocksdb - this also persists the GTID within RocksDB for the most recent commit, this also makes changes from the transaction visible to others
4) optionally sync rocksdb WAL

When lossless semisync is used we skip steps 2 and 4. When lossless semisync is not used we do step 2 and skip 3. Step 4 is optional. Recovery in this case is done by:
1) query RocksDB to determine GTID of last commit it has
2) extract/replay transactions from binlog >= GTID from previous step

When running in non durable mode, then on a crash one of the following is true where the relation describes which one has more commits:
1) rocksdb > binlog
2) binlog > rocksdb
3) rocksdb == binlog
If you know which state the server is in, then you can reach state 3. If in state 1 then append events to the binlog without running them on innodb. If in state 2 then replay events to innodb without recording to binlog. If in state 3 then do nothing. Both RocksDB and the binlog can tell us the last GTID they contain and we can compare that with the binlog archived via lossless semisync to determine the state.

@maykov maykov was assigned by jonahcohen
@maykov maykov was unassigned by jonahcohen
@maykov maykov referenced this issue from a commit
@maykov maykov Testing the ASAN failure
Summary:
I saw this intermittent ASAN failure
==662590== ERROR: AddressSanitizer: SEGV on unknown address 0x000000000098 (pc 0x00000112fc64 sp 0x7f0324c71710 bp 0x7f0324c71760 T21)
AddressSanitizer can not provide additional info.
    #0 0x112fc63 in my_pthread_fastmutex_lock /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/mysys/thr_mutex.c:479
    #1 0x18fb4b0 in _ZN23Dropped_indices_manager14remove_indicesERKSt13unordered_setIjSt4hashIjESt8equal_toIjESaIjEEP12Dict_manager /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/include/mysql/psi/mysql_thread.h:688
    #2 0x18b031a in _Z17drop_index_threadPv /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/storage/rocksdb/ha_rocksdb.cc:4927
    #3 0x7f0337d611e8 in _ZN6__asan10AsanThread11ThreadStartEv ??:0
    #4 0x7f03376dbfa7 in start_thread ??:0
    #5 0x7f0335a5f5bc in __clone ??:0
Thread T21 created by T0 here:
    #0 0x7f0337d5121b in pthread_create ??:0
    #1 0x18aec6a in _ZL17rocksdb_init_funcPv /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/storage/rocksdb/ha_rocksdb.cc:1765
    #2 0x60a039 in _Z24ha_initialize_handlertonP13st_plugin_int /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/handler.cc:673
    #3 0xab04e0 in _ZL17plugin_initializeP13st_plugin_int /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/sql_plugin.cc:1137
    #4 0xac8fe3 in _Z11plugin_initPiPPci /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/sql_plugin.cc:1431
    #5 0x5ef74c in _ZL22init_server_componentsv .cc:5482
    #6 0x5f0cce in _Z11mysqld_mainiPPc .cc:6254
    #7 0x7f033597defe in __libc_start_main ??:0
    #8 0x5d86a8 in _start /home/engshare/third-party2/glibc/2.17/src/glibc-2.17/csu/../sysdeps/x86_64/start.S:123
==662590== ABORTING
It looks like the Dropped_indices_manager mutex is accessed by the thread after the mutex was cleaned up. I'm moving the cleanup of the manager to the thread in hopes that this will fix the failure.

Test Plan: ran rocksdb suite, nothing failed

Reviewers: hermanlee4

Reviewed By: hermanlee4

Differential Revision: https://reviews.facebook.net/D42909
eeb5ef2
@maykov maykov referenced this issue from a commit
@maykov maykov Testing the ASAN failure
Summary:
I saw this intermittent ASAN failure
==662590== ERROR: AddressSanitizer: SEGV on unknown address 0x000000000098 (pc 0x00000112fc64 sp 0x7f0324c71710 bp 0x7f0324c71760 T21)
AddressSanitizer can not provide additional info.
    #0 0x112fc63 in my_pthread_fastmutex_lock /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/mysys/thr_mutex.c:479
    #1 0x18fb4b0 in _ZN23Dropped_indices_manager14remove_indicesERKSt13unordered_setIjSt4hashIjESt8equal_toIjESaIjEEP12Dict_manager /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/include/mysql/psi/mysql_thread.h:688
    #2 0x18b031a in _Z17drop_index_threadPv /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/storage/rocksdb/ha_rocksdb.cc:4927
    #3 0x7f0337d611e8 in _ZN6__asan10AsanThread11ThreadStartEv ??:0
    #4 0x7f03376dbfa7 in start_thread ??:0
    #5 0x7f0335a5f5bc in __clone ??:0
Thread T21 created by T0 here:
    #0 0x7f0337d5121b in pthread_create ??:0
    #1 0x18aec6a in _ZL17rocksdb_init_funcPv /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/storage/rocksdb/ha_rocksdb.cc:1765
    #2 0x60a039 in _Z24ha_initialize_handlertonP13st_plugin_int /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/handler.cc:673
    #3 0xab04e0 in _ZL17plugin_initializeP13st_plugin_int /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/sql_plugin.cc:1137
    #4 0xac8fe3 in _Z11plugin_initPiPPci /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/sql_plugin.cc:1431
    #5 0x5ef74c in _ZL22init_server_componentsv .cc:5482
    #6 0x5f0cce in _Z11mysqld_mainiPPc .cc:6254
    #7 0x7f033597defe in __libc_start_main ??:0
    #8 0x5d86a8 in _start /home/engshare/third-party2/glibc/2.17/src/glibc-2.17/csu/../sysdeps/x86_64/start.S:123
==662590== ABORTING
It looks like the Dropped_indices_manager mutex is accessed by the thread after the mutex was cleaned up. I'm moving the cleanup of the manager to the thread in hopes that this will fix the failure.

Test Plan: ran rocksdb suite, nothing failed

Reviewers: hermanlee4

Reviewed By: hermanlee4

Differential Revision: https://reviews.facebook.net/D42909
28c20ef
@hermanlee hermanlee referenced this issue in facebook/mysql-5.6
Open

support binlog + rocksdb group commit #25

@hermanlee hermanlee closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.