MySQLOnRocksDB/mysql-5.6
forked from facebook/mysql-5.6

Loading…
support binlog + rocksdb group commit #8
Closed
jonahcohen opened this Issue
· 0 comments
Collaborator
jonahcohen
commented
This was referenced
Closed
|
|
maykov |
Testing the ASAN failure
…
Summary:
I saw this intermittent ASAN failure
==662590== ERROR: AddressSanitizer: SEGV on unknown address 0x000000000098 (pc 0x00000112fc64 sp 0x7f0324c71710 bp 0x7f0324c71760 T21)
AddressSanitizer can not provide additional info.
#0 0x112fc63 in my_pthread_fastmutex_lock /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/mysys/thr_mutex.c:479
#1 0x18fb4b0 in _ZN23Dropped_indices_manager14remove_indicesERKSt13unordered_setIjSt4hashIjESt8equal_toIjESaIjEEP12Dict_manager /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/include/mysql/psi/mysql_thread.h:688
#2 0x18b031a in _Z17drop_index_threadPv /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/storage/rocksdb/ha_rocksdb.cc:4927
#3 0x7f0337d611e8 in _ZN6__asan10AsanThread11ThreadStartEv ??:0
#4 0x7f03376dbfa7 in start_thread ??:0
#5 0x7f0335a5f5bc in __clone ??:0
Thread T21 created by T0 here:
#0 0x7f0337d5121b in pthread_create ??:0
#1 0x18aec6a in _ZL17rocksdb_init_funcPv /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/storage/rocksdb/ha_rocksdb.cc:1765
#2 0x60a039 in _Z24ha_initialize_handlertonP13st_plugin_int /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/handler.cc:673
#3 0xab04e0 in _ZL17plugin_initializeP13st_plugin_int /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/sql_plugin.cc:1137
#4 0xac8fe3 in _Z11plugin_initPiPPci /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/sql_plugin.cc:1431
#5 0x5ef74c in _ZL22init_server_componentsv .cc:5482
#6 0x5f0cce in _Z11mysqld_mainiPPc .cc:6254
#7 0x7f033597defe in __libc_start_main ??:0
#8 0x5d86a8 in _start /home/engshare/third-party2/glibc/2.17/src/glibc-2.17/csu/../sysdeps/x86_64/start.S:123
==662590== ABORTING
It looks like the Dropped_indices_manager mutex is accessed by the thread after the mutex was cleaned up. I'm moving the cleanup of the manager to the thread in hopes that this will fix the failure.
Test Plan: ran rocksdb suite, nothing failed
Reviewers: hermanlee4
Reviewed By: hermanlee4
Differential Revision: https://reviews.facebook.net/D42909 |
eeb5ef2
|
|
|
maykov |
Testing the ASAN failure
…
Summary:
I saw this intermittent ASAN failure
==662590== ERROR: AddressSanitizer: SEGV on unknown address 0x000000000098 (pc 0x00000112fc64 sp 0x7f0324c71710 bp 0x7f0324c71760 T21)
AddressSanitizer can not provide additional info.
#0 0x112fc63 in my_pthread_fastmutex_lock /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/mysys/thr_mutex.c:479
#1 0x18fb4b0 in _ZN23Dropped_indices_manager14remove_indicesERKSt13unordered_setIjSt4hashIjESt8equal_toIjESaIjEEP12Dict_manager /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/include/mysql/psi/mysql_thread.h:688
#2 0x18b031a in _Z17drop_index_threadPv /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/storage/rocksdb/ha_rocksdb.cc:4927
#3 0x7f0337d611e8 in _ZN6__asan10AsanThread11ThreadStartEv ??:0
#4 0x7f03376dbfa7 in start_thread ??:0
#5 0x7f0335a5f5bc in __clone ??:0
Thread T21 created by T0 here:
#0 0x7f0337d5121b in pthread_create ??:0
#1 0x18aec6a in _ZL17rocksdb_init_funcPv /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/storage/rocksdb/ha_rocksdb.cc:1765
#2 0x60a039 in _Z24ha_initialize_handlertonP13st_plugin_int /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/handler.cc:673
#3 0xab04e0 in _ZL17plugin_initializeP13st_plugin_int /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/sql_plugin.cc:1137
#4 0xac8fe3 in _Z11plugin_initPiPPci /data/users/jenkins/workspace/myrocks-prod/BUILD_TYPE/ASan/CLIENT_MODE/Sync/PAGE_SIZE/32/TEST_SET/MixedOther/label/mysql/mysql/sql/sql_plugin.cc:1431
#5 0x5ef74c in _ZL22init_server_componentsv .cc:5482
#6 0x5f0cce in _Z11mysqld_mainiPPc .cc:6254
#7 0x7f033597defe in __libc_start_main ??:0
#8 0x5d86a8 in _start /home/engshare/third-party2/glibc/2.17/src/glibc-2.17/csu/../sysdeps/x86_64/start.S:123
==662590== ABORTING
It looks like the Dropped_indices_manager mutex is accessed by the thread after the mutex was cleaned up. I'm moving the cleanup of the manager to the thread in hopes that this will fix the failure.
Test Plan: ran rocksdb suite, nothing failed
Reviewers: hermanlee4
Reviewed By: hermanlee4
Differential Revision: https://reviews.facebook.net/D42909 |
28c20ef
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
From @mdcallag:
This requires more discussion but many of us are in favor of it.
It might be time to use the binlog as the source of truth to avoid the complexity and inefficiency of keeping RocksDB and the binlog synchronized via internal XA. There are two modes for this. The first mode is durable in which case fsync is done after writing the binlog and RocksDB WAL. The other mode is non-durable in which case fsync might only be done once per second and we rely on lossless semisync to recover. Binlog as source of truth might have been discussed on a MariaDB mail list many years ago - https://lists.launchpad.net/maria-developers/msg01998.html
Some details are at http://yoshinorimatsunobu.blogspot.com/2014/04/semi-synchronous-replication-at-facebook.html
The new protocol will be:
1) write binlog
2) optionally sync binlog
3) optionally wait for semisync ack
3) commit rocksdb - this also persists the GTID within RocksDB for the most recent commit, this also makes changes from the transaction visible to others
4) optionally sync rocksdb WAL
When lossless semisync is used we skip steps 2 and 4. When lossless semisync is not used we do step 2 and skip 3. Step 4 is optional. Recovery in this case is done by:
1) query RocksDB to determine GTID of last commit it has
2) extract/replay transactions from binlog >= GTID from previous step
When running in non durable mode, then on a crash one of the following is true where the relation describes which one has more commits:
1) rocksdb > binlog
2) binlog > rocksdb
3) rocksdb == binlog
If you know which state the server is in, then you can reach state 3. If in state 1 then append events to the binlog without running them on innodb. If in state 2 then replay events to innodb without recording to binlog. If in state 3 then do nothing. Both RocksDB and the binlog can tell us the last GTID they contain and we can compare that with the binlog archived via lossless semisync to determine the state.