[feature] add TokuDB row operations #1

BohuTANG · 2016-10-16T12:53:22Z

[Summary]
add insert/read/update/delete operations per second to measure the performance.

[Usage]
mysql>SHOW ENGINE TOKUDB STATUS;

....
Type: TokuDB
Name: row: operation status
Status: Number of rows inserted 3905, updated 3350, deleted 1016, read 3461
657.09 inserts/s, 322.88 updates/s, 876.90 deletes/s, 1230.48 reads/s

[Test Plan]
mtr with ASAN

[Reviewed By]
N/A

The text was updated successfully, but these errors were encountered:

[Summary] add insert/read/update/delete operations per second to measure the performance. [Usage] mysql>SHOW ENGINE TOKUDB STATUS; .... Type: TokuDB Name: row: operation status Status: Number of rows inserted 3905, updated 3350, deleted 1016, read 3461 657.09 inserts/s, 322.88 updates/s, 876.90 deletes/s, 1230.48 reads/s [Test Plan] ./mtr --suite=tokudb suite/tokudb/t/row_operations.test [OK] ASAN [OK] [Reviewed By] N/A

Use ha_rocksdb::open() should check table->found_next_number_field, not table->next_number_field (like InnoDB does)

Two testcases are empty because RocksDB-SE doesn't support the feature: -autoincrement.test -autoinc_secondary.test

Summary: This adds two features in RocksDB. 1. Supporting START TRANSACTION WITH CONSISTENT SNAPSHOT 2. Getting current binlog position in addition to #1. With these features, mysqldump can take consistent logical backup. The second feature is done by START TRANSACTION WITH CONSISTENT ROCKSDB SNAPSHOT. This is Facebook's extension, and it works like existing START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT. This diff changed some existing codebase/behaviors. - Original Facebook-MySQL always started InnoDB transaction regardless of engine clause. For example, START TRANSACTION WITH CONSISTENT MYISAM SNAPSHOT was accepted but it actually started InnoDB transaction, not MyISAM. This patch does not allow setting engine that does not support consistent snapshot. mysql> start transaction with consistent myisam snapshot; ERROR 1105 (HY000): Consistent Snapshot is not supported for this engine Currently only InnoDB and RocksDB support consistent snapshot. To check engines, I modified sql/sql_yacc.yy, trans_begin() and ha_start_consistent_snapshot() to pass handlerton. - Changed constant name from MYSQL_START_TRANS_OPT_WITH_CONS_INNODB_SNAPSHOT to MYSQL_START_TRANS_OPT_WITH_CONS_ENGINE_SNAPSHOT, because it's no longer InnoDB dependent. - When not setting engine, START TRANSACTION WITH CONSISTENT SNAPSHOT takes both InnoDB and RocksDB snapshots, and both InnoDB and RocksDB participate in transaction. When executing COMMIT, both InnoDB and RocksDB modifications are committed. Remember that XA is not supported yet, so mixing engines is not recommended anyway. - When setting engine, START TRANSACTION WITH CONSISTENT.. takes snapshot for the specified engine only. But it starts both InnoDB and RocksDB transactions. Test Plan: mtr --suite=rocksdb,rocksdb_rpl, --repeat=3 Reviewers: hermanlee4, jonahcohen, jtolmer, tian.xia, maykov, spetrunia Reviewed By: spetrunia Subscribers: steaphan Differential Revision: https://reviews.facebook.net/D32355

Summary: Don't update secondary indexes that do not have columns that are included in table->write_map at query start. Test Plan: mtr Reviewers: maykov, yoshinorim, hermanlee4, jonahcohen Reviewed By: jonahcohen Differential Revision: https://reviews.facebook.net/D33471

Summary: Inside index_next_same() call, we should 1. first check whether the record matches the index lookup prefix, 2. then check pushed index condition. If we try to check #2 without checking #1 first, we may walk off the index lookup prefix and scan till the end of the index. Test Plan: Run mtr Reviewers: hermanlee4, maykov, jtolmer, yoshinorim Reviewed By: yoshinorim Differential Revision: https://reviews.facebook.net/D38769

Summary: MyRocks had two bugs when calculating index scan cost. 1. block_size was not considered. This made covering index scan cost (both full index scan and range scan) much higher 2. ha_rocksdb::records_in_range() may have estimated more rows than the estimated number of rows in the table. This was wrong, and MySQL optimizer decided to use full index scan even though range scan was more efficient. This diff fixes #1 by setting stats.block_size at ha_rocksdb::open(), and fixes #2 by reducing the number of estimated rows if it was larger than stats.records. Test Plan: mtr, updating some affected test cases, and new test case rocksdb_range2 Reviewers: hermanlee4, jkedgar, spetrunia Reviewed By: spetrunia Subscribers: MarkCallaghan, webscalesql-eng Differential Revision: https://reviews.facebook.net/D55869

Summary: In the xa transation 'XA END' phase(thd_sql_command is SQLCOM_END), TokuDB slave will create both transaction for trx->sp_level and trx->stmt, this will cause the toku_xids_can_create_child abort since the trx->sp_level->xids is 0x00. How to reproduce: With tokudb_debug=32, do the queries on master: create table t1(a int)engine=tokudb; xa start 'x1'; insert into t1 values(1); xa end 'x1'; xa prepare 'x1'; xa commit 'x1'; xa start 'x2'; insert into t1 values(2); xa end 'x2'; xa prepare 'x2'; xa commit 'x2'; Slave debug info: xa start 'x1'; insert into t1 values(1); xa end 'x1'; xa prepare 'x1'; xa commit 'x1'; 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6533 ha_tokudb::external_lock trx (nil) (nil) (nil) (nil) 0 0 2123 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn (nil) 0x7ff2d44a3000 67108864 r=0 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6426 ha_tokudb::create_txn created master 0x7ff2d44a3000 2123 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn 0x7ff2d44a3000 0x7ff2d44a3100 1 r=0 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6468 ha_tokudb::create_txn created stmt 0x7ff2d44a3000 sp_level 0x7ff2d44a3100 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7ff2d44a3100 2123 /u01/tokudb/storage/tokudb/hatoku_hton.cc:942 tokudb_commit commit trx 0 txn 0x7ff2d44a3100 syncflag 512 xa start 'x2'; insert into t1 values(2); xa end 'x2'; xa prepare 'x2'; xa commit 'x2'; 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6533 ha_tokudb::external_lock trx 0x7ff2d44a3000 (nil) 0x7ff2d44a3000 (nil) 0 0 2123 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn 0x7ff2d44a3000 0x7ff2d44a3000 1 r=0 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6468 ha_tokudb::create_txn created stmt 0x7ff2d44a3000 sp_level 0x7ff2d44a3000 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7ff2d44a3000 2017-12-24T08:36:45.347405Z 11 [ERROR] TokuDB: toku_db_put: Transaction cannot do work when child exists 2017-12-24T08:36:45.347444Z 11 [Warning] Slave: Got error 22 from storage engine Error_code: 1030 2017-12-24T08:36:45.347448Z 11 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with 'SLAVE START'. We stopped at log 'mysql-bin.000001' position 1007 2123 /u01/tokudb/storage/tokudb/hatoku_hton.cc:972 tokudb_rollback rollback 0 txn 0x7ff2d44a3000 Segmentation fault (core dumped) This crash caused by the parent->xid is 0x00. The core statck info: (gdb) bt #0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62 #1 0x0000000000f6b647 in my_write_core (sig=sig@entry=11) at /u01/tokudb/mysys/stacktrace.c:249 #2 0x000000000086b945 in handle_fatal_signal (sig=11) at /u01/tokudb/sql/signal_handler.cc:223 #3 <signal handler called> #4 toku_xids_can_create_child (xids=0x0) at /u01/tokudb/storage/tokudb/PerconaFT/ft/txn/xids.cc:93 #5 0x000000000080531f in toku_txn_begin_with_xid (parent=0x7f0bf501c280, txnp=0x7f0bf50a3490, logger=0x7f0c415e66c0, xid=..., snapshot_type=TXN_SNAPSHOT_CHILD, container_db_txn=0x7f0bf50a3400, for_recovery=false, read_only=false) at /u01/tokudb/storage/tokudb/PerconaFT/ft/txn/txn.cc:137 #6 0x00000000007aa6a2 in toku_txn_begin (env=0x7f0c819fde00, stxn=0x7f0bf50a3300, txn=0x7f0bf500dca8, flags=<optimized out>) at /u01/tokudb/storage/tokudb/PerconaFT/src/ydb_txn.cc:579 #7 0x0000000000f99323 in txn_begin (thd=0x7f0bf504bfc0, flags=1, txn=0x7f0bf500dca8, parent=0x7f0bf50a3300, env=<optimized out>) at /u01/tokudb/storage/tokudb/tokudb_txn.h:116 #8 ha_tokudb::create_txn (this=0x7f0bf50c8830, thd=0x7f0bf504bfc0, trx=0x7f0bf500dca0) at /u01/tokudb/storage/tokudb/ha_tokudb.cc:6458 #9 0x0000000000fa48f9 in ha_tokudb::external_lock (this=0x7f0bf50c8830, thd=0x7f0bf504bfc0, lock_type=1) at /u01/tokudb/storage/tokudb/ha_tokudb.cc:6544 #10 0x00000000008d46eb in handler::ha_external_lock (this=0x7f0bf50c8830, thd=thd@entry=0x7f0bf504bfc0, lock_type=lock_type@entry=1) at /u01/tokudb/sql/handler.cc:8352 #11 0x0000000000e4f3b4 in lock_external (count=1, tables=0x7f0bf5050688, thd=0x7f0bf504bfc0) at /u01/tokudb/sql/lock.cc:389 #12 mysql_lock_tables (thd=thd@entry=0x7f0bf504bfc0, tables=<optimized out>, count=<optimized out>, flags=0) at /u01/tokudb/sql/lock.cc:325 #13 0x0000000000cd0b6d in lock_tables (thd=thd@entry=0x7f0bf504bfc0, tables=0x7f0bf4d11020, count=<optimized out>, flags=flags@entry=0) at /u01/tokudb/sql/sql_base.cc:6705 #14 0x0000000000cd61f2 in open_and_lock_tables (thd=0x7f0bf504bfc0, tables=0x7f0bf4d11020, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f0c89629680) at /u01/tokudb/sql/sql_base.cc:6523 percona#15 0x0000000000ee09eb in open_and_lock_tables (flags=0, tables=<optimized out>, thd=<optimized out>) at /u01/tokudb/sql/sql_base.h:484 percona#16 Rows_log_event::do_apply_event (this=0x7f0bf50ab4a0, rli=0x7f0c87762800) at /u01/tokudb/sql/log_event.cc:10911 percona#17 0x0000000000ed71c0 in Log_event::apply_event (this=this@entry=0x7f0bf50ab4a0, rli=rli@entry=0x7f0c87762800) at /u01/tokudb/sql/log_event.cc:3329 percona#18 0x0000000000f1d233 in apply_event_and_update_pos (ptr_ev=ptr_ev@entry=0x7f0c89629940, thd=thd@entry=0x7f0bf504bfc0, rli=rli@entry=0x7f0c87762800) at /u01/tokudb/sql/rpl_slave.cc:4761 percona#19 0x0000000000f280a8 in exec_relay_log_event (rli=0x7f0c87762800, thd=0x7f0bf504bfc0) at /u01/tokudb/sql/rpl_slave.cc:5276 percona#20 handle_slave_sql (arg=<optimized out>) at /u01/tokudb/sql/rpl_slave.cc:7491 percona#21 0x00000000013c6184 in pfs_spawn_thread (arg=0x7f0bf5bea820) at /u01/tokudb/storage/perfschema/pfs.cc:2185 percona#22 0x00007f0c885126ba in start_thread (arg=0x7f0c8962a700) at pthread_create.c:333 percona#23 0x00007f0c87d293dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb) f 10 #10 0x00000000008d46eb in handler::ha_external_lock (this=0x7f0bf50c8830, thd=thd@entry=0x7f0bf504bfc0, lock_type=lock_type@entry=1) at /u01/tokudb/sql/handler.cc:8352 8352 /u01/tokudb/sql/handler.cc: No such file or directory. (gdb) p thd->lex->sql_command = SQLCOM_END With the fixed patch, the debug info is: xa start 'x1'; insert into t1 values(1); xa end 'x1'; xa prepare 'x1'; xa commit 'x1'; 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6534 ha_tokudb::external_lock trx (nil) (nil) (nil) (nil) 0 0 24111 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn (nil) 0x7f4aba689000 67108864 r=0 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6469 ha_tokudb::create_txn created stmt (nil) sp_level 0x7f4aba689000 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7f4aba689000 24111 /u01/tokudb/storage/tokudb/hatoku_hton.cc:942 tokudb_commit commit trx 0 txn 0x7f4aba689000 syncflag 512 xa start 'x2'; insert into t1 values(2); xa end 'x2'; xa prepare 'x2'; xa commit 'x2'; 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6534 ha_tokudb::external_lock trx (nil) (nil) (nil) (nil) 0 0 24111 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn (nil) 0x7f4aba689000 67108864 r=0 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6469 ha_tokudb::create_txn created stmt (nil) sp_level 0x7f4aba689000 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7f4aba689000 24111 /u01/tokudb/storage/tokudb/hatoku_hton.cc:942 tokudb_commit commit trx 0 txn 0x7f4aba689000 syncflag 512 Test: mtr --suite=tokudb xa Reviewed by: Rik

…ecuting global constructors (before main gets called). the assert catches global mutex initialization which seems to be problematic. since tokudb has a global mutex (open_tables_mutex) and tokudb is statically linked into mysqld (plugin type mandatory), the assert fires. we could: 1. remove the assert, or 2. rewrite tokudb to remove the open_tables_mutex, or 3. compile tokudb into a shared library. this commit compiles tokudb into a shared library for debug builds. GNU gdb (Ubuntu 8.0.1-0ubuntu1) 8.0.1 Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from bin/mysqld...done. [New LWP 24606] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `bin/mysqld --initialize'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007f4498d8df5d in __GI_abort () at abort.c:90 #2 0x00007f4498d83f17 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x5581f567ed23 "safe_mutex_inited", file=file@entry=0x5581f567ecf0 "/home/rfp/projects/xelabs-server/mysys/thr_mutex.c", line=line@entry=39, function=function@entry=0x5581f567f040 <__PRETTY_FUNCTION__.6977> "safe_mutex_init") at assert.c:92 #3 0x00007f4498d83fc2 in __GI___assert_fail (assertion=0x5581f567ed23 "safe_mutex_inited", file=0x5581f567ecf0 "/home/rfp/projects/xelabs-server/mysys/thr_mutex.c", line=39, function=0x5581f567f040 <__PRETTY_FUNCTION__.6977> "safe_mutex_init") at assert.c:101 #4 0x00005581f4ab1a40 in safe_mutex_init (mp=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, attr=0x5581f6336550 <my_fast_mutexattr>, file=0x5581f57ac188 "/home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h", line=207) at /home/rfp/projects/xelabs-server/mysys/thr_mutex.c:39 #5 0x00005581f500aa9f in my_mutex_init (mp=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, attr=0x5581f6336550 <my_fast_mutexattr>, file=0x5581f57ac188 "/home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h", line=207) at /home/rfp/projects/xelabs-server/include/thr_mutex.h:167 #6 0x00005581f500ac25 in inline_mysql_mutex_init (key=4294967295, that=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, attr=0x5581f6336550 <my_fast_mutexattr>, src_file=0x5581f57ac188 "/home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h", src_line=207) at /home/rfp/projects/xelabs-server/include/mysql/psi/mysql_thread.h:668 #7 0x00005581f5043ee5 in tokudb::thread::mutex_t::mutex_t (this=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, key=4294967295) at /home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h:207 #8 0x00005581f5043e68 in tokudb::thread::mutex_t::mutex_t (this=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>) at /home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h:45 #9 0x00005581f50433a4 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /home/rfp/projects/xelabs-server/storage/tokudb/ha_tokudb.cc:38 #10 0x00005581f5043936 in _GLOBAL__sub_I__ZN6tokudb8metadata4readEP9__toku_dbP13__toku_db_txnyPvmPm () at /home/rfp/projects/xelabs-server/storage/tokudb/ha_tokudb.cc:9061 #11 0x00005581f529aefd in __libc_csu_init () #12 0x00007f4498d76150 in __libc_start_main (main=0x5581f4025aea <main(int, char**)>, argc=2, argv=0x7ffe9dcd7118, init=0x5581f529aeb0 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe9dcd7108) at ../csu/libc-start.c:264 #13 0x00005581f4025a0a in _start () (gdb) q

TokuDB will be crashed during shutdown due to PFS key double-free. (gdb) bt #0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62 #1 0x0000000000f6b3e7 in my_write_core (sig=sig@entry=11) at /u01/tokudb/mysys/stacktrace.c:249 #2 0x000000000086b6e5 in handle_fatal_signal (sig=11) at /u01/tokudb/sql/signal_handler.cc:223 #3 <signal handler called> #4 destroy_mutex (pfs=0x7f8fb71c9900) at /u01/tokudb/storage/perfschema/pfs_instr.cc:327 #5 0x00000000013c574a in pfs_destroy_mutex_v1 (mutex=<optimized out>) at /u01/tokudb/storage/perfschema/pfs.cc:1833 #6 0x0000000000fc350a in inline_mysql_mutex_destroy (that=0x1f84ea0 <tokudb_map_mutex>) at /u01/tokudb/include/mysql/psi/mysql_thread.h:681 #7 tokudb::thread::mutex_t::~mutex_t (this=0x1f84ea0 <tokudb_map_mutex>, __in_chrg=<optimized out>) at /u01/tokudb/storage/tokudb/tokudb_thread.h:214 #8 0x00007f8fbb38cff8 in _run_exit_handlers (status=status@entry=0, listp=0x7f8fbb7175f8 <_exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82 #9 0x00007f8fbb38d045 in __GI_exit (status=status@entry=0) at exit.c:104 #10 0x000000000085b7b5 in mysqld_exit (exit_code=exit_code@entry=0) at /u01/tokudb/sql/mysqld.cc:1205 #11 0x0000000000865fa6 in mysqld_main (argc=46, argv=0x7f8fbaeb3088) at /u01/tokudb/sql/mysqld.cc:5430 #12 0x00007f8fbb373830 in __libc_start_main (main=0x78d700 <main(int, char**)>, argc=10, argv=0x7ffd36ab1a28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd36ab1a18) at ../csu/libc-start.c:291 #13 0x00000000007a7b79 in _start () And the AddressSanitizer errors: ==27219==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f009b12d118 at pc 0x00000265d80b bp 0x7ffd3bb7ffb0 sp 0x7ffd3bb7ffa0 READ of size 8 at 0x7f009b12d118 thread T0 #0 0x265d80a in destroy_mutex(PFS_mutex*) /u01/tokudb/storage/perfschema/pfs_instr.cc:323 #1 0x1cfaef3 in inline_mysql_mutex_destroy /u01/tokudb/include/mysql/psi/mysql_thread.h:681 #2 0x1cfaef3 in tokudb::thread::mutex_t::~mutex_t() /u01/tokudb/storage/tokudb/tokudb_thread.h:214 #3 0x7f009d512ff7 (/lib/x86_64-linux-gnu/libc.so.6+0x39ff7) #4 0x7f009d513044 in exit (/lib/x86_64-linux-gnu/libc.so.6+0x3a044) #5 0x9a8239 in mysqld_exit /u01/tokudb/sql/mysqld.cc:1205 #6 0x9b210f in unireg_abort /u01/tokudb/sql/mysqld.cc:1175 #7 0x9b629e in init_server_components /u01/tokudb/sql/mysqld.cc:4509 #8 0x9b8aca in mysqld_main(int, char**) /u01/tokudb/sql/mysqld.cc:5001 #9 0x7f009d4f982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f) #10 0x7e8308 in _start (/home/ubuntu/mysql_20161216/bin/mysqld+0x7e8308)

Group Replication does implement conflict detection on multi-primary to avoid write errors on parallel operations. The conflict detection is also engaged in single-primary mode on the particular case of primary change and the new primary still has a backlog to apply. Until the backlog is flushed, conflict detection is enabled to prevent write errors between the backlog and incoming transactions. The conflict detection data, which we name certification info, is also used to detected dependencies between accepted transactions, dependencies which will rule the transactions schedule on the parallel applier. In order to avoid that the certification info grows forever, periodically all members exchange their GTID_EXECUTED set, which full intersection will provide the set of transactions that are applied on all members. Future transactions cannot conflict with this set since all members are operating on top of it, so we can safely remove all write-sets from the certification info that do belong to those transactions. More details at WL#6833: Group Replication: Read-set free Certification Module (DBSM Snapshot Isolation). Though a corner case was found on which the garbage collection was purging more data than it should. The scenario is: 1) Group with 2 members; 2) Member1 executes: CREATE TABLE t1(a INT, b INT, PRIMARY KEY(a)); INSERT INTO t1 VALUE(1, 1); Both members have a GTID_EXECUTED= UUID:1-4 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4 3) member1 executes TA UPDATE t1 SET b=10 WHERE a=1; and blocks immediately before send the transaction to the group. This transaction has snapshot_version: UUID:1-4 4) member2 executes TB UPDATE t1 SET b=10 WHERE a=1; This transaction has snapshot_version: UUID:1-4 It goes through the complete patch and it is committed. This transaction has GTID: UUID:1000002 Both members have a GTID_EXECUTED= UUID:1-4:1000002 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 5) member2 becomes extremely slow in processing transactions, we simulate that by holding the transaction queue to the GR pipeline. Transaction delivery is still working, but the transaction will be block before certification. 6) member1 is able to send its TA transaction, lets recall that this transaction has snapshot_version: UUID:1-4. On conflict detection on member1, it will conflict with xelabs#1, since this snapshot_version does not contain the snapshot_version of xelabs#1, that is TA was executed on a previous version than TB. On member2 the transaction will be delivered and will be put on hold before conflict detection. 7) meanwhile the certification info garbage collection kicks in. Both members have a GTID_EXECUTED= UUID:1-4:1000002 Its intersection is UUID:1-4:1000002 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 The condition to purge write-sets is: snapshot_version.is_subset(intersection) We have "UUID:1-4:1000002".is_subset("UUID:1-4:1000002) which is true, so we remove xelabs#1. Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) <empty> 8) member2 gets back to normal, we release transaction TA, lets recall that this transaction has snapshot_version: UUID:1-4. On conflict detection, since the certification info is empty, the transaction will be allowed to proceed, which is incorrect, it must rollback (like on member1) since it conflicts with TB. The problem it is on certification garbage collection, more precisely on the condition used to purge data, we cannot leave the certification info empty otherwise this situation can happen. The condition must be changed to snapshot_version.is_subset_not_equals(intersection) which will always leave a placeholder to detect delayed conflicting transaction. So a trace of the solution is (starting on step 7): 7) meanwhile the certification info garbage collection kicks in. Both members have a GTID_EXECUTED= UUID:1-4:1000002 Its intersection is UUID:1-4:1000002 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 The condition to purge write-sets is: snapshot_version.is_subset_not_equals(intersection) We have "UUID:1-4:1000002".is_subset_not_equals("UUID:1-4:1000002) which is false, so we do not remove xelabs#1. Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 8) member2 gets back to normal, we release transaction TA, lets recall that this transaction has snapshot_version: UUID:1-4. On conflict detection on member2, it will conflict with xelabs#1, since this snapshot_version does not contain the snapshot_version of xelabs#1, that is TA was executed on a previous version than TB. This is the same scenario that we see on this bug, though here the pipeline is being blocked by the distributed recovery procedure, that is, while the joining member is applying the missing data through the recovery channel, the incoming data is being queued. Meanwhile the certification info garbage collection kicks in and purges more data that it should, the result it is that conflicts are not being detected.

Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c

…E TO A SERVER Problem ======================================================================== Running the GCS tests with ASAN seldomly reports a user-after-free of the server reference that the acceptor_learner_task uses. Here is an excerpt of ASAN's output: ==43936==ERROR: AddressSanitizer: heap-use-after-free on address 0x63100021c840 at pc 0x000000530ff8 bp 0x7fc0427e8530 sp 0x7fc0427e8520 WRITE of size 8 at 0x63100021c840 thread T3 #0 0x530ff7 in server_detected /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:962 #1 0x533814 in buffered_read_bytes /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1249 #2 0x5481af in buffered_read_msg /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1399 #3 0x51e171 in acceptor_learner_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4690 #4 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #5 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #6 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #7 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 #8 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) #9 0x7fc047ff2bfc in __clone (/lib64/libc.so.6+0xfebfc) 0x63100021c840 is located 64 bytes inside of 65688-byte region [0x63100021c800,0x63100022c898) freed by thread T3 here: #0 0x7fc04a5d7508 in __interceptor_free (/lib64/libasan.so.4+0xde508) #1 0x52cf86 in freesrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:836 #2 0x52ea78 in srv_unref /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:868 #3 0x524c30 in reply_handler_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4914 #4 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #5 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #6 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #7 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 #8 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) previously allocated by thread T3 here: #0 0x7fc04a5d7a88 in __interceptor_calloc (/lib64/libasan.so.4+0xdea88) #1 0x543604 in mksrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:721 #2 0x543b4c in addsrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:755 #3 0x54af61 in update_servers /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1747 #4 0x501082 in site_install_action /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1572 #5 0x55447c in import_config /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/site_def.c:486 #6 0x506dfc in handle_x_snapshot /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:5257 #7 0x50c444 in xcom_fsm /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:5325 #8 0x516c36 in dispatch_op /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4510 #9 0x521997 in acceptor_learner_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4772 #10 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #11 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #12 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #13 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 #14 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) Analysis ======================================================================== The server structure is reference counted by the associated sender_task and reply_handler_task. When they finish, they unreference the server, which leads to its memory being freed. However, the acceptor_learner_task keeps a "naked" reference to the server structure. Under the right ordering of operations, i.e. the sender_task and reply_handler_task terminating after the acceptor_learner_task acquires, but before it uses, the reference to the server structure, leads to the acceptor_learner_task accessing the server structure after it has been freed. Solution ======================================================================== Let the acceptor_learner_task also reference count the server structure so it is not freed while still in use. Reviewed-by: André Negrão <andre.negrao@oracle.com> Reviewed-by: Venkatesh Venugopal <venkatesh.venugopal@oracle.com> RB: 21209

… keyring Issue: when the server is started without a keyring, the redo log encryption checks were never performed, redo log encryption wasn't initialized, but the setting was left on, misleading the user. Later operations which triggered this check could possibly fail. Issue #2: redo log encryption didn't work properly when it was turned on using the command line: in some cases, the redo log remained unencrypted (until another operation triggered the redo log encryption setup methods to be called again) Both issues were caused by the refactoring of the periodic, once every second encryption checks in upstream, in PS-5189. This commit: * changes the code so redo log encryption routines are called at least twice during every startup with the encryption settings on. This fixes issue #2. * changes the code so redo log encryption routines are called at least once during every startup, even without a keyring, or a read-only server: this fixes issues #1 * adds a test ensuring that in an invalid configuration (without a keyring) the the server remains running, but the redo log encryption setting correctly displays the OFF status * adds a test ensuring that when the redo log is correctly configured (using the dynamic variable or a command line parameter) the log is encrypted, and no unencrypted data is present in it. * backports additional redo log related bugfixes/refactorings from 8.0 (92198f4). The original commit contains fixes for both the redo and the undo log, the backport only includes the redo log related parts refactored to match our changes.

BohuTANG changed the title ~~FEATURE: add TokuDB row operations~~ [feature] add TokuDB row operations Oct 16, 2016

BohuTANG closed this as completed Oct 16, 2016

BohuTANG pushed a commit that referenced this issue Dec 31, 2016

Fix Issue #1: AUTO_INCREMENT value doesn't survive server shutdown

74a3ec1

Use ha_rocksdb::open() should check table->found_next_number_field, not table->next_number_field (like InnoDB does)

BohuTANG pushed a commit that referenced this issue Dec 31, 2016

Add remaining tests to mysql-test/suite/rocksdb. This fixes Issue #1

883b152

Two testcases are empty because RocksDB-SE doesn't support the feature: -autoincrement.test -autoinc_secondary.test

BohuTANG pushed a commit that referenced this issue Dec 31, 2016

Fix Issue #1: Enable Row-based binlogging and disable statement-based.

573b18e

BohuTANG pushed a commit that referenced this issue Dec 31, 2016

Fix Issue #1: Enable Row-based binlogging and disable statement-based.

410f5aa

opsnull mentioned this issue Jan 30, 2018

compile error: unrecognized command line option ‘-Wmisleading-indentation’ #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] add TokuDB row operations #1

[feature] add TokuDB row operations #1

BohuTANG commented Oct 16, 2016 •

edited

[feature] add TokuDB row operations #1

[feature] add TokuDB row operations #1

Comments

BohuTANG commented Oct 16, 2016 • edited

BohuTANG commented Oct 16, 2016 •

edited