Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle_fatal_signal (sig=6) in gcache::MemStore::discard | gcache/src/gcache_mem_store.hpp:130 #419

Closed
rameshvs02 opened this issue Aug 16, 2016 · 4 comments

Comments

@rameshvs02
Copy link

commented Aug 16, 2016

Found an issue while performing random node recovery testing.

Testcase to reproduce the issue

  1. Start 7 node cluster
  2. Load data using sysbench ( 30 tables * 1000 rows)
  3. Run sysbench read write run in background
  4. Kill 3 random nodes from cluster using kill -9 command
  5. Restart killed nodes
  6. Add new node into the cluster

While adding new node into cluster one of the cluster node is failing with below stack trace.

GDB info

+bt
#0  0x00007f9b64936741 in pthread_kill () from /lib64/libpthread.so.0
#1  0x000000000067b1ec in handle_fatal_signal (sig=6) at /home/galera/mysql-wsrep-5.6.30-25.15/sql/signal_handler.cc:230
#2  <signal handler called>
#3  0x00007f9b635395f7 in raise () from /lib64/libc.so.6
#4  0x00007f9b6353ace8 in abort () from /lib64/libc.so.6
#5  0x00007f9b63579317 in __libc_message () from /lib64/libc.so.6
#6  0x00007f9b63581023 in _int_free () from /lib64/libc.so.6
#7  0x00007f9b48cb276c in gcache::MemStore::discard (this=0x2d45e18, bh=0x7f9b3ed40500) at gcache/src/gcache_mem_store.hpp:130
#8  0x00007f9b48ca5b4c in gcache::GCache::seqno_release (this=0x2d45d48, seqno=seqno@entry=1972) at gcache/src/GCache_seqno.cpp:154
#9  0x00007f9b48da7120 in galera::ServiceThd::thd_func (arg=0x2d46018) at galera/src/galera_service_thd.cpp:76
#10 0x00007f9b64931dc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f9b635fa28d in clone () from /lib64/libc.so.6
(gdb) 
@rameshvs02

This comment has been minimized.

Copy link
Author

commented Aug 16, 2016

We have created one script to test this case. Please execute this script from base directory.
https://github.com/Percona-QA/percona-qa/blob/master/pxc-chaosmonkey-test.sh

@RoelVdP

This comment has been minimized.

Copy link

commented Aug 16, 2016

To ensure you have the same version of the script, use revision Percona-QA/percona-qa@4a4579e

@philip-galera

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2016

Hello,

After running the test I ended up in a situation where there were two mysqlds were sharing the same data directory (nodes No 5 and No 8 are sharing the data directory of Node 5):

32570 pts/3    Sl     0:04 /tmp/mysql-wsrep-5.6.31-25.15-linux-x86_64/bin/mysqld --defaults-file=/tmp/mysql-wsrep-5.6.31-25.15-linux-x86_64/my.cnf --datadir=/tmp/mysql-wsrep-5.6.31-25.15-linux-x86_64/node5 --wsrep_cluster_address=gcomm://127.0.0.1:29108,127.0.0.1:29208,127.0.0.1:29308,127.0.0.1:29408,127.0.0.1:29508,127.0.0.1:29608,127.0.0.1:29708, --wsrep_provider_options=gmcast.listen_addr=tcp://127.0.0.1:29508 --log-error=/tmp/mysql-wsrep-5.6.31-25.15-linux-x86_64/node5/node5.err --socket=/tmp/mysql-wsrep-5.6.31-25.15-linux-x86_64/node5/socket.sock --port=29500

32744 pts/3    Sl     0:02 /tmp/mysql-wsrep-5.6.31-25.15-linux-x86_64/bin/mysqld --defaults-file=/tmp/mysql-wsrep-5.6.31-25.15-linux-x86_64/my.cnf --datadir=/tmp/mysql-wsrep-5.6.31-25.15-linux-x86_64/node5 --wsrep_cluster_address=gcomm://127.0.0.1:29108,127.0.0.1:29208,127.0.0.1:29308,127.0.0.1:29408,127.0.0.1:29508,127.0.0.1:29608,127.0.0.1:29708,127.0.0.1:29808, --wsrep_provider_options=gmcast.listen_addr=tcp://127.0.0.1:29808 --log-error=/tmp/mysql-wsrep-5.6.31-25.15-linux-x86_64/node8/node8.err --socket=/tmp/mysql-wsrep-5.6.31-25.15-linux-x86_64/node8/socket.sock --port=29800

Can you fix the logic in the test and see if you still get an assertion?

@rameshvs02

This comment has been minimized.

Copy link
Author

commented Aug 17, 2016

Hi Philip,

You are right, adding extra node was using existing node's datadir. After fixing it did not get this assertion.

Thank you very much for looking into it.

Thanks,
Ramesh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.