New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc #2762
Conversation
@@ -136,6 +136,8 @@ start slave; | |||
SHOW TABLES; | |||
|
|||
--disable_connect_log | |||
# IO thread is stopped, stop SQL thread only | |||
--let $rpl_only_running_threads= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using rpl_only_running_threads=1
is fine to keep as is (in the context of your current commit), but for future reference, there is --source include/stop_slave_sql.inc
and stop_slave_io.inc
which would set the behavior expectation more explicitly.
@@ -116,6 +116,9 @@ show status like 'Rpl_semi_sync_master_no_tx'; | |||
--connection server_2 | |||
--eval SET @@GLOBAL.debug_dbug= "$sav_server_2_dbug" | |||
--eval SET @@GLOBAL.rpl_semi_sync_slave_enabled= 0 | |||
--let $slave_io_errno=1595 | |||
--source include/wait_for_slave_io_error.inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rpl_semi_sync_shutdown_await_ack
test will be a little more complicated to fix. If you look at the "driver" test, rpl_semi_sync_shutdown_await_ack.test
, it calls this .inc
file 4 times, each having different expectations for servers 2 and 3 to error or not.. When corrupt_queue_event
, is provided to the server, the IO thread will fail with this error (and it can be either server 2 or 3, depending on which test case is running). If a server doesn't have corrupt_queue_event
I'd imagine this errno would probably be 2003
(lost connection to server) instead. One way to do this would be using SELECT REGEXP
, something like (using server_2
as the example, and note it is untested :)):
# 2003 is the default..
--let $slave_io_errno=2003
# and if corrupt_queue_event is specified, we should await a relay log error instead
if (`SELECT REGEXP("corrupt_queue_event","$server_2_dbug")`)
{
--let $slave_io_errno=1595
}
908589c
to
3ca41ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks like a good fix and cleanup, thanks Anel! See my few notes too.
STOP SLAVE; | ||
--source include/wait_for_slave_to_stop.inc | ||
--let $rpl_only_running_threads= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to remove the above STOP SLAVE
as it is now redundant with the addition of source include/stop_slave.inc
@@ -75,17 +75,26 @@ EOF | |||
--connection server_1 | |||
DROP TABLE t1; | |||
|
|||
# Slaves IO thread will receive the disconnect error when master was shutdown | |||
# so we are allowing error on start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit vague with the following behavior essentially starting twice:
--let rpl_allow_error=1
--source include/wait_for_slave_io_to_start.inc
--source include/start_slave.inc
I think it should be reworded to mention that the IO thread will automatically try to reconnect, and the error is allowed during this time, rather than "on start"
@@ -41,7 +41,8 @@ SELECT * FROM t1; | |||
|
|||
# restart replication for the next testcase | |||
stop slave; | |||
--source include/wait_for_slave_to_stop.inc | |||
--let $rpl_only_running_threads= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too, the above stop slave
can be removed
@@ -30,8 +30,9 @@ connection master; | |||
--let $rpl_server_number=1 | |||
source include/rpl_stop_server.inc; | |||
|
|||
--connection slave | |||
--source include/stop_slave.inc | |||
# After stoping the master, slave receives the disconnect error (2003) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ensure this and wait to see the error before going on, and also the following 2 line comment should be deleted.
--source include/start_slave.inc | ||
--source include/wait_for_slave_sql_to_start.inc | ||
--let rpl_allow_error=1 | ||
--source include/wait_for_slave_io_to_start.inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the above change to await the error, I imagine the IO thread will have stopped and that it should be re-started manually, i.e. it should have missed the window for automatic restart.
4412b1d
to
1db9865
Compare
1db9865
to
c5f73dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment about a quick syntax fix-up, otherwise looks good. Thanks @an3l!
sql/slave.cc
Outdated
@@ -4775,7 +4775,11 @@ pthread_handler_t handle_slave_io(void *arg) | |||
goto err; | |||
goto connected; | |||
} | |||
DBUG_EXECUTE_IF("fail_com_register_slave", goto err;); | |||
DBUG_EXECUTE_IF("fail_com_register_slave", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The braces should be on new lines.
c5f73dd
to
575a907
Compare
575a907
to
3fae200
Compare
…t_for_slave_param.inc - Reviewer: <knielsen@knielsen-hq.org> <brandon.nesterenko@mariadb.com> <andrei.elkin@mariadb.com>
3fae200
to
494a4d8
Compare
No description provided.