Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc #2762

Closed
wants to merge 1 commit into from

Conversation

an3l
Copy link
Collaborator

@an3l an3l commented Sep 18, 2023

No description provided.

@an3l an3l added the MariaDB Foundation Pull requests created by MariaDB Foundation label Sep 18, 2023
@an3l an3l added this to the 10.4 milestone Sep 18, 2023
@@ -136,6 +136,8 @@ start slave;
SHOW TABLES;

--disable_connect_log
# IO thread is stopped, stop SQL thread only
--let $rpl_only_running_threads= 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using rpl_only_running_threads=1 is fine to keep as is (in the context of your current commit), but for future reference, there is --source include/stop_slave_sql.inc and stop_slave_io.inc which would set the behavior expectation more explicitly.

@@ -116,6 +116,9 @@ show status like 'Rpl_semi_sync_master_no_tx';
--connection server_2
--eval SET @@GLOBAL.debug_dbug= "$sav_server_2_dbug"
--eval SET @@GLOBAL.rpl_semi_sync_slave_enabled= 0
--let $slave_io_errno=1595
--source include/wait_for_slave_io_error.inc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rpl_semi_sync_shutdown_await_ack test will be a little more complicated to fix. If you look at the "driver" test, rpl_semi_sync_shutdown_await_ack.test, it calls this .inc file 4 times, each having different expectations for servers 2 and 3 to error or not.. When corrupt_queue_event, is provided to the server, the IO thread will fail with this error (and it can be either server 2 or 3, depending on which test case is running). If a server doesn't have corrupt_queue_event I'd imagine this errno would probably be 2003 (lost connection to server) instead. One way to do this would be using SELECT REGEXP, something like (using server_2 as the example, and note it is untested :)):

# 2003 is the default..
--let $slave_io_errno=2003

# and if corrupt_queue_event is specified, we should await a relay log error instead
if (`SELECT REGEXP("corrupt_queue_event","$server_2_dbug")`)
{
--let $slave_io_errno=1595
}

@an3l an3l force-pushed the bb-10.4-anel-fix-rpl_init-assertion branch 2 times, most recently from 908589c to 3ca41ba Compare September 20, 2023 12:42
@an3l an3l changed the title WIP MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc Sep 20, 2023
Copy link
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks like a good fix and cleanup, thanks Anel! See my few notes too.

STOP SLAVE;
--source include/wait_for_slave_to_stop.inc
--let $rpl_only_running_threads= 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to remove the above STOP SLAVE as it is now redundant with the addition of source include/stop_slave.inc

@@ -75,17 +75,26 @@ EOF
--connection server_1
DROP TABLE t1;

# Slaves IO thread will receive the disconnect error when master was shutdown
# so we are allowing error on start.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit vague with the following behavior essentially starting twice:

--let rpl_allow_error=1
--source include/wait_for_slave_io_to_start.inc
--source include/start_slave.inc

I think it should be reworded to mention that the IO thread will automatically try to reconnect, and the error is allowed during this time, rather than "on start"

@@ -41,7 +41,8 @@ SELECT * FROM t1;

# restart replication for the next testcase
stop slave;
--source include/wait_for_slave_to_stop.inc
--let $rpl_only_running_threads= 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, the above stop slave can be removed

@@ -30,8 +30,9 @@ connection master;
--let $rpl_server_number=1
source include/rpl_stop_server.inc;

--connection slave
--source include/stop_slave.inc
# After stoping the master, slave receives the disconnect error (2003)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure this and wait to see the error before going on, and also the following 2 line comment should be deleted.

--source include/start_slave.inc
--source include/wait_for_slave_sql_to_start.inc
--let rpl_allow_error=1
--source include/wait_for_slave_io_to_start.inc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above change to await the error, I imagine the IO thread will have stopped and that it should be re-started manually, i.e. it should have missed the window for automatic restart.

@an3l an3l force-pushed the bb-10.4-anel-fix-rpl_init-assertion branch 3 times, most recently from 4412b1d to 1db9865 Compare September 25, 2023 14:39
@an3l an3l force-pushed the bb-10.4-anel-fix-rpl_init-assertion branch from 1db9865 to c5f73dd Compare September 29, 2023 10:36
Copy link
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment about a quick syntax fix-up, otherwise looks good. Thanks @an3l!

sql/slave.cc Outdated
@@ -4775,7 +4775,11 @@ pthread_handler_t handle_slave_io(void *arg)
goto err;
goto connected;
}
DBUG_EXECUTE_IF("fail_com_register_slave", goto err;);
DBUG_EXECUTE_IF("fail_com_register_slave", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The braces should be on new lines.

@an3l an3l force-pushed the bb-10.4-anel-fix-rpl_init-assertion branch from c5f73dd to 575a907 Compare October 3, 2023 12:26
@an3l an3l force-pushed the bb-10.4-anel-fix-rpl_init-assertion branch from 575a907 to 3fae200 Compare November 13, 2023 10:32
…t_for_slave_param.inc

- Reviewer: <knielsen@knielsen-hq.org>
            <brandon.nesterenko@mariadb.com>
            <andrei.elkin@mariadb.com>
@an3l an3l force-pushed the bb-10.4-anel-fix-rpl_init-assertion branch from 3fae200 to 494a4d8 Compare November 14, 2023 13:32
@an3l an3l closed this Nov 14, 2023
@an3l an3l deleted the bb-10.4-anel-fix-rpl_init-assertion branch November 14, 2023 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MariaDB Foundation Pull requests created by MariaDB Foundation
2 participants