Skip to content
Permalink
Browse files
MDEV-12012/MDEV-11969 Can't remove GTIDs for a stale GTID Domain ID
As reported in MDEV-11969 "there's no way to ditch knowledge" about some
domain that is no longer updated on a server. Besides being of annoyance to
clutter output in DBA console stale domains can prevent the slave
to connect the master as MDEV-12012 witnesses.
What domain is obsolete must be evaluated by the user (DBA) according
to whether the domain info is still relevant and will the domain ever
receive any update.

This patch introduces a method to discard obsolete gtid domains from
the server binlog state. The removal requires no event group from such
domain present in existing binlog files though. If there are any the
containing logs must be first PURGEd in order for

  FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)

succeed. Otherwise the command returns an error.

The list of obsolete domains can be computed through
intersecting two sets - the earliest (first) binlog's Gtid_list
and the current value of @@global.gtid_binlog_state - and extracting
the domain id components from the intersection list items.
The new DELETE_DOMAIN_ID featured FLUSH continues to rotate binlog
omitting the deleted domains from the active binlog file's Gtid_list.
Notice though when the command is ineffective - that none of requested to delete
domain exists in the binlog state - rotation does not occur.

Obsolete domain deletion is not harmful for connected slaves as long
as master side binlog files *purge* is synchronized with FLUSH-DELETE_DOMAIN_ID.
The slaves must have the last event from purged files processed as usual,
in order not to bump later into requesting a gtid from a file which
was already gone.
While the command is not replicated (as ordinary FLUSH BINLOG LOGS is)
slaves, even though having extra domains, won't suffer from reconnection errors
thanks to master-slave gtid connection protocol allowing the master
to be ignorant about a gtid domain.
Should at failover such slave to be promoted into master role it may run
the ex-master's

 FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)

to clean its own binlog state.

NOTES.
  suite/perfschema/r/start_server_low_digest.result
is re-recorded as consequence of internal parser codes changes.
  • Loading branch information
andrelkin committed Nov 15, 2017
1 parent 7e1326c commit aae4932
Show file tree
Hide file tree
Showing 20 changed files with 772 additions and 77 deletions.
@@ -0,0 +1,15 @@
# ==== Purpose ====
#
# Extract Gtid_list info from SHOW BINLOG EVENTS output masking
# non-deterministic fields.
#
# ==== Usage ====
#
# [--let $binlog_file=filename
#
if ($binlog_file)
{
--let $_in_binlog_file=in '$binlog_file'
}
--replace_column 2 # 5 #
--eval show binlog events $_in_binlog_file limit 1,1
@@ -0,0 +1,78 @@
RESET MASTER;
FLUSH BINARY LOGS DELETE_DOMAIN_ID = ();
and the command execution is effective thence rotates binlog as usual
show binary logs;
Log_name File_size
master-bin.000001 #
master-bin.000002 #
Non-existed domain is warned, the command completes without rotation
but with a warning
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (99);
Warnings:
Warning 1982 The gtid domain being deleted ('99') is not in the current binlog state
show binary logs;
Log_name File_size
master-bin.000001 #
master-bin.000002 #
SET @@SESSION.gtid_domain_id=1;
SET @@SESSION.server_id=1;
CREATE TABLE t (a int);
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
ERROR HY000: Could not delete gtid domain. Reason: binlog files may contain gtids from the domain ('1') being deleted. Make sure to first purge those files.
FLUSH BINARY LOGS;
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
ERROR HY000: Could not delete gtid domain. Reason: binlog files may contain gtids from the domain ('1') being deleted. Make sure to first purge those files.
PURGE BINARY LOGS TO 'master-bin.000003';;
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
Gtid_list of the current binlog does not contain '1':
show binlog events in 'master-bin.000004' limit 1,1;
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000004 # Gtid_list 1 # []
But the previous log's Gtid_list may have it which explains a warning from the following command
show binlog events in 'master-bin.000003' limit 1,1;
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000003 # Gtid_list 1 # [1-1-1]
Already deleted domain in Gtid_list of the earliest log is benign
but may cause a warning
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
Warnings:
Warning 1982 The current gtid binlog state is incompatible with a former one missing gtids from the '1-1' domain-server pair which is referred to in the gtid list describing an earlier state. Ignore if the domain ('1') was already explicitly deleted.
Warning 1982 The gtid domain being deleted ('1') is not in the current binlog state
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 0);
ERROR HY000: Could not delete gtid domain. Reason: binlog files may contain gtids from the domain ('1') being deleted. Make sure to first purge those files.
FLUSH BINARY LOGS;
PURGE BINARY LOGS TO 'master-bin.000005';
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 0);
Warnings:
Warning 1982 The gtid domain being deleted ('0') is not in the current binlog state
Gtid_list of the current binlog does not contain 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 0:
show binlog events in 'master-bin.000006' limit 1,1;
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000006 # Gtid_list 1 # []
SET @@SESSION.gtid_domain_id=1;;
SET @@SESSION.server_id=1;
SET @@SESSION.gtid_seq_no=1;
INSERT INTO t SET a=1;
SET @@SESSION.server_id=2;
SET @@SESSION.gtid_seq_no=2;
INSERT INTO t SET a=2;
SET @@SESSION.gtid_domain_id=11;
SET @@SESSION.server_id=11;
SET @@SESSION.gtid_seq_no=11;
INSERT INTO t SET a=11;
SET @gtid_binlog_state_saved=@@GLOBAL.gtid_binlog_state;
FLUSH BINARY LOGS;
SET @@SESSION.gtid_domain_id=11;
SET @@SESSION.server_id=11;
SET @@SESSION.gtid_seq_no=1;
INSERT INTO t SET a=1;
SELECT @gtid_binlog_state_saved "as original state", @@GLOBAL.gtid_binlog_state as "out of order for 11 domain state";
as original state out of order for 11 domain state
1-1-1,1-2-2,11-11-11 1-1-1,1-2-2,11-11-1
PURGE BINARY LOGS TO 'master-bin.000007';
the following command succeeds with warnings
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
Warnings:
Warning 1982 The current gtid binlog state is incompatible with a former one having a gtid '11-11-1' which is less than the '11-11-11' of the gtid list describing an earlier state. The state may have been affected by manually injecting a lower sequence number gtid or via replication.
DROP TABLE t;
RESET MASTER;
@@ -0,0 +1,6 @@
SET @@SESSION.debug_dbug='+d,inject_binlog_delete_domain_init_error';
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (99);
ERROR HY000: Could not delete gtid domain. Reason: injected error.
SHOW WARNINGS;
Level Code Message
Error 1982 Could not delete gtid domain. Reason: injected error.
@@ -0,0 +1,137 @@
# Prove basic properties of
#
# FLUSH BINARY LOGS DELETE_DOMAIN_ID = (...)
#
# The command removes the supplied list of domains from the current
# @@global.gtid_binlog_state provided the binlog files do not contain
# events from such domains.

# The test is not format specific. One format is chosen to run it.
--source include/have_binlog_format_mixed.inc

# Reset binlog state
RESET MASTER;

# Empty list is accepted
FLUSH BINARY LOGS DELETE_DOMAIN_ID = ();
--echo and the command execution is effective thence rotates binlog as usual
--source include/show_binary_logs.inc

--echo Non-existed domain is warned, the command completes without rotation
--echo but with a warning
--let $binlog_pre_flush=query_get_value(SHOW MASTER STATUS, Position, 1)
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (99);
--let $binlog_start=$binlog_pre_flush
--source include/show_binary_logs.inc

# Log one event in a specified domain and try to delete the domain
SET @@SESSION.gtid_domain_id=1;
SET @@SESSION.server_id=1;
CREATE TABLE t (a int);

--error ER_BINLOG_CANT_DELETE_GTID_DOMAIN
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);

# the same error after log rotation
FLUSH BINARY LOGS;
--error ER_BINLOG_CANT_DELETE_GTID_DOMAIN
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);

# the latest binlog does not really contain any events incl ones from 1-domain
--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
--eval PURGE BINARY LOGS TO '$purge_to_binlog';
# So now it's safe to delete
--error 0
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
--echo Gtid_list of the current binlog does not contain '1':
--let $binlog_file=query_get_value(SHOW MASTER STATUS, File, 1)
--source include/show_gtid_list.inc
--echo But the previous log's Gtid_list may have it which explains a warning from the following command
--let $binlog_file=$purge_to_binlog
--source include/show_gtid_list.inc

--echo Already deleted domain in Gtid_list of the earliest log is benign
--echo but may cause a warning
--error 0
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);

# Few domains delete. The chosen number verifies among others how
# expected overrun of the static buffers of underlying dynamic arrays is doing.
--let $domain_cnt=17
--let $server_in_domain_cnt=3
--let $domain_list=
--disable_query_log
while ($domain_cnt)
{
--let servers=$server_in_domain_cnt
--eval SET @@SESSION.gtid_domain_id=$domain_cnt
while ($servers)
{
--eval SET @@SESSION.server_id=10*$domain_cnt + $servers
--eval INSERT INTO t SET a=@@SESSION.server_id

--dec $servers
}
--let $domain_list= $domain_cnt, $domain_list

--dec $domain_cnt
}
--enable_query_log
--let $zero=0
--let $domain_list= $domain_list$zero

--error ER_BINLOG_CANT_DELETE_GTID_DOMAIN
--eval FLUSH BINARY LOGS DELETE_DOMAIN_ID = ($domain_list)

# Now satisfy the safety condtion to purge log files containing $domain list
FLUSH BINARY LOGS;
--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
--eval PURGE BINARY LOGS TO '$purge_to_binlog'
--error 0
--eval FLUSH BINARY LOGS DELETE_DOMAIN_ID = ($domain_list)
--echo Gtid_list of the current binlog does not contain $domain_list:
--let $binlog_file=query_get_value(SHOW MASTER STATUS, File, 1)
--source include/show_gtid_list.inc

# Show reaction on @@global.gtid_binlog_state not succeeding
# earlier state as described by the 1st binlog' Gtid_list.
# Now let it be out-order gtid logged to a domain unrelated to deletion.

--let $del_d_id=1
--eval SET @@SESSION.gtid_domain_id=$del_d_id;
SET @@SESSION.server_id=1;
SET @@SESSION.gtid_seq_no=1;
INSERT INTO t SET a=1;
SET @@SESSION.server_id=2;
SET @@SESSION.gtid_seq_no=2;
INSERT INTO t SET a=2;

SET @@SESSION.gtid_domain_id=11;
SET @@SESSION.server_id=11;
SET @@SESSION.gtid_seq_no=11;
INSERT INTO t SET a=11;

SET @gtid_binlog_state_saved=@@GLOBAL.gtid_binlog_state;
FLUSH BINARY LOGS;

# Inject out of order for domain '11' before
SET @@SESSION.gtid_domain_id=11;
SET @@SESSION.server_id=11;
SET @@SESSION.gtid_seq_no=1;
INSERT INTO t SET a=1;

SELECT @gtid_binlog_state_saved "as original state", @@GLOBAL.gtid_binlog_state as "out of order for 11 domain state";

# to delete '1', first to purge logs containing its events
--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
--eval PURGE BINARY LOGS TO '$purge_to_binlog'

--echo the following command succeeds with warnings
--eval FLUSH BINARY LOGS DELETE_DOMAIN_ID = ($del_d_id)

#
# Cleanup
#

DROP TABLE t;
RESET MASTER;
@@ -0,0 +1,11 @@
# Check "internal" error branches of
# FLUSH BINARY LOGS DELETE_DOMAIN_ID = (...)
# handler.
--source include/have_debug.inc
--source include/have_binlog_format_mixed.inc

SET @@SESSION.debug_dbug='+d,inject_binlog_delete_domain_init_error';
--error ER_BINLOG_CANT_DELETE_GTID_DOMAIN
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (99);

SHOW WARNINGS;
@@ -8,5 +8,5 @@ SELECT 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1
####################################
SELECT event_name, digest, digest_text, sql_text FROM events_statements_history_long;
event_name digest digest_text sql_text
statement/sql/truncate e1c917a43f978456fab15240f89372ca TRUNCATE TABLE truncate table events_statements_history_long
statement/sql/select 3f7ca34376814d0e985337bd588b5ffd SELECT ? + ? + SELECT 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1
statement/sql/truncate 6206ac02a54d832f55015e480e6f2213 TRUNCATE TABLE truncate table events_statements_history_long
statement/sql/select 4cc1c447d79877c4e8df0423fd0cde9a SELECT ? + ? + SELECT 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1
@@ -0,0 +1,30 @@
include/master-slave.inc
[connection master]
SET @@SESSION.gtid_domain_id=0;
CREATE TABLE t (a INT);
call mtr.add_suppression("connecting slave requested to start from.*which is not in the master's binlog");
include/stop_slave.inc
CHANGE MASTER TO master_use_gtid=slave_pos;
SET @@SESSION.gtid_domain_id=11;
SET @@SESSION.server_id=111;
SET @@SESSION.gtid_seq_no=1;
INSERT INTO t SET a=1;
SET @save.gtid_slave_pos=@@global.gtid_slave_pos;
SET @@global.gtid_slave_pos=concat(@@global.gtid_slave_pos, ",", 11, "-", 111, "-", 1 + 1);
Warnings:
Warning 1947 Specified GTID 0-1-1 conflicts with the binary log which contains a more recent GTID 0-2-2. If MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override the new value of @@gtid_slave_pos.
START SLAVE IO_THREAD;
include/wait_for_slave_io_error.inc [errno=1236]
FLUSH BINARY LOGS;
PURGE BINARY LOGS TO 'master-bin.000002';;
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(11);
include/start_slave.inc
INSERT INTO t SET a=1;
include/wait_for_slave_io_error.inc [errno=1236]
FLUSH BINARY LOGS;
PURGE BINARY LOGS TO 'master-bin.000004';;
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(11);
include/start_slave.inc
SET @@SESSION.gtid_domain_id=0;
DROP TABLE t;
include/rpl_end.inc
@@ -0,0 +1,95 @@
# In case master's gtid binlog state is divergent from the slave's gtid_slave_pos
# slave may not be able to connect.
# For instance when slave is more updated in some of domains, see
# MDEV-12012 as example, the master's state may require adjustment.
# In a specific case of an "old" divergent domain, that is there
# won't be no more event groups from it generated, the states can be
# made compatible with wiping the problematic domain away. After that slave
# becomes connectable.
#
# Notice that the slave applied gtid state is not really required to
# be similarly cleaned in order for replication to flow.
# However this could lead to an expected error when the master
# resumes binlogging of such domain which the test demonstrate.

--source include/master-slave.inc

--connection master
# enforce the default domain_id binlogging explicitly
SET @@SESSION.gtid_domain_id=0;
CREATE TABLE t (a INT);
--sync_slave_with_master

--connection slave
call mtr.add_suppression("connecting slave requested to start from.*which is not in the master's binlog");

--source include/stop_slave.inc
CHANGE MASTER TO master_use_gtid=slave_pos;

--connection master
# create extra gtid domains for binlog state
--let $extra_domain_id=11
--let $extra_domain_server_id=111
--let $extra_gtid_seq_no=1
--eval SET @@SESSION.gtid_domain_id=$extra_domain_id
--eval SET @@SESSION.server_id=$extra_domain_server_id
--eval SET @@SESSION.gtid_seq_no=$extra_gtid_seq_no
INSERT INTO t SET a=1;

#
# Set up the slave replication state as if slave knows more events from the extra
# domain.
#
--connection slave
SET @save.gtid_slave_pos=@@global.gtid_slave_pos;
--eval SET @@global.gtid_slave_pos=concat(@@global.gtid_slave_pos, ",", $extra_domain_id, "-", $extra_domain_server_id, "-", $extra_gtid_seq_no + 1)

# unsuccessful attempt to start slave
START SLAVE IO_THREAD;
--let $slave_io_errno=1236
--source include/wait_for_slave_io_error.inc

--connection master
# adjust the master binlog state
FLUSH BINARY LOGS;
--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
--eval PURGE BINARY LOGS TO '$purge_to_binlog';
# with final removal of the extra domain
--eval FLUSH BINARY LOGS DELETE_DOMAIN_ID=($extra_domain_id)

--connection slave
# start the slave sucessfully
--source include/start_slave.inc

--connection master
# but the following gtid from the *extra* domain will break replication
INSERT INTO t SET a=1;

# take note of the slave io thread error due to being dismissed
# extra domain at connection to master which tried becoming active;
# slave is to stop.
--connection slave
--let $errno=1236
--source include/wait_for_slave_io_error.inc

# let's apply the very same medicine
--connection master
FLUSH BINARY LOGS;
--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
--eval PURGE BINARY LOGS TO '$purge_to_binlog';
# with final removal of the extra domain
--eval FLUSH BINARY LOGS DELETE_DOMAIN_ID=($extra_domain_id)

--connection slave
--source include/start_slave.inc

#
# cleanup
#
--connection master
SET @@SESSION.gtid_domain_id=0;
DROP TABLE t;

sync_slave_with_master;

--source include/rpl_end.inc
@@ -179,6 +179,7 @@ static SYMBOL symbols[] = {
{ "DELAYED", SYM(DELAYED_SYM)},
{ "DELAY_KEY_WRITE", SYM(DELAY_KEY_WRITE_SYM)},
{ "DELETE", SYM(DELETE_SYM)},
{ "DELETE_DOMAIN_ID", SYM(DELETE_DOMAIN_ID_SYM)},
{ "DESC", SYM(DESC)},
{ "DESCRIBE", SYM(DESCRIBE)},
{ "DES_KEY_FILE", SYM(DES_KEY_FILE)},

0 comments on commit aae4932

Please sign in to comment.