Skip to content

Commit

Permalink
MDEV-24626 Remove synchronous write of page0 file during file creation
Browse files Browse the repository at this point in the history
During data file creation, InnoDB holds dict_sys mutex, tries to
write page 0 of the file and flushes the file. This not only causing
unnecessary contention but also a deviation from the write-ahead
logging protocol.

The clean sequence of operations is that we first start a dictionary
transaction and write SYS_TABLES and SYS_INDEXES records that identify
the tablespace. Then, we durably write a FILE_CREATE record to the
write-ahead log and create the file.

Recovery should not unnecessarily insist that the first page of each
data file that is referred to by the redo log is valid. It must be
enough that page 0 of the tablespace can be initialized based on the
redo log contents.

We introduce a new data structure deferred_spaces that keeps track
of corrupted-looking files during recovery. The data structure holds
the last LSN of a FILE_ record referring to the data file, the
tablespace identifier, and the last known file name.

There are two scenarios can happen during recovery:
i) Sufficient memory: InnoDB can reconstruct the
tablespace after parsing all redo log records.

ii) Insufficient memory(multiple apply phase): InnoDB should
store the deferred tablespace redo logs even though
tablespace is not present. InnoDB should start constructing
the tablespace when it first encounters deferred tablespace
id.

Mariabackup copies the zero filled ibd file in backup_fix_ddl() as
the extension of .new file. Mariabackup test case does page flushing
when it deals with DDL operation during backup operation.

fil_ibd_create(): Remove the write of page0 and flushing of file

fil_ibd_load(): Return FIL_LOAD_DEFER if the tablespace has
zero filled page0

Datafile: Clean up the error handling, and do not report errors
if we are in the middle of recovery. The caller will check
Datafile::m_defer.

fil_node_t::deferred: Indicates whether the tablespace loading was
deferred during recovery

FIL_LOAD_DEFER: Returned by fil_ibd_load() to indicate that tablespace
file was cannot be loaded.

recv_sys_t::recover_deferred(): Invoke deferred_spaces.create() to
initialize fil_space_t based on buffered metadata and records to
initialize page 0. Ignore the flags in fil_name_t, because they are
intentionally invalid.

fil_name_process(): Update deferred_spaces.

recv_sys_t::parse(): Store the redo log if the tablespace id
is present in deferred spaces

recv_sys_t::recover_low(): Should recover the first page of
the tablespace even though the tablespace instance is not
present

recv_sys_t::apply(): Initialize the deferred tablespace
before applying the deferred tablespace records

recv_validate_tablespace(): Skip the validation for deferred_spaces.

recv_rename_files(): Moved and revised from recv_sys_t::apply().
For deferred-recovery tablespaces, do not attempt to rename the
file if a deferred-recovery tablespace is associated with the name.

recv_recovery_from_checkpoint_start(): Invoke recv_rename_files()
and initialize all deferred tablespaces before applying redo log.

fil_node_t::read_page0(): Skip page0 validation if the tablespace
is deferred

buf_page_create_deferred(): A variant of buf_page_create() when
the fil_space_t is not available yet

This is joint work with Thirunarayanan Balathandayuthapani,
who implemented an initial prototype.
  • Loading branch information
dr-m committed May 17, 2021
1 parent c290c0d commit 86dc7b4
Show file tree
Hide file tree
Showing 38 changed files with 720 additions and 369 deletions.
5 changes: 4 additions & 1 deletion extra/mariabackup/fil_cur.cc
Expand Up @@ -364,6 +364,7 @@ xb_fil_cur_result_t xb_fil_cur_read(xb_fil_cur_t* cursor,
ib_int64_t offset;
ib_int64_t to_read;
const ulint page_size = cursor->page_size;
bool defer = false;
xb_ad(!cursor->is_system() || page_size == srv_page_size);

cursor->read_filter->get_next_batch(&cursor->read_filter_ctxt,
Expand Down Expand Up @@ -418,13 +419,15 @@ xb_fil_cur_result_t xb_fil_cur_read(xb_fil_cur_t* cursor,
ret = XB_FIL_CUR_ERROR;
goto func_exit;
}

defer = space->is_deferred();
/* check pages for corruption and re-read if necessary. i.e. in case of
partially written pages */
for (page = cursor->buf, i = 0; i < npages;
page += page_size, i++) {
unsigned page_no = cursor->buf_page_no + i;

if (page_is_corrupted(page, page_no, cursor, space)){
if (!defer && page_is_corrupted(page, page_no, cursor, space)) {
retry_count--;

if (retry_count == 0) {
Expand Down
79 changes: 52 additions & 27 deletions extra/mariabackup/xtrabackup.cc
Expand Up @@ -510,7 +510,8 @@ bool CorruptedPages::empty() const
}

static void xb_load_single_table_tablespace(const std::string &space_name,
bool set_size);
bool set_size,
ulint defer_space_id=0);
static void xb_data_files_close();
static fil_space_t* fil_space_get_by_name(const char* name);

Expand Down Expand Up @@ -587,7 +588,8 @@ xtrabackup_add_datasink(ds_ctxt_t *ds)
typedef void (*process_single_tablespace_func_t)(const char *dirname,
const char *filname,
bool is_remote,
bool skip_node_page0);
bool skip_node_page0,
ulint defer_space_id);
static dberr_t enumerate_ibd_files(process_single_tablespace_func_t callback);

/* ======== Datafiles iterator ======== */
Expand Down Expand Up @@ -1680,7 +1682,8 @@ debug_sync_point(const char *name)
static std::set<std::string> tables_for_export;

static void append_export_table(const char *dbname, const char *tablename,
bool is_remote, bool skip_node_page0)
bool is_remote, bool skip_node_page0,
ulint defer_space_id)
{
if(dbname && tablename && !is_remote)
{
Expand Down Expand Up @@ -3271,11 +3274,14 @@ xb_fil_io_init()
node page0 will be read, and it's size and free pages limit
will be set from page 0, what is neccessary for checking and fixing corrupted
pages.
@param[in] defer_space_id use the space id to create space object
when there is deferred tablespace
*/
static void xb_load_single_table_tablespace(const char *dirname,
const char *filname,
bool is_remote,
bool skip_node_page0)
bool skip_node_page0,
ulint defer_space_id)
{
ut_ad(srv_operation == SRV_OPERATION_BACKUP
|| srv_operation == SRV_OPERATION_RESTORE_DELTA
Expand All @@ -3298,6 +3304,7 @@ static void xb_load_single_table_tablespace(const char *dirname,
lsn_t flush_lsn;
dberr_t err;
fil_space_t *space;
bool defer = false;

name = static_cast<char*>(ut_malloc_nokey(pathlen));

Expand Down Expand Up @@ -3329,14 +3336,30 @@ static void xb_load_single_table_tablespace(const char *dirname,
}

for (int i = 0; i < 10; i++) {
file->m_defer = false;
err = file->validate_first_page(&flush_lsn);
if (err != DB_CORRUPTION) {

if (file->m_defer) {
if (defer_space_id) {
defer = true;
file->set_space_id(defer_space_id);
file->set_flags(FSP_FLAGS_PAGE_SSIZE());
err = DB_SUCCESS;
break;
}
} else if (err != DB_CORRUPTION) {
break;
}

my_sleep(1000);
}

if (!defer && file->m_defer) {
delete file;
ut_free(name);
return;
}

bool is_empty_file = file->exists() && file->is_empty_file();

if (err == DB_SUCCESS && file->space_id() != SRV_TMP_SPACE_ID) {
Expand All @@ -3345,9 +3368,11 @@ static void xb_load_single_table_tablespace(const char *dirname,
FIL_TYPE_TABLESPACE, NULL/* TODO: crypt_data */);

ut_a(space != NULL);
space->add(file->filepath(),
skip_node_page0 ? file->detach() : pfs_os_file_t(),
0, false, false);
fil_node_t* node= space->add(
file->filepath(),
skip_node_page0 ? file->detach() : pfs_os_file_t(),
0, false, false);
node->deferred= defer;
mysql_mutex_lock(&fil_system.mutex);
space->read_page0();
mysql_mutex_unlock(&fil_system.mutex);
Expand All @@ -3368,7 +3393,8 @@ static void xb_load_single_table_tablespace(const char *dirname,
}

static void xb_load_single_table_tablespace(const std::string &space_name,
bool skip_node_page0)
bool skip_node_page0,
ulint defer_space_id)
{
std::string name(space_name);
bool is_remote= access((name + ".ibd").c_str(), R_OK) != 0;
Expand All @@ -3379,14 +3405,13 @@ static void xb_load_single_table_tablespace(const std::string &space_name,
buf[sizeof buf - 1]= '\0';
const char *dbname= buf;
char *p= strchr(buf, '/');
if (p == 0)
if (!p)
die("Unexpected tablespace %s filename %s", space_name.c_str(),
name.c_str());
ut_a(p);
*p= 0;
const char *tablename= p + 1;
xb_load_single_table_tablespace(dbname, tablename, is_remote,
skip_node_page0);
skip_node_page0, defer_space_id);
}

/** Scan the database directories under the MySQL datadir, looking for
Expand Down Expand Up @@ -3425,12 +3450,11 @@ static dberr_t enumerate_ibd_files(process_single_tablespace_func_t callback)

/* General tablespaces are always at the first level of the
data home dir */
if (dbinfo.type == OS_FILE_TYPE_FILE) {
bool is_isl = ends_with(dbinfo.name, ".isl");
bool is_ibd = !is_isl && ends_with(dbinfo.name,".ibd");

if (is_isl || is_ibd) {
(*callback)(NULL, dbinfo.name, is_isl, false);
if (dbinfo.type != OS_FILE_TYPE_FILE) {
const bool is_isl = ends_with(dbinfo.name, ".isl");
if (is_isl || ends_with(dbinfo.name,".ibd")) {
(*callback)(nullptr, dbinfo.name, is_isl,
false, 0);
}
}

Expand Down Expand Up @@ -3486,7 +3510,7 @@ static dberr_t enumerate_ibd_files(process_single_tablespace_func_t callback)
if (strlen(fileinfo.name) > 4) {
bool is_isl= false;
if (ends_with(fileinfo.name, ".ibd") || ((is_isl = ends_with(fileinfo.name, ".isl"))))
(*callback)(dbinfo.name, fileinfo.name, is_isl, false);
(*callback)(dbinfo.name, fileinfo.name, is_isl, false, 0);
}
}

Expand Down Expand Up @@ -4567,9 +4591,9 @@ FTWRL. This ensures consistent backup in presence of DDL.
*/
void backup_fix_ddl(CorruptedPages &corrupted_pages)
{
std::set<std::string> new_tables;
std::set<std::string> dropped_tables;
std::map<std::string, std::string> renamed_tables;
space_id_to_name_t new_tables;

/* Disable further DDL on backed up tables (only needed for --no-lock).*/
pthread_mutex_lock(&backup_mutex);
Expand Down Expand Up @@ -4619,7 +4643,7 @@ void backup_fix_ddl(CorruptedPages &corrupted_pages)

if (ddl_tracker.drops.find(id) == ddl_tracker.drops.end()) {
dropped_tables.erase(name);
new_tables.insert(name);
new_tables[id] = name;
if (opt_log_innodb_page_corruption)
corrupted_pages.drop_space(id);
}
Expand Down Expand Up @@ -4661,12 +4685,12 @@ void backup_fix_ddl(CorruptedPages &corrupted_pages)
}

DBUG_EXECUTE_IF("check_mdl_lock_works", DBUG_ASSERT(new_tables.size() == 0););
for (std::set<std::string>::iterator iter = new_tables.begin();
iter != new_tables.end(); iter++) {
const char *space_name = iter->c_str();
if (check_if_skip_table(space_name))
continue;
xb_load_single_table_tablespace(*iter, false);

for (const auto &t : new_tables) {
if (!check_if_skip_table(t.second.c_str())) {
xb_load_single_table_tablespace(t.second, false,
t.first);
}
}

datafiles_iter_t it2;
Expand All @@ -4677,6 +4701,7 @@ void backup_fix_ddl(CorruptedPages &corrupted_pages)
std::string dest_name= filename_to_spacename(
node->name, strlen(node->name));
dest_name.append(".new");

xtrabackup_copy_datafile(node, 0, dest_name.c_str(), wf_write_through,
corrupted_pages);
}
Expand Down
26 changes: 3 additions & 23 deletions mysql-test/suite/innodb/r/log_file_name.result
@@ -1,6 +1,7 @@
SET GLOBAL innodb_file_per_table=ON;
FLUSH TABLES;
CREATE TABLE t1(a INT PRIMARY KEY) ENGINE=InnoDB;
# restart
CREATE TABLE t3(a INT PRIMARY KEY) ENGINE=InnoDB;
BEGIN;
INSERT INTO t3 VALUES (33101),(347);
Expand Down Expand Up @@ -31,7 +32,7 @@ WHERE engine = 'innodb'
AND support IN ('YES', 'DEFAULT', 'ENABLED');
ENGINE SUPPORT COMMENT TRANSACTIONS XA SAVEPOINTS
FOUND 1 /InnoDB: Ignoring data file '.*t[23].ibd' with space ID/ in mysqld.1.err
FOUND 1 /InnoDB: Tablespace \d+ was not found at .*t1.ibd/ in mysqld.1.err
NOT FOUND /InnoDB: Tablespace \d+ was not found at .*t1.ibd/ in mysqld.1.err
FOUND 1 /InnoDB: Tablespace \d+ was not found at .*t3.ibd/ in mysqld.1.err
FOUND 2 /InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace/ in mysqld.1.err
# Fault 4: Missing data file
Expand All @@ -54,7 +55,7 @@ WHERE engine = 'innodb'
AND support IN ('YES', 'DEFAULT', 'ENABLED');
ENGINE SUPPORT COMMENT TRANSACTIONS XA SAVEPOINTS
NOT FOUND /\[Note\] InnoDB: Cannot read first page of .*t2.ibd/ in mysqld.1.err
FOUND 1 /\[ERROR\] InnoDB: Datafile .*t2.*\. Cannot determine the space ID from the first 64 pages/ in mysqld.1.err
FOUND 1 /.*\[ERROR\] InnoDB: Cannot apply log to \[page id: space=[1-9][0-9]*, page number=3\] of corrupted file './test/t2\.ibd'/ in mysqld.1.err
# restart
SELECT * FROM t2;
a
Expand Down Expand Up @@ -85,27 +86,6 @@ INSERT INTO u6 VALUES(2);
# Kill the server
# Fault 6: All-zero data file and innodb_force_recovery
# restart: --innodb-force-recovery=1
SELECT * FROM INFORMATION_SCHEMA.ENGINES
WHERE engine = 'innodb'
AND support IN ('YES', 'DEFAULT', 'ENABLED');
ENGINE SUPPORT COMMENT TRANSACTIONS XA SAVEPOINTS
FOUND 1 /\[Note\] InnoDB: Header page consists of zero bytes in datafile: .*u1.ibd/ in mysqld.1.err
FOUND 1 /\[ERROR\] InnoDB: Datafile .*u1.*\. Cannot determine the space ID from the first 64 pages/ in mysqld.1.err
NOT FOUND /\[Note\] InnoDB: Cannot read first page of .*u2.ibd/ in mysqld.1.err
# Fault 7: Missing or wrong data file and innodb_force_recovery
# restart: --innodb-force-recovery=1
SELECT * FROM INFORMATION_SCHEMA.ENGINES
WHERE engine = 'innodb'
AND support IN ('YES', 'DEFAULT', 'ENABLED');
ENGINE SUPPORT COMMENT TRANSACTIONS XA SAVEPOINTS
FOUND 1 /\[Note\] InnoDB: Header page consists of zero bytes in datafile: .*u1.ibd/ in mysqld.1.err
FOUND 1 /InnoDB: At LSN: \d+: unable to open file .*u[1-5].ibd for tablespace/ in mysqld.1.err
FOUND 1 /\[ERROR\] InnoDB: Cannot replay rename of tablespace \d+ from '.*u4.ibd' to '.*u6.ibd' because the target file exists/ in mysqld.1.err
# restart: --innodb-force-recovery=1
FOUND 1 /\[Note\] InnoDB: Header page consists of zero bytes in datafile: .*u1.ibd/ in mysqld.1.err
FOUND 1 /InnoDB: At LSN: \d+: unable to open file .*u[1-5].ibd for tablespace/ in mysqld.1.err
FOUND 1 /\[Warning\] InnoDB: Tablespace \d+ was not found at .*u[1-5].ibd, and innodb_force_recovery was set. All redo log for this tablespace will be ignored!/ in mysqld.1.err
# restart
DROP TABLE u1,u2,u3,u6;
# List of files:
db.opt
Expand Down
68 changes: 8 additions & 60 deletions mysql-test/suite/innodb/t/log_file_name.test
Expand Up @@ -12,6 +12,7 @@ FLUSH TABLES;

CREATE TABLE t1(a INT PRIMARY KEY) ENGINE=InnoDB;

--source include/restart_mysqld.inc
--source include/no_checkpoint_start.inc
CREATE TABLE t3(a INT PRIMARY KEY) ENGINE=InnoDB;

Expand Down Expand Up @@ -120,7 +121,7 @@ eval $check_no_innodb;

let SEARCH_PATTERN= \[Note\] InnoDB: Cannot read first page of .*t2.ibd;
--source include/search_pattern_in_file.inc
let SEARCH_PATTERN= \[ERROR\] InnoDB: Datafile .*t2.*\. Cannot determine the space ID from the first 64 pages;
let SEARCH_PATTERN= .*\[ERROR\] InnoDB: Cannot apply log to \\[page id: space=[1-9][0-9]*, page number=3\\] of corrupted file './test/t2\\.ibd';
--source include/search_pattern_in_file.inc

# Restore t2.ibd
Expand Down Expand Up @@ -150,13 +151,15 @@ call mtr.add_suppression("InnoDB: Cannot open datafile for read-write: '.*t2\.ib
# The following are for aborted startup without --innodb-force-recovery:
call mtr.add_suppression("InnoDB: Tablespace .* was not found at .*test");
call mtr.add_suppression("InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace");
call mtr.add_suppression("InnoDB: Cannot read first page of '.*test.[tu]2.ibd' I/O error");
call mtr.add_suppression("InnoDB: Cannot read first page of '.*test.[tu]2.ibd': I/O error");
call mtr.add_suppression("InnoDB: Cannot apply log to \\[page id: space=[1-9][0-9]*, page number=3\\] of corrupted file './test/t2\\.ibd'");
call mtr.add_suppression("InnoDB: Datafile '.*test.*ibd' is corrupted");
call mtr.add_suppression("InnoDB: Cannot replay file rename. Remove either file and try again");
call mtr.add_suppression("InnoDB: Cannot rename.*because the target file exists");
call mtr.add_suppression("InnoDB: Log scan aborted at LSN");
# The following are for the --innodb-force-recovery=1 with broken u* tables:
call mtr.add_suppression("InnoDB: The size of the file .*u1\\.ibd is only 16384 bytes, should be at least 65536");
call mtr.add_suppression("InnoDB: The size of the file .*u[12]\\.ibd is only [1-9][0-9]* bytes, should be at least 65536");
call mtr.add_suppression("InnoDB: The size of tablespace file '.*test/u[12].ibd' is only");
call mtr.add_suppression("InnoDB: The error means the system cannot find the path specified");
call mtr.add_suppression("InnoDB: .*you must create directories");
call mtr.add_suppression("InnoDB: Cannot open datafile for read-only: '.*u[1-5]\.ibd'");
Expand Down Expand Up @@ -199,69 +202,14 @@ EOF

--exec echo "" > $MYSQLD_DATADIR/test/u2.ibd

# TODO: Test with this, once
# Bug#18131883 IMPROVE INNODB ERROR MESSAGES REGARDING FILES
# has been fixed:
#--mkdir $MYSQLD_DATADIR/test/u3.ibd

--copy_file $MYSQLD_DATADIR/test/u6.ibd $MYSQLD_DATADIR/test/u4.ibd

--let $restart_parameters= --innodb-force-recovery=1
--source include/start_mysqld.inc
eval $check_no_innodb;

let SEARCH_PATTERN= \[Note\] InnoDB: Header page consists of zero bytes in datafile: .*u1.ibd;
--source include/search_pattern_in_file.inc

let SEARCH_PATTERN= \[ERROR\] InnoDB: Datafile .*u1.*\. Cannot determine the space ID from the first 64 pages;
--source include/search_pattern_in_file.inc

# TODO: These errors should state the file name (u2.ibd) and be ignored
# in innodb-force-recovery mode once
# Bug#18131883 IMPROVE INNODB ERROR MESSAGES REGARDING FILES
# has been fixed:
let SEARCH_PATTERN= \[Note\] InnoDB: Cannot read first page of .*u2.ibd;
--source include/search_pattern_in_file.inc

--source include/shutdown_mysqld.inc

# Allow --innodb-force-recovery to start despite the broken file.
# TODO: Remove this workaround, and make --innodb-force-recovery=1
# ignore the broken file.
--remove_file $MYSQLD_DATADIR/test/u2.ibd

--echo # Fault 7: Missing or wrong data file and innodb_force_recovery

--source include/start_mysqld.inc
eval $check_no_innodb;

let SEARCH_PATTERN= \[Note\] InnoDB: Header page consists of zero bytes in datafile: .*u1.ibd;
--source include/search_pattern_in_file.inc

let SEARCH_PATTERN= InnoDB: At LSN: \d+: unable to open file .*u[1-5].ibd for tablespace;
--source include/search_pattern_in_file.inc

let SEARCH_PATTERN= \[ERROR\] InnoDB: Cannot replay rename of tablespace \d+ from '.*u4.ibd' to '.*u6.ibd' because the target file exists;
--source include/search_pattern_in_file.inc

--remove_file $MYSQLD_DATADIR/test/u6.ibd

--source include/restart_mysqld.inc

let SEARCH_PATTERN= \[Note\] InnoDB: Header page consists of zero bytes in datafile: .*u1.ibd;
--source include/search_pattern_in_file.inc

let SEARCH_PATTERN= InnoDB: At LSN: \d+: unable to open file .*u[1-5].ibd for tablespace;
--source include/search_pattern_in_file.inc

let SEARCH_PATTERN= \[Warning\] InnoDB: Tablespace \d+ was not found at .*u[1-5].ibd, and innodb_force_recovery was set. All redo log for this tablespace will be ignored!;
--source include/search_pattern_in_file.inc

--let $restart_parameters=
--source include/restart_mysqld.inc

DROP TABLE u1,u2,u3,u6;

--remove_file $MYSQLD_DATADIR/test/u4.ibd

--echo # List of files:
--list_files $MYSQLD_DATADIR/test

Expand Down

0 comments on commit 86dc7b4

Please sign in to comment.