Skip to content

Commit

Permalink
MDEV-23190 InnoDB data file extension is not crash-safe
Browse files Browse the repository at this point in the history
When InnoDB is extending a data file, it is updating the FSP_SIZE
field in the first page of the data file.

In commit 8451e09 (MDEV-11556)
we removed a work-around for this bug and made recovery stricter,
by making it track changes to FSP_SIZE via redo log records, and
extend the data files before any changes are being applied to them.

It turns out that the function fsp_fill_free_list() is not crash-safe
with respect to this when it is initializing the change buffer bitmap
page (page 1, or generally, N*innodb_page_size+1). It uses a separate
mini-transaction that is committed (and will be written to the redo
log file) before the mini-transaction that actually extended the data
file. Hence, recovery can observe a reference to a page that is
beyond the current end of the data file.

fsp_fill_free_list(): Initialize the change buffer bitmap page in
the same mini-transaction.

The rest of the changes are fixing a bug that the use of the separate
mini-transaction was attempting to work around. Namely, we must ensure
that no other thread will access the change buffer bitmap page before
our mini-transaction has been committed and all page latches have been
released.

That is, for read-ahead as well as neighbour flushing, we must avoid
accessing pages that might not yet be durably part of the tablespace.

fil_space_t::committed_size: The size of the tablespace
as persisted by mtr_commit().

fil_space_t::max_page_number_for_io(): Limit the highest page
number for I/O batches to committed_size.

MTR_MEMO_SPACE_X_LOCK: Replaces MTR_MEMO_X_LOCK for fil_space_t::latch.

mtr_x_space_lock(): Replaces mtr_x_lock() for fil_space_t::latch.

mtr_memo_slot_release_func(): When releasing MTR_MEMO_SPACE_X_LOCK,
copy space->size to space->committed_size. In this way, read-ahead
or flushing will never be invoked on pages that do not yet exist
according to FSP_SIZE.
  • Loading branch information
dr-m committed Jul 20, 2020
1 parent 98e2c17 commit 57ec42b
Show file tree
Hide file tree
Showing 24 changed files with 313 additions and 197 deletions.
9 changes: 6 additions & 3 deletions storage/innobase/buf/buf0flu.cc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*****************************************************************************
Copyright (c) 1995, 2017, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2013, 2019, MariaDB Corporation.
Copyright (c) 2013, 2020, MariaDB Corporation.
Copyright (c) 2013, 2014, Fusion-io
This program is free software; you can redistribute it and/or modify it under
Expand Down Expand Up @@ -1241,8 +1241,11 @@ buf_flush_try_neighbors(
/* fprintf(stderr, "Flush area: low %lu high %lu\n", low, high); */
#endif

if (high > fil_space_get_size(space)) {
high = fil_space_get_size(space);
if (fil_space_t *s = fil_space_acquire_for_io(space)) {
high = s->max_page_number_for_io(high);
fil_space_release_for_io(s);
} else {
return 0;
}

ulint count = 0;
Expand Down
68 changes: 37 additions & 31 deletions storage/innobase/buf/buf0rea.cc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*****************************************************************************
Copyright (c) 1995, 2013, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2015, 2017, MariaDB Corporation.
Copyright (c) 2015, 2020, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Expand Down Expand Up @@ -292,19 +292,22 @@ buf_read_ahead_random(
return(0);
}

/* Remember the tablespace version before we ask te tablespace size
below: if DISCARD + IMPORT changes the actual .ibd file meanwhile, we
do not try to read outside the bounds of the tablespace! */
if (fil_space_t *s = fil_space_acquire_for_io(space)) {
/* Remember the tablespace version along with the
tablespace size: if DISCARD + IMPORT changes the
actual .ibd file meanwhile, we do not try to read
outside the bounds of the tablespace! */
tablespace_version = s->tablespace_version;

tablespace_version = fil_space_get_version(space);

low = (offset / buf_read_ahead_random_area)
* buf_read_ahead_random_area;
high = (offset / buf_read_ahead_random_area + 1)
* buf_read_ahead_random_area;
if (high > fil_space_get_size(space)) {
low = (offset / buf_read_ahead_random_area)
* buf_read_ahead_random_area;
high = (offset / buf_read_ahead_random_area + 1)
* buf_read_ahead_random_area;
high = s->max_page_number_for_io(high);

high = fil_space_get_size(space);
fil_space_release_for_io(s);
} else {
return 0;
}

buf_pool_mutex_enter(buf_pool);
Expand Down Expand Up @@ -435,22 +438,16 @@ buf_read_page(
ulint zip_size,
ulint offset)
{
ib_int64_t tablespace_version;
ulint count;
dberr_t err = DB_SUCCESS;

tablespace_version = fil_space_get_version(space_id);

FilSpace space(space_id, true);

if (space()) {

/* We do the i/o in the synchronous aio mode to save thread
switches: hence TRUE */
count = buf_read_page_low(&err, true, BUF_READ_ANY_PAGE, space_id,
zip_size, FALSE,
tablespace_version, offset);

ulint count = buf_read_page_low(&err, /*sync=*/true,
BUF_READ_ANY_PAGE,
space_id, zip_size, FALSE,
space()->tablespace_version,
offset);
srv_stats.buf_pool_reads.add(count);
}

Expand Down Expand Up @@ -619,21 +616,30 @@ buf_read_ahead_linear(
return(0);
}

/* Remember the tablespace version before we ask te tablespace size
below: if DISCARD + IMPORT changes the actual .ibd file meanwhile, we
do not try to read outside the bounds of the tablespace! */
uint32_t space_high_limit = 0;

tablespace_version = fil_space_get_version(space);
if (fil_space_t *s = fil_space_acquire_for_io(space)) {
/* Remember the tablespace version along with the
tablespace size: if DISCARD + IMPORT changes the
actual .ibd file meanwhile, we do not try to read
outside the bounds of the tablespace! */
tablespace_version = s->tablespace_version;

buf_pool_mutex_enter(buf_pool);
space_high_limit = s->max_page_number_for_io(ULINT_UNDEFINED);

if (high > fil_space_get_size(space)) {
buf_pool_mutex_exit(buf_pool);
fil_space_release_for_io(s);
} else {
return 0;
}

if (high > space_high_limit) {
/* The area is not whole, return */

return(0);
}

buf_pool_mutex_enter(buf_pool);

if (buf_pool->n_pend_reads
> buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {
buf_pool_mutex_exit(buf_pool);
Expand Down Expand Up @@ -754,7 +760,7 @@ buf_read_ahead_linear(
return(0);
}

if (high > fil_space_get_size(space)) {
if (high > space_high_limit) {
/* The area is not whole, return */

return(0);
Expand Down
5 changes: 4 additions & 1 deletion storage/innobase/fil/fil0fil.cc
Original file line number Diff line number Diff line change
Expand Up @@ -676,7 +676,7 @@ fil_node_open_file(
#ifdef UNIV_HOTBACKUP
add_size:
#endif /* UNIV_HOTBACKUP */
space->size += node->size;
space->committed_size = space->size += node->size;
}

ulint atomic_writes = FSP_FLAGS_GET_ATOMIC_WRITES(space->flags);
Expand Down Expand Up @@ -1151,6 +1151,9 @@ fil_mutex_enter_and_prepare_for_io(
ut_a(success);
/* InnoDB data files cannot shrink. */
ut_a(space->size >= size);
if (size > space->committed_size) {
space->committed_size = size;
}

/* There could be multiple concurrent I/O requests for
this tablespace (multiple threads trying to extend
Expand Down
Loading

0 comments on commit 57ec42b

Please sign in to comment.