Skip to content

Commit d09426f

Browse files
committed
MDEV-26537 InnoDB corrupts files due to incorrect st_blksize calculation
The st_blksize returned by fstat(2) is not documented to be a power of 2, like we assumed in commit 58252ff (MDEV-26040). While on Linux, the st_blksize appears to report the file system block size (which hopefully is not smaller than the sector size of the underlying block device), on FreeBSD we observed st_blksize values that might have been something similar to st_size. Also IBM AIX was affected by this. A simple test case would lead to a crash when using the minimum innodb_buffer_pool_size=5m on both FreeBSD and AIX: seq -f 'create table t%g engine=innodb select * from seq_1_to_200000;' \ 1 100|mysql test& seq -f 'create table u%g engine=innodb select * from seq_1_to_200000;' \ 1 100|mysql test& We will fix this by not trusting st_blksize at all, and assuming that the smallest allowed write size (for O_DIRECT) is 4096 bytes. We hope that no storage systems with larger block size exist. Anything larger than 4096 bytes should be unlikely, given that it is the minimum virtual memory page size of many contemporary processors. MariaDB Server on Microsoft Windows was not affected by this. While the 512-byte sector size of the venerable Seagate ST-225 is still in widespread use, the minimum innodb_page_size is 4096 bytes, and innodb_log_file_size can be set in integer multiples of 65536 bytes. The only occasion where InnoDB uses smaller data file block sizes than 4096 bytes is with ROW_FORMAT=COMPRESSED tables with KEY_BLOCK_SIZE=1 or KEY_BLOCK_SIZE=2 (or innodb_page_size=4096). For such tables, we will from now on preallocate space in integer multiples of 4096 bytes and let regular writes extend the file by 1024, 2048, or 3072 bytes. The view INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES.FS_BLOCK_SIZE should report the raw st_blksize. For page_compressed tables, the function fil_space_get_block_size() will map to 512 any st_blksize value that is larger than 4096. os_file_set_size(): Assume that the file system block size is 4096 bytes, and only support extending files to integer multiples of 4096 bytes. fil_space_extend_must_retry(): Round down the preallocation size to an integer multiple of 4096 bytes.
1 parent 1c378f1 commit d09426f

File tree

3 files changed

+19
-7
lines changed

3 files changed

+19
-7
lines changed

mysql-test/suite/innodb/t/check_ibd_filesize.test

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,12 @@ perl;
4646
print "# bytes: ", (-s "$ENV{MYSQLD_DATADIR}/test/t1.ibd"), "\n";
4747
EOF
4848
INSERT INTO t1 SELECT seq,REPEAT('a',30000) FROM seq_1_to_20;
49+
# Ensure that the file will be extended with the last 1024-byte page
50+
# after the file was pre-extended in 4096-byte increments.
51+
--disable_query_log
52+
FLUSH TABLE t1 FOR EXPORT;
53+
UNLOCK TABLES;
54+
--enable_query_log
4955
perl;
5056
print "# bytes: ", (-s "$ENV{MYSQLD_DATADIR}/test/t1.ibd"), "\n";
5157
EOF

storage/innobase/fil/fil0fil.cc

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -980,11 +980,16 @@ fil_space_extend_must_retry(
980980
const page_size_t pageSize(space->flags);
981981
const ulint page_size = pageSize.physical();
982982

983-
/* fil_read_first_page() expects UNIV_PAGE_SIZE bytes.
984-
fil_node_open_file() expects at least 4 * UNIV_PAGE_SIZE bytes.*/
983+
/* fil_read_first_page() expects innodb_page_size bytes.
984+
fil_node_open_file() expects at least 4 * innodb_page_size bytes.
985+
os_file_set_size() expects multiples of 4096 bytes.
986+
For ROW_FORMAT=COMPRESSED tables using 1024-byte or 2048-byte
987+
pages, we will preallocate up to an integer multiple of 4096 bytes,
988+
and let normal writes append 1024, 2048, or 3072 bytes to the file. */
985989
os_offset_t new_size = std::max(
986-
os_offset_t(size - file_start_page_no) * page_size,
987-
os_offset_t(FIL_IBD_FILE_INITIAL_SIZE * UNIV_PAGE_SIZE));
990+
(os_offset_t(size - file_start_page_no) * page_size)
991+
& ~os_offset_t(4095),
992+
os_offset_t(FIL_IBD_FILE_INITIAL_SIZE << srv_page_size_shift));
988993

989994
*success = os_file_set_size(node->name, node->handle, new_size,
990995
FSP_FLAGS_HAS_PAGE_COMPRESSION(space->flags));

storage/innobase/os/os0file.cc

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5405,6 +5405,8 @@ os_file_set_size(
54055405
os_offset_t size,
54065406
bool is_sparse)
54075407
{
5408+
ut_ad(!(size & 4095));
5409+
54085410
#ifdef _WIN32
54095411
/* On Windows, changing file size works well and as expected for both
54105412
sparse and normal files.
@@ -5446,7 +5448,7 @@ os_file_set_size(
54465448
if (current_size >= size) {
54475449
return true;
54485450
}
5449-
current_size &= ~os_offset_t(statbuf.st_blksize - 1);
5451+
current_size &= ~4095ULL;
54505452
err = posix_fallocate(file, current_size,
54515453
size - current_size);
54525454
}
@@ -5486,8 +5488,7 @@ os_file_set_size(
54865488
if (fstat(file, &statbuf)) {
54875489
return false;
54885490
}
5489-
os_offset_t current_size = statbuf.st_size
5490-
& ~os_offset_t(statbuf.st_blksize - 1);
5491+
os_offset_t current_size = statbuf.st_size & ~4095ULL;
54915492
#endif
54925493
if (current_size >= size) {
54935494
return true;

0 commit comments

Comments
 (0)