Skip to content

Commit

Permalink
MDEV-17380 innodb_flush_neighbors=ON should be ignored on SSD
Browse files Browse the repository at this point in the history
For tablespaces that do not reside on spinning storage, it does
not make sense to attempt to write nearby pages when writing out
dirty pages from the InnoDB buffer pool. It is actually detrimental
to performance and to the life span of flash ROM storage.

With this change, MariaDB will detect whether an InnoDB file resides
on solid-state storage. The detection has been implemented for Linux
and Microsoft Windows. For other systems, we will err on the safe side
and assume that files reside on SSD.

As part of this change, we will reduce the number of fstat() calls
when opening data files on POSIX systems and slightly clean up some
file I/O code.

FIXME: os_is_sparse_file_supported() on POSIX works in a destructive
manner. Thus, we can only invoke it when creating files, not when
opening them.

For diagnostics, we introduce the column ON_SSD to the table
INFORMATION_SCHEMA.INNODB_TABLESPACES_SCRUBBING. The table
INNODB_SYS_TABLESPACES might seem more appropriate, but its purpose
is to reflect the contents of the InnoDB system table SYS_TABLESPACES,
which we would like to remove at some point.

On Microsoft Windows, querying StorageDeviceSeekPenaltyProperty
sometimes returns ERROR_GEN_FAILURE instead of ERROR_INVALID_FUNCTION
or ERROR_NOT_SUPPORTED. We will silently ignore also this error,
and assume that the file does not reside on SSD.

On Linux, the detection will be based on the files
/sys/block/*/queue/rotational and /sys/block/*/dev.
Especially for USB storage, it is possible that
/sys/block/*/queue/rotational will wrongly report 1 instead of 0.

fil_node_t::on_ssd: Whether the InnoDB data file resides on
solid-state storage.

fil_system_t::ssd: Collection of Linux block devices that reside on
non-rotational storage.

fil_system_t::create(): Detect ssd on Linux based on the contents
of /sys/block/*/queue/rotational and /sys/block/*/dev.

fil_system_t::is_ssd(dev_t): Determine if a Linux block device is
non-rotational. Partitions will be identified with the containing
block device by assuming that the least significant 4 bits of the
minor number identify a partition, and that the "partition number"
of the entire device is 0.
  • Loading branch information
dr-m committed Apr 1, 2019
1 parent 2d825e9 commit 10dd290
Show file tree
Hide file tree
Showing 10 changed files with 476 additions and 413 deletions.
1 change: 0 additions & 1 deletion config.h.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,6 @@
#cmakedefine HAVE_PERROR 1
#cmakedefine HAVE_POLL 1
#cmakedefine HAVE_POSIX_FALLOCATE 1
#cmakedefine HAVE_LINUX_FALLOC_H 1
#cmakedefine HAVE_FALLOC_PUNCH_HOLE_AND_KEEP_SIZE 1
#cmakedefine HAVE_PREAD 1
#cmakedefine HAVE_PAUSE_INSTRUCTION 1
Expand Down
1 change: 0 additions & 1 deletion configure.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,6 @@ CHECK_INCLUDE_FILES (inttypes.h HAVE_INTTYPES_H)
CHECK_INCLUDE_FILES (langinfo.h HAVE_LANGINFO_H)
CHECK_INCLUDE_FILES (link.h HAVE_LINK_H)
CHECK_INCLUDE_FILES (linux/unistd.h HAVE_LINUX_UNISTD_H)
CHECK_INCLUDE_FILES (linux/falloc.h HAVE_LINUX_FALLOC_H)
CHECK_INCLUDE_FILES (limits.h HAVE_LIMITS_H)
CHECK_INCLUDE_FILES (locale.h HAVE_LOCALE_H)
CHECK_INCLUDE_FILES (malloc.h HAVE_MALLOC_H)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ SPACE NAME ENCRYPTION_SCHEME KEYSERVER_REQUESTS MIN_KEY_VERSION CURRENT_KEY_VERS
Warnings:
Warning 1012 InnoDB: SELECTing from INFORMATION_SCHEMA.innodb_tablespaces_encryption but the InnoDB storage engine is not installed
select * from information_schema.innodb_tablespaces_scrubbing;
SPACE NAME COMPRESSED LAST_SCRUB_COMPLETED CURRENT_SCRUB_STARTED CURRENT_SCRUB_ACTIVE_THREADS CURRENT_SCRUB_PAGE_NUMBER CURRENT_SCRUB_MAX_PAGE_NUMBER ROTATING_OR_FLUSHING
SPACE NAME COMPRESSED LAST_SCRUB_COMPLETED CURRENT_SCRUB_STARTED CURRENT_SCRUB_ACTIVE_THREADS CURRENT_SCRUB_PAGE_NUMBER CURRENT_SCRUB_MAX_PAGE_NUMBER ON_SSD
Warnings:
Warning 1012 InnoDB: SELECTing from INFORMATION_SCHEMA.innodb_tablespaces_scrubbing but the InnoDB storage engine is not installed
select * from information_schema.innodb_mutexes;
Expand Down
15 changes: 10 additions & 5 deletions storage/innobase/buf/buf0flu.cc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*****************************************************************************
Copyright (c) 1995, 2017, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2013, 2018, MariaDB Corporation.
Copyright (c) 2013, 2019, MariaDB Corporation.
Copyright (c) 2013, 2014, Fusion-io
This program is free software; you can redistribute it and/or modify it under
Expand Down Expand Up @@ -1314,9 +1314,13 @@ buf_flush_try_neighbors(
buf_pool_t* buf_pool = buf_pool_get(page_id);

ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_LIST);
fil_space_t* space = fil_space_acquire_for_io(page_id.space());
if (!space) {
return 0;
}

if (UT_LIST_GET_LEN(buf_pool->LRU) < BUF_LRU_OLD_MIN_LEN
|| srv_flush_neighbors == 0) {
|| !srv_flush_neighbors || !space->is_rotational()) {
/* If there is little space or neighbor flushing is
not enabled then just flush the victim. */
low = page_id.page_no();
Expand Down Expand Up @@ -1371,9 +1375,8 @@ buf_flush_try_neighbors(
}
}

const ulint space_size = fil_space_get_size(page_id.space());
if (high > space_size) {
high = space_size;
if (high > space->size) {
high = space->size;
}

DBUG_PRINT("ib_buf", ("flush %u:%u..%u",
Expand Down Expand Up @@ -1450,6 +1453,8 @@ buf_flush_try_neighbors(
buf_pool_mutex_exit(buf_pool);
}

space->release_for_io();

if (count > 1) {
MONITOR_INC_VALUE_CUMULATIVE(
MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE,
Expand Down
Loading

0 comments on commit 10dd290

Please sign in to comment.