Skip to content

Commit

Permalink
MDEV-32268: GNU libc posix_fallocate() may be extremely slow
Browse files Browse the repository at this point in the history
os_file_set_size(): Let us invoke the Linux system call fallocate(2)
directly, because the GNU libc posix_fallocate() implements a fallback
that writes to the file 1 byte every 4096 or fewer bytes. In one
environment, invoking fallocate() directly would lead to 4 times the
file growth rate during ALTER TABLE. Presumably, what happened was
that the NFS server used a smaller allocation block size than 4096 bytes
and therefore created a heavily fragmented sparse file when
posix_fallocate() was used. For example, extending a file by 4 MiB
would create 1,024 file fragments. When the file is actually being
written to with data, it would be "unsparsed".

The built-in EOPNOTSUPP fallback in os_file_set_size() writes a buffer
of 1 MiB of NUL bytes. This was always used on musl libc and other
Linux implementations of posix_fallocate().
  • Loading branch information
dr-m committed Jan 18, 2024
1 parent 615f4a8 commit ee1407f
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions storage/innobase/os/os0file.cc
Original file line number Diff line number Diff line change
Expand Up @@ -4934,8 +4934,18 @@ os_file_set_size(
return true;
}
current_size &= ~4095ULL;
# ifdef __linux__
if (!fallocate(file, 0, current_size,
size - current_size)) {
err = 0;
break;
}

err = errno;
# else
err = posix_fallocate(file, current_size,
size - current_size);
# endif
}
} while (err == EINTR
&& srv_shutdown_state <= SRV_SHUTDOWN_INITIATED);
Expand Down

0 comments on commit ee1407f

Please sign in to comment.