-
Notifications
You must be signed in to change notification settings - Fork 4.1k
innodb buffer pool size not consistent with large pages #258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
innodb buffer pool size not consistent with large pages #258
Conversation
As highlighted in bug #90943 and in Fernando Laudares, innodb buffer pool and large pages is greedy. https://fosdem.org/2019/schedule/event/hugepages_databases/ To highlight how greedy: Before: $ gdb --args ./runtime_output_directory/mysqld --no-defaults --datadir=/tmp/mysqldata --innodb-buffer-pool-size=20M --innodb-buffer-pool-instances=2 --innodb-buffer-pool-chunk-size=2M --large-pages --large-page-size=2M (gdb) break buf_pool_init Breakpoint 1 at 0x1fb0fe0: file /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc, line 1432. (gdb) break os_mem_alloc_large(unsigned long*) Breakpoint 2 at 0x1e54a20: file /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc, line 83. (gdb) r Thread 2 "mysqld" hit Breakpoint 1, buf_pool_init (total_size=20971520, n_instances=1) at /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc:1436 1436 const ulint size = total_size / n_instances; (gdb) n 1442 NUMA_MEMPOLICY_INTERLEAVE_IN_SCOPE; (gdb) p size $1 = 20971520 Thread 3 "mysqld" hit Breakpoint 2, os_mem_alloc_large (n=n@entry=0x7fffe5598918) at /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc:83 83 if (!os_use_large_pages || !os_large_page_size) { (gdb) f 83 if (!os_use_large_pages || !os_large_page_size) { (gdb) p *n $2 = 2162688 (gdb) n 89 size = ut_2pow_round(*n + (os_large_page_size - 1), os_large_page_size); (gdb) 91 shmid = shmget(IPC_PRIVATE, (size_t)size, SHM_HUGETLB | SHM_R | SHM_W); (gdb) 92 if (shmid < 0) { (gdb) p size $3 = 4194304 (gdb) n 97 ptr = shmat(shmid, NULL, 0); (gdb) 98 if (ptr == (void *)-1) { (gdb) p ptr $4 = (void *) 0x7fffe4800000 Looking at OS allocation: $ cd /proc/$(pidof mysqld);egrep -A 20 '(SYSV|huge)' smaps 7fffe4800000-7fffe4c00000 rw-s 00000000 00:0f 37519390 /SYSV00000000 (deleted) Size: 4096 kB KernelPageSize: 2048 kB MMUPageSize: 2048 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB VmFlags: rd wr sh mr mw me ms de ht sd So for a 2M chunk size, 4M are allocated. 49% of which isn't used. The same linear relationship hold with 1G or even 16G huge pages. Thats a lot of wastage. Asking sysadmins to set innodb_buffer_pool_chunk_size to 2% less of the large-page-size seem like a poor choise. After this commit, things get sane: Thread 3 "mysqld" hit Breakpoint 1, buf_chunk_init (buf_pool=0x7fffe0325618, chunk=0x7fffd80012d8, mem_size=2097152, mutex=0x7fffe624d350) at /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc:982 982 { (gdb) n 995 if (!buf_pool->allocate_chunk(mem_size, chunk)) { (gdb) c Continuing. Thread 3 "mysqld" hit Breakpoint 2, os_mem_alloc_large (n=n@entry=0x7fffe5598918) at /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc:83 83 if (!os_use_large_pages || !os_large_page_size) { (gdb) p *n $1 = 2097152 (gdb) n 89 size = ut_2pow_round(*n + (os_large_page_size - 1), os_large_page_size); (gdb) 91 shmid = shmget(IPC_PRIVATE, (size_t)size, SHM_HUGETLB | SHM_R | SHM_W); (gdb) p size $2 = 2097152 (gdb) n 92 if (shmid < 0) { (gdb) 97 ptr = shmat(shmid, NULL, 0); (gdb) 98 if (ptr == (void *)-1) { (gdb) p ptr $3 = (void *) 0x7fffe4a00000 $ cd /proc/$(pidof mysqld);egrep -A 20 '(SYSV|huge)' smaps 7fffe4a00000-7fffe4c00000 rw-s 00000000 00:0f 37552220 /SYSV00000000 (deleted) Size: 2048 kB KernelPageSize: 2048 kB MMUPageSize: 2048 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB VmFlags: rd wr sh mr mw me ms de ht sd 2M meg allocation when we specified innodb_buffer_pool_chunk_size=2M Rather than add a small extra amount on the size of chunks, keep it of the specified size. The rest of the chunk initialization code adapts to this small size reduction. This has been made in the general case, not just large pages, to keep it simple. The chunks size is controlled by innodb-buffer-pool-chunk-size. In the code increasing this by a descriptor table size length makes it difficult with large pages. With innodb-buffer-pool-chunk-size set to 2M the code before this commit would of added a small amount extra to this value when it tried to allocate this. While not normally a problem it is with large pages, it now requires addition space, a whole extra large page. With a number of pools, or with 1G or 16G large pages this is quite significant. By removing this additional amount, DBAs can set innodb-buffer-pool-chunk size to the large page size, or a multiple of it, and actually get that amount allocated. Previously they had to fudge a value less.
I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it. |
Hi, thank you for your contribution. Please confirm this code is submitted under the terms of the OCA (Oracle's Contribution Agreement) you have previously signed by cutting and pasting the following text as a comment: |
This also fixes https://bugs.mysql.com/bug.php?id=79379. |
Hi, thank you for your contribution. Your code has been assigned to an internal queue. Please follow |
As highlighted in bug #90943 and in Fernando Laudares,
innodb buffer pool and large pages is greedy.
https://fosdem.org/2019/schedule/event/hugepages_databases/
To highlight how greedy:
Before:
$ gdb --args ./runtime_output_directory/mysqld --no-defaults --datadir=/tmp/mysqldata --innodb-buffer-pool-size=20M --innodb-buffer-pool-instances=2 --innodb-buffer-pool-chunk-size=2M --large-pages --large-page-size=2M
(gdb) break buf_pool_init
Breakpoint 1 at 0x1fb0fe0: file /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc, line 1432.
(gdb) break os_mem_alloc_large(unsigned long*)
Breakpoint 2 at 0x1e54a20: file /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc, line 83.
(gdb) r
Thread 2 "mysqld" hit Breakpoint 1, buf_pool_init (total_size=20971520, n_instances=1) at /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc:1436
1436 const ulint size = total_size / n_instances;
(gdb) n
1442 NUMA_MEMPOLICY_INTERLEAVE_IN_SCOPE;
(gdb) p size
$1 = 20971520
Thread 3 "mysqld" hit Breakpoint 2, os_mem_alloc_large (n=n@entry=0x7fffe5598918) at /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc:83
83 if (!os_use_large_pages || !os_large_page_size) {
(gdb) f
83 if (!os_use_large_pages || !os_large_page_size) {
(gdb) p *n
$2 = 2162688
(gdb) n
89 size = ut_2pow_round(*n + (os_large_page_size - 1), os_large_page_size);
(gdb)
91 shmid = shmget(IPC_PRIVATE, (size_t)size, SHM_HUGETLB | SHM_R | SHM_W);
(gdb)
92 if (shmid < 0) {
(gdb) p size
$3 = 4194304
(gdb) n
97 ptr = shmat(shmid, NULL, 0);
(gdb)
98 if (ptr == (void *)-1) {
(gdb) p ptr
$4 = (void *) 0x7fffe4800000
Looking at OS allocation:
$ cd /proc/$(pidof mysqld);egrep -A 20 '(SYSV|huge)' smaps
7fffe4800000-7fffe4c00000 rw-s 00000000 00:0f 37519390 /SYSV00000000 (deleted)
Size: 4096 kB
KernelPageSize: 2048 kB
MMUPageSize: 2048 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
VmFlags: rd wr sh mr mw me ms de ht sd
So for a 2M chunk size, 4M are allocated. 49% of which isn't used.
The same linear relationship hold with 1G or even 16G huge pages.
Thats a lot of wastage. Asking sysadmins to set
innodb_buffer_pool_chunk_size to 2% less of the large-page-size seem
like a poor choise.
After this commit, things get sane:
Thread 3 "mysqld" hit Breakpoint 1, buf_chunk_init (buf_pool=0x7fffe0325618, chunk=0x7fffd80012d8, mem_size=2097152, mutex=0x7fffe624d350)
at /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc:982
982 {
(gdb) n
995 if (!buf_pool->allocate_chunk(mem_size, chunk)) {
(gdb) c
Continuing.
Thread 3 "mysqld" hit Breakpoint 2, os_mem_alloc_large (n=n@entry=0x7fffe5598918) at /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc:83
83 if (!os_use_large_pages || !os_large_page_size) {
(gdb) p *n
$1 = 2097152
(gdb) n
89 size = ut_2pow_round(*n + (os_large_page_size - 1), os_large_page_size);
(gdb)
91 shmid = shmget(IPC_PRIVATE, (size_t)size, SHM_HUGETLB | SHM_R | SHM_W);
(gdb) p size
$2 = 2097152
(gdb) n
92 if (shmid < 0) {
(gdb)
97 ptr = shmat(shmid, NULL, 0);
(gdb)
98 if (ptr == (void *)-1) {
(gdb) p ptr
$3 = (void *) 0x7fffe4a00000
$ cd /proc/$(pidof mysqld);egrep -A 20 '(SYSV|huge)' smaps
7fffe4a00000-7fffe4c00000 rw-s 00000000 00:0f 37552220 /SYSV00000000 (deleted)
Size: 2048 kB
KernelPageSize: 2048 kB
MMUPageSize: 2048 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
VmFlags: rd wr sh mr mw me ms de ht sd
2M meg allocation when we specified innodb_buffer_pool_chunk_size=2M
Rather than add a small extra amount on the size of chunks, keep it
of the specified size. The rest of the chunk initialization code
adapts to this small size reduction. This has been made in the general
case, not just large pages, to keep it simple.
The chunks size is controlled by innodb-buffer-pool-chunk-size. In the
code increasing this by a descriptor table size length makes it
difficult with large pages. With innodb-buffer-pool-chunk-size set to 2M
the code before this commit would of added a small amount extra to this
value when it tried to allocate this. While not normally a problem it is
with large pages, it now requires addition space, a whole extra large
page. With a number of pools, or with 1G or 16G large pages this is
quite significant.
By removing this additional amount, DBAs can set
innodb-buffer-pool-chunk size to the large page size, or a multiple of
it, and actually get that amount allocated. Previously they had to fudge
a value less.