Skip to content

Conversation

grooverdan
Copy link

As highlighted in bug #90943 and in Fernando Laudares,
innodb buffer pool and large pages is greedy.
https://fosdem.org/2019/schedule/event/hugepages_databases/

To highlight how greedy:

Before:

$ gdb --args ./runtime_output_directory/mysqld --no-defaults --datadir=/tmp/mysqldata --innodb-buffer-pool-size=20M --innodb-buffer-pool-instances=2 --innodb-buffer-pool-chunk-size=2M --large-pages --large-page-size=2M

(gdb) break buf_pool_init
Breakpoint 1 at 0x1fb0fe0: file /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc, line 1432.
(gdb) break os_mem_alloc_large(unsigned long*)
Breakpoint 2 at 0x1e54a20: file /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc, line 83.
(gdb) r
Thread 2 "mysqld" hit Breakpoint 1, buf_pool_init (total_size=20971520, n_instances=1) at /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc:1436
1436 const ulint size = total_size / n_instances;
(gdb) n
1442 NUMA_MEMPOLICY_INTERLEAVE_IN_SCOPE;
(gdb) p size
$1 = 20971520
Thread 3 "mysqld" hit Breakpoint 2, os_mem_alloc_large (n=n@entry=0x7fffe5598918) at /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc:83
83 if (!os_use_large_pages || !os_large_page_size) {
(gdb) f
83 if (!os_use_large_pages || !os_large_page_size) {
(gdb) p *n
$2 = 2162688
(gdb) n
89 size = ut_2pow_round(*n + (os_large_page_size - 1), os_large_page_size);
(gdb)
91 shmid = shmget(IPC_PRIVATE, (size_t)size, SHM_HUGETLB | SHM_R | SHM_W);
(gdb)
92 if (shmid < 0) {
(gdb) p size
$3 = 4194304
(gdb) n
97 ptr = shmat(shmid, NULL, 0);
(gdb)
98 if (ptr == (void *)-1) {
(gdb) p ptr
$4 = (void *) 0x7fffe4800000

Looking at OS allocation:

$ cd /proc/$(pidof mysqld);egrep -A 20 '(SYSV|huge)' smaps
7fffe4800000-7fffe4c00000 rw-s 00000000 00:0f 37519390 /SYSV00000000 (deleted)
Size: 4096 kB
KernelPageSize: 2048 kB
MMUPageSize: 2048 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
VmFlags: rd wr sh mr mw me ms de ht sd

So for a 2M chunk size, 4M are allocated. 49% of which isn't used.

The same linear relationship hold with 1G or even 16G huge pages.
Thats a lot of wastage. Asking sysadmins to set
innodb_buffer_pool_chunk_size to 2% less of the large-page-size seem
like a poor choise.

After this commit, things get sane:

Thread 3 "mysqld" hit Breakpoint 1, buf_chunk_init (buf_pool=0x7fffe0325618, chunk=0x7fffd80012d8, mem_size=2097152, mutex=0x7fffe624d350)
at /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc:982
982 {
(gdb) n
995 if (!buf_pool->allocate_chunk(mem_size, chunk)) {
(gdb) c
Continuing.

Thread 3 "mysqld" hit Breakpoint 2, os_mem_alloc_large (n=n@entry=0x7fffe5598918) at /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc:83
83 if (!os_use_large_pages || !os_large_page_size) {
(gdb) p *n
$1 = 2097152
(gdb) n
89 size = ut_2pow_round(*n + (os_large_page_size - 1), os_large_page_size);
(gdb)
91 shmid = shmget(IPC_PRIVATE, (size_t)size, SHM_HUGETLB | SHM_R | SHM_W);
(gdb) p size
$2 = 2097152
(gdb) n
92 if (shmid < 0) {
(gdb)
97 ptr = shmat(shmid, NULL, 0);
(gdb)
98 if (ptr == (void *)-1) {
(gdb) p ptr
$3 = (void *) 0x7fffe4a00000

$ cd /proc/$(pidof mysqld);egrep -A 20 '(SYSV|huge)' smaps
7fffe4a00000-7fffe4c00000 rw-s 00000000 00:0f 37552220 /SYSV00000000 (deleted)
Size: 2048 kB
KernelPageSize: 2048 kB
MMUPageSize: 2048 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
VmFlags: rd wr sh mr mw me ms de ht sd

2M meg allocation when we specified innodb_buffer_pool_chunk_size=2M

Rather than add a small extra amount on the size of chunks, keep it
of the specified size. The rest of the chunk initialization code
adapts to this small size reduction. This has been made in the general
case, not just large pages, to keep it simple.

The chunks size is controlled by innodb-buffer-pool-chunk-size. In the
code increasing this by a descriptor table size length makes it
difficult with large pages. With innodb-buffer-pool-chunk-size set to 2M
the code before this commit would of added a small amount extra to this
value when it tried to allocate this. While not normally a problem it is
with large pages, it now requires addition space, a whole extra large
page. With a number of pools, or with 1G or 16G large pages this is
quite significant.

By removing this additional amount, DBAs can set
innodb-buffer-pool-chunk size to the large page size, or a multiple of
it, and actually get that amount allocated. Previously they had to fudge
a value less.

As highlighted in bug #90943 and in Fernando Laudares,
innodb buffer pool and large pages is greedy.
https://fosdem.org/2019/schedule/event/hugepages_databases/

To highlight how greedy:

Before:

$ gdb --args   ./runtime_output_directory/mysqld --no-defaults  --datadir=/tmp/mysqldata   --innodb-buffer-pool-size=20M --innodb-buffer-pool-instances=2 --innodb-buffer-pool-chunk-size=2M --large-pages --large-page-size=2M

(gdb) break buf_pool_init
Breakpoint 1 at 0x1fb0fe0: file /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc, line 1432.
(gdb) break os_mem_alloc_large(unsigned long*)
Breakpoint 2 at 0x1e54a20: file /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc, line 83.
(gdb) r
Thread 2 "mysqld" hit Breakpoint 1, buf_pool_init (total_size=20971520, n_instances=1) at /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc:1436
1436	  const ulint size = total_size / n_instances;
(gdb) n
1442	  NUMA_MEMPOLICY_INTERLEAVE_IN_SCOPE;
(gdb) p size
$1 = 20971520
Thread 3 "mysqld" hit Breakpoint 2, os_mem_alloc_large (n=n@entry=0x7fffe5598918) at /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc:83
83	  if (!os_use_large_pages || !os_large_page_size) {
(gdb) f
83	  if (!os_use_large_pages || !os_large_page_size) {
(gdb) p *n
$2 = 2162688
(gdb) n
89	  size = ut_2pow_round(*n + (os_large_page_size - 1), os_large_page_size);
(gdb)
91	  shmid = shmget(IPC_PRIVATE, (size_t)size, SHM_HUGETLB | SHM_R | SHM_W);
(gdb)
92	  if (shmid < 0) {
(gdb) p size
$3 = 4194304
(gdb) n
97	    ptr = shmat(shmid, NULL, 0);
(gdb)
98	    if (ptr == (void *)-1) {
(gdb) p ptr
$4 = (void *) 0x7fffe4800000

Looking at OS allocation:

$  cd /proc/$(pidof mysqld);egrep -A 20 '(SYSV|huge)' smaps
7fffe4800000-7fffe4c00000 rw-s 00000000 00:0f 37519390                   /SYSV00000000 (deleted)
Size:               4096 kB
KernelPageSize:     2048 kB
MMUPageSize:        2048 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms de ht sd

So for a 2M chunk size, 4M are allocated. 49% of which isn't used.

The same linear relationship hold with 1G or even 16G huge pages.
Thats a lot of wastage. Asking sysadmins to set
innodb_buffer_pool_chunk_size to 2% less of the large-page-size seem
like a poor choise.

After this commit, things get sane:

Thread 3 "mysqld" hit Breakpoint 1, buf_chunk_init (buf_pool=0x7fffe0325618, chunk=0x7fffd80012d8, mem_size=2097152, mutex=0x7fffe624d350)
    at /home/dan/repos/mysql-server/storage/innobase/buf/buf0buf.cc:982
982	{
(gdb) n
995	  if (!buf_pool->allocate_chunk(mem_size, chunk)) {
(gdb) c
Continuing.

Thread 3 "mysqld" hit Breakpoint 2, os_mem_alloc_large (n=n@entry=0x7fffe5598918) at /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc:83
83	  if (!os_use_large_pages || !os_large_page_size) {
(gdb) p *n
$1 = 2097152
(gdb) n
89	  size = ut_2pow_round(*n + (os_large_page_size - 1), os_large_page_size);
(gdb)
91	  shmid = shmget(IPC_PRIVATE, (size_t)size, SHM_HUGETLB | SHM_R | SHM_W);
(gdb) p size
$2 = 2097152
(gdb) n
92	  if (shmid < 0) {
(gdb)
97	    ptr = shmat(shmid, NULL, 0);
(gdb)
98	    if (ptr == (void *)-1) {
(gdb) p ptr
$3 = (void *) 0x7fffe4a00000

$  cd /proc/$(pidof mysqld);egrep -A 20 '(SYSV|huge)' smaps
7fffe4a00000-7fffe4c00000 rw-s 00000000 00:0f 37552220                   /SYSV00000000 (deleted)
Size:               2048 kB
KernelPageSize:     2048 kB
MMUPageSize:        2048 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms de ht sd

2M meg allocation when we specified innodb_buffer_pool_chunk_size=2M

Rather than add a small extra amount on the size of chunks, keep it
of the specified size. The rest of the chunk initialization code
adapts to this small size reduction. This has been made in the general
case, not just large pages, to keep it simple.

The chunks size is controlled by innodb-buffer-pool-chunk-size. In the
code increasing this by a descriptor table size length makes it
difficult with large pages. With innodb-buffer-pool-chunk-size set to 2M
the code before this commit would of added a small amount extra to this
value when it tried to allocate this. While not normally a problem it is
with large pages, it now requires addition space, a whole extra large
page. With a number of pools, or with 1G or 16G large pages this is
quite significant.

By removing this additional amount, DBAs can set
innodb-buffer-pool-chunk size to the large page size, or a multiple of
it, and actually get that amount allocated. Previously they had to fudge
a value less.
@grooverdan
Copy link
Author

I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

@mysql-oca-bot
Copy link

Hi, thank you for your contribution. Please confirm this code is submitted under the terms of the OCA (Oracle's Contribution Agreement) you have previously signed by cutting and pasting the following text as a comment:
"I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it."
Thanks

@akopytov
Copy link

This also fixes https://bugs.mysql.com/bug.php?id=79379.

@mysql-oca-bot
Copy link

Hi, thank you for your contribution. Your code has been assigned to an internal queue. Please follow
bug http://bugs.mysql.com/bug.php?id=94693 for updates.
Thanks

@grooverdan grooverdan deleted the 8.0-bug-XXXXX-innodb-buf-pool-chunk-overallocate branch March 18, 2019 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants