MDEV-23369 False sharing in page_hash_latch::read_lock_wait()

dr-m · dr-m · commit c12d24e29162 · 2020-08-02T20:39:36.000+03:00
MDEV-22871 refactored the InnoDB buf_pool.page_hash to use a simple
rw-lock implementation that avoids a spinloop between non-contended
read-lock requests, simply using std::atomic::fetch_add() for the
lock acquisition.

Alas, in a write-heavy stress test on a 56-core system with 1,000
concurrent client connections, the server would stop processing
any transactions every now and then. The reason turned out to be
false sharing. Attaching a debugger to the server during one such
hang revealed that 22 of the 1,033 threads were polling in
page_hash_latch::read_lock_wait() on the same object, which appeared
to be in unlocked state (no readers or writers). All 22 requests were
for accessing an undo log page, with a distinct page number.

To eliminate such false sharing, we will make buf_pool.page_hash.array
contain one page_hash_latch per CPU data cache line. On AMD64, this
will pad the size of the array by 8/7, or almost 15%. For a 50GiB
buffer pool of 16KiB pages, the buf_pool.page_hash.array would
grow from 25MiB to 28.6MiB. On other instruction set architectures,
the incurred memory overhead may be smaller.

Thanks to Vladislav Vaintroub for noticing this anomaly.
diff --git a/storage/innobase/include/buf0buf.h b/storage/innobase/include/buf0buf.h
@@ -1824,7 +1824,8 @@ class buf_pool_t
   {
     /** Number of array[] elements per page_hash_latch.
     Must be one less than a power of 2. */
-    static constexpr size_t ELEMENTS_PER_LATCH= 1023;
+    static constexpr size_t ELEMENTS_PER_LATCH= CPU_LEVEL1_DCACHE_LINESIZE /
+      sizeof(void*) - 1;
 
     /** number of payload elements in array[] */
     Atomic_relaxed<ulint> n_cells;

Original file line number	Diff line number	Diff line change
`@@ -1824,7 +1824,8 @@ class buf_pool_t`
`1824`	`1824`	`{`
`1825`	`1825`	`/** Number of array[] elements per page_hash_latch.`
`1826`	`1826`	`Must be one less than a power of 2. */`
`1827`		`- static constexpr size_t ELEMENTS_PER_LATCH= 1023;`
	`1827`	`+ static constexpr size_t ELEMENTS_PER_LATCH= CPU_LEVEL1_DCACHE_LINESIZE /`
	`1828`	`+ sizeof(void*) - 1;`
`1828`	`1829`
`1829`	`1830`	`/** number of payload elements in array[] */`
`1830`	`1831`	`Atomic_relaxed<ulint> n_cells;`