MDEV-33588 buf::Block_hint is a performance hog#3098
Conversation
|
|
storage/innobase/btr/btr0pcur.cc
Outdated
| release_left_block: | ||
| mtr->release_last_page(); | ||
| return false; | ||
| mtr->lock_register(savepoint + 1, mtr_memo_type_t(mode << 5)); |
There was a problem hiding this comment.
This code path is obviously not covered by our regression test suite, but it causes massive amounts of failures in @mleich1’s RQG test runs. It should be savepoint and not savepoint + 1. Typically we would have savepoint==1 here, corresponding to a freshly started mini-transaction that holds block at mtr->m_memo slot 0 and left at mtr->m_memo slot 1.
There was a problem hiding this comment.
b56d2eb addresses this (in another branch).
There was a problem hiding this comment.
There is still something wrong with btr_pcur_move_backward_from_page(). I hope to get an rr replay trace for that soon.
There was a problem hiding this comment.
9b169a6 hopefully addresses this by invoking btr_latch_prev() and by improving that function so that the preceding block is always mtr->at_savepoint(1).
| const page_t *const p= prev->page.frame; | ||
| if (memcmp_aligned<4>(FIL_PAGE_NEXT + p, FIL_PAGE_OFFSET + page, 4) || | ||
| memcmp_aligned<2>(FIL_PAGE_TYPE + p, FIL_PAGE_TYPE + page, 2) || | ||
| memcmp_aligned<2>(PAGE_HEADER + PAGE_INDEX_ID + p, | ||
| PAGE_HEADER + PAGE_INDEX_ID + page, 8) || | ||
| page_is_comp(p) != page_is_comp(page)) | ||
| { | ||
| ut_ad("corrupted" == 0); // FIXME: remove this |
There was a problem hiding this comment.
Matthias reported an assertion failure here, due to the FIL_PAGE_NEXT check. In the rr replay trace that I analyzed, the cause is that while btr_latch_prev() was waiting for the previous page latch in buf_page_get_gen(), there was a btr_page_split_and_insert() executed in another thread.
I think that we must implement a retry logic in case the FIL_PAGE_PREV of our current page changed while we were not holding a page latch.
There was a problem hiding this comment.
Indeed, latching right to left is not straight forward. AFAIK, we don't need such latching order in general case. It is needed for backward scan and perhaps some other special cases. In 8.0, for pessimistic operation, the order is allowed to reverse for non-leaf pages with certain strict latching rules (Yasufulmi's improvement SX latching).
Can you tell me the scenario when we have this flow ? Yet to see the details but hope it is not in genera; DML/Query path if the design is not significantly different from 5.7/8.0.
There was a problem hiding this comment.
The SX latch mode was implemented in MySQL 5.7 already. I renamed it to U in the sux_lock. It is not free from hangs; see f209647 for a prominent example.
This code is mostly being used in btr_pcur_move_backward_from_page() but also in the change buffer (BTR_MODIFY_PREV).
mariadb-DebarunBanerjee
left a comment
There was a problem hiding this comment.
Hi Marko,
The patch is doing way more than removing buf::Block_hint. In fact it is making a major design change.
I checked the design and doesn't look very promising. The current design is to buffer fix the page and take the locks previous-current-next in order. This is used for usual scan and repositioning. During reposition the only possible conflict is a concurrent pessimistic operation.
The modified design is doing a try latch on reverse order which is bound to conflict and cause repeated retry (which we nee to do) with any concurrent Query. Historically the reverse scan is also known to have performance issue.
I don't agree with the design change and I think will cause issues for us in future. May be if you can explain the reason for the performance hog, we can improve it without changing the latch order.
Regards,
Deb
| const page_t *const p= prev->page.frame; | ||
| if (memcmp_aligned<4>(FIL_PAGE_NEXT + p, FIL_PAGE_OFFSET + page, 4) || | ||
| memcmp_aligned<2>(FIL_PAGE_TYPE + p, FIL_PAGE_TYPE + page, 2) || | ||
| memcmp_aligned<2>(PAGE_HEADER + PAGE_INDEX_ID + p, | ||
| PAGE_HEADER + PAGE_INDEX_ID + page, 8) || | ||
| page_is_comp(p) != page_is_comp(page)) | ||
| { | ||
| ut_ad("corrupted" == 0); // FIXME: remove this |
There was a problem hiding this comment.
Indeed, latching right to left is not straight forward. AFAIK, we don't need such latching order in general case. It is needed for backward scan and perhaps some other special cases. In 8.0, for pessimistic operation, the order is allowed to reverse for non-leaf pages with certain strict latching rules (Yasufulmi's improvement SX latching).
Can you tell me the scenario when we have this flow ? Yet to see the details but hope it is not in genera; DML/Query path if the design is not significantly different from 5.7/8.0.
storage/innobase/buf/buf0buf.cc
Outdated
| { | ||
| transactional_shared_lock_guard<page_hash_latch> g | ||
| {buf_pool.page_hash.lock_get(chain)}; | ||
| if (UNIV_UNLIKELY(id != block->page.id() || !block->page.frame)) | ||
| if (UNIV_UNLIKELY(!buf_pool.is_uncompressed(block) || | ||
| id != block->page.id() || !block->page.frame)) | ||
| return false; |
There was a problem hiding this comment.
I tested this with 79e3a65 (#3107) to see if the transactional_lock_guard works well:
perf record -e tx-abort ./mtr innodb.innodb_buffer_pool_resize
perf reportIn that branch, it is OK; we have much more xabort() in the buffer pool lookup functions:
Samples: 5K of event 'tx-abort', Event count (approx.): 101716
Overhead Command Shared Object Symbol
99,98% mariadbd mariadbd [.] buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool)
0,01% mariadbd mariadbd [.] buf_page_create_low(page_id_t, unsigned long, mtr_t*, buf_block_t*)
0,00% mariadbd mariadbd [.] buf_page_optimistic_get(rw_lock_type_t, buf_block_t*, page_id_t, unsigned long, mtr_t*)
In a longer sysbench benchmark, there were more xabort() in the optimistic function:
Samples: 64K of event 'tx-abort', Event count (approx.): 476326
Overhead Command Shared Object Symbol
83,71% mariadbd mariadbd [.] buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool)
15,73% mariadbd mariadbd [.] buf_page_optimistic_get(rw_lock_type_t, buf_block_t*, page_id_t, unsigned long, mtr_t*)
0,33% mariadbd mariadbd [.] TMLockGuard::TMLockGuard(lock_sys_t::hash_table&, page_id_t)
0,10% mariadbd mariadbd [.] buf_page_create_low(page_id_t, unsigned long, mtr_t*, buf_block_t*)
0,08% mariadbd mariadbd [.] buf_page_free(fil_space_t*, unsigned int, mtr_t*)
0,05% mariadbd mariadbd [.] trx_t::commit_low(mtr_t*)
For this branch, the traversal of buf_pool.chunks could make the memory transaction too big. I reran the same workload on a merge of 4ac8c4c and a4389ec, and got this:
Samples: 18K of event 'tx-abort', Event count (approx.): 278138
Overhead Command Shared Object Symbol
99,55% mariadbd mariadbd [.] buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool)
0,20% mariadbd mariadbd [.] buf_page_init_for_read(unsigned long, page_id_t, unsigned long, bool)
0,13% mariadbd mariadbd [.] buf_page_create_low(page_id_t, unsigned long, mtr_t*, buf_block_t*)
0,06% mariadbd mariadbd [.] buf_page_optimistic_get(rw_lock_type_t, buf_block_t*, page_id_t, unsigned long, mtr_t*)
0,04% mariadbd mariadbd [.] trx_t::commit_low(mtr_t*)
0,02% mariadbd mariadbd [.] TMLockGuard::TMLockGuard(lock_sys_t::hash_table&, page_id_t)
As we can see, the does not appear to be any issue with the buf_page_optimistic_get() here either. There are fewer samples, though: The server did not complete the innodb_fast_shutdown=0. I had to kill the process because it hung due to running out of buffer pool. With #3107 we would abort the buffer pool resize in that scenario.
There was a problem hiding this comment.
The removed Block_hint::buffer_fix_block_if_still_valid() was also essentially performing a buf_pool.page_hash lookup, for acquiring a latch on the hash table slice.
One thing that we can do better is to only buffer-fix the block while holding buf_pool.page_hash, and try to acquire the buf_page_t::lock after we have released the buf_pool.page_hash. The buffer-fix will prevent eviction.
There was a problem hiding this comment.
With the ccd4b96 simplification of buf_page_optimistic_get(), the throughput numbers for my 60-second 64-thread 64×100k row Sysbench oltp_update_index were virtually unchanged between that commit and its parent.
ccd4b96 to
c10fd5d
Compare
|
@mariadb-DebarunBanerjee, thank you for the review. c10fd5d addresses some of your concerns. Please re-review. The retry logic in case |
dr-m
left a comment
There was a problem hiding this comment.
As discussed today, I must try to create an mtr test case that covers the btr_pcur_move_backward_from_page() code path.
mariadb-DebarunBanerjee
left a comment
There was a problem hiding this comment.
@dr-m The latest changes look good to me. It changes the design and hence there is some risk but I think it is for good. Let's go ahead.
Added some minor comments.
| ut_ad(!block->page.is_read_fixed()); | ||
| if (UNIV_UNLIKELY(block->page.id() != page_id)) { | ||
| block->page.lock.s_unlock(); | ||
| block->page.lock.x_lock(); | ||
| goto page_id_mismatch; | ||
| } | ||
| get_latch_valid: |
There was a problem hiding this comment.
For my edification, why is the page ID mismatch check is now not needed in the "else" part of the "if" condition ?
if (state >= buf_page_t::UNFIXED
&& allow_ibuf_merge
&& fil_page_get_type(block->page.frame) == FIL_PAGE_INDEX
&& page_is_leaf(block->page.frame)) {
There was a problem hiding this comment.
The function buf_pool_t::corrupted_evict() does two things before releasing the exclusive page latch: Invalidate the page ID, and set the state to FREED while preserving the buffer-fix count. In the else branch we detect the latter:
if (UNIV_UNLIKELY(state < buf_page_t::UNFIXED)) {
mtr->release_last_page();
goto ignore_unfixed;
}In the if branch the condition for page ID mismatch could in fact be redundant; the goto ignore_block or goto ignore_unfixed for the state mismatch should do the same. It is the responsibility of buf_page_t::read_complete() to invoke buf_pool_t::corrupted_evict() when needed. I will revise the logic and remove the redundant check for page ID mismatch.
In so-called optimistic buffer pool lookups, we must not dereference a block descriptor before we have made sure that it is accessible. While buf_pool_t::resize() is running, block descriptors could become invalid. The buf::Block_hint class was essentially duplicating a buf_pool.page_hash lookup that was executed in buf_page_optimistic_get() anyway. For better locality of reference, we had better execute that lookup only once. buf_page_optimistic_fix(): Prepare for buf_page_optimistic_get(). This basically is a simpler version of Buf::Block_hint. buf_page_optimistic_get(): Assume that buf_page_optimistic_fix() has been called and the page identifier verified. Should the block be evicted, the block->modify_clock will be invalidated; we do not need to check the block->page.id() again. It suffices to check the block->modify_clock after acquiring the page latch. btr_pcur_t::old_page_id: Store the expected page identifier for buf_page_optimistic_fix(). btr_pcur_t::block_when_stored: Remove. This was duplicating page_cur_t::block. btr_pcur_optimistic_latch_leaves(): Remove redundant parameters. First, invoke buf_page_optimistic_fix() on the requested page. If needed, acquire a latch on the left page. Finally, acquire a latch on the target page and recheck the block->modify_clock. If the page had been freed while we were not holding a page latch, fall back to the slow path. Validate the FIL_PAGE_PREV after acquiring a latch on the current page. The block->modify_clock is only being incremented when records are deleted or pages reorganized or evicted; it does not guard against concurrent page splits. Reviewed by: Debarun Banerjee
Description
The class
buf::Block_hintthat was added in order to fix MDEV-23693 is an unnecessary performance hog that causes unnecessary lookups onbuf_pool.page_hash.Originally, I was thinking that this function could be removed as part of rewriting the buffer pool resizing in MDEV-29445, but it turns out that we can do it independently of that. We only need to add a field
btr_pcur_t::old_page_idto remember the page identifier where the persistent cursor position was stored, and pass this identifier tobuf_page_optimistic_get(), which will have to perform abuf_pool.page_hashlookup anyway.buf_page_optimistic_fix(): Prepare forbuf_page_optimistic_get(). This basically is a simpler version ofBuf::Block_hint.buf_page_optimistic_get(): Assume thatbuf_page_optimistic_fix()has been called and the page identifier validated. Should the block be evicted, theblock->modify_clockwill be invalidated; we do not need to check theblock->page.id()again. It suffices to check theblock->modify_clockafter acquiring the page latch.btr_pcur_optimistic_latch_leaves(): Remove redundant parameters. First, invokebuf_page_optimistic_get()on the requested page. If needed, acquire a latch on the left page. Finally, acquire a latch on the target page and recheck theblock->modify_clock, and that the left page (if any) points to the current page.btr_latch_prev(): Always append the previous page tomtr_t::m_memoafter the current page, even if we need to wait for the page latch. If theFIL_PAGE_PREVof the current page was changed while we were not holding a latch, fetch the previous page again. (Theblock->modify_clockis not incremented during page merges!) Add some consistency checks.btr_pcur_move_backward_from_page(): Simplify the logic, now that the left block will be at the top ofmtr_t::m_memo.Release Notes
Nothing needed; this is a rather small performance improvement.
How can this PR be tested?
The function
btr_pcur_move_backward_from_page()is not covered by the regression test suite. It is covered by RQG, as we learned back when working on MDEV-30400 and MDEV-29835.Basing the PR against the correct MariaDB version
This is a performance bug fix that depends on earlier substantial refactoring that is only available starting with 10.6.
PR quality check