Skip to content
Permalink
Browse files
MDEV-21136 InnoDB's records_in_range estimates can be way off
Get rid of BTR_ESTIMATE and btr_cur_t::path_arr.

Before the fix btr_estimate_n_rows_in_range_low() used two
btr_cur_search_to_nth_level() calls to create two arrays of tree path,
the array per border. And then it tried to estimate the number of rows
diving level-by-level with the array elements. As the path pages are
unlatched during the arrays iterating, the tree could be modified, the
estimation function called itself until the number of attempts exceed.

After the fix the estimation happens during search process. Roughly, the
algorithm is the following. Dive in the left page, then if there are pages
between left and right ones, read a few pages to the right, if the right
page is reached, fetch it and count the exact number of rows, otherwise
count the estimated number of rows, and fetch the right page.

The latching order corresponds to WL#6326 rules, i.e.:

(2.1) [same as (1.1)]: Page latches must be acquired in descending order
of tree level.

(2.2) When acquiring a node pointer page latch at level L, we must hold
the left sibling page latch (at level L) or some ancestor latch
(at level>L).

When we dive to the level down, the parent page is unlatched only after
the the current level page is latched. When we estimate the number of rows
on some level, we latch the left border, then fetch the next page, and
then fetch the next page unlatching the previous page after the current
page is latched until the right border is reached. I.e. the left sibling
is always latched when we acquire page latch on the same level. When we
reach the right border, the current page is unlatched, and then the right
border is latched. Following to (2.2) rule, we can do this because the
right border's parent is latched.
  • Loading branch information
vlad-lesin committed Jul 25, 2022
1 parent 6156a2b commit 222e800
Show file tree
Hide file tree
Showing 5 changed files with 573 additions and 562 deletions.

0 comments on commit 222e800

Please sign in to comment.