Lower level index pages is missed from parent page [CORE1300] #717
Submitted by: @hvlad
Sometimes parent page missed nodes pointed at lower level pages.
This error can be detected by gfix. It reports in firebird.log 2 errors for the same index page for each such case :
Index XXX is corrupt on page YYY level 1. File: \Fb2\fb2.0\src\jrd\validation.cpp, line: 1656
(line number as per 2.0 sources)
This error itself can't lead to wrong query results, AFAIK, but i don't know if it can lead to more serious corruptions being present in actively modified index for a long time.
Also this bug can produce 'wrong page type expected 7 found XXX' bugchecks
The text was updated successfully, but these errors were encountered:
Commented by: @hvlad
Below is explanation of a bug and solution :
When we add an entry into the index we search through b-tree branch down to leaf page, insert node there and if leaf page splits we add new page number at one level upper.
Upper page number is remembered in variable ("index" in add_node) before we do handoff from this page down to leaf page
After that we re-fetch upper page by remembered page number and add split page number into it. Note - we don't retain lock at upper page while inserting key into leaf page
But this upper page can be removed from index when we finish split at lower level - thus we will insert split page number into removed page
If this removed page is re-allocated at this point (not a case for this bug report) then we may have 'wrong page type expected 7 found xxx' error if it re-allocated as non-index page. In my test case this page is not allocated thus i have it completed without such errors
I can offer 3 ways to solve the issue :
a) mark parent page with btr_dont_gc flag before CCH_HANDOFF and clear this mark after return from add_node
Easy to implement but make additional page fetches with LCK_write lock which is not necessary in most cases.
b) retain LCK_read lock on parent page, replace CCH_HANDOFF by CCH_FETCH and remove last CCH_FETCH
Also easy to implement but retain more that one page locked. I doubt it can lead to deadlocks as page locks acquired in strong order and we have similar locking schema in btr\garbage_collect
c) detect parent page change and search for correct insertion point starting from root page
Harder to implement and will be not necessary for most cases as parent page can be changed many times and still stay in index at its original place
I've implemented both (a) and (b) and found that (b) is much worse from performance POV.