-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MDEV-15132 Avoid accessing the TRX_SYS page
InnoDB maintains an internal persistent sequence of transaction identifiers. This sequence is used for assigning both transaction start identifiers (DB_TRX_ID=trx->id) and end identifiers (trx->no) as well as end identifiers for the mysql.transaction_registry table that was introduced in MDEV-12894. TRX_SYS_TRX_ID_WRITE_MARGIN: Remove. After this many updates of the sequence we used to update the TRX_SYS page. We can avoid accessing the TRX_SYS page if we modify the InnoDB startup so that resurrecting the sequence from other pages of the transaction system. TRX_SYS_TRX_ID_STORE: Deprecate. The field only exists for the purpose of upgrading from an earlier version of MySQL or MariaDB. Starting with this fix, MariaDB will rely on the fields TRX_UNDO_TRX_ID, TRX_UNDO_TRX_NO in the undo log header page of each non-committed transaction, and on the new field TRX_RSEG_MAX_TRX_ID in rollback segment header pages. Because of this change, setting innodb_force_recovery=5 or 6 may cause the system to recover with trx_sys.get_max_trx_id()==0. We must adjust checks for invalid DB_TRX_ID and PAGE_MAX_TRX_ID accordingly. We will change the startup and shutdown messages to display the trx_sys.get_max_trx_id() in addition to the log sequence number. trx_sys_t::flush_max_trx_id(): Remove. trx_undo_mem_create_at_db_start(), trx_undo_lists_init(): Add an output parameter max_trx_id, to be updated from TRX_UNDO_TRX_ID, TRX_UNDO_TRX_NO. TRX_RSEG_MAX_TRX_ID: New field, for persisting trx_sys.get_max_trx_id() at the time of the latest transaction commit. Startup is not reading the undo log pages of committed transactions. We want to avoid additional page accesses on startup, as well as trouble when all undo logs have been emptied. On startup, we will simply determine the maximum value from all pages that are being read anyway. TRX_RSEG_FORMAT: Redefined from TRX_RSEG_MAX_SIZE. Old versions of InnoDB wrote uninitialized garbage to unused data fields. Because of this, we cannot simply introduce a new field in the rollback segment pages and expect it to be always zero, like it would if the database was created by a recent enough InnoDB version. Luckily, it looks like the field TRX_RSEG_MAX_SIZE was always written as 0xfffffffe. We will indicate a new subformat of the page by writing 0 to this field. This has the nice side effect that after a downgrade to older versions of InnoDB, transactions should fail to allocate any undo log, that is, writes will be blocked. So, there is no problem of getting corrupted transaction identifiers after downgrading. trx_rseg_t::max_size: Remove. trx_rseg_header_create(): Remove the parameter max_size=ULINT_MAX. trx_purge_add_undo_to_history(): Update TRX_RSEG_MAX_SIZE (and TRX_RSEG_FORMAT if needed). This is invoked on transaction commit. trx_rseg_mem_restore(): If TRX_RSEG_FORMAT contains 0, read TRX_RSEG_MAX_SIZE. trx_rseg_array_init(): Invoke trx_sys.init_max_trx_id(max_trx_id + 1) where max_trx_id was the maximum that was encountered in the rollback segment pages and the undo log pages of recovered active, XA PREPARE, or some committed transactions. (See trx_purge_add_undo_to_history() which invokes trx_rsegf_set_nth_undo(..., FIL_NULL, ...); not all committed transactions will be immediately detached from the rollback segment header.)
- Loading branch information
Showing
11 changed files
with
104 additions
and
166 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
c7d0448
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,dr-m,
I don't understand how you use the TRX_UNDO_TRX_ID and TRX_UNDO_TRX_NO fields instead of TRX_SYS_TRX_ID_STORE to complete the assignment of the transaction number. Can you answer it for me?
c7d0448
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@1100hk, it turned out that the field
TRX_SYS_TRX_ID_STORE
is almost redundant. We are basically splitting it to existing fields and the new fieldsTRX_RSEG_MAX_TRX_ID
in rollback segment header pages.What we do is similar to executing the equivalent of
SELECT MAX(autoinc_col) FROM t
for recovering an auto-increment value, with the exception that we are not introducing any extra page reads at startup, and that any gaps in the sequence of transaction identifiers will be preserved (thanks to the addition ofTRX_RSEG_MAX_TRX_ID
). The main motivation why we introduced that field is that it is distributed on rollback segment header pages, and transaction commit was already modifying those pages anyway. By splitting the field, we avoid the bottleneck of accessing theTRX_SYS
page during normal operation. The access toTRX_SYS_TRX_ID_STORE
was causing a potential issue for @svoj’strx_sys
scalability fixes (probably MDEV-15104, because MDEV-15059 was pushed before my change). I felt that it was easiest to eliminate the redundant accesses to theTRX_SYS
page.On startup, InnoDB always reads the undo log records of any incomplete (recovered) transactions, and it also reads all rollback segment header pages to find information about not-yet-purged committed transactions.
For incomplete transactions that have persistently generated any undo log records, the start identifier (
DB_TRX_ID
ortrx->id
) will be available in the undo logs. For committed transactions, the end identifier (trx->no
) will be written inTRX_UNDO_TRX_NO
, but such records will be removed by purge later. That is why we need theTRX_RSEG_MAX_TRX_ID
.This area is partly covered by my talk Deep Dive: InnoDB Transactions and Write Paths (in particular, the slide "A Page View of the InnoDB Transaction Layer").
Also, note that there was a follow-up fix in dcc09af, which deals with pre-existing garbage on undo log pages.
c7d0448
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your generous sharing. After listening to your explanation and reading the source code, I have understood the idea. Thank you very much for sharing a great learning platform, and I will share it with my partner. Thank you.