From ab0190101b0587e0e03b2d75a967050b9a85fd1b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Marko=20M=C3=A4kel=C3=A4?= Date: Fri, 21 Oct 2022 10:02:54 +0300 Subject: [PATCH] MDEV-24402: InnoDB CHECK TABLE ... EXTENDED Until now, the attribute EXTENDED of CHECK TABLE was ignored by InnoDB, and InnoDB only counted the records in each index according to the current read view. Unless the attribute QUICK was specified, the function btr_validate_index() would be invoked to validate the B-tree structure (the sibling and child links between index pages). The EXTENDED check will not only count all index records according to the current read view, but also ensure that any delete-marked records in the clustered index are waiting for the purge of history, and that all secondary index records point to a version of the clustered index record that is waiting for the purge of history. In other words, no index may contain orphan records. Normal MVCC reads and the non-EXTENDED version of CHECK TABLE would ignore these orphans. Unpurged records merely result in warnings (at most one per index), not errors, and no indexes will be flagged as corrupted due to such garbage. It will remain possible to SELECT data from such indexes or tables (which will skip such records) or to rebuild the table to reclaim some space. We introduce purge_sys.end_view that will be (almost) a copy of purge_sys.view at the end of a batch of purging committed transaction history. It is not an exact copy, because if the size of a purge batch is limited by innodb_purge_batch_size, some records that purge_sys.view would allow to be purged will be left over for subsequent batches. The purge_sys.view is relevant in the purge of committed transaction history, to determine if records are safe to remove. The new purge_sys.end_view is relevant in MVCC operations and in CHECK TABLE ... EXTENDED. It tells which undo log records are safe to access (have not been discarded at the end of a purge batch). purge_sys.clone_oldest_view(): In trx_lists_init_at_db_start(), clone the oldest read view similar to purge_sys_t::clone_end_view() so that CHECK TABLE ... EXTENDED will not report bogus failures between InnoDB restart and the completed purge of committed transaction history. purge_sys_t::is_purgeable(): Replaces purge_sys_t::changes_visible() in the case that purge_sys.latch will not be held by the caller. Among other things, this guards access to BLOBs. It is not safe to dereference any BLOBs of a delete-marked purgeable record, because they may have already been freed. purge_sys_t::view_guard::view(): Return a reference to purge_sys.view that will be protected by purge_sys.latch, held by purge_sys_t::view_guard. purge_sys_t::end_view_guard::view(): Return a reference to purge_sys.end_view while it is protected by purge_sys.end_latch. Whenever a thread needs to retrieve an older version of a clustered index record, it will hold a page latch on the clustered index page and potentially also on a secondary index page that points to the clustered index page. If these pages contain purgeable records that would be accessed by a currently running purge batch, the progress of the purge batch would be blocked by the page latches. Hence, it is safe to make a copy of purge_sys.end_view while holding an index page latch, and consult the copy of the view to determine whether a record should already have been purged. btr_validate_index(): Remove a redundant check. row_check_index_match(): Check if a secondary index record and a version of a clustered index record match each other. row_check_index(): Replaces row_scan_index_for_mysql(). Count the records in each index directly, duplicating the relevant logic from row_search_mvcc(). Initialize check_table_extended_view for CHECK ... EXTENDED while holding an index leaf page latch. If we encounter an orphan record, the copy of purge_sys.end_view that we make is safe for visibility checks, and trx_undo_get_undo_rec() will check for the safety to access each undo log record. Should that check fail, we should return DB_MISSING_HISTORY to report a corrupted index. The EXTENDED check tries to match each secondary index record with every available clustered index record version, by duplicating the logic of row_vers_build_for_consistent_read() and invoking trx_undo_prev_version_build() directly. Before invoking row_check_index_match() on delete-marked clustered index record versions, we will consult purge_sys.is_purgeable() in order to avoid accessing freed BLOBs. We will always check that the DB_TRX_ID or PAGE_MAX_TRX_ID does not exceed the global maximum. Orphan secondary index records will be flagged only if everything up to PAGE_MAX_TRX_ID has been purged. We warn also about clustered index records whose nonzero DB_TRX_ID should have been reset in purge or rollback. trx_set_rw_mode(): Move an assertion from ReadView::set_creator_trx_id(). trx_undo_prev_version_build(): Remove two debug-only parameters, and return an error code instead of a Boolean. trx_undo_get_undo_rec(): Return a pointer to the undo log record, or nullptr if one cannot be retrieved. Instead of consulting the purge_sys.view, consult the purge_sys.end_view to determine which records can be accessed. trx_undo_get_rec_if_purgeable(): A variant of trx_undo_get_undo_rec() that will consult purge_sys.view instead of purge_sys.end_view. TRX_UNDO_CHECK_PURGEABILITY: A new parameter to trx_undo_prev_version_build(), passed by row_vers_old_has_index_entry() so that purge_sys.view instead of purge_sys.end_view will be consulted to determine whether a secondary index record may be safely purged. row_upd_changes_disowned_external(): Remove. This should be more expensive than briefly latching purge_sys in trx_undo_prev_version_build() (which may make use of transactional memory). row_sel_reset_old_vers_heap(): New function, split from row_sel_build_prev_vers_for_mysql(). row_sel_build_prev_vers_for_mysql(): Reorder some parameters to simplify the call to row_sel_reset_old_vers_heap(). row_search_for_mysql(): Replaced with direct calls to row_search_mvcc(). sel_node_get_nth_plan(): Define inline in row0sel.h open_step(): Define at the call site, in simplified form. sel_node_reset_cursor(): Merged with the only caller open_step(). --- ReadViewBase::check_trx_id_sanity(): Remove. Let us handle "future" DB_TRX_ID in a more meaningful way: row_sel_clust_sees(): Return DB_SUCCESS if the record is visible, DB_SUCCESS_LOCKED_REC if it is invisible, and DB_CORRUPTION if the DB_TRX_ID is in the future. row_undo_mod_must_purge(), row_undo_mod_clust(): Silently ignore corrupted DB_TRX_ID. We are in ROLLBACK, and we should have noticed that corruption when we were about to modify the record in the first place (leading us to refuse the operation). row_vers_build_for_consistent_read(): Return DB_CORRUPTION if DB_TRX_ID is in the future. Tested by: Matthias Leich Reviewed by: Vladislav Lesin --- .../suite/gcol/r/innodb_virtual_purge.result | 6 +- .../suite/gcol/t/innodb_virtual_purge.test | 6 +- .../suite/innodb/r/trx_id_future.result | 10 +- mysql-test/suite/innodb/t/trx_id_future.test | 14 +- storage/innobase/CMakeLists.txt | 1 - storage/innobase/btr/btr0btr.cc | 5 - storage/innobase/dict/dict0stats.cc | 1 + storage/innobase/fts/fts0ast.cc | 3 +- storage/innobase/handler/ha_innodb.cc | 83 +- storage/innobase/include/read0types.h | 43 +- storage/innobase/include/row0mysql.h | 26 +- storage/innobase/include/row0sel.h | 87 +- storage/innobase/include/row0sel.inl | 138 --- storage/innobase/include/row0upd.h | 10 +- storage/innobase/include/row0vers.h | 8 +- storage/innobase/include/trx0purge.h | 76 +- storage/innobase/include/trx0rec.h | 22 +- storage/innobase/lock/lock0lock.cc | 1 + storage/innobase/que/que0que.cc | 25 +- storage/innobase/row/row0log.cc | 10 +- storage/innobase/row/row0merge.cc | 14 +- storage/innobase/row/row0mysql.cc | 181 +--- storage/innobase/row/row0sel.cc | 955 ++++++++++++++++-- storage/innobase/row/row0umod.cc | 53 +- storage/innobase/row/row0upd.cc | 40 - storage/innobase/row/row0vers.cc | 54 +- storage/innobase/trx/trx0purge.cc | 68 +- storage/innobase/trx/trx0rec.cc | 146 ++- storage/innobase/trx/trx0sys.cc | 34 - storage/innobase/trx/trx0trx.cc | 3 +- 30 files changed, 1246 insertions(+), 877 deletions(-) delete mode 100644 storage/innobase/include/row0sel.inl diff --git a/mysql-test/suite/gcol/r/innodb_virtual_purge.result b/mysql-test/suite/gcol/r/innodb_virtual_purge.result index ee88527ec2e04..48a2d31338267 100644 --- a/mysql-test/suite/gcol/r/innodb_virtual_purge.result +++ b/mysql-test/suite/gcol/r/innodb_virtual_purge.result @@ -24,7 +24,7 @@ COMMIT; UPDATE t1 SET a=1; connection default; InnoDB 0 transactions not purged -CHECK TABLE t1; +CHECK TABLE t1 EXTENDED; Table Op Msg_type Msg_text test.t1 check status OK SELECT b1 FROM t1; @@ -123,7 +123,7 @@ COMMIT; disconnect con1; connection default; InnoDB 0 transactions not purged -CHECK TABLE t1; +CHECK TABLE t1 EXTENDED; Table Op Msg_type Msg_text test.t1 check status OK SELECT b1 FROM t1; @@ -134,7 +134,7 @@ SELECT * FROM t1; a b b1 a1 a4 b3 100 10 10 100 90 100 100 10 10 100 90 100 -CHECK TABLE t2; +CHECK TABLE t2 EXTENDED; Table Op Msg_type Msg_text test.t2 check status OK DROP TABLE t2, t1, t0; diff --git a/mysql-test/suite/gcol/t/innodb_virtual_purge.test b/mysql-test/suite/gcol/t/innodb_virtual_purge.test index c79a817dd4eda..e9e4caf8e07fc 100644 --- a/mysql-test/suite/gcol/t/innodb_virtual_purge.test +++ b/mysql-test/suite/gcol/t/innodb_virtual_purge.test @@ -38,7 +38,7 @@ UPDATE t1 SET a=1; connection default; --source ../../innodb/include/wait_all_purged.inc -CHECK TABLE t1; +CHECK TABLE t1 EXTENDED; SELECT b1 FROM t1; @@ -123,11 +123,11 @@ disconnect con1; connection default; --source ../../innodb/include/wait_all_purged.inc -CHECK TABLE t1; +CHECK TABLE t1 EXTENDED; SELECT b1 FROM t1; SELECT * FROM t1; -CHECK TABLE t2; +CHECK TABLE t2 EXTENDED; DROP TABLE t2, t1, t0; CREATE TABLE t1 (a VARCHAR(30), b INT, a2 VARCHAR(30) GENERATED ALWAYS AS (a) VIRTUAL); diff --git a/mysql-test/suite/innodb/r/trx_id_future.result b/mysql-test/suite/innodb/r/trx_id_future.result index 1ddc0e64f8b51..4f88b1d478398 100644 --- a/mysql-test/suite/innodb/r/trx_id_future.result +++ b/mysql-test/suite/innodb/r/trx_id_future.result @@ -6,13 +6,9 @@ SET GLOBAL innodb_purge_rseg_truncate_frequency=1; CREATE TABLE t1(a INT) row_format=redundant engine=innoDB; INSERT INTO t1 VALUES(1); InnoDB 0 transactions not purged -NOT FOUND /\[Warning\] InnoDB: A transaction id in a record of table `test`\.`t1` is newer than the system-wide maximum/ in mysqld.1.err call mtr.add_suppression("\\[Warning\\] InnoDB: A transaction id in a record of table `test`\\.`t1` is newer than the system-wide maximum"); -SET @save_count = @@max_error_count; -SET max_error_count = 1; +call mtr.add_suppression("\\[ERROR\\] InnoDB: We detected index corruption"); +call mtr.add_suppression("Index for table 't1' is corrupt; try to repair it"); SELECT * FROM t1; -a -Warnings: -Warning 1642 InnoDB: Transaction id in a record of table `test`.`t1` is newer than system-wide maximum. -SET max_error_count = @save_count; +ERROR HY000: Index for table 't1' is corrupt; try to repair it DROP TABLE t1; diff --git a/mysql-test/suite/innodb/t/trx_id_future.test b/mysql-test/suite/innodb/t/trx_id_future.test index b897800fa91b0..18077549cf68a 100644 --- a/mysql-test/suite/innodb/t/trx_id_future.test +++ b/mysql-test/suite/innodb/t/trx_id_future.test @@ -57,19 +57,11 @@ syswrite(FILE, $page, $ps)==$ps || die "Unable to write $file\n"; close(FILE) || die "Unable to close $file"; EOF -# Debug assertions would fail due to the injected corruption. ---let $restart_parameters= --loose-skip-debug-assert --source include/start_mysqld.inc - -let SEARCH_FILE= $MYSQLTEST_VARDIR/log/mysqld.1.err; -let SEARCH_PATTERN= \[Warning\] InnoDB: A transaction id in a record of table `test`\.`t1` is newer than the system-wide maximum; ---source include/search_pattern_in_file.inc - call mtr.add_suppression("\\[Warning\\] InnoDB: A transaction id in a record of table `test`\\.`t1` is newer than the system-wide maximum"); +call mtr.add_suppression("\\[ERROR\\] InnoDB: We detected index corruption"); +call mtr.add_suppression("Index for table 't1' is corrupt; try to repair it"); -# A debug assertion would cause a duplicated message to be output. -SET @save_count = @@max_error_count; -SET max_error_count = 1; +--error ER_NOT_KEYFILE SELECT * FROM t1; -SET max_error_count = @save_count; DROP TABLE t1; diff --git a/storage/innobase/CMakeLists.txt b/storage/innobase/CMakeLists.txt index fa6f3490fb1d5..422d7cd6f4c0f 100644 --- a/storage/innobase/CMakeLists.txt +++ b/storage/innobase/CMakeLists.txt @@ -338,7 +338,6 @@ SET(INNOBASE_SOURCES include/row0row.h include/row0row.inl include/row0sel.h - include/row0sel.inl include/row0types.h include/row0uins.h include/row0umod.h diff --git a/storage/innobase/btr/btr0btr.cc b/storage/innobase/btr/btr0btr.cc index 6843771ac8c8d..e503215e05b27 100644 --- a/storage/innobase/btr/btr0btr.cc +++ b/storage/innobase/btr/btr0btr.cc @@ -5276,11 +5276,6 @@ btr_validate_index( dict_index_t* index, /*!< in: index */ const trx_t* trx) /*!< in: transaction or NULL */ { - /* Full Text index are implemented by auxiliary tables, not the B-tree */ - if (index->online_status != ONLINE_INDEX_COMPLETE || - (index->type & (DICT_FTS | DICT_CORRUPT))) - return DB_SUCCESS; - const bool lockout= index->is_spatial(); mtr_t mtr; diff --git a/storage/innobase/dict/dict0stats.cc b/storage/innobase/dict/dict0stats.cc index cd81d4c7386a7..8d2fa9cdc4298 100644 --- a/storage/innobase/dict/dict0stats.cc +++ b/storage/innobase/dict/dict0stats.cc @@ -33,6 +33,7 @@ Created Jan 06, 2010 Vasil Dimov #include #include "log.h" #include "btr0btr.h" +#include "que0que.h" #include #include diff --git a/storage/innobase/fts/fts0ast.cc b/storage/innobase/fts/fts0ast.cc index bb42f7c9f54f7..74d02d6381795 100644 --- a/storage/innobase/fts/fts0ast.cc +++ b/storage/innobase/fts/fts0ast.cc @@ -1,7 +1,7 @@ /***************************************************************************** Copyright (c) 2007, 2020, Oracle and/or its affiliates. All Rights Reserved. -Copyright (c) 2018, MariaDB Corporation. +Copyright (c) 2018, 2022, MariaDB Corporation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software @@ -28,6 +28,7 @@ Created 2007/3/16 Sunny Bains. #include "fts0ast.h" #include "fts0pars.h" #include "fts0fts.h" +#include "trx0trx.h" /* The FTS ast visit pass. */ enum fts_ast_visit_pass_t { diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 4c7a9d6e6a44c..de407ce9ae5b8 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -3662,7 +3662,7 @@ ha_innobase::init_table_handle_for_HANDLER(void) innobase_register_trx(ht, m_user_thd, m_prebuilt->trx); /* We did the necessary inits in this function, no need to repeat them - in row_search_for_mysql */ + in row_search_mvcc() */ m_prebuilt->sql_stat_start = FALSE; @@ -7411,7 +7411,7 @@ ha_innobase::build_template( /* We must at least fetch all primary key cols. Note that if the clustered index was internally generated by InnoDB on the row id (no primary key was - defined), then row_search_for_mysql() will always + defined), then row_search_mvcc() will always retrieve the row id to a special buffer in the m_prebuilt struct. */ @@ -8943,7 +8943,7 @@ statement issued by the user. We also increment trx->n_mysql_tables_in_use. instructions to m_prebuilt->template of the table handle instance in ::index_read. The template is used to save CPU time in large joins. - 3) In row_search_for_mysql, if m_prebuilt->sql_stat_start is true, we + 3) In row_search_mvcc(), if m_prebuilt->sql_stat_start is true, we allocate a new consistent read view for the trx if it does not yet have one, or in the case of a locking read, set an InnoDB 'intention' table level lock on the table. @@ -9245,7 +9245,7 @@ ha_innobase::change_active_index( } /* The caller seems to ignore this. Thus, we must check - this again in row_search_for_mysql(). */ + this again in row_search_mvcc(). */ DBUG_RETURN(convert_error_code_to_mysql(DB_MISSING_HISTORY, 0, NULL)); } @@ -9845,9 +9845,9 @@ ha_innobase::ft_read( int error; - switch (dberr_t ret = row_search_for_mysql(buf, PAGE_CUR_GE, - m_prebuilt, - ROW_SEL_EXACT, 0)) { + switch (dberr_t ret = row_search_mvcc(buf, PAGE_CUR_GE, + m_prebuilt, + ROW_SEL_EXACT, 0)) { case DB_SUCCESS: error = 0; table->status = 0; @@ -15186,8 +15186,10 @@ ha_innobase::check( DBUG_ENTER("ha_innobase::check"); DBUG_ASSERT(thd == ha_thd()); + DBUG_ASSERT(thd == m_user_thd); ut_a(m_prebuilt->trx->magic_n == TRX_MAGIC_N); ut_a(m_prebuilt->trx == thd_to_trx(thd)); + ut_ad(m_prebuilt->trx->mysql_thd == thd); if (m_prebuilt->mysql_template == NULL) { /* Build the template; we will use a dummy template @@ -15197,7 +15199,6 @@ ha_innobase::check( } if (!m_prebuilt->table->space) { - ib_senderrf( thd, IB_LOG_LEVEL_ERROR, @@ -15205,10 +15206,7 @@ ha_innobase::check( table->s->table_name.str); DBUG_RETURN(HA_ADMIN_CORRUPT); - - } else if (!m_prebuilt->table->is_readable() && - !m_prebuilt->table->space) { - + } else if (!m_prebuilt->table->is_readable()) { ib_senderrf( thd, IB_LOG_LEVEL_ERROR, ER_TABLESPACE_MISSING, @@ -15229,6 +15227,9 @@ ha_innobase::check( ? TRX_ISO_READ_UNCOMMITTED : TRX_ISO_REPEATABLE_READ; + trx_start_if_not_started(m_prebuilt->trx, false); + m_prebuilt->trx->read_view.open(m_prebuilt->trx); + for (dict_index_t* index = dict_table_get_first_index(m_prebuilt->table); index; @@ -15237,25 +15238,22 @@ ha_innobase::check( if (!index->is_committed()) { continue; } + if (index->type & DICT_FTS) { + /* We do not check any FULLTEXT INDEX. */ + continue; + } - if (!(check_opt->flags & T_QUICK) - && !index->is_corrupted()) { - - dberr_t err = btr_validate_index( - index, m_prebuilt->trx); - - if (err != DB_SUCCESS) { - is_ok = false; - - push_warning_printf( - thd, - Sql_condition::WARN_LEVEL_WARN, - ER_NOT_KEYFILE, - "InnoDB: The B-tree of" - " index %s is corrupted.", - index->name()); - continue; - } + if ((check_opt->flags & T_QUICK) || index->is_corrupted()) { + } else if (btr_validate_index(index, m_prebuilt->trx) + != DB_SUCCESS) { + is_ok = false; + push_warning_printf(thd, + Sql_condition::WARN_LEVEL_WARN, + ER_NOT_KEYFILE, + "InnoDB: The B-tree of" + " index %s is corrupted.", + index->name()); + continue; } /* Instead of invoking change_active_index(), set up @@ -15277,7 +15275,7 @@ ha_innobase::check( if (UNIV_UNLIKELY(!m_prebuilt->index_usable)) { if (index->is_corrupted()) { push_warning_printf( - m_user_thd, + thd, Sql_condition::WARN_LEVEL_WARN, HA_ERR_INDEX_CORRUPT, "InnoDB: Index %s is marked as" @@ -15286,7 +15284,7 @@ ha_innobase::check( is_ok = false; } else { push_warning_printf( - m_user_thd, + thd, Sql_condition::WARN_LEVEL_WARN, HA_ERR_TABLE_DEF_CHANGED, "InnoDB: Insufficient history for" @@ -15299,18 +15297,22 @@ ha_innobase::check( m_prebuilt->sql_stat_start = TRUE; m_prebuilt->template_type = ROW_MYSQL_DUMMY_TEMPLATE; m_prebuilt->n_template = 0; - m_prebuilt->need_to_access_clustered = FALSE; + m_prebuilt->read_just_key = 0; + m_prebuilt->autoinc_error = DB_SUCCESS; + m_prebuilt->need_to_access_clustered = + !!(check_opt->flags & T_EXTEND); dtuple_set_n_fields(m_prebuilt->search_tuple, 0); m_prebuilt->select_lock_type = LOCK_NONE; /* Scan this index. */ - if (dict_index_is_spatial(index)) { + if (index->is_spatial()) { ret = row_count_rtree_recs(m_prebuilt, &n_rows); + } else if (index->type & DICT_FTS) { + ret = DB_SUCCESS; } else { - ret = row_scan_index_for_mysql( - m_prebuilt, index, &n_rows); + ret = row_check_index(m_prebuilt, &n_rows); } DBUG_EXECUTE_IF( @@ -15319,11 +15321,18 @@ ha_innobase::check( ret = DB_CORRUPTION; }); - if (ret == DB_INTERRUPTED || thd_killed(m_user_thd)) { + if (ret == DB_INTERRUPTED || thd_killed(thd)) { /* Do not report error since this could happen during shutdown */ break; } + + if (ret == DB_SUCCESS + && m_prebuilt->autoinc_error != DB_MISSING_HISTORY) { + /* See if any non-fatal errors were reported. */ + ret = m_prebuilt->autoinc_error; + } + if (ret != DB_SUCCESS) { /* Assume some kind of corruption. */ push_warning_printf( diff --git a/storage/innobase/include/read0types.h b/storage/innobase/include/read0types.h index bc02fc065f572..e002f1b77e1cc 100644 --- a/storage/innobase/include/read0types.h +++ b/storage/innobase/include/read0types.h @@ -121,19 +121,6 @@ class ReadViewBase inline void snapshot(trx_t *trx); - /** - Check whether transaction id is valid. - @param[in] id transaction id to check - @param[in] name table name - - @todo changes_visible() was an unfortunate choice for this check. - It should be moved towards the functions that load trx id like - trx_read_trx_id(). No need to issue a warning, error log message should - be enough. Although statement should ideally fail if it sees corrupt - data. - */ - static void check_trx_id_sanity(trx_id_t id, const table_name_t &name); - /** Check whether the changes by id are visible. @param[in] id transaction id to check against the view @@ -149,26 +136,6 @@ class ReadViewBase !std::binary_search(m_ids.begin(), m_ids.end(), id); } - /** - Check whether the changes by id are visible. - @param[in] id transaction id to check against the view - @param[in] name table name - @return whether the view sees the modifications of id. - */ - bool changes_visible(trx_id_t id, const table_name_t &name) const - MY_ATTRIBUTE((warn_unused_result)) - { - if (id >= m_low_limit_id) - { - check_trx_id_sanity(id, name); - return false; - } - return id < m_up_limit_id || - m_ids.empty() || - !std::binary_search(m_ids.begin(), m_ids.end(), id); - } - - /** @param id transaction to check @return true if view sees transaction id @@ -180,6 +147,13 @@ class ReadViewBase /** @return the low limit id */ trx_id_t low_limit_id() const { return m_low_limit_id; } + + /** Clamp the low limit id for purge_sys.end_view */ + void clamp_low_limit_id(trx_id_t limit) + { + if (m_low_limit_id > limit) + m_low_limit_id= limit; + } }; @@ -250,7 +224,6 @@ class ReadView: public ReadViewBase */ void set_creator_trx_id(trx_id_t id) { - ut_ad(id > 0); ut_ad(m_creator_trx_id == 0); m_creator_trx_id= id; } @@ -275,8 +248,6 @@ class ReadView: public ReadViewBase A wrapper around ReadViewBase::changes_visible(). Intended to be called by the ReadView owner thread. */ - bool changes_visible(trx_id_t id, const table_name_t &name) const - { return id == m_creator_trx_id || ReadViewBase::changes_visible(id, name); } bool changes_visible(trx_id_t id) const { return id == m_creator_trx_id || ReadViewBase::changes_visible(id); } diff --git a/storage/innobase/include/row0mysql.h b/storage/innobase/include/row0mysql.h index 3c624621b1ddd..a49e2c3f44187 100644 --- a/storage/innobase/include/row0mysql.h +++ b/storage/innobase/include/row0mysql.h @@ -1,7 +1,7 @@ /***************************************************************************** Copyright (c) 2000, 2017, Oracle and/or its affiliates. All Rights Reserved. -Copyright (c) 2017, 2021, MariaDB Corporation. +Copyright (c) 2017, 2022, MariaDB Corporation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software @@ -263,7 +263,7 @@ row_update_for_mysql( /** This can only be used when the current transaction is at READ COMMITTED or READ UNCOMMITTED isolation level. -Before calling this function row_search_for_mysql() must have +Before calling this function row_search_mvcc() must have initialized prebuilt->new_rec_locks to store the information which new record locks really were set. This function removes a newly set clustered index record lock under prebuilt->pcur or @@ -382,22 +382,6 @@ row_rename_table_for_mysql( FOREIGN KEY constraints */ MY_ATTRIBUTE((nonnull, warn_unused_result)); -/*********************************************************************//** -Scans an index for either COOUNT(*) or CHECK TABLE. -If CHECK TABLE; Checks that the index contains entries in an ascending order, -unique constraint is not broken, and calculates the number of index entries -in the read view of the current transaction. -@return DB_SUCCESS or other error */ -dberr_t -row_scan_index_for_mysql( -/*=====================*/ - row_prebuilt_t* prebuilt, /*!< in: prebuilt struct - in MySQL handle */ - const dict_index_t* index, /*!< in: index */ - ulint* n_rows) /*!< out: number of entries - seen in the consistent read */ - MY_ATTRIBUTE((warn_unused_result)); - /* A struct describing a place for an individual column in the MySQL row format which is presented to the table handler in ha_innobase. This template struct is used to speed up row transformations between @@ -606,7 +590,7 @@ struct row_prebuilt_t { ROW_READ_TRY_SEMI_CONSISTENT and to simply skip the row. If the row matches, the next call to - row_search_for_mysql() will lock + row_search_mvcc() will lock the row. This eliminates lock waits in some cases; note that this breaks @@ -615,7 +599,7 @@ struct row_prebuilt_t { the session is using READ COMMITTED or READ UNCOMMITTED isolation level, set in - row_search_for_mysql() if we set a new + row_search_mvcc() if we set a new record lock on the secondary or clustered index; this is used in row_unlock_for_mysql() @@ -847,7 +831,7 @@ innobase_rename_vc_templ( #define ROW_MYSQL_REC_FIELDS 1 #define ROW_MYSQL_NO_TEMPLATE 2 #define ROW_MYSQL_DUMMY_TEMPLATE 3 /* dummy template used in - row_scan_and_check_index */ + row_check_index() */ /* Values for hint_need_to_fetch_extra_cols */ #define ROW_RETRIEVE_PRIMARY_KEY 1 diff --git a/storage/innobase/include/row0sel.h b/storage/innobase/include/row0sel.h index eb83a4bcad656..8134c60fe7206 100644 --- a/storage/innobase/include/row0sel.h +++ b/storage/innobase/include/row0sel.h @@ -1,7 +1,7 @@ /***************************************************************************** Copyright (c) 1997, 2017, Oracle and/or its affiliates. -Copyright (c) 2017, MariaDB Corporation. +Copyright (c) 2017, 2022, MariaDB Corporation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software @@ -24,8 +24,7 @@ Select Created 12/19/1997 Heikki Tuuri *******************************************************/ -#ifndef row0sel_h -#define row0sel_h +#pragma once #include "data0data.h" #include "que0types.h" @@ -58,15 +57,6 @@ void sel_col_prefetch_buf_free( /*======================*/ sel_buf_t* prefetch_buf); /*!< in, own: prefetch buffer */ -/*********************************************************************//** -Gets the plan node for the nth table in a join. -@return plan node */ -UNIV_INLINE -plan_t* -sel_node_get_nth_plan( -/*==================*/ - sel_node_t* node, /*!< in: select node */ - ulint i); /*!< in: get ith plan node */ /**********************************************************************//** Performs a select step. This is a high-level function used in SQL execution graphs. @@ -76,14 +66,6 @@ row_sel_step( /*=========*/ que_thr_t* thr); /*!< in: query thread */ /**********************************************************************//** -Performs an execution step of an open or close cursor statement node. -@return query thread to run next or NULL */ -UNIV_INLINE -que_thr_t* -open_step( -/*======*/ - que_thr_t* thr); /*!< in: query thread */ -/**********************************************************************//** Performs a fetch for a cursor. @return query thread to run next or NULL */ que_thr_t* @@ -136,37 +118,7 @@ row_sel_convert_mysql_key_to_innobase( ulint key_len); /*!< in: MySQL key value length */ -/** Searches for rows in the database. This is used in the interface to -MySQL. This function opens a cursor, and also implements fetch next -and fetch prev. NOTE that if we do a search with a full key value -from a unique index (ROW_SEL_EXACT), then we will not store the cursor -position and fetch next or fetch prev must not be tried to the cursor! - -@param[out] buf buffer for the fetched row in MySQL format -@param[in] mode search mode PAGE_CUR_L -@param[in,out] prebuilt prebuilt struct for the table handler; - this contains the info to search_tuple, - index; if search tuple contains 0 field then - we position the cursor at start or the end of - index, depending on 'mode' -@param[in] match_mode 0 or ROW_SEL_EXACT or ROW_SEL_EXACT_PREFIX -@param[in] direction 0 or ROW_SEL_NEXT or ROW_SEL_PREV; - Note: if this is != 0, then prebuilt must has a - pcur with stored position! In opening of a - cursor 'direction' should be 0. -@return DB_SUCCESS, DB_RECORD_NOT_FOUND, DB_END_OF_INDEX, DB_DEADLOCK, -DB_LOCK_TABLE_FULL, DB_CORRUPTION, or DB_TOO_BIG_RECORD */ -UNIV_INLINE -dberr_t -row_search_for_mysql( - byte* buf, - page_cur_mode_t mode, - row_prebuilt_t* prebuilt, - ulint match_mode, - ulint direction) - MY_ATTRIBUTE((warn_unused_result)); - -/** Searches for rows in the database using cursor. +/** Search for rows in the database using cursor. Function is mainly used for tables that are shared across connections and so it employs technique that can help re-construct the rows that transaction is suppose to see. @@ -184,7 +136,8 @@ It also has optimization such as pre-caching the rows, using AHI, etc. Note: if this is != 0, then prebuilt must has a pcur with stored position! In opening of a cursor 'direction' should be 0. -@return DB_SUCCESS or error code */ +@return DB_SUCCESS, DB_RECORD_NOT_FOUND, DB_END_OF_INDEX, DB_DEADLOCK, +DB_LOCK_TABLE_FULL, DB_CORRUPTION, or DB_TOO_BIG_RECORD */ dberr_t row_search_mvcc( byte* buf, @@ -210,6 +163,21 @@ row_count_rtree_recs( ulint* n_rows); /*!< out: number of entries seen in the consistent read */ +/** +Check the index records in CHECK TABLE. +The index must contain entries in an ascending order, +unique constraint must not be violated by duplicated keys, +and the number of index entries is counted in according to the +current read view. + +@param prebuilt index and transaction +@param n_rows number of records counted + +@return error code +@retval DB_SUCCESS if no error was found */ +dberr_t row_check_index(row_prebuilt_t *prebuilt, ulint *n_rows) + MY_ATTRIBUTE((nonnull, warn_unused_result)); + /** Read the max AUTOINC value from an index. @param[in] index index starting with an AUTO_INCREMENT column @return the largest AUTO_INCREMENT value @@ -382,6 +350,17 @@ struct sel_node_t{ fetches */ }; +/** +Get the plan node for a table in a join. +@param node query graph node for SELECT +@param i plan node element +@return ith plan node */ +inline plan_t *sel_node_get_nth_plan(sel_node_t *node, ulint i) +{ + ut_ad(i < node->n_tables); + return &node->plans[i]; +} + /** Fetch statement node */ struct fetch_node_t{ que_common_t common; /*!< type: QUE_NODE_FETCH */ @@ -476,7 +455,3 @@ row_sel_field_store_in_mysql_format_func( #endif /* UNIV_DEBUG */ const byte* data, /*!< in: data to store */ ulint len); /*!< in: length of the data */ - -#include "row0sel.inl" - -#endif diff --git a/storage/innobase/include/row0sel.inl b/storage/innobase/include/row0sel.inl deleted file mode 100644 index 7880605ca8f4a..0000000000000 --- a/storage/innobase/include/row0sel.inl +++ /dev/null @@ -1,138 +0,0 @@ -/***************************************************************************** - -Copyright (c) 1997, 2014, Oracle and/or its affiliates. All Rights Reserved. - -This program is free software; you can redistribute it and/or modify it under -the terms of the GNU General Public License as published by the Free Software -Foundation; version 2 of the License. - -This program is distributed in the hope that it will be useful, but WITHOUT -ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS -FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. - -You should have received a copy of the GNU General Public License along with -this program; if not, write to the Free Software Foundation, Inc., -51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA - -*****************************************************************************/ - -/**************************************************//** -@file include/row0sel.ic -Select - -Created 12/19/1997 Heikki Tuuri -*******************************************************/ - -#include "que0que.h" - -/*********************************************************************//** -Gets the plan node for the nth table in a join. -@return plan node */ -UNIV_INLINE -plan_t* -sel_node_get_nth_plan( -/*==================*/ - sel_node_t* node, /*!< in: select node */ - ulint i) /*!< in: get ith plan node */ -{ - ut_ad(i < node->n_tables); - - return(node->plans + i); -} - -/*********************************************************************//** -Resets the cursor defined by sel_node to the SEL_NODE_OPEN state, which means -that it will start fetching from the start of the result set again, regardless -of where it was before, and it will set intention locks on the tables. */ -UNIV_INLINE -void -sel_node_reset_cursor( -/*==================*/ - sel_node_t* node) /*!< in: select node */ -{ - node->state = SEL_NODE_OPEN; -} - -/**********************************************************************//** -Performs an execution step of an open or close cursor statement node. -@return query thread to run next or NULL */ -UNIV_INLINE -que_thr_t* -open_step( -/*======*/ - que_thr_t* thr) /*!< in: query thread */ -{ - sel_node_t* sel_node; - open_node_t* node; - ulint err; - - ut_ad(thr); - - node = (open_node_t*) thr->run_node; - ut_ad(que_node_get_type(node) == QUE_NODE_OPEN); - - sel_node = node->cursor_def; - - err = DB_SUCCESS; - - if (node->op_type == ROW_SEL_OPEN_CURSOR) { - - /* if (sel_node->state == SEL_NODE_CLOSED) { */ - - sel_node_reset_cursor(sel_node); - /* } else { - err = DB_ERROR; - } */ - } else { - if (sel_node->state != SEL_NODE_CLOSED) { - - sel_node->state = SEL_NODE_CLOSED; - } else { - err = DB_ERROR; - } - } - - if (err != DB_SUCCESS) { - /* SQL error detected */ - fprintf(stderr, "SQL error %lu\n", (ulong) err); - - ut_error; - } - - thr->run_node = que_node_get_parent(node); - - return(thr); -} - - -/** Searches for rows in the database. This is used in the interface to -MySQL. This function opens a cursor, and also implements fetch next -and fetch prev. NOTE that if we do a search with a full key value -from a unique index (ROW_SEL_EXACT), then we will not store the cursor -position and fetch next or fetch prev must not be tried to the cursor! - -@param[out] buf buffer for the fetched row in MySQL format -@param[in] mode search mode PAGE_CUR_L -@param[in,out] prebuilt prebuilt struct for the table handler; - this contains the info to search_tuple, - index; if search tuple contains 0 field then - we position the cursor at start or the end of - index, depending on 'mode' -@param[in] match_mode 0 or ROW_SEL_EXACT or ROW_SEL_EXACT_PREFIX -@param[in] direction 0 or ROW_SEL_NEXT or ROW_SEL_PREV; - Note: if this is != 0, then prebuilt must has a - pcur with stored position! In opening of a - cursor 'direction' should be 0. -@return DB_SUCCESS, DB_RECORD_NOT_FOUND, DB_END_OF_INDEX, DB_DEADLOCK, -DB_LOCK_TABLE_FULL, DB_CORRUPTION, or DB_TOO_BIG_RECORD */ -UNIV_INLINE -dberr_t -row_search_for_mysql( - byte* buf, - page_cur_mode_t mode, - row_prebuilt_t* prebuilt, - ulint match_mode, - ulint direction) -{ - return(row_search_mvcc(buf, mode, prebuilt, match_mode, direction)); -} diff --git a/storage/innobase/include/row0upd.h b/storage/innobase/include/row0upd.h index d47ec793f8914..cc05df395ea7b 100644 --- a/storage/innobase/include/row0upd.h +++ b/storage/innobase/include/row0upd.h @@ -1,7 +1,7 @@ /***************************************************************************** Copyright (c) 1996, 2018, Oracle and/or its affiliates. All Rights Reserved. -Copyright (c) 2017, 2020, MariaDB Corporation. +Copyright (c) 2017, 2022, MariaDB Corporation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software @@ -118,14 +118,6 @@ row_upd_changes_field_size_or_external( dict_index_t* index, /*!< in: index */ const rec_offs* offsets,/*!< in: rec_get_offsets(rec, index) */ const upd_t* update);/*!< in: update vector */ -/***********************************************************//** -Returns true if row update contains disowned external fields. -@return true if the update contains disowned external fields. */ -bool -row_upd_changes_disowned_external( -/*==============================*/ - const upd_t* update) /*!< in: update vector */ - MY_ATTRIBUTE((nonnull, warn_unused_result)); /***************************************************************//** Builds an update vector from those fields which in a secondary index entry diff --git a/storage/innobase/include/row0vers.h b/storage/innobase/include/row0vers.h index e05b18a8cccde..60f310e1b0f66 100644 --- a/storage/innobase/include/row0vers.h +++ b/storage/innobase/include/row0vers.h @@ -1,7 +1,7 @@ /***************************************************************************** Copyright (c) 1997, 2016, Oracle and/or its affiliates. All Rights Reserved. -Copyright (c) 2017, 2019, MariaDB Corporation. +Copyright (c) 2017, 2022, MariaDB Corporation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software @@ -55,7 +55,7 @@ row_vers_impl_x_locked( const rec_offs* offsets); /** Finds out if a version of the record, where the version >= the current -purge view, should have ientry as its secondary index entry. We check +purge_sys.view, should have ientry as its secondary index entry. We check if there is any not delete marked version of the record where the trx id >= purge view, and the secondary index entry == ientry; exactly in this case we return TRUE. @@ -85,7 +85,9 @@ row_vers_old_has_index_entry( Constructs the version of a clustered index record which a consistent read should see. We assume that the trx id stored in rec is such that the consistent read should not see rec in its present version. -@return DB_SUCCESS or DB_MISSING_HISTORY */ +@return error code +@retval DB_SUCCESS if a previous version was fetched +@retval DB_MISSING_HISTORY if the history is missing (a sign of corruption) */ dberr_t row_vers_build_for_consistent_read( /*===============================*/ diff --git a/storage/innobase/include/trx0purge.h b/storage/innobase/include/trx0purge.h index 6109f7fb35824..3711599bd8ca9 100644 --- a/storage/innobase/include/trx0purge.h +++ b/storage/innobase/include/trx0purge.h @@ -24,8 +24,7 @@ Purge old versions Created 3/26/1996 Heikki Tuuri *******************************************************/ -#ifndef trx0purge_h -#define trx0purge_h +#pragma once #include "trx0sys.h" #include "que0types.h" @@ -123,7 +122,8 @@ class purge_sys_t /** latch protecting view, m_enabled */ alignas(CPU_LEVEL1_DCACHE_LINESIZE) mutable srw_spin_lock latch; private: - /** The purge will not remove undo logs which are >= this view */ + /** Read view at the start of a purge batch. Any encountered index records + that are older than view will be removed. */ ReadViewBase view; /** whether purge is enabled; protected by latch and std::atomic */ std::atomic m_enabled; @@ -133,6 +133,12 @@ class purge_sys_t Atomic_counter m_SYS_paused; /** number of stop_FTS() calls without resume_FTS() */ Atomic_counter m_FTS_paused; + + /** latch protecting end_view */ + alignas(CPU_LEVEL1_DCACHE_LINESIZE) srw_spin_lock_low end_latch; + /** Read view at the end of a purge batch (copied from view). Any undo pages + containing records older than end_view may be freed. */ + ReadViewBase end_view; public: que_t* query; /*!< The query graph which will do the parallelized purge operation */ @@ -261,28 +267,56 @@ class purge_sys_t /** check stop_SYS() */ void check_stop_FTS() { if (must_wait_FTS()) wait_FTS(); } - /** A wrapper around ReadView::changes_visible(). */ - bool changes_visible(trx_id_t id, const table_name_t &name) const - { - return view.changes_visible(id, name); - } + /** Determine if the history of a transaction is purgeable. + @param trx_id transaction identifier + @return whether the history is purgeable */ + TRANSACTIONAL_TARGET bool is_purgeable(trx_id_t trx_id) const; + /** A wrapper around ReadView::low_limit_no(). */ trx_id_t low_limit_no() const { - /* Other callers than purge_coordinator_callback() must be holding - purge_sys.latch here. The purge coordinator task may call this - without holding any latch, because it is the only thread that may - modify purge_sys.view. */ + /* This function may only be called by purge_coordinator_callback(). + + The purge coordinator task may call this without holding any latch, + because it is the only thread that may modify purge_sys.view. + + Any other threads that access purge_sys.view must hold purge_sys.latch, + typically via purge_sys_t::view_guard. */ return view.low_limit_no(); } /** A wrapper around trx_sys_t::clone_oldest_view(). */ + template void clone_oldest_view() { latch.wr_lock(SRW_LOCK_CALL); trx_sys.clone_oldest_view(&view); + if (also_end_view) + (end_view= view). + clamp_low_limit_id(head.trx_no ? head.trx_no : tail.trx_no); latch.wr_unlock(); } + /** Update end_view at the end of a purge batch. */ + inline void clone_end_view(); + + struct view_guard + { + inline view_guard(); + inline ~view_guard(); + + /** @return purge_sys.view */ + inline const ReadViewBase &view() const; + }; + + struct end_view_guard + { + inline end_view_guard(); + inline ~end_view_guard(); + + /** @return purge_sys.end_view */ + inline const ReadViewBase &view() const; + }; + /** Stop the purge thread and check n_ref_count of all auxiliary and common table associated with the fts table. @param table parent FTS table @@ -294,4 +328,20 @@ class purge_sys_t /** The global data structure coordinating a purge */ extern purge_sys_t purge_sys; -#endif /* trx0purge_h */ +purge_sys_t::view_guard::view_guard() +{ purge_sys.latch.rd_lock(SRW_LOCK_CALL); } + +purge_sys_t::view_guard::~view_guard() +{ purge_sys.latch.rd_unlock(); } + +const ReadViewBase &purge_sys_t::view_guard::view() const +{ return purge_sys.view; } + +purge_sys_t::end_view_guard::end_view_guard() +{ purge_sys.end_latch.rd_lock(); } + +purge_sys_t::end_view_guard::~end_view_guard() +{ purge_sys.end_latch.rd_unlock(); } + +const ReadViewBase &purge_sys_t::end_view_guard::view() const +{ return purge_sys.end_view; } diff --git a/storage/innobase/include/trx0rec.h b/storage/innobase/include/trx0rec.h index 52fd97fef9d57..58ec5ab17071d 100644 --- a/storage/innobase/include/trx0rec.h +++ b/storage/innobase/include/trx0rec.h @@ -181,17 +181,17 @@ trx_undo_report_row_operation( is being called purge view and we would like to get the purge record even it is in the purge view (in normal case, it will return without fetching the purge record */ -#define TRX_UNDO_PREV_IN_PURGE 0x1 +static constexpr ulint TRX_UNDO_PREV_IN_PURGE = 1; /** This tells trx_undo_prev_version_build() to fetch the old value in the undo log (which is the after image for an update) */ -#define TRX_UNDO_GET_OLD_V_VALUE 0x2 +static constexpr ulint TRX_UNDO_GET_OLD_V_VALUE = 2; + +/** indicate a call from row_vers_old_has_index_entry() */ +static constexpr ulint TRX_UNDO_CHECK_PURGEABILITY = 4; /** Build a previous version of a clustered index record. The caller must hold a latch on the index page of the clustered index record. -@param index_rec clustered index record in the index tree -@param index_mtr mtr which contains the latch to index_rec page - and purge_view @param rec version of a clustered index record @param index clustered index @param offsets rec_get_offsets(rec, index) @@ -210,14 +210,12 @@ must hold a latch on the index page of the clustered index record. @param v_status status determine if it is going into this function by purge thread or not. And if we read "after image" of undo log -@retval true if previous version was built, or if it was an insert -or the table has been rebuilt -@retval false if the previous version is earlier than purge_view, -or being purged, which means that it may have been removed */ -bool +@return error code +@retval DB_SUCCESS if previous version was successfully built, +or if it was an insert or the undo record refers to the table before rebuild +@retval DB_MISSING_HISTORY if the history is missing */ +dberr_t trx_undo_prev_version_build( - const rec_t *index_rec, - mtr_t *index_mtr, const rec_t *rec, dict_index_t *index, rec_offs *offsets, diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc index 8490773dc6894..0d6a069d50d1b 100644 --- a/storage/innobase/lock/lock0lock.cc +++ b/storage/innobase/lock/lock0lock.cc @@ -44,6 +44,7 @@ Created 5/7/1996 Heikki Tuuri #include "row0vers.h" #include "pars0pars.h" #include "srv0mon.h" +#include "que0que.h" #include diff --git a/storage/innobase/que/que0que.cc b/storage/innobase/que/que0que.cc index 3ea5c15bccc58..80c34af279051 100644 --- a/storage/innobase/que/que0que.cc +++ b/storage/innobase/que/que0que.cc @@ -1,7 +1,7 @@ /***************************************************************************** Copyright (c) 1996, 2016, Oracle and/or its affiliates. All Rights Reserved. -Copyright (c) 2017, 2021, MariaDB Corporation. +Copyright (c) 2017, 2022, MariaDB Corporation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software @@ -566,6 +566,27 @@ que_node_type_string( } #endif /* DBUG_TRACE */ + +/**********************************************************************//** +Performs an execution step of an open or close cursor statement node. +@param thr query thread */ +static void open_step(que_thr_t *thr) +{ + open_node_t *node= static_cast(thr->run_node); + ut_ad(que_node_get_type(node) == QUE_NODE_OPEN); + sel_node_t *sel_node= node->cursor_def; + + if (node->op_type == ROW_SEL_OPEN_CURSOR) + sel_node->state= SEL_NODE_OPEN; + else + { + ut_ad(sel_node->state != SEL_NODE_CLOSED); + sel_node->state= SEL_NODE_CLOSED; + } + + thr->run_node= que_node_get_parent(node); +} + /**********************************************************************//** Performs an execution step on a query thread. @return query thread to run next: it may differ from the input @@ -636,7 +657,7 @@ que_thr_step( } else if (type == QUE_NODE_FETCH) { thr = fetch_step(thr); } else if (type == QUE_NODE_OPEN) { - thr = open_step(thr); + open_step(thr); } else if (type == QUE_NODE_FUNC) { proc_eval_step(thr); diff --git a/storage/innobase/row/row0log.cc b/storage/innobase/row/row0log.cc index ac011697e2c18..cfc3d6f18f2aa 100644 --- a/storage/innobase/row/row0log.cc +++ b/storage/innobase/row/row0log.cc @@ -3842,9 +3842,8 @@ UndorecApplier::get_old_rec(const dtuple_t &tuple, dict_index_t *index, ut_ad(len == DATA_ROLL_PTR_LEN); if (is_same(roll_ptr)) return version; - trx_undo_prev_version_build(*clust_rec, &mtr, version, index, - *offsets, heap, &prev_version, nullptr, - nullptr, 0); + trx_undo_prev_version_build(version, index, *offsets, heap, &prev_version, + nullptr, nullptr, 0); version= prev_version; } while (version); @@ -4014,9 +4013,8 @@ void UndorecApplier::log_update(const dtuple_t &tuple, if (match_rec == rec) copy_rec= rec_copy(mem_heap_alloc( heap, rec_offs_size(offsets)), match_rec, offsets); - trx_undo_prev_version_build(rec, &mtr, match_rec, clust_index, - offsets, heap, &prev_version, nullptr, - nullptr, 0); + trx_undo_prev_version_build(match_rec, clust_index, offsets, heap, + &prev_version, nullptr, nullptr, 0); prev_offsets= rec_get_offsets(prev_version, clust_index, prev_offsets, clust_index->n_core_fields, diff --git a/storage/innobase/row/row0merge.cc b/storage/innobase/row/row0merge.cc index 1b6d23660bb1d..f2b29ae0f16f4 100644 --- a/storage/innobase/row/row0merge.cc +++ b/storage/innobase/row/row0merge.cc @@ -2124,8 +2124,14 @@ row_merge_read_clustered_index( ut_ad(trx->read_view.is_open()); ut_ad(rec_trx_id != trx->id); - if (!trx->read_view.changes_visible( - rec_trx_id, old_table->name)) { + if (!trx->read_view.changes_visible(rec_trx_id)) { + if (rec_trx_id + >= trx->read_view.low_limit_id() + && rec_trx_id + >= trx_sys.get_max_trx_id()) { + goto corrupted_rec; + } + rec_t* old_vers; row_vers_build_for_consistent_read( @@ -4412,9 +4418,7 @@ row_merge_is_index_usable( && (index->table->is_temporary() || index->table->no_rollback() || index->trx_id == 0 || !trx->read_view.is_open() - || trx->read_view.changes_visible( - index->trx_id, - index->table->name))); + || trx->read_view.changes_visible(index->trx_id))); } /** Build indexes on a table by reading a clustered index, creating a temporary diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc index 10fe321a70287..de469c5b0889f 100644 --- a/storage/innobase/row/row0mysql.cc +++ b/storage/innobase/row/row0mysql.cc @@ -1766,7 +1766,7 @@ row_update_for_mysql(row_prebuilt_t* prebuilt) /** This can only be used when the current transaction is at READ COMMITTED or READ UNCOMMITTED isolation level. -Before calling this function row_search_for_mysql() must have +Before calling this function row_search_mvcc() must have initialized prebuilt->new_rec_locks to store the information which new record locks really were set. This function removes a newly set clustered index record lock under prebuilt->pcur or @@ -2937,182 +2937,3 @@ row_rename_table_for_mysql( return(err); } - -/*********************************************************************//** -Scans an index for either COUNT(*) or CHECK TABLE. -If CHECK TABLE; Checks that the index contains entries in an ascending order, -unique constraint is not broken, and calculates the number of index entries -in the read view of the current transaction. -@return DB_SUCCESS or other error */ -dberr_t -row_scan_index_for_mysql( -/*=====================*/ - row_prebuilt_t* prebuilt, /*!< in: prebuilt struct - in MySQL handle */ - const dict_index_t* index, /*!< in: index */ - ulint* n_rows) /*!< out: number of entries - seen in the consistent read */ -{ - dtuple_t* prev_entry = NULL; - ulint matched_fields; - byte* buf; - dberr_t ret; - rec_t* rec; - int cmp; - ibool contains_null; - ulint i; - ulint cnt; - mem_heap_t* heap = NULL; - rec_offs offsets_[REC_OFFS_NORMAL_SIZE]; - rec_offs* offsets; - rec_offs_init(offsets_); - - *n_rows = 0; - - /* Don't support RTree Leaf level scan */ - ut_ad(!dict_index_is_spatial(index)); - - if (dict_index_is_clust(index)) { - /* The clustered index of a table is always available. - During online ALTER TABLE that rebuilds the table, the - clustered index in the old table will have - index->online_log pointing to the new table. All - indexes of the old table will remain valid and the new - table will be unaccessible to MySQL until the - completion of the ALTER TABLE. */ - } else if (dict_index_is_online_ddl(index) - || (index->type & DICT_FTS)) { - /* Full Text index are implemented by auxiliary tables, - not the B-tree. We also skip secondary indexes that are - being created online. */ - return(DB_SUCCESS); - } - - ulint bufsize = std::max(srv_page_size, - prebuilt->mysql_row_len); - buf = static_cast(ut_malloc_nokey(bufsize)); - heap = mem_heap_create(100); - - cnt = 1000; - - ret = row_search_for_mysql(buf, PAGE_CUR_G, prebuilt, 0, 0); -loop: - /* Check thd->killed every 1,000 scanned rows */ - if (--cnt == 0) { - if (trx_is_interrupted(prebuilt->trx)) { - ret = DB_INTERRUPTED; - goto func_exit; - } - cnt = 1000; - } - - switch (ret) { - case DB_SUCCESS: - break; - case DB_DEADLOCK: - case DB_LOCK_TABLE_FULL: - case DB_LOCK_WAIT_TIMEOUT: - case DB_INTERRUPTED: - goto func_exit; - default: - ib::warn() << "CHECK TABLE on index " << index->name << " of" - " table " << index->table->name << " returned " << ret; - /* (this error is ignored by CHECK TABLE) */ - /* fall through */ - case DB_END_OF_INDEX: - ret = DB_SUCCESS; -func_exit: - ut_free(buf); - mem_heap_free(heap); - - return(ret); - } - - *n_rows = *n_rows + 1; - - /* else this code is doing handler::check() for CHECK TABLE */ - - /* row_search... returns the index record in buf, record origin offset - within buf stored in the first 4 bytes, because we have built a dummy - template */ - - rec = buf + mach_read_from_4(buf); - - offsets = rec_get_offsets(rec, index, offsets_, index->n_core_fields, - ULINT_UNDEFINED, &heap); - - if (prev_entry != NULL) { - matched_fields = 0; - - cmp = cmp_dtuple_rec_with_match(prev_entry, rec, offsets, - &matched_fields); - contains_null = FALSE; - - /* In a unique secondary index we allow equal key values if - they contain SQL NULLs */ - - for (i = 0; - i < dict_index_get_n_ordering_defined_by_user(index); - i++) { - if (UNIV_SQL_NULL == dfield_get_len( - dtuple_get_nth_field(prev_entry, i))) { - - contains_null = TRUE; - break; - } - } - - const char* msg; - - if (cmp > 0) { - ret = DB_INDEX_CORRUPT; - msg = "index records in a wrong order in "; -not_ok: - ib::error() - << msg << index->name - << " of table " << index->table->name - << ": " << *prev_entry << ", " - << rec_offsets_print(rec, offsets); - /* Continue reading */ - } else if (dict_index_is_unique(index) - && !contains_null - && matched_fields - >= dict_index_get_n_ordering_defined_by_user( - index)) { - ret = DB_DUPLICATE_KEY; - msg = "duplicate key in "; - goto not_ok; - } - } - - { - mem_heap_t* tmp_heap = NULL; - - /* Empty the heap on each round. But preserve offsets[] - for the row_rec_to_index_entry() call, by copying them - into a separate memory heap when needed. */ - if (UNIV_UNLIKELY(offsets != offsets_)) { - ulint size = rec_offs_get_n_alloc(offsets) - * sizeof *offsets; - - tmp_heap = mem_heap_create(size); - - offsets = static_cast( - mem_heap_dup(tmp_heap, offsets, size)); - } - - mem_heap_empty(heap); - - prev_entry = row_rec_to_index_entry( - rec, index, offsets, heap); - - if (UNIV_LIKELY_NULL(tmp_heap)) { - mem_heap_free(tmp_heap); - } - } - - ret = row_search_for_mysql( - buf, PAGE_CUR_G, prebuilt, 0, ROW_SEL_NEXT); - - goto loop; -} diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc index 8e18aedef21ac..9162fda91edf4 100644 --- a/storage/innobase/row/row0sel.cc +++ b/storage/innobase/row/row0sel.cc @@ -36,6 +36,8 @@ Created 12/19/1997 Heikki Tuuri #include "dict0boot.h" #include "trx0undo.h" #include "trx0trx.h" +#include "trx0purge.h" +#include "trx0rec.h" #include "btr0btr.h" #include "btr0cur.h" #include "btr0sea.h" @@ -54,6 +56,7 @@ Created 12/19/1997 Heikki Tuuri #include "buf0lru.h" #include "srv0srv.h" #include "srv0mon.h" +#include "sql_error.h" #ifdef WITH_WSREP #include "mysql/service_wsrep.h" /* For wsrep_thd_skip_locking */ #endif @@ -282,7 +285,6 @@ row_sel_sec_rec_is_for_clust_rec( rec_offs_init(clust_offsets_); rec_offs_init(sec_offsets_); - ib_vcol_row vc(heap); clust_offs = rec_get_offsets(clust_rec, clust_index, clust_offs, @@ -952,9 +954,12 @@ row_sel_test_other_conds( @param index clustered index @param offsets rec_get_offsets(rec, index) @param view consistent read view -@return whether rec is visible in view */ -static bool row_sel_clust_sees(const rec_t *rec, const dict_index_t &index, - const rec_offs *offsets, const ReadView &view) +@retval DB_SUCCESS if rec is visible in view +@retval DB_SUCCESS_LOCKED_REC if rec is not visible in view +@retval DB_CORRUPTION if the DB_TRX_ID is corrupted */ +static dberr_t row_sel_clust_sees(const rec_t *rec, const dict_index_t &index, + const rec_offs *offsets, + const ReadView &view) { ut_ad(index.is_primary()); ut_ad(page_rec_is_user_rec(rec)); @@ -962,8 +967,16 @@ static bool row_sel_clust_sees(const rec_t *rec, const dict_index_t &index, ut_ad(!rec_is_metadata(rec, index)); ut_ad(!index.table->is_temporary()); - return view.changes_visible(row_get_rec_trx_id(rec, &index, offsets), - index.table->name); + const trx_id_t id= row_get_rec_trx_id(rec, &index, offsets); + + if (view.changes_visible(id)) + return DB_SUCCESS; + if (UNIV_LIKELY(id < view.low_limit_id() || id < trx_sys.get_max_trx_id())) + return DB_SUCCESS_LOCKED_REC; + + ib::warn() << "A transaction id in a record of table " << index.table->name + << " is newer than the system-wide maximum."; + return DB_CORRUPTION; } /*********************************************************************//** @@ -1074,9 +1087,15 @@ row_sel_get_clust_rec( old_vers = NULL; - if (!row_sel_clust_sees(clust_rec, *index, offsets, - *node->read_view)) { + err = row_sel_clust_sees(clust_rec, *index, offsets, + *node->read_view); + switch (err) { + default: + goto err_exit; + case DB_SUCCESS: + break; + case DB_SUCCESS_LOCKED_REC: err = row_sel_build_prev_vers( node->read_view, index, clust_rec, &offsets, &heap, &plan->old_vers_heap, @@ -1594,8 +1613,8 @@ row_sel_try_search_shortcut( ULINT_UNDEFINED, &heap); if (dict_index_is_clust(index)) { - if (!row_sel_clust_sees(rec, *index, offsets, - *node->read_view)) { + if (row_sel_clust_sees(rec, *index, offsets, *node->read_view) + != DB_SUCCESS) { return SEL_RETRY; } } else if (!srv_read_only_mode) { @@ -1962,9 +1981,16 @@ row_sel( a previous version of the record */ if (dict_index_is_clust(index)) { - if (!node->read_view->changes_visible( - row_get_rec_trx_id(rec, index, offsets), - index->table->name)) { + const trx_id_t id = row_get_rec_trx_id( + rec, index, offsets); + + if (!node->read_view->changes_visible(id)) { + if (id >= node->read_view->low_limit_id() + && id >= trx_sys.get_max_trx_id()) { + err = DB_CORRUPTION; + goto lock_wait_or_error; + } + err = row_sel_build_prev_vers( node->read_view, index, rec, &offsets, &heap, &plan->old_vers_heap, @@ -3230,6 +3256,14 @@ static bool row_sel_store_mysql_rec( DBUG_RETURN(true); } +static void row_sel_reset_old_vers_heap(row_prebuilt_t *prebuilt) +{ + if (prebuilt->old_vers_heap) + mem_heap_empty(prebuilt->old_vers_heap); + else + prebuilt->old_vers_heap= mem_heap_create(200); +} + /*********************************************************************//** Builds a previous version of a clustered index record for a consistent read @return DB_SUCCESS or error code */ @@ -3237,9 +3271,8 @@ static MY_ATTRIBUTE((warn_unused_result)) dberr_t row_sel_build_prev_vers_for_mysql( /*==============================*/ - ReadView* read_view, /*!< in: read view */ + row_prebuilt_t* prebuilt, /*!< in/out: prebuilt struct */ dict_index_t* clust_index, /*!< in: clustered index */ - row_prebuilt_t* prebuilt, /*!< in: prebuilt struct */ const rec_t* rec, /*!< in: record in a clustered index */ rec_offs** offsets, /*!< in/out: offsets returned by rec_get_offsets(rec, clust_index) */ @@ -3253,18 +3286,12 @@ row_sel_build_prev_vers_for_mysql( column data */ mtr_t* mtr) /*!< in: mtr */ { - dberr_t err; - - if (prebuilt->old_vers_heap) { - mem_heap_empty(prebuilt->old_vers_heap); - } else { - prebuilt->old_vers_heap = mem_heap_create(200); - } + row_sel_reset_old_vers_heap(prebuilt); - err = row_vers_build_for_consistent_read( - rec, mtr, clust_index, offsets, read_view, offset_heap, + return row_vers_build_for_consistent_read( + rec, mtr, clust_index, offsets, + &prebuilt->trx->read_view, offset_heap, prebuilt->old_vers_heap, old_vers, vrow); - return(err); } /** Helper class to cache clust_rec and old_vers */ @@ -3341,7 +3368,6 @@ Row_sel_get_clust_rec_for_mysql::operator()( access the clustered index */ { dict_index_t* clust_index; - const rec_t* clust_rec; rec_t* old_vers; trx_t* trx; @@ -3364,7 +3390,7 @@ Row_sel_get_clust_rec_for_mysql::operator()( return err; } - clust_rec = btr_pcur_get_rec(prebuilt->clust_pcur); + const rec_t* clust_rec = btr_pcur_get_rec(prebuilt->clust_pcur); prebuilt->clust_pcur->trx_if_known = trx; @@ -3392,8 +3418,6 @@ Row_sel_get_clust_rec_for_mysql::operator()( if (!rtr_info->matches->valid) { mysql_mutex_unlock(&rtr_info->matches->rtr_match_mutex); clust_rec = NULL; - - err = DB_SUCCESS; goto func_exit; } mysql_mutex_unlock(&rtr_info->matches->rtr_match_mutex); @@ -3403,15 +3427,11 @@ Row_sel_get_clust_rec_for_mysql::operator()( && prebuilt->select_lock_type == LOCK_NONE) { clust_rec = NULL; - - err = DB_SUCCESS; goto func_exit; } if (rec != btr_pcur_get_rec(prebuilt->pcur)) { clust_rec = NULL; - - err = DB_SUCCESS; goto func_exit; } @@ -3437,18 +3457,14 @@ Row_sel_get_clust_rec_for_mysql::operator()( nullptr)); ut_ad(low_match < dtuple_get_n_fields_cmp(tuple)); mem_heap_free(heap); - clust_rec = NULL; - err = DB_SUCCESS; - goto func_exit; #endif /* UNIV_DEBUG */ } else if (!rec_get_deleted_flag(rec, dict_table_is_comp(sec_index->table)) || prebuilt->select_lock_type != LOCK_NONE) { /* In a rare case it is possible that no clust rec is found for a delete-marked secondary index - record: if in row0umod.cc in - row_undo_mod_remove_clust_low() we have already removed + record: if row_undo_mod_clust() has already removed the clust rec, while purge is still cleaning and removing secondary index records associated with earlier versions of the clustered index record. @@ -3464,11 +3480,8 @@ Row_sel_get_clust_rec_for_mysql::operator()( "InnoDB: clust index record ", stderr); rec_print(stderr, clust_rec, clust_index); err = DB_CORRUPTION; - clust_rec = NULL; - goto func_exit; } - err = DB_SUCCESS; clust_rec = NULL; goto func_exit; } @@ -3504,11 +3517,20 @@ Row_sel_get_clust_rec_for_mysql::operator()( if (trx->isolation_level == TRX_ISO_READ_UNCOMMITTED || clust_index->table->is_temporary()) { + } else { /* If the isolation level allows reading of uncommitted data, then we never look for an earlier version */ - } else if (!row_sel_clust_sees(clust_rec, *clust_index, - *offsets, trx->read_view)) { + err = row_sel_clust_sees(clust_rec, *clust_index, + *offsets, trx->read_view); + } + + switch (err) { + default: + return err; + case DB_SUCCESS: + break; + case DB_SUCCESS_LOCKED_REC: const buf_page_t& bpage = btr_pcur_get_block( prebuilt->clust_pcur)->page; @@ -3521,7 +3543,7 @@ Row_sel_get_clust_rec_for_mysql::operator()( /* The following call returns 'offsets' associated with 'old_vers' */ err = row_sel_build_prev_vers_for_mysql( - &trx->read_view, clust_index, prebuilt, + prebuilt, clust_index, clust_rec, offsets, offset_heap, &old_vers, vrow, mtr); @@ -3978,7 +4000,8 @@ row_sel_try_search_shortcut_for_mysql( *offsets = rec_get_offsets(rec, index, *offsets, index->n_core_fields, ULINT_UNDEFINED, heap); - if (!row_sel_clust_sees(rec, *index, *offsets, trx->read_view)) { + if (row_sel_clust_sees(rec, *index, *offsets, trx->read_view) + != DB_SUCCESS) { return SEL_RETRY; } @@ -4376,8 +4399,8 @@ row_search_mvcc( /* We need to get the virtual column values stored in secondary index key, if this is covered index scan or virtual key read is requested. */ - bool need_vrow = dict_index_has_virtual(prebuilt->index) - && prebuilt->read_just_key; + bool need_vrow = prebuilt->read_just_key + && prebuilt->index->has_virtual(); /* Reset the new record lock info if READ UNCOMMITTED or READ COMMITED isolation level is used. Then @@ -4881,11 +4904,6 @@ row_search_mvcc( rec = btr_pcur_get_rec(pcur); - if (!index->table->is_readable()) { - err = DB_DECRYPTION_FAILED; - goto page_read_error; - } - ut_ad(!!page_rec_is_comp(rec) == comp); ut_ad(page_rec_is_leaf(rec)); @@ -5324,18 +5342,24 @@ row_search_mvcc( high force recovery level set, we try to avoid crashes by skipping this lookup */ - if (!row_sel_clust_sees(rec, *index, offsets, - trx->read_view)) { + err = row_sel_clust_sees(rec, *index, offsets, + trx->read_view); + + switch (err) { + default: + goto lock_wait_or_error; + case DB_SUCCESS: + break; + case DB_SUCCESS_LOCKED_REC: ut_ad(srv_force_recovery < SRV_FORCE_NO_UNDO_LOG_SCAN); rec_t* old_vers; /* The following call returns 'offsets' associated with 'old_vers' */ err = row_sel_build_prev_vers_for_mysql( - &trx->read_view, clust_index, - prebuilt, rec, &offsets, &heap, - &old_vers, need_vrow ? &vrow : NULL, - &mtr); + prebuilt, clust_index, + rec, &offsets, &heap, &old_vers, + need_vrow ? &vrow : nullptr, &mtr); if (err != DB_SUCCESS) { @@ -5476,8 +5500,7 @@ row_search_mvcc( &offsets, &heap, need_vrow ? &vrow : NULL, &mtr); - if (prebuilt->skip_locked && - err == DB_LOCK_WAIT) { + if (err == DB_LOCK_WAIT && prebuilt->skip_locked) { err = lock_trx_handle_wait(trx); } switch (err) { @@ -5486,7 +5509,6 @@ row_search_mvcc( /* The record did not exist in the read view */ ut_ad(prebuilt->select_lock_type == LOCK_NONE || dict_index_is_spatial(index)); - goto next_rec; } break; @@ -5581,9 +5603,7 @@ row_search_mvcc( && !prebuilt->templ_contains_blob && !prebuilt->clust_index_was_generated && !prebuilt->used_in_HANDLER - && prebuilt->template_type != ROW_MYSQL_DUMMY_TEMPLATE && !prebuilt->in_fts_query) { - /* Inside an update, for example, we do not cache rows, since we may use the cursor position to do the actual update, that is why we require ...lock_type == LOCK_NONE. @@ -5648,29 +5668,8 @@ row_search_mvcc( if (prebuilt->n_fetch_cached < MYSQL_FETCH_CACHE_SIZE) { goto next_rec; } - } else { - if (UNIV_UNLIKELY - (prebuilt->template_type == ROW_MYSQL_DUMMY_TEMPLATE)) { - /* CHECK TABLE: fetch the row */ - - if (result_rec != rec - && !prebuilt->need_to_access_clustered) { - /* We used 'offsets' for the clust - rec, recalculate them for 'rec' */ - offsets = rec_get_offsets(rec, index, offsets, - index->n_core_fields, - ULINT_UNDEFINED, - &heap); - result_rec = rec; - } - - memcpy(buf + 4, result_rec - - rec_offs_extra_size(offsets), - rec_offs_size(offsets)); - mach_write_to_4(buf, - rec_offs_extra_size(offsets) + 4); - } else if (!prebuilt->pk_filter && !prebuilt->idx_cond) { + if (!prebuilt->pk_filter && !prebuilt->idx_cond) { /* The record was not yet converted to MySQL format. */ if (!row_sel_store_mysql_rec( buf, prebuilt, result_rec, vrow, @@ -6026,18 +6025,11 @@ row_count_rtree_recs( prebuilt->mysql_row_len); buf = static_cast(ut_malloc_nokey(bufsize)); - ulint cnt = 1000; + ulint direction = 0; - ret = row_search_for_mysql(buf, PAGE_CUR_WITHIN, prebuilt, 0, 0); loop: - /* Check thd->killed every 1,000 scanned rows */ - if (--cnt == 0) { - if (trx_is_interrupted(prebuilt->trx)) { - ret = DB_INTERRUPTED; - goto func_exit; - } - cnt = 1000; - } + ret = row_search_mvcc(buf, PAGE_CUR_WITHIN, prebuilt, 0, direction); + direction = ROW_SEL_NEXT; switch (ret) { case DB_SUCCESS: @@ -6059,12 +6051,777 @@ row_count_rtree_recs( return(ret); } - *n_rows = *n_rows + 1; + ++*n_rows; + goto loop; +} + +/** Check if a version of a clustered index record and a secondary +index record match. + +@param prebuilt index and transaction +@param clust_rec a version of a clustered index record +@param clust_index clustered index +@param clust_offsets rec_get_offsets(clust_rec, clust_index) +@param rec secondary index leaf page record +@param offsets rec_get_offsets(rec, index) +@return an error code +@retval DB_SUCCESS if rec matches clust_rec +@retval DB_SUCCESS_LOCKED_REC if rec does not match clust_rec +*/ +static dberr_t row_check_index_match(row_prebuilt_t *prebuilt, + const rec_t *clust_rec, + const dict_index_t *clust_index, + const rec_offs *clust_offsets, + const rec_t *rec, + const dict_index_t *index, + const rec_offs *offsets) +{ + ut_ad(index == prebuilt->index); - ret = row_search_for_mysql( - buf, PAGE_CUR_WITHIN, prebuilt, 0, ROW_SEL_NEXT); + ib_vcol_row vc(index->has_virtual() ? mem_heap_create(256) : nullptr); - goto loop; + const uint16_t n= index->n_user_defined_cols; + + for (uint16_t i= 0; i < n; i++) + { + ulint pos= 0; + ulint len, sec_len; + + const dict_field_t &ifield= index->fields[i]; + const byte *sec_field= rec_get_nth_field(rec, offsets, i, &sec_len); + const byte *field; + + if (ifield.col->is_virtual()) + { + /* Virtual column values must be reconstructed from the base columns. */ + row_ext_t *ext; + byte *record= vc.record(prebuilt->trx->mysql_thd, clust_index, + &prebuilt->m_mysql_table); + const dict_v_col_t *v_col= reinterpret_cast + (ifield.col); + dtuple_t *row= row_build(ROW_COPY_POINTERS, + clust_index, clust_rec, clust_offsets, + nullptr, nullptr, nullptr, &ext, vc.heap); + if (dfield_t *vfield= + innobase_get_computed_value(row, v_col, clust_index, &vc.heap, + nullptr, nullptr, + prebuilt->trx->mysql_thd, + prebuilt->m_mysql_table, + record, nullptr, nullptr)) + { + len= vfield->len; + field= static_cast(vfield->data); + } + else + { + innobase_report_computed_value_failed(row); + return DB_COMPUTE_VALUE_FAILED; + } + } + else + { + pos= dict_col_get_clust_pos(ifield.col, clust_index); + field= rec_get_nth_cfield(clust_rec, clust_index, clust_offsets, pos, + &len); + if (len == UNIV_SQL_NULL) + { + if (sec_len == UNIV_SQL_NULL) + continue; + return DB_SUCCESS_LOCKED_REC; + } + if (sec_len == UNIV_SQL_NULL) + return DB_SUCCESS_LOCKED_REC; + + if (rec_offs_nth_extern(clust_offsets, pos)) + { + if (len == BTR_EXTERN_FIELD_REF_SIZE) + goto compare_blobs; + len-= BTR_EXTERN_FIELD_REF_SIZE; + } + + if (ifield.prefix_len) + { + len= + dtype_get_at_most_n_mbchars(ifield.col->prtype, ifield.col->mbminlen, + ifield.col->mbmaxlen, + ifield.prefix_len, len, + reinterpret_cast(field)); + if (len < sec_len) + goto check_for_blob; + } + else + { +check_for_blob: + if (rec_offs_nth_extern(clust_offsets, pos)) + { +compare_blobs: + if (!row_sel_sec_rec_is_for_blob(ifield.col->mtype, + ifield.col->prtype, + ifield.col->mbminlen, + ifield.col->mbmaxlen, + field, len, sec_field, sec_len, + ifield.prefix_len, + clust_index->table)) + return DB_SUCCESS_LOCKED_REC; + continue; + } + } + } + + if (cmp_data_data(ifield.col->mtype, ifield.col->prtype, + field, len, sec_field, sec_len)) + return DB_SUCCESS_LOCKED_REC; + } + + return DB_SUCCESS; +} + +/** +Check the index records in CHECK TABLE. +The index must contain entries in an ascending order, +unique constraint must not be violated by duplicated keys, +and the number of index entries is counted in according to the +current read view. + +@param prebuilt index and transaction +@param n_rows number of records counted + +@return error code +@retval DB_SUCCESS if no error was found */ +dberr_t row_check_index(row_prebuilt_t *prebuilt, ulint *n_rows) +{ + rec_offs offsets_[REC_OFFS_NORMAL_SIZE]; + rec_offs_init(offsets_); + + *n_rows= 0; + dict_index_t *const index= prebuilt->index; + + prebuilt->fetch_direction= ROW_SEL_NEXT; + + if (!index->is_btree()) + return DB_CORRUPTION; + + mem_heap_t *heap= mem_heap_create(100); + + dtuple_t *prev_entry= nullptr; + mtr_t mtr; + mtr.start(); + + dict_index_t *clust_index= dict_table_get_first_index(prebuilt->table); + + dberr_t err= btr_pcur_open_at_index_side(true, index, BTR_SEARCH_LEAF, + prebuilt->pcur, false, 0, &mtr); + if (UNIV_UNLIKELY(err != DB_SUCCESS)) + { +func_exit: + mtr.commit(); + mem_heap_free(heap); + return err; + } + + if (const trx_id_t bulk_trx_id= index->table->bulk_trx_id) + if (!prebuilt->trx->read_view.changes_visible(bulk_trx_id)) + goto func_exit; + + ReadView check_table_extended_view; + ReadView &view= + prebuilt->need_to_access_clustered && + prebuilt->trx->isolation_level != TRX_ISO_READ_UNCOMMITTED + ? check_table_extended_view : prebuilt->trx->read_view; + if (&view == &check_table_extended_view) + check_table_extended_view.set_creator_trx_id(prebuilt->trx->id); + +page_loop: + if (&view == &check_table_extended_view) + /* In CHECK TABLE...EXTENDED, we make a copy of purge_sys.end_view + while holding a shared latch on the index leaf page. + Should a currently active purge batch desire to remove any further + records from this page, it would be blocked by our page latch. + + We will consult check_table_extended_view to determine if a + clustered index record corresponding to a secondary index record + is visible to the current purge batch. Right after we have made our + copy, purge_sys.end_view is free to be changed again. + + If we have an orphan secondary index record, we may attempt to + request a clustered index record version that cannot be retrieved + any more because the undo log records may have been freed + (according to the purge_sys.end_view). In such a case, + trx_undo_get_undo_rec() would cause + trx_undo_prev_version_build() and trx_undo_prev_version_build() + to return DB_MISSING_HISTORY. */ + static_cast(check_table_extended_view)= + purge_sys_t::end_view_guard{}.view(); + +rec_loop: + ut_ad(err == DB_SUCCESS); + + if (!btr_pcur_move_to_next_on_page(prebuilt->pcur)) + { + err= DB_CORRUPTION; + goto func_exit; + } + + const rec_t *rec= btr_pcur_get_rec(prebuilt->pcur); + rec_offs *offsets= offsets_; + + if (page_rec_is_supremum(rec)) + { + next_page: + if (btr_pcur_is_after_last_in_tree(prebuilt->pcur)) + goto func_exit; + err= btr_pcur_move_to_next_page(prebuilt->pcur, &mtr); + if (err == DB_SUCCESS && trx_is_interrupted(prebuilt->trx)) + err= DB_INTERRUPTED; + if (UNIV_UNLIKELY(err != DB_SUCCESS)) + goto func_exit; + goto page_loop; + } + + offsets= rec_get_offsets(rec, index, offsets, index->n_core_fields, + ULINT_UNDEFINED, &heap); + + const auto info_bits= + rec_get_info_bits(rec, prebuilt->table->not_redundant()); + const bool rec_deleted= info_bits & REC_INFO_DELETED_FLAG; + + if (UNIV_UNLIKELY(info_bits & REC_INFO_MIN_REC_FLAG)) + { + if (*n_rows || !index->is_instant()) + { + push_warning_printf(prebuilt->trx->mysql_thd, + Sql_condition::WARN_LEVEL_WARN, ER_NOT_KEYFILE, + "InnoDB: invalid record encountered"); + prebuilt->autoinc_error= DB_INDEX_CORRUPT; + } + goto next_rec; + } + + if (index->is_clust()) + { + if (prebuilt->trx->isolation_level == TRX_ISO_READ_UNCOMMITTED) + { + if (!rec_deleted) + goto count_row; + goto next_rec; + } + + trx_id_t rec_trx_id= row_get_rec_trx_id(rec, index, offsets); + + if (rec_trx_id >= prebuilt->trx->read_view.low_limit_id() && + UNIV_UNLIKELY(rec_trx_id >= trx_sys.get_max_trx_id())) + { + invalid_trx_id: + if (prebuilt->autoinc_error == DB_SUCCESS) + push_warning_printf(prebuilt->trx->mysql_thd, + Sql_condition::WARN_LEVEL_WARN, + ER_NOT_KEYFILE, + "InnoDB: DB_TRX_ID=" TRX_ID_FMT + " exceeds the system-wide maximum", + rec_trx_id); + prebuilt->autoinc_error= DB_CORRUPTION; + goto next_rec; + } + + if (!prebuilt->trx->read_view.changes_visible(rec_trx_id)) + { + ut_ad(srv_force_recovery < SRV_FORCE_NO_UNDO_LOG_SCAN); + rec_t *old_vers; + /* The following call returns 'offsets' associated with 'old_vers' */ + err= row_sel_build_prev_vers_for_mysql(prebuilt, index, rec, &offsets, + &heap, &old_vers, nullptr, &mtr); + + if (err != DB_SUCCESS) + goto func_exit; + + if (old_vers) + { + rec= old_vers; + rec_trx_id= row_get_rec_trx_id(rec, index, offsets); + + if (rec_trx_id >= prebuilt->trx->read_view.low_limit_id() && + UNIV_UNLIKELY(rec_trx_id >= trx_sys.get_max_trx_id())) + goto invalid_trx_id; + + if (!rec_get_deleted_flag(rec, prebuilt->table->not_redundant())) + goto count_row; + } + else + offsets= rec_get_offsets(rec, index, offsets, index->n_core_fields, + ULINT_UNDEFINED, &heap); + goto next_rec; + } + else if (!rec_deleted && !rec_trx_id); + else if (!check_table_extended_view.changes_visible(rec_trx_id)); + else if (prebuilt->autoinc_error == DB_SUCCESS) + { + const char *msg= rec_deleted + ? "Unpurged clustered index record" + : "Clustered index record with stale history"; + + ib::warn w; + w << msg << " in table " << index->table->name << ": " + << rec_offsets_print(rec, offsets); + prebuilt->autoinc_error= DB_MISSING_HISTORY; + push_warning_printf(prebuilt->trx->mysql_thd, + Sql_condition::WARN_LEVEL_WARN, + ER_NOT_KEYFILE, "InnoDB: %s", w.m_oss.str().c_str()); + } + + if (!rec_deleted) + goto count_row; + + goto next_rec; + } + else if (const trx_id_t page_trx_id= page_get_max_trx_id(page_align(rec))) + { + if (page_trx_id >= trx_sys.get_max_trx_id()) + goto invalid_PAGE_MAX_TRX_ID; + if (prebuilt->trx->isolation_level == TRX_ISO_READ_UNCOMMITTED); + else if (&view == &check_table_extended_view || rec_deleted || + !view.sees(page_trx_id)) + { + bool got_extended_match= &view == &check_table_extended_view; + const auto savepoint= mtr.get_savepoint(); + + row_build_row_ref_in_tuple(prebuilt->clust_ref, rec, index, offsets); + err= btr_pcur_open_with_no_init(clust_index, prebuilt->clust_ref, + PAGE_CUR_LE, BTR_SEARCH_LEAF, + prebuilt->clust_pcur, &mtr); + if (err != DB_SUCCESS) + goto func_exit; + + const rec_t *clust_rec= btr_pcur_get_rec(prebuilt->clust_pcur); + + /* Note: only if the search ends up on a non-infimum record is the + low_match value the real match to the search tuple */ + + if (!page_rec_is_user_rec(clust_rec) || + btr_pcur_get_low_match(prebuilt->clust_pcur) < clust_index->n_uniq) + { + if (!rec_deleted) + { + not_found: + /* MDEV-29823 FIXME: There is a race condition between + rollback, purge, and possibly other SQL connections that + are creating and releasing read views. At the time + row_undo_mod_del_mark_or_remove_sec_low() is executing + rollback on a secondary index record, purge_sys.view + may not allow it to delete the record, and it will be + delete-marked. Eventually purge_sys.view would advance, + but the delete-marked record could never be removed, + because no undo log record was ever added to + the purge queue by trx_purge_add_undo_to_history(). + + For now, we will not flag an error about orphan secondary index + records that are delete-marked; we will only warn about them. */ + + if (!rec_deleted || prebuilt->autoinc_error == DB_SUCCESS) + { + ib::error_or_warn w(!rec_deleted); + w << "Clustered index record not found for index " + << index->name << " of table " << index->table->name + << ": " << rec_offsets_print(rec, offsets); + push_warning_printf(prebuilt->trx->mysql_thd, + Sql_condition::WARN_LEVEL_WARN, + ER_NOT_KEYFILE, "InnoDB: %s", + w.m_oss.str().c_str()); + } + + if (prebuilt->autoinc_error == DB_SUCCESS) + prebuilt->autoinc_error= rec_deleted + ? DB_MISSING_HISTORY + : DB_CORRUPTION; + } + else if (&view == &check_table_extended_view) + extended_not_found: + if (view.changes_visible(page_trx_id)) + goto not_found; + did_not_find: + mtr.rollback_to_savepoint(savepoint); + goto next_rec; + } + + rec_offs *clust_offsets; + trx_id_t rec_trx_id; + rec_t *old_vers= nullptr; + + bool found_in_view= false; + trx_id_t visible_trx_id= ~0ULL; + + if (ulint trx_id_offset= clust_index->trx_id_offset) + { + clust_offsets= nullptr; + read_trx_id: + rec_trx_id= trx_read_trx_id(clust_rec + trx_id_offset); + + if (clust_rec[trx_id_offset + DATA_TRX_ID_LEN] & 0x80) + { + if (UNIV_UNLIKELY + (rec_get_deleted_flag(clust_rec, + prebuilt->table->not_redundant()))) + { + err= DB_CORRUPTION; + goto func_exit; + } + + /* This is the oldest available record version (fresh insert). */ + if (!view.changes_visible(rec_trx_id)) + { + if (rec_trx_id >= view.low_limit_id() && + UNIV_UNLIKELY(rec_trx_id >= trx_sys.get_max_trx_id())) + goto invalid_rec_trx_id; + if (got_extended_match) + goto check_latest_version; + goto did_not_find; + } + } + } + else + { + clust_offsets= rec_get_offsets(clust_rec, clust_index, nullptr, + clust_index->n_core_fields, + ULINT_UNDEFINED, &heap); + ulint trx_id_pos= clust_index->n_uniq ? clust_index->n_uniq : 1; + ulint len; + trx_id_offset= rec_get_nth_field_offs(clust_offsets, trx_id_pos, &len); + ut_ad(len == DATA_TRX_ID_LEN); + goto read_trx_id; + } + + if (got_extended_match) + { + check_latest_version: + /* In CHECK TABLE...EXTENDED, always check if the secondary + index record matches the latest clustered index record + version, no matter if it is visible in our own read view. + + If the latest clustered index version is delete-marked and + purgeable, it is not safe to fetch any BLOBs for column prefix + indexes because they may already have been freed. */ + if (rec_trx_id && + rec_get_deleted_flag(clust_rec, + prebuilt->table->not_redundant()) && + purge_sys.is_purgeable(rec_trx_id)) + goto did_not_find; + + if (!clust_offsets) + clust_offsets= rec_get_offsets(clust_rec, clust_index, nullptr, + clust_index->n_core_fields, + ULINT_UNDEFINED, &heap); + err= row_check_index_match(prebuilt, + clust_rec, clust_index, clust_offsets, + rec, index, offsets); + + switch (err) { + default: + goto func_exit; + case DB_SUCCESS_LOCKED_REC: + case DB_SUCCESS: + break; + } + + got_extended_match= err == DB_SUCCESS; + err= DB_SUCCESS; + + if (!prebuilt->trx->read_view.changes_visible(rec_trx_id)) + /* While CHECK TABLE ... EXTENDED checks for a matching + clustered index record version for each secondary index + record, it must count only those records that belong to its + own read view. + + If the latest version of clust_rec matches rec but is not + in our read view, there may still be an older version of + clust_rec that not only matches rec but is in our view. + We must evaluate old versions before deciding whether rec + should be counted. */ + goto check_old_vers; + + /* Remember that this is the visible clust_rec for rec, + and whether it matches rec. */ + visible_trx_id= rec_trx_id; + found_in_view= got_extended_match && + !rec_get_deleted_flag(clust_rec, + prebuilt->table->not_redundant()); + + if (!got_extended_match) + goto check_old_vers; + + if (!found_in_view) + goto did_not_find; + + found_match: + mtr.rollback_to_savepoint(savepoint); + goto count_row; + } + else if (!view.changes_visible(rec_trx_id)) + { + check_old_vers: + if (rec_trx_id >= view.low_limit_id() && + UNIV_UNLIKELY(rec_trx_id >= trx_sys.get_max_trx_id())) + { + invalid_rec_trx_id: + if (prebuilt->autoinc_error == DB_SUCCESS) + push_warning_printf(prebuilt->trx->mysql_thd, + Sql_condition::WARN_LEVEL_WARN, + ER_NOT_KEYFILE, + "InnoDB: DB_TRX_ID=" TRX_ID_FMT + " exceeds the system-wide maximum", + rec_trx_id); + goto not_found; + } + + if (!clust_offsets) + clust_offsets= rec_get_offsets(clust_rec, clust_index, nullptr, + clust_index->n_core_fields, + ULINT_UNDEFINED, &heap); + + row_sel_reset_old_vers_heap(prebuilt); + /* The following is adapted from row_vers_build_for_consistent_read() + because when using check_table_extended_view, we must + consider every available version of the clustered index record. */ + mem_heap_t *vers_heap= nullptr; + + for (;;) + { + mem_heap_t *prev_heap= vers_heap; + vers_heap= mem_heap_create(1024); + err= trx_undo_prev_version_build(clust_rec, + clust_index, clust_offsets, + vers_heap, &old_vers, + nullptr, nullptr, 0); + if (prev_heap) + mem_heap_free(prev_heap); + if (err != DB_SUCCESS) + { + old_vers_err: + mem_heap_free(vers_heap); + if (err == DB_MISSING_HISTORY) + { + err= DB_SUCCESS; + if (got_extended_match) + goto did_not_find; + goto not_found; + } + goto func_exit; + } + + if (UNIV_UNLIKELY(!old_vers)) + { + mem_heap_free(vers_heap); + /* We did not find a matching clustered index record version + for the secondary index record. Normal CHECK TABLE will simply + not count the secondary index record; CHECK TABLE ... EXTENDED + will flag such orphan records if appropriate. + + A secondary index record may may be "temporarily orphan" + if purge is in progress. We will only flag them if + everything up to PAGE_MAX_TRX_ID has been fully purged. + + "Temporary orphans" may be produced when + row_undo_mod_clust() resets the DB_TRX_ID of the latest + clust_rec version or when trx_undo_prev_version_build() + encounters a BLOB that may have been freed according to + purge_sys.view (not purge_sys.end_view). */ + if (&view == &check_table_extended_view && !got_extended_match) + goto extended_not_found; + goto did_not_find; + } + + clust_rec= old_vers; + clust_offsets= rec_get_offsets(clust_rec, clust_index, clust_offsets, + clust_index->n_core_fields, + ULINT_UNDEFINED, &heap); + + rec_trx_id= row_get_rec_trx_id(clust_rec, clust_index, + clust_offsets); + + if (UNIV_UNLIKELY(rec_trx_id >= + prebuilt->trx->read_view.low_limit_id() && + rec_trx_id >= trx_sys.get_max_trx_id())) + { + mem_heap_free(vers_heap); + goto invalid_rec_trx_id; + } + + const bool rec_visible= + prebuilt->trx->read_view.changes_visible(rec_trx_id); + const bool clust_rec_deleted= + rec_get_deleted_flag(clust_rec, prebuilt->table->not_redundant()); + + if (&view != &prebuilt->trx->read_view) + { + /* It is not safe to fetch BLOBs of committed delete-marked + records that may have been freed in purge. */ + err= clust_rec_deleted && rec_trx_id && + purge_sys.is_purgeable(rec_trx_id) + ? DB_SUCCESS_LOCKED_REC + : row_check_index_match(prebuilt, + clust_rec, clust_index, clust_offsets, + rec, index, offsets); + + switch (err) { + default: + goto old_vers_err; + case DB_SUCCESS_LOCKED_REC: + if (rec_visible && !~visible_trx_id) + visible_trx_id= rec_trx_id; + continue; + case DB_SUCCESS: + got_extended_match= true; + if (!rec_visible) + continue; + if (!~visible_trx_id) + { + visible_trx_id= rec_trx_id; + found_in_view= !clust_rec_deleted; + } + mem_heap_free(vers_heap); + if (!found_in_view) + goto did_not_find; + goto found_match; + } + } + else if (rec_visible) + { + if (!clust_rec_deleted) + { + clust_rec= rec_copy(mem_heap_alloc(heap, + rec_offs_size(clust_offsets)), + clust_rec, clust_offsets); + rec_offs_make_valid(clust_rec, clust_index, true, clust_offsets); + } + mem_heap_free(vers_heap); + if (clust_rec_deleted) + goto did_not_find; + goto check_match; + } + } + } + else if (rec_get_deleted_flag(clust_rec, + prebuilt->table->not_redundant())) + goto did_not_find; + + ut_ad(clust_rec); + ut_ad(&view != &check_table_extended_view); + + /* If we had to go to an earlier version of row or the secondary + index record is delete marked, then it may be that the secondary + index record corresponding to clust_rec (or old_vers) is not + rec; in that case we must ignore such row because in our + snapshot rec would not have existed. Remember that from rec we + cannot see directly which transaction id corresponds to it: we + have to go to the clustered index record. A query where we want + to fetch all rows where the secondary index value is in some + interval would return a wrong result if we would not drop rows + which we come to visit through secondary index records that + would not really exist in our snapshot. */ + + if (rec_deleted) + { + if (!clust_offsets) + clust_offsets= rec_get_offsets(clust_rec, clust_index, nullptr, + clust_index->n_core_fields, + ULINT_UNDEFINED, &heap); + check_match: + /* This clustered index record version exists in + prebuilt->trx->read_view and is not delete-marked. + By design, any BLOBs in it are not allowed to be + freed in the purge of committed transaction history. */ + err= row_check_index_match(prebuilt, clust_rec, clust_index, + clust_offsets, rec, index, offsets); + switch (err) { + case DB_SUCCESS: + break; + case DB_SUCCESS_LOCKED_REC: + err= DB_SUCCESS; + goto did_not_find; + default: + goto func_exit; + } + } + + mtr.rollback_to_savepoint(savepoint); + } + } + else + { + invalid_PAGE_MAX_TRX_ID: + if (UNIV_LIKELY(srv_force_recovery < SRV_FORCE_NO_UNDO_LOG_SCAN)) + { + push_warning_printf(prebuilt->trx->mysql_thd, + Sql_condition::WARN_LEVEL_WARN, ER_NOT_KEYFILE, + "InnoDB: Invalid PAGE_MAX_TRX_ID=%llu" + " in index '%-.200s'", + page_trx_id, index->name()); + prebuilt->autoinc_error= DB_INDEX_CORRUPT; + } + goto next_rec; + } + +count_row: + ++*n_rows; + + if (prev_entry) + { + ulint matched_fields= 0; + int cmp= cmp_dtuple_rec_with_match(prev_entry, rec, offsets, + &matched_fields); + const char* msg; + + if (UNIV_LIKELY(cmp < 0)); + else if (cmp > 0) + { + prebuilt->autoinc_error= DB_INDEX_CORRUPT; + msg= "index records in a wrong order in "; +not_ok: + ib::error() << msg << index->name << " of table " << index->table->name + << ": " << *prev_entry << ", " + << rec_offsets_print(rec, offsets); + } + else if (index->is_unique() && matched_fields >= + dict_index_get_n_ordering_defined_by_user(index)) + { + /* NULL values in unique indexes are considered not to be duplicates */ + for (ulint i= 0; i < dict_index_get_n_ordering_defined_by_user(index); + i++) + if (dfield_is_null(dtuple_get_nth_field(prev_entry, i))) + goto next_rec; + + if (prebuilt->autoinc_error == DB_SUCCESS) + prebuilt->autoinc_error= DB_DUPLICATE_KEY; + msg= "duplicate key in "; + goto not_ok; + } + } + +next_rec: + ut_ad(err == DB_SUCCESS); + + { + mem_heap_t *tmp_heap= nullptr; + + /* Empty the heap on each round. But preserve offsets[] + for the row_rec_to_index_entry() call, by copying them + into a separate memory heap when needed. */ + if (UNIV_UNLIKELY(offsets != offsets_)) + { + ulint size= rec_offs_get_n_alloc(offsets) * sizeof *offsets; + tmp_heap= mem_heap_create(size); + offsets= static_cast(mem_heap_dup(tmp_heap, offsets, size)); + } + + mem_heap_empty(heap); + prev_entry= row_rec_to_index_entry(rec, index, offsets, heap); + + if (UNIV_LIKELY_NULL(tmp_heap)) + mem_heap_free(tmp_heap); + } + + if (btr_pcur_is_after_last_on_page(prebuilt->pcur)) + goto next_page; + + goto rec_loop; } /*******************************************************************//** diff --git a/storage/innobase/row/row0umod.cc b/storage/innobase/row/row0umod.cc index 91925219ea855..cca44f0192077 100644 --- a/storage/innobase/row/row0umod.cc +++ b/storage/innobase/row/row0umod.cc @@ -216,26 +216,23 @@ static ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index) } /** Determine if rollback must execute a purge-like operation. -@param[in,out] node row undo -@param[in,out] mtr mini-transaction +@param node row undo @return whether the record should be purged */ -static bool row_undo_mod_must_purge(undo_node_t* node, mtr_t* mtr) +static bool row_undo_mod_must_purge(const undo_node_t &node) { - ut_ad(node->rec_type == TRX_UNDO_UPD_DEL_REC); - ut_ad(!node->table->is_temporary()); + ut_ad(node.rec_type == TRX_UNDO_UPD_DEL_REC); + ut_ad(!node.table->is_temporary()); - btr_cur_t* btr_cur = btr_pcur_get_btr_cur(&node->pcur); - ut_ad(btr_cur->index->is_primary()); - DEBUG_SYNC_C("rollback_purge_clust"); + const btr_cur_t &btr_cur= node.pcur.btr_cur; + ut_ad(btr_cur.index->is_primary()); + DEBUG_SYNC_C("rollback_purge_clust"); - if (!purge_sys.changes_visible(node->new_trx_id, node->table->name)) { - return false; - } + if (!purge_sys.is_purgeable(node.new_trx_id)) + return false; - const rec_t* rec = btr_cur_get_rec(btr_cur); - - return trx_read_trx_id(rec + row_trx_id_offset(rec, btr_cur->index)) - == node->new_trx_id; + const rec_t *rec= btr_cur_get_rec(&btr_cur); + return trx_read_trx_id(rec + row_trx_id_offset(rec, btr_cur.index)) == + node.new_trx_id; } /***********************************************************//** @@ -251,7 +248,6 @@ row_undo_mod_clust( { btr_pcur_t* pcur; mtr_t mtr; - bool have_latch = false; dberr_t err; dict_index_t* index; @@ -347,9 +343,7 @@ row_undo_mod_clust( btr_pcur_commit_specify_mtr(pcur, &mtr); } else { index->set_modified(mtr); - have_latch = true; - purge_sys.latch.rd_lock(SRW_LOCK_CALL); - if (!row_undo_mod_must_purge(node, &mtr)) { + if (!row_undo_mod_must_purge(*node)) { goto mtr_commit_exit; } err = btr_cur_optimistic_delete(&pcur->btr_cur, 0, @@ -358,9 +352,7 @@ row_undo_mod_clust( goto mtr_commit_exit; } err = DB_SUCCESS; - purge_sys.latch.rd_unlock(); btr_pcur_commit_specify_mtr(pcur, &mtr); - have_latch = false; } mtr.start(); @@ -376,9 +368,7 @@ row_undo_mod_clust( if (index->table->is_temporary()) { mtr.set_log_mode(MTR_LOG_NO_REDO); } else { - have_latch = true; - purge_sys.latch.rd_lock(SRW_LOCK_CALL); - if (!row_undo_mod_must_purge(node, &mtr)) { + if (!row_undo_mod_must_purge(*node)) { goto mtr_commit_exit; } index->set_modified(mtr); @@ -400,17 +390,12 @@ row_undo_mod_clust( mtr.start(); if (pcur->restore_position(BTR_MODIFY_LEAF, &mtr) - != btr_pcur_t::SAME_ALL) { - goto mtr_commit_exit; - } - rec_t* rec = btr_pcur_get_rec(pcur); - have_latch = true; - purge_sys.latch.rd_lock(SRW_LOCK_CALL); - if (!purge_sys.changes_visible(node->new_trx_id, - node->table->name)) { + != btr_pcur_t::SAME_ALL + || !purge_sys.is_purgeable(node->new_trx_id)) { goto mtr_commit_exit; } + rec_t* rec = btr_pcur_get_rec(pcur); ulint trx_id_offset = index->trx_id_offset; ulint trx_id_pos = index->n_uniq ? index->n_uniq : 1; /* Reserve enough offsets for the PRIMARY KEY and @@ -477,10 +462,6 @@ row_undo_mod_clust( } mtr_commit_exit: - if (have_latch) { - purge_sys.latch.rd_unlock(); - } - btr_pcur_commit_specify_mtr(pcur, &mtr); func_exit: diff --git a/storage/innobase/row/row0upd.cc b/storage/innobase/row/row0upd.cc index 61ac22ca27a7b..26c434ca4741c 100644 --- a/storage/innobase/row/row0upd.cc +++ b/storage/innobase/row/row0upd.cc @@ -469,46 +469,6 @@ row_upd_changes_field_size_or_external( return(FALSE); } -/***********************************************************//** -Returns true if row update contains disowned external fields. -@return true if the update contains disowned external fields. */ -bool -row_upd_changes_disowned_external( -/*==============================*/ - const upd_t* update) /*!< in: update vector */ -{ - const upd_field_t* upd_field; - const dfield_t* new_val; - ulint new_len; - ulint n_fields; - ulint i; - - n_fields = upd_get_n_fields(update); - - for (i = 0; i < n_fields; i++) { - const byte* field_ref; - - upd_field = upd_get_nth_field(update, i); - new_val = &(upd_field->new_val); - new_len = dfield_get_len(new_val); - - if (!dfield_is_ext(new_val)) { - continue; - } - - ut_ad(new_len >= BTR_EXTERN_FIELD_REF_SIZE); - - field_ref = static_cast(dfield_get_data(new_val)) - + new_len - BTR_EXTERN_FIELD_REF_SIZE; - - if (field_ref[BTR_EXTERN_LEN] & BTR_EXTERN_OWNER_FLAG) { - return(true); - } - } - - return(false); -} - /***************************************************************//** Builds an update vector from those fields which in a secondary index entry differ from a record that has the equal ordering fields. NOTE: we compare diff --git a/storage/innobase/row/row0vers.cc b/storage/innobase/row/row0vers.cc index 04cf9640f0501..372d30149f01c 100644 --- a/storage/innobase/row/row0vers.cc +++ b/storage/innobase/row/row0vers.cc @@ -1,7 +1,7 @@ /***************************************************************************** Copyright (c) 1997, 2017, Oracle and/or its affiliates. All Rights Reserved. -Copyright (c) 2017, 2021, MariaDB Corporation. +Copyright (c) 2017, 2022, MariaDB Corporation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software @@ -104,6 +104,9 @@ row_vers_impl_x_locked_low( DBUG_ENTER("row_vers_impl_x_locked_low"); ut_ad(rec_offs_validate(rec, index, offsets)); + ut_ad(mtr->memo_contains_page_flagged(clust_rec, + MTR_MEMO_PAGE_S_FIX + | MTR_MEMO_PAGE_X_FIX)); if (ulint trx_id_offset = clust_index->trx_id_offset) { trx_id = mach_read_from_6(clust_rec + trx_id_offset); @@ -190,7 +193,7 @@ row_vers_impl_x_locked_low( heap = mem_heap_create(1024); trx_undo_prev_version_build( - clust_rec, mtr, version, clust_index, clust_offsets, + version, clust_index, clust_offsets, heap, &prev_version, NULL, dict_index_has_virtual(index) ? &vrow : NULL, 0); @@ -527,6 +530,10 @@ row_vers_build_cur_vrow_low( = DATA_MISSING; } + ut_ad(mtr->memo_contains_page_flagged(rec, + MTR_MEMO_PAGE_S_FIX + | MTR_MEMO_PAGE_X_FIX)); + version = rec; /* If this is called by purge thread, set TRX_UNDO_PREV_IN_PURGE @@ -543,7 +550,7 @@ row_vers_build_cur_vrow_low( version, clust_index, clust_offsets); trx_undo_prev_version_build( - rec, mtr, version, clust_index, clust_offsets, + version, clust_index, clust_offsets, heap, &prev_version, NULL, vrow, status); if (heap2) { @@ -643,6 +650,10 @@ row_vers_vc_matches_cluster( /* First compare non-virtual columns (primary keys) */ ut_ad(index->n_fields == n_fields); ut_ad(n_fields == dtuple_get_n_fields(icentry)); + ut_ad(mtr->memo_contains_page_flagged(rec, + MTR_MEMO_PAGE_S_FIX + | MTR_MEMO_PAGE_X_FIX)); + { const dfield_t* a = ientry->fields; const dfield_t* b = icentry->fields; @@ -684,7 +695,7 @@ row_vers_vc_matches_cluster( ut_ad(roll_ptr != 0); trx_undo_prev_version_build( - rec, mtr, version, clust_index, clust_offsets, + version, clust_index, clust_offsets, heap, &prev_version, NULL, vrow, TRX_UNDO_PREV_IN_PURGE | TRX_UNDO_GET_OLD_V_VALUE); @@ -834,7 +845,7 @@ row_vers_build_cur_vrow( } /** Finds out if a version of the record, where the version >= the current -purge view, should have ientry as its secondary index entry. We check +purge_sys.view, should have ientry as its secondary index entry. We check if there is any not delete marked version of the record where the trx id >= purge view, and the secondary index entry == ientry; exactly in this case we return TRUE. @@ -1016,11 +1027,12 @@ row_vers_old_has_index_entry( heap = mem_heap_create(1024); vrow = NULL; - trx_undo_prev_version_build(rec, mtr, version, + trx_undo_prev_version_build(version, clust_index, clust_offsets, - heap, &prev_version, NULL, + heap, &prev_version, nullptr, dict_index_has_virtual(index) - ? &vrow : NULL, 0); + ? &vrow : nullptr, + TRX_UNDO_CHECK_PURGEABILITY); mem_heap_free(heap2); /* free version and clust_offsets */ if (!prev_version) { @@ -1099,7 +1111,9 @@ row_vers_old_has_index_entry( Constructs the version of a clustered index record which a consistent read should see. We assume that the trx id stored in rec is such that the consistent read should not see rec in its present version. -@return DB_SUCCESS or DB_MISSING_HISTORY */ +@return error code +@retval DB_SUCCESS if a previous version was fetched +@retval DB_MISSING_HISTORY if the history is missing (a sign of corruption) */ dberr_t row_vers_build_for_consistent_read( /*===============================*/ @@ -1139,7 +1153,7 @@ row_vers_build_for_consistent_read( trx_id = row_get_rec_trx_id(rec, index, *offsets); - ut_ad(!view->changes_visible(trx_id, index->table->name)); + ut_ad(!view->changes_visible(trx_id)); ut_ad(!vrow || !(*vrow)); @@ -1157,12 +1171,10 @@ row_vers_build_for_consistent_read( /* If purge can't see the record then we can't rely on the UNDO log record. */ - bool purge_sees = trx_undo_prev_version_build( - rec, mtr, version, index, *offsets, heap, + err = trx_undo_prev_version_build( + version, index, *offsets, heap, &prev_version, NULL, vrow, 0); - err = (purge_sees) ? DB_SUCCESS : DB_MISSING_HISTORY; - if (prev_heap != NULL) { mem_heap_free(prev_heap); } @@ -1184,7 +1196,7 @@ row_vers_build_for_consistent_read( trx_id = row_get_rec_trx_id(prev_version, index, *offsets); - if (view->changes_visible(trx_id, index->table->name)) { + if (view->changes_visible(trx_id)) { /* The view already sees this version: we can copy it to in_heap and return */ @@ -1201,8 +1213,11 @@ row_vers_build_for_consistent_read( dtuple_dup_v_fld(*vrow, in_heap); } break; + } else if (trx_id >= view->low_limit_id() + && trx_id >= trx_sys.get_max_trx_id()) { + err = DB_CORRUPTION; + break; } - version = prev_version; } @@ -1319,10 +1334,9 @@ row_vers_build_for_semi_consistent_read( heap2 = heap; heap = mem_heap_create(1024); - if (!trx_undo_prev_version_build(rec, mtr, version, index, - *offsets, heap, - &prev_version, - in_heap, vrow, 0)) { + if (trx_undo_prev_version_build(version, index, *offsets, heap, + &prev_version, in_heap, vrow, + 0) != DB_SUCCESS) { mem_heap_free(heap); heap = heap2; heap2 = NULL; diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc index b5d00d9624f10..625d3223bdc68 100644 --- a/storage/innobase/trx/trx0purge.cc +++ b/storage/innobase/trx/trx0purge.cc @@ -42,10 +42,6 @@ Created 3/26/1996 Heikki Tuuri #include -#ifdef UNIV_PFS_RWLOCK -extern mysql_pfs_key_t trx_purge_latch_key; -#endif /* UNIV_PFS_RWLOCK */ - /** Maximum allowable purge history length. <=0 means 'infinite'. */ ulong srv_max_purge_lag = 0; @@ -184,6 +180,7 @@ void purge_sys_t::create() hdr_page_no= 0; hdr_offset= 0; latch.SRW_LOCK_INIT(trx_purge_latch_key); + end_latch.init(); mysql_mutex_init(purge_sys_pq_mutex_key, &pq_mutex, nullptr); truncate.current= NULL; truncate.last= NULL; @@ -205,11 +202,40 @@ void purge_sys_t::close() trx->state= TRX_STATE_NOT_STARTED; trx->free(); latch.destroy(); + end_latch.destroy(); mysql_mutex_destroy(&pq_mutex); mem_heap_free(heap); heap= nullptr; } +/** Determine if the history of a transaction is purgeable. +@param trx_id transaction identifier +@return whether the history is purgeable */ +TRANSACTIONAL_TARGET bool purge_sys_t::is_purgeable(trx_id_t trx_id) const +{ + bool purgeable; +#if !defined SUX_LOCK_GENERIC && !defined NO_ELISION + purgeable= false; + if (xbegin()) + { + if (!latch.is_write_locked()) + { + purgeable= view.changes_visible(trx_id); + xend(); + } + else + xabort(); + } + else +#endif + { + latch.rd_lock(SRW_LOCK_CALL); + purgeable= view.changes_visible(trx_id); + latch.rd_unlock(); + } + return purgeable; +} + /*================ UNDO LOG HISTORY LIST =============================*/ /** Prepend the history list with an undo log. @@ -1199,7 +1225,6 @@ trx_purge_attach_undo_recs(ulint n_purge_threads) i = 0; - const ulint batch_size = srv_purge_batch_size; std::unordered_map table_id_map; mem_heap_empty(purge_sys.heap); @@ -1251,7 +1276,7 @@ trx_purge_attach_undo_recs(ulint n_purge_threads) node->undo_recs.push(purge_rec); - if (n_pages_handled >= batch_size) { + if (n_pages_handled >= srv_purge_batch_size) { break; } } @@ -1303,14 +1328,14 @@ extern tpool::waitable_task purge_worker_task; /** Wait for pending purge jobs to complete. */ static void trx_purge_wait_for_workers_to_complete() { - bool notify_wait = purge_worker_task.is_running(); + const bool notify_wait{purge_worker_task.is_running()}; if (notify_wait) - tpool::tpool_wait_begin(); + tpool::tpool_wait_begin(); purge_worker_task.wait(); - if(notify_wait) + if (notify_wait) tpool::tpool_wait_end(); /* There should be no outstanding tasks as long @@ -1318,12 +1343,33 @@ static void trx_purge_wait_for_workers_to_complete() ut_ad(srv_get_task_queue_length() == 0); } +/** Update end_view at the end of a purge batch. */ +TRANSACTIONAL_INLINE void purge_sys_t::clone_end_view() +{ + /* This is only invoked only by the purge coordinator, + which is the only thread that can modify our inputs head, tail, view. + Therefore, we only need to protect end_view from concurrent reads. */ + + /* Limit the end_view similar to what trx_purge_truncate_history() does. */ + const trx_id_t trx_no= head.trx_no ? head.trx_no : tail.trx_no; +#ifdef SUX_LOCK_GENERIC + end_latch.wr_lock(); +#else + transactional_lock_guard g(end_latch); +#endif + end_view= view; + end_view.clamp_low_limit_id(trx_no); +#ifdef SUX_LOCK_GENERIC + end_latch.wr_unlock(); +#endif +} + /** Run a purge batch. @param n_tasks number of purge tasks to submit to the queue @param truncate whether to truncate the history at the end of the batch @return number of undo log pages handled in the batch */ -ulint trx_purge(ulint n_tasks, bool truncate) +TRANSACTIONAL_TARGET ulint trx_purge(ulint n_tasks, bool truncate) { que_thr_t* thr = NULL; ulint n_pages_handled; @@ -1357,6 +1403,8 @@ ulint trx_purge(ulint n_tasks, bool truncate) trx_purge_wait_for_workers_to_complete(); + purge_sys.clone_end_view(); + if (truncate) { trx_purge_truncate_history(); } diff --git a/storage/innobase/trx/trx0rec.cc b/storage/innobase/trx/trx0rec.cc index 766b7da154349..dc24f083d05d1 100644 --- a/storage/innobase/trx/trx0rec.cc +++ b/storage/innobase/trx/trx0rec.cc @@ -2069,51 +2069,49 @@ trx_undo_get_undo_rec_low( return undo_rec; } -/** Copy an undo record to heap. -@param[in] roll_ptr roll pointer to record -@param[in,out] heap memory heap where copied -@param[in] trx_id id of the trx that generated - the roll pointer: it points to an - undo log of this transaction -@param[in] name table name -@param[out] undo_rec own: copy of the record -@retval true if the undo log has been -truncated and we cannot fetch the old version -@retval false if the undo log record is available -NOTE: the caller must have latches on the clustered index page. */ -static MY_ATTRIBUTE((warn_unused_result)) -bool -trx_undo_get_undo_rec( - roll_ptr_t roll_ptr, - mem_heap_t* heap, - trx_id_t trx_id, - const table_name_t& name, - trx_undo_rec_t** undo_rec) +/** Copy an undo record to heap, to check if a secondary index record +can be safely purged. +@param trx_id DB_TRX_ID corresponding to roll_ptr +@param name table name +@param roll_ptr DB_ROLL_PTR pointing to the undo log record +@param heap memory heap for allocation +@return copy of the record +@retval nullptr if the version is visible to purge_sys.view */ +static trx_undo_rec_t *trx_undo_get_rec_if_purgeable(trx_id_t trx_id, + const table_name_t &name, + roll_ptr_t roll_ptr, + mem_heap_t* heap) { - purge_sys.latch.rd_lock(SRW_LOCK_CALL); - - bool missing_history = purge_sys.changes_visible(trx_id, name); - if (!missing_history) { - *undo_rec = trx_undo_get_undo_rec_low(roll_ptr, heap); - missing_history = !*undo_rec; - } - - purge_sys.latch.rd_unlock(); - - return missing_history; + { + purge_sys_t::view_guard check; + if (!check.view().changes_visible(trx_id)) + return trx_undo_get_undo_rec_low(roll_ptr, heap); + } + return nullptr; } -#ifdef UNIV_DEBUG -#define ATTRIB_USED_ONLY_IN_DEBUG -#else /* UNIV_DEBUG */ -#define ATTRIB_USED_ONLY_IN_DEBUG MY_ATTRIBUTE((unused)) -#endif /* UNIV_DEBUG */ +/** Copy an undo record to heap. +@param trx_id DB_TRX_ID corresponding to roll_ptr +@param name table name +@param roll_ptr DB_ROLL_PTR pointing to the undo log record +@param heap memory heap for allocation +@return copy of the record +@retval nullptr if the undo log is not available */ +static trx_undo_rec_t *trx_undo_get_undo_rec(trx_id_t trx_id, + const table_name_t &name, + roll_ptr_t roll_ptr, + mem_heap_t *heap) +{ + { + purge_sys_t::end_view_guard check; + if (!check.view().changes_visible(trx_id)) + return trx_undo_get_undo_rec_low(roll_ptr, heap); + } + return nullptr; +} /** Build a previous version of a clustered index record. The caller must hold a latch on the index page of the clustered index record. -@param index_rec clustered index record in the index tree -@param index_mtr mtr which contains the latch to index_rec page - and purge_view @param rec version of a clustered index record @param index clustered index @param offsets rec_get_offsets(rec, index) @@ -2134,14 +2132,13 @@ must hold a latch on the index page of the clustered index record. And if we read "after image" of undo log @param undo_block undo log block which was cached during online dml apply or nullptr -@retval true if previous version was built, or if it was an insert -or the table has been rebuilt -@retval false if the previous version is earlier than purge_view, -or being purged, which means that it may have been removed */ -bool +@return error code +@retval DB_SUCCESS if previous version was successfully built, +or if it was an insert or the undo record refers to the table before rebuild +@retval DB_MISSING_HISTORY if the history is missing */ +TRANSACTIONAL_TARGET +dberr_t trx_undo_prev_version_build( - const rec_t *index_rec ATTRIB_USED_ONLY_IN_DEBUG, - mtr_t *index_mtr ATTRIB_USED_ONLY_IN_DEBUG, const rec_t *rec, dict_index_t *index, rec_offs *offsets, @@ -2151,7 +2148,6 @@ trx_undo_prev_version_build( dtuple_t **vrow, ulint v_status) { - trx_undo_rec_t* undo_rec = NULL; dtuple_t* entry; trx_id_t rec_trx_id; ulint type; @@ -2166,11 +2162,7 @@ trx_undo_prev_version_build( byte* buf; ut_ad(!index->table->is_temporary()); - ut_ad(index_mtr->memo_contains_page_flagged(index_rec, - MTR_MEMO_PAGE_S_FIX - | MTR_MEMO_PAGE_X_FIX)); ut_ad(rec_offs_validate(rec, index, offsets)); - ut_a(index->is_primary()); roll_ptr = row_get_rec_roll_ptr(rec, index, offsets); @@ -2178,27 +2170,20 @@ trx_undo_prev_version_build( if (trx_undo_roll_ptr_is_insert(roll_ptr)) { /* The record rec is the first inserted version */ - return(true); + return DB_SUCCESS; } rec_trx_id = row_get_rec_trx_id(rec, index, offsets); ut_ad(!index->table->skip_alter_undo); - if (trx_undo_get_undo_rec( - roll_ptr, heap, rec_trx_id, index->table->name, - &undo_rec)) { - if (v_status & TRX_UNDO_PREV_IN_PURGE) { - /* We are fetching the record being purged */ - undo_rec = trx_undo_get_undo_rec_low(roll_ptr, heap); - if (!undo_rec) { - return false; - } - } else { - /* The undo record may already have been purged, - during purge or semi-consistent read. */ - return(false); - } + trx_undo_rec_t* undo_rec = v_status == TRX_UNDO_CHECK_PURGEABILITY + ? trx_undo_get_rec_if_purgeable(rec_trx_id, index->table->name, + roll_ptr, heap) + : trx_undo_get_undo_rec(rec_trx_id, index->table->name, + roll_ptr, heap); + if (!undo_rec) { + return DB_MISSING_HISTORY; } const byte *ptr = @@ -2209,7 +2194,7 @@ trx_undo_prev_version_build( /* The table should have been rebuilt, but purge has not yet removed the undo log records for the now-dropped old table (table_id). */ - return(true); + return DB_SUCCESS; } ptr = trx_undo_update_rec_get_sys_cols(ptr, &trx_id, &roll_ptr, @@ -2257,24 +2242,9 @@ trx_undo_prev_version_build( delete-marked record by trx_id, no transactions need to access the BLOB. */ - /* the row_upd_changes_disowned_external(update) call could be - omitted, but the synchronization on purge_sys.latch is likely - more expensive. */ - - if ((update->info_bits & REC_INFO_DELETED_FLAG) - && row_upd_changes_disowned_external(update)) { - purge_sys.latch.rd_lock(SRW_LOCK_CALL); - - bool missing_extern = purge_sys.changes_visible( - trx_id, index->table->name); - - purge_sys.latch.rd_unlock(); - - if (missing_extern) { - /* treat as a fresh insert, not to - cause assertion error at the caller. */ - return(true); - } + if (update->info_bits & REC_INFO_DELETED_FLAG + && purge_sys.is_purgeable(trx_id)) { + return DB_SUCCESS; } /* We have to set the appropriate extern storage bits in the @@ -2289,8 +2259,8 @@ trx_undo_prev_version_build( following call is safe. */ if (!row_upd_index_replace_new_col_vals(entry, *index, update, heap)) { - ut_a(v_status & TRX_UNDO_PREV_IN_PURGE); - return false; + return (v_status & TRX_UNDO_PREV_IN_PURGE) + ? DB_MISSING_HISTORY : DB_CORRUPTION; } /* Get number of externally stored columns in updated record */ @@ -2387,7 +2357,7 @@ trx_undo_prev_version_build( v_status & TRX_UNDO_PREV_IN_PURGE); } - return(true); + return DB_SUCCESS; } /** Read virtual column value from undo log diff --git a/storage/innobase/trx/trx0sys.cc b/storage/innobase/trx/trx0sys.cc index 2479e5a4cc127..d344f3a0c8338 100644 --- a/storage/innobase/trx/trx0sys.cc +++ b/storage/innobase/trx/trx0sys.cc @@ -44,40 +44,6 @@ Created 3/26/1996 Heikki Tuuri /** The transaction system */ trx_sys_t trx_sys; -/** Check whether transaction id is valid. -@param[in] id transaction id to check -@param[in] name table name */ -void -ReadViewBase::check_trx_id_sanity( - trx_id_t id, - const table_name_t& name) -{ - if (id >= trx_sys.get_max_trx_id()) { - - ib::warn() << "A transaction id" - << " in a record of table " - << name - << " is newer than the" - << " system-wide maximum."; - ut_ad(0); - THD *thd = current_thd; - if (thd != NULL) { - char table_name[MAX_FULL_NAME_LEN + 1]; - - innobase_format_name( - table_name, sizeof(table_name), - name.m_name); - - push_warning_printf(thd, Sql_condition::WARN_LEVEL_WARN, - ER_SIGNAL_WARN, - "InnoDB: Transaction id" - " in a record of table" - " %s is newer than system-wide" - " maximum.", table_name); - } - } -} - #ifdef UNIV_DEBUG /* Flag to control TRX_RSEG_N_SLOTS behavior debugging. */ uint trx_rseg_n_slots_debug = 0; diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc index 111f8fe5f3aa5..f9fe2c19fe154 100644 --- a/storage/innobase/trx/trx0trx.cc +++ b/storage/innobase/trx/trx0trx.cc @@ -785,7 +785,7 @@ dberr_t trx_lists_init_at_db_start() ib::info() << "Trx id counter is " << trx_sys.get_max_trx_id(); } - purge_sys.clone_oldest_view(); + purge_sys.clone_oldest_view(); return DB_SUCCESS; } @@ -2168,6 +2168,7 @@ trx_set_rw_mode( ut_ad(trx->rsegs.m_redo.rseg != 0); trx_sys.register_rw(trx); + ut_ad(trx->id); /* So that we can see our own changes. */ if (trx->read_view.is_open()) {