MDEV-37365: Crash on concurrent ALTER TABLE parent + INSERT on FK child#5085
MDEV-37365: Crash on concurrent ALTER TABLE parent + INSERT on FK child#5085arcivanov wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses MDEV-37365 by implementing a prelocking mechanism that acquires MDL_SHARED_READ on foreign key parent tables during DML operations on child tables, preventing crashes during concurrent DDL. Key changes include the addition of prepare_fk_referenced_prelocking_list in sql_base.cc and the references_foreign_key method in the handler interface. Review feedback points out a high-severity issue where a raw return is used instead of DBUG_RETURN, potentially corrupting the debug stack, and a compilation error caused by passing LEX_CSTRING objects by value instead of by address to table_already_fk_prelocked.
5e329f4 to
a6d53bd
Compare
|
@gemini-code-assist rereview |
There was a problem hiding this comment.
Code Review
This pull request addresses MDEV-37365 by implementing metadata lock (MDL) prelocking for foreign key parent tables during DML operations on child tables, preventing crashes during concurrent DDL. It also introduces early foreign key checks for DROP and TRUNCATE operations using upgradable shared locks. Review feedback identifies several critical issues in the prelocking implementation, including incorrect iterator initialization, type mismatches between LEX_CSTRING and character pointers, and the need for proper object construction. Improvements were also suggested for error reporting consistency and optimizing redundant schema lock acquisitions.
4c76984 to
f030374
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
The diff answers the formal criteria. I'm approving it for that. Please stay tuned for the final review.
| if (prepare_fk_prelocking_list(thd, prelocking_ctx, table_list, | ||
| need_prelocking, | ||
| table_list->trg_event_map)) | ||
| return TRUE; | ||
|
|
||
| if (prepare_fk_referenced_prelocking_list(thd, prelocking_ctx, table_list, | ||
| need_prelocking)) | ||
| return TRUE; |
There was a problem hiding this comment.
As far as I understand, both @vuvova and @svoj have been against acquiring more locks during DML operations. I can imagine that this could introduce a significant performance regression.
I believe that the DDL/DML races can be fixed by extending the locking during DDL statements. Did you try implementing the following: Any DDL operation that is dropping or renaming a table, or dropping or adding foreign key constraints needs to exclusively lock all child and parent table names, in addition to locking the current table name.
There was a problem hiding this comment.
I can imagine that this could introduce a significant performance regression.
So the whole point of this approach is that lock MDL_SHARED_READ is basically never contended on except by the MDL_EXCLUSIVE.
Every SELECT anywhere takes an MDL_SHARED_READ on every table involved.
Every INSERT used to take MDL_SHARED_WRITE on a table being inserted into but will after this patch also take an MDL_SHARED_READ on every table referenced by an FK (which will be uncontended by virtually everybody else).
I haven't looked at MariaDB's internal lock implementation but acquiring a shared uncontended lock should be virtually performance-neutral (since the lock is not distributed), especially in comparison to the other parts of the queries.
Exclusively locking all children and a parent tables for DDL, on the other hand, stops everything that is happening on those child tables. The actual time to lock is going to take a very long time on heavily loaded database with a table with a large number of FKs. Depending on the locking fairness (I don't know the locking implementation for MDL) it may end up that MDL_EXCLUSIVE for each table will get pushed down behind literal thousands of MDL_SHARED_READ/WRITE. Those locks could be held for SECONDS per query (large SELECT queries) which means that if you're locking, for example, 1 parent and 5 children by the time you are locking the 5th child down the first child to be locked has been sitting in MDL_EXCLUSIVE for seconds to tens of seconds or worse (very much depends on how many queries per second are there) with ALL operations halted on those tables including SELECTS (defeating MVCC!).
Unless I'm extremely confused about the nature of MDL_EXCLUSIVE the idea of locking parent and children exclusively to me sounds like the worst possible approach behind only maybe acquiring an exclusive lock on the whole database (if it were a thing).
There was a problem hiding this comment.
I’ll need to restore the context of this issue before I can give a meaningful answer. I don’t recall expressing an opinion regarding FK locking. I’d be less concerned about performance and more concerned about introducing additional possible deadlocks.
There was a problem hiding this comment.
MDL_SHARED_READ is relatively cheap, but nevertheless it is a contention point. I don't mean contention with MDL_EXCLUSIVE, but rather contention on a lower level locking primitives.
However we have 18985d8, which solves similar SHARED vs SHARED contention bottleneck, but only in the BACKUP namespace. It can be extended to the TABLE namespace too. It will make MDL_SHARED_READ absolutely cheap.
I'm more worried about reading list of FK references, which takes global X-latch? Even global S-latch issued by references_foreign_key() is harmful.
If we lock child during ALTER TABLE parent instead, it may involve one correctness issue. There's LOCK TABLES parent WRITE and REPAIR TABLE parent, both take MDL_SHARED_NO_READ_WRITE. Solution proposed Arcadiy will be conflicting with these, solution proposed by Marko won't. These are probably irrelevant with InnoDB, but we have other engines that may be willing to implement FK and conflicting may make sense.
Many popular RDBMSes appear to handle FK-related locking on the DML side. Moving this logic to DDL would significantly broaden the scope of conflicting locks beyond what is strictly necessary, whereas the DML-side approach only locks the objects that are actually involved.
I’d go with the DML-side approach, assuming we can make both FK list lookups and MDL_SHARED_READ acquisition inexpensive. We should follow up with Marko on the former; the latter is already known to be fixable.
|
Sorry for the delay on our end, @arcivanov. @vuvova confirmed that he is aware of the question here. But he's currently a bit busy with other work. He also wanted me to set expectations a bit by providing some general timelines (based on https://mariadb.com/docs/release-notes/community-server/about/release-model):
Note that these dates are not set in stone and might fluctuate a bit, given the priorities and load of the individual developers. I've confirmed with Serg that we're in one such fluctuation currently. Long story short: you should hopefully have your reply in a couple of days or so. I will monitor the situation and keep everybody updated in the meanwhile. |
svoj
left a comment
There was a problem hiding this comment.
I believe the approach introduced in the first patch is worth pursuing and should be developed further. My main concerns are the number of dict_sys locks it requires and taking additional locks in cases where they could potentially be avoided.
The second patch does not seem worthwhile to me, unless @arcivanov has a strong case for keeping it. It adds complexity for relatively little benefit.
At the same time, I believe ALTER TABLE parent ADD INDEX(c) is logically unrelated to foreign keys, and in principle InnoDB could probably handle such operations without introducing additional MDL locking.
| const bool empty= m_prebuilt->table->foreign_set.empty(); | ||
| dict_sys.unfreeze(); | ||
| return !empty; | ||
| } |
There was a problem hiding this comment.
Can't we cache this flag somewhere (m_prebuilt?) so that we don't have to touch dict_sys latch at all? OTOH isn't foreign_set constant for given table, do we need to lock the latch at all?
| @@ -5106,6 +5106,85 @@ | |||
| DBUG_RETURN(FALSE); | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
Consecutive functions should be separated with 2 empty lines in between
| --source include/have_innodb.inc | ||
| --source include/have_metadata_lock_info.inc | ||
| --source include/have_debug_sync.inc | ||
| --source include/count_sessions.inc |
There was a problem hiding this comment.
count_sessions.inc is rudimentary, it doesn't solve what it was intended to solve.
| TABLE *table= table_list->table; | ||
|
|
||
| /* Avoid the heavier get_foreign_key_list() (which acquires dict_sys | ||
| exclusive latch) for tables that have no FK references to parents. */ |
There was a problem hiding this comment.
Multi-line comments are normally formatted as:
/*
Avoid the heavier get_foreign_key_list() (which acquires dict_sys
exclusive latch) for tables that have no FK references to parents.
*/
This comment refers to dict_sys exclusive latch, which is a non-generic InnoDB specific term. It'd be nice to make it more generic.
| When performing DML on a child table that has foreign keys referencing | ||
| other tables, InnoDB's FK constraint check acquires InnoDB-internal locks | ||
| on the parent table. Without corresponding MDL on the parent, concurrent | ||
| DDL on the parent can crash (MDEV-37365). |
There was a problem hiding this comment.
A bug description, even if relevant, does not make for a good function comment.
| --connection dml3 | ||
| --reap | ||
| --inc $i | ||
| } |
There was a problem hiding this comment.
We don't normally put stress tests into mtr. The other tests ensure that MDL_SHARED_READ lock is acquired, it is enough for verification. You may also issue concurrent ALTER TABLE while INSERT INTO child.parent_id transaction is active and demonstrate that ALTER waits.
| --connect(con_insert,localhost,root,,test) | ||
| --let $con_insert_id= `SELECT CONNECTION_ID()` | ||
| SET DEBUG_SYNC= 'after_lock_tables_takes_lock SIGNAL locked WAIT_FOR go'; | ||
| --send INSERT INTO child VALUES (1) |
There was a problem hiding this comment.
Is this debug sync complexity really needed? INSERT should start a transaction that will keep holding the parent lock even upon its completion. Isn't default connection enough to verify this?
| if (prepare_fk_prelocking_list(thd, prelocking_ctx, table_list, | ||
| need_prelocking, | ||
| table_list->trg_event_map)) | ||
| return TRUE; | ||
|
|
||
| if (prepare_fk_referenced_prelocking_list(thd, prelocking_ctx, table_list, | ||
| need_prelocking)) | ||
| return TRUE; |
There was a problem hiding this comment.
MDL_SHARED_READ is relatively cheap, but nevertheless it is a contention point. I don't mean contention with MDL_EXCLUSIVE, but rather contention on a lower level locking primitives.
However we have 18985d8, which solves similar SHARED vs SHARED contention bottleneck, but only in the BACKUP namespace. It can be extended to the TABLE namespace too. It will make MDL_SHARED_READ absolutely cheap.
I'm more worried about reading list of FK references, which takes global X-latch? Even global S-latch issued by references_foreign_key() is harmful.
If we lock child during ALTER TABLE parent instead, it may involve one correctness issue. There's LOCK TABLES parent WRITE and REPAIR TABLE parent, both take MDL_SHARED_NO_READ_WRITE. Solution proposed Arcadiy will be conflicting with these, solution proposed by Marko won't. These are probably irrelevant with InnoDB, but we have other engines that may be willing to implement FK and conflicting may make sense.
Many popular RDBMSes appear to handle FK-related locking on the DML side. Moving this logic to DDL would significantly broaden the scope of conflicting locks beyond what is strictly necessary, whereas the DML-side approach only locks the objects that are actually involved.
I’d go with the DML-side approach, assuming we can make both FK list lookups and MDL_SHARED_READ acquisition inexpensive. We should follow up with Marko on the former; the latter is already known to be fixable.
|
|
||
| arena= thd->activate_stmt_arena_if_needed(&backup); | ||
|
|
||
| table->file->get_foreign_key_list(thd, &fk_list); |
There was a problem hiding this comment.
Ehm. It means we touch global dict_sys latch up to 4 times per table:
referenced_by_foreign_key() - S-lock
get_parent_foreign_key_list() - S-lock
references_foreign_key() - S-lock
get_foreign_key_list() - X-lock (why not S-lock?)
Even for tables that don't have anything to do with FK we get 2 S-locks. That's a lot.
|
|
||
| --connection con_insert | ||
| SET DEBUG_SYNC= 'after_lock_tables_takes_lock SIGNAL locked WAIT_FOR go'; | ||
| --send UPDATE child SET a=2 WHERE a=1 |
There was a problem hiding this comment.
You're still holding locks from the previous transaction, right? If so, this test doesn't really test what it was supposed to test.
svoj
left a comment
There was a problem hiding this comment.
Thinking more on this...
thr1: lock and open child
thr1: get FK parent list
thr2: rename table parent to parent1
thr1: lock parent
Looks like the above is still permitted. If so, the bug isn't fully fixed.
There was a problem hiding this comment.
Pull request overview
This PR addresses MDEV-37365 by ensuring DML on an FK child also acquires MDL_SHARED_READ on its FK parent(s) via prelocking, preventing crashes when concurrent DDL on the parent invalidates InnoDB dictionary objects while FK locks are held.
Changes:
- Add child→parent FK prelocking (
prepare_fk_referenced_prelocking_list()), using a lightweight handler fast-path (handler::references_foreign_key()+ InnoDB override). - Add early FK checks under MDL_SHARED_UPGRADABLE for TRUNCATE and DROP to return FK-specific errors (while other DDL blocks at MDL and times out as documented).
- Update/add MTR coverage for new MDL behavior and adjust existing expected results accordingly.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| storage/innobase/handler/ha_innodb.h | Adds InnoDB override declaration for references_foreign_key(). |
| storage/innobase/handler/ha_innodb.cc | Implements references_foreign_key() and extends open() behavior for FK-check opens. |
| storage/innobase/dict/dict0dict.cc | Avoids crash when table->space is NULL while reporting corruption/provider issues. |
| sql/handler.h | Introduces handler::references_foreign_key() virtual fast-path API. |
| sql/sql_base.cc | Adds child→parent FK parent prelocking list preparation and wires it into DML prelocking strategy. |
| sql/sql_truncate.cc | Adds early FK parent check under SU then upgrades to X for TRUNCATE (when FK checks enabled). |
| sql/sql_table.cc | Adds early FK parent check under SU then upgrades to X for DROP TABLE and integrates BACKUP_DDL protocol. |
| include/my_base.h | Adds HA_OPEN_FOR_FK_CHECK open flag for metadata-only FK checks. |
| mysql-test/suite/perfschema/r/mdl_func.result | Updates expected MDL rows (now includes SU). |
| mysql-test/suite/innodb/t/foreign_key.test | Updates behavior expectations (timeouts vs FK-specific errors) and uses lock_wait_timeout. |
| mysql-test/suite/innodb/r/foreign_key.result | Updates expected output for adjusted FK/MDL behavior. |
| mysql-test/suite/innodb/r/monitor.result | Updates InnoDB monitor expected counters due to extra metadata opens. |
| mysql-test/main/innodb_fk_mdl.test | New regression/behavior test for FK-parent MDL acquisition and crash reproducer scenario. |
| mysql-test/main/innodb_fk_mdl.result | Expected output for new innodb_fk_mdl test. |
| mysql-test/main/partition_debug_sync.test | Adjusts debug sync expectations for new locking behavior. |
| mysql-test/main/partition_debug_sync.result | Updates expected result for adjusted debug sync flow. |
| mysql-test/main/backup_locks.result | Updates expected MDL rows (now includes SU). |
| mysql-test/main/backup_lock.result | Updates expected MDL rows (now includes SU). |
| mysql-test/main/lock_multi.result | Updates expected output (removal of prior timeout warning output). |
| mysql-test/main/query_cache_innodb.result | Updates expected count of logged “Invalid (old?) table or database name” errors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| TABLE_LIST *tl= thd->alloc<TABLE_LIST>(1); | ||
| tl->init_one_table_for_prelocking(fk->referenced_db, fk->referenced_table, | ||
| NULL, TL_READ, TABLE_LIST::PRELOCK_FK, table_list->belong_to_view, | ||
| 0, &prelocking_ctx->query_tables_last, table_list->for_insert_data); |
When `ALTER TABLE` runs on a parent table with FK children and concurrent `INSERT` runs on a child table, the server crashes in `innobase_reload_table()` → `dict_sys.remove()` with assertion `table->n_rec_locks == 0`. The root cause is that `INSERT INTO child` performs its FK constraint check inside InnoDB, acquiring InnoDB-internal locks (LOCK_IS + record locks) on the parent table without any corresponding MDL on the parent. When ALTER's commit phase tears down and recreates the parent's `dict_table_t`, it hits those still-held locks. The fix closes the gap by extending the DML prelocking strategy: when a child table with foreign keys is opened for DML, the SQL layer now also prelocks the FK parent table(s) with `TL_READ` (→ `MDL_SHARED_READ`). This properly declares the FK dependency at the MDL layer, so DDL on the parent (which holds `MDL_EXCLUSIVE`) will wait for child DML transactions to complete before proceeding. Implementation: - New function `prepare_fk_referenced_prelocking_list()` in `sql/sql_base.cc`, symmetric to the existing `prepare_fk_prelocking_list()` (which handles the parent→children direction for cascading FK actions). Uses `get_foreign_key_list()` to find referenced parent tables and prelocks them with `TL_READ` + `PRELOCK_FK` (→ `OPEN_STUB`, so only MDL is acquired, no table open). - New `handler::references_foreign_key()` virtual (+ InnoDB override) as a lightweight early-exit check, symmetric to the existing `referenced_by_foreign_key()`. Uses `dict_sys.freeze()` (shared latch) to check `foreign_set.empty()`, avoiding the heavier `get_foreign_key_list()` (exclusive latch) for tables without FKs. - Called from `DML_prelocking_strategy::handle_table()` in both the `trg_event_map` and `slave_fk_event_map` branches. Behavioral change: DDL on a parent table (`ALTER`, `DROP`, `TRUNCATE`, `RENAME`) now blocks at the MDL layer while any child table has an open transaction that touched FK columns (even if the DML statement failed). Previously, DDL could proceed and return FK-specific errors (`ER_TRUNCATE_ILLEGAL_FK`, `ER_ROW_IS_REFERENCED_2`), but InnoDB-internal locks were still held by the child, leading to crashes on concurrent `ALTER TABLE`. With this fix, DDL gets `ER_LOCK_WAIT_TIMEOUT` instead, controlled by `lock_wait_timeout` (not `innodb_lock_wait_timeout`, since the conflict is at the MDL layer). Once the child transaction ends, DDL returns the same FK-specific errors as before. The regression is narrow: the DDL was never going to succeed anyway (FK constraints prevent it regardless of MDL), so only the error code changes, not the outcome. The old behavior was a crash waiting to happen. `innodb.foreign_key` test (MDEV-26554 section) updated accordingly. Galera/WSREP: no `wsrep_foreign_key_append()` needed in the new function — the child→parent FK check is read-only and doesn't require writeset certification keys for the parent table.
Review feedback by Sergey Vojtovich on PR MariaDB#5085: - Replace the `get_foreign_key_list()` call in `prepare_fk_referenced_prelocking_list()` with a new lightweight `handler::get_fk_referenced_table_names()` returning only the names of the referenced tables (new struct `FK_TABLE_NAME`). The InnoDB implementation holds the shared `dict_sys` latch instead of the exclusive one: unlike `get_foreign_key_info()`, it does not need to load referenced tables into the dictionary cache. The name conversion is quiet, so prelocking a child table with a pre-1.0 (`#mysql50#`) name no longer writes to the error log (reverts the `query_cache_innodb.result` change). - `ha_innobase::references_foreign_key()` no longer acquires the `dict_sys` latch: the set of foreign keys of a table open for DML cannot change concurrently, since that would require DDL on the table holding `MDL_EXCLUSIVE`, which conflicts with the metadata lock the DML statement holds; the table object is pinned by the open handle. With both changes, the child-to-parent direction of DML prelocking touches no global `dict_sys` latch at all for tables without foreign keys, and takes one shared (instead of exclusive) latch for tables with them. - Handle out-of-memory when allocating prelocking `TABLE_LIST` elements, in both `prepare_fk_prelocking_list()` and `prepare_fk_referenced_prelocking_list()`. - Replace the unreachable empty-list early return with a `DBUG_ASSERT` and a comment proving the invariant. - The function comment now describes what the function does instead of the bug that prompted it; comments follow the multi-line format and avoid engine-specific terms; two blank lines between functions. - Rework the `innodb_fk_mdl` test per review: * no debug_sync: a transaction holds its metadata locks until commit, so they are observable from the default connection after a plain `BEGIN; <DML>`; * the stress test is replaced with a deterministic test showing `ALTER TABLE parent` waiting for the metadata lock; * `evalp` instead of `eval` with `--replace_result`; * lock queries filter on the connection's THREAD_ID: background purge threads acquire `MDL_SHARED` on recently modified tables, making unfiltered `metadata_lock_info` output nondeterministic; * new test verifying the parent lock is acquired regardless of `@@foreign_key_checks`. The prelocking must not depend on it: the prelocking list of a prepared statement or stored routine statement is computed once and reused across executions, while `@@foreign_key_checks` may change between them.
A rename rewrites the foreign key metadata shared between both tables
of every foreign key relationship the renamed table participates in.
DML prelocking reads that metadata (`prepare_fk_prelocking_list()`,
`prepare_fk_referenced_prelocking_list()`) while holding a metadata
lock on the DML table only, so the following race was possible:
thr1: open child, acquiring MDL_SHARED_WRITE on child
thr1: read the FK parent list -> "parent"
thr2: RENAME TABLE parent TO parent1 (MDL_EXCLUSIVE on parent and
parent1 only) -- rewrites the FK to reference parent1
thr1: acquire MDL_SHARED_READ on the stale name "parent"
thr1's FK check then acquires engine-internal locks on parent1 without
any MDL on it, recreating the MDEV-37365 crash scenario with a
concurrent ALTER TABLE on parent1.
Fix: `RENAME TABLE` and `ALTER TABLE ... RENAME` now acquire
`MDL_EXCLUSIVE` on all tables related to the renamed table by foreign
keys (the children referencing it and the parents it references),
plus intention-exclusive schema locks, before performing the rename:
- `fk_append_related_table_mdl_requests()` collects the lock requests
from an open handler (children via `get_parent_foreign_key_list()`,
parents via `get_fk_referenced_table_names()`).
- `mysql_rename_tables()` calls the new `fk_lock_related_tables()`
after `lock_table_names()`: the exclusive locks on the renamed
names guarantee that the set of related tables cannot grow
concurrently (adding a foreign key requires a conflicting lock on
both tables involved), so the collected set is complete. Each
source table's share is opened briefly for the FK discovery;
sources that are temporary tables, views, missing, or in engines
without foreign key support are skipped.
- `mysql_alter_table()` appends the requests to the existing
target-name lock batch when the ALTER renames the table (the table
is already open there).
The DML side holds its metadata lock on the DML table from open until
transaction end, and the rename now requires `MDL_EXCLUSIVE` on that
same table, so the rename is serialized with the entire window between
the FK metadata read and the related-name lock acquisition.
This mirrors MySQL WL#6049 item 3: DDL statements which affect
FK-related metadata, like RENAME TABLE, acquire X locks on the tables
participating in the foreign key.
Summary
MDL_SHARED_READon parent table(s) via prelocking, preventing crashes when concurrent DDL on the parent tears downdict_table_twhile InnoDB-internal FK locks are heldprepare_fk_referenced_prelocking_list()insql/sql_base.cc— symmetric to existingprepare_fk_prelocking_list()(parent→children direction)handler::references_foreign_key()virtual + InnoDB override for lightweight early-exit checkslave_fk_event_mappath) also gets FK parent prelockingBehavioral change
DDL on parent now gets
ER_LOCK_WAIT_TIMEOUTinstead of FK-specific errors while a child has an open transaction. The DDL was never going to succeed anyway (FK constraints prevent it). Once the child transaction ends, behavior is identical to before.