Skip to content

MDEV-39344: trx_disconnect_prepared() uses wrong mutex#5094

Merged
dr-m merged 3 commits into
10.6from
10.6-MDEV-39344
May 28, 2026
Merged

MDEV-39344: trx_disconnect_prepared() uses wrong mutex#5094
dr-m merged 3 commits into
10.6from
10.6-MDEV-39344

Conversation

@dr-m
Copy link
Copy Markdown
Contributor

@dr-m dr-m commented May 19, 2026

trx_t::disconnect_prepared(): Replaces trx_disconnect_prepared(). Protect the data members with trx_t::mutex.

fetch_data_into_cache(): Fix a correctness regression that had been introduced in 5cbf4e3 when the INFORMATION_SCHEMA views innodb_trx, innodb_locks, innodb_lock_waits were introduced in the InnoDB Plugin for MySQL 5.1, which later became the built-in InnoDB in MySQL 5.5 and MariaDB Server 5.5.

@dr-m dr-m requested a review from svoj May 19, 2026 08:28
@dr-m dr-m self-assigned this May 19, 2026
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment thread storage/innobase/buf/buf0buf.cc Outdated
Comment on lines +1659 to +1625
bool found= false;
find_interesting_trx f
{found, withdraw_started, my_hrtime_coarse()};
withdraw_started = current_time;
const my_hrtime_t current_hrtime{my_hrtime_coarse()};
bool found{false};
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this file buf0buf.cc are only applicable to the 10.6 branches that do not include #3107. We’re only replacing callback-based traversal with an inline range for loop here.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the InnoDB transaction system by replacing the thread_safe_trx_ilist_t wrapper with a standard ilist protected by the global lock_sys.latch and converting functor-based iterations into manual loops. Feedback points out a logic error in the InnoDB monitor where view_count might be incorrectly reported as zero, a bug in the Information Schema cache regarding truncation signaling, and potential data races when accessing transaction fields. Additionally, there is a concern that using the central lock_sys.latch for transaction registration could create a scalability bottleneck.

Comment thread storage/innobase/srv/srv0srv.cc Outdated
"--------------\n"
"ROW OPERATIONS\n"
"--------------\n"
"%zu read views open inside InnoDB\n", view_count);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The view_count variable may incorrectly report 0 here if the InnoDB lock monitor is disabled. It is initialized to 0 at line 777 and only updated within the conditional block at line 796. To ensure accurate reporting in the monitor output, view_count should be fetched regardless of whether the lock monitor is active, while ensuring that lock_sys.latch is held during the call to trx_sys.view_count().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional, but not mentioned in the commit message. If we need to protect trx_sys.trx_list with something else than lock_sys.latch then this problem would go away; we’d unconditionally read the view_count at this spot.

Comment thread storage/innobase/trx/trx0i_s.cc Outdated
Comment thread storage/innobase/buf/buf0buf.cc Outdated
Comment on lines +1631 to +1632
if (!trx.is_started() || !trx.mysql_thd
|| withdraw_started <= trx.start_time_micro) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Accessing trx.mysql_thd and trx.is_started() without holding trx.mutex is a data race. These fields are modified in trx_t::disconnect_prepared() under the protection of trx.mutex. While the impact in this context (buffer pool resizing diagnostics) is likely limited to incorrect warnings, it is technically undefined behavior. Consider acquiring the transaction mutex before checking these fields.

Comment thread storage/innobase/trx/trx0trx.cc Outdated
Comment on lines +323 to +325
lock_sys.wr_lock(SRW_LOCK_CALL);
trx_list.push_front(*trx);
lock_sys.wr_unlock();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Protecting the global transaction list with the central lock_sys.latch during register_trx and deregister_trx may introduce a scalability bottleneck. These functions are called at the start and end of every transaction. Moving from a dedicated list mutex to an exclusive lock on the heavily-contended lock_sys.latch could increase contention in high-concurrency environments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly this we need to find out by running some performance tests.

Comment on lines -787 to -788
void create() { mysql_mutex_init(trx_sys_mutex_key, &mutex, nullptr); }
void close() { mysql_mutex_destroy(&mutex); }
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to remove the trx_sys_mutex_key. If it turns out that this change will reduce performance due to increased contention on lock_sys.latch, I’ll put back the mutex, preferrably as srw_mutex or srw_lock_low to reduce overhead.

Copy link
Copy Markdown
Contributor

@svoj svoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel I like the idea of moving trx_sys.trx_list under the lock_sys.latch protection. I had hard time trying to follow the updated code. I feel like it increases maintenance cost. And it doesn't directly relate to the reported problem, it was about trx members protection rather than trx_sys.trx_list.

I'm less concerned about extra pressure on the lock_sys.latch, but I agree that it should be tested and we need to suggest correct benchmarks for verification that'd cover worst case scenario. E.g. combination of connect + oltp_read_write?

Why would we want to have this change in 10.6, which is EOL in 1.5 months? Shouldn't such refactoring be targeted for the main?

Also if #5043 is going to be pushed, this list is becoming somewhat redundant. We can usr rw_trx_ids.trxs instead.

/* todo/fixme: suggest to do it at innodb prepare */
trx->will_lock= false;
trx_sys.rw_trx_hash.put_pins(trx);
will_lock= false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously will_lock was updated out of critical section. IIUC it is always accessed by the owner thread, except for 2 occurrences in lock0lock.cc.

  1. check_trx_state() can be called with either trx->mutex or trx->rw_trx_hash_element->mutex locked, there's one case when both are locked
  2. lock_print_info doesn't lock trx->mutex
  3. will_lock is updated in lots of places without holding trx->mutex

This looks like a very inconsistent locking pattern. I don't see much value in moving this specific will_lock update under trx->mutex. I suggest that we should reevaluate its locking pattern separately.

Comment thread storage/innobase/trx/trx0trx.cc Outdated
ut_ad(mysql_thd);
ut_ad(!mysql_log_file_name);
read_view.close();
mutex.wr_lock();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not mutex_lock()? Because of assert()?

Comment thread storage/innobase/trx/trx0sys.cc Outdated
size_t total_trx= 0, prepared_trx= 0;

trx_sys.trx_list.for_each([&](const trx_t &trx) {
for (const trx_t &trx : trx_list)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need a lock here? IIUC it can be executed in multithreaded environment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this function trx_sys.any_active_transactions() is being invoked from two places during shutdown to check if any transactions are still active. During that time, trx_sys.trx_list may be concurrently modified, and we’d have to protect the access.

@dr-m
Copy link
Copy Markdown
Contributor Author

dr-m commented May 27, 2026

I don't feel I like the idea of moving trx_sys.trx_list under the lock_sys.latch protection. I had hard time trying to follow the updated code. I feel like it increases maintenance cost. And it doesn't directly relate to the reported problem, it was about trx members protection rather than trx_sys.trx_list.

I'm less concerned about extra pressure on the lock_sys.latch, but I agree that it should be tested and we need to suggest correct benchmarks for verification that'd cover worst case scenario. E.g. combination of connect + oltp_read_write?

Why would we want to have this change in 10.6, which is EOL in 1.5 months? Shouldn't such refactoring be targeted for the main?

Also if #5043 is going to be pushed, this list is becoming somewhat redundant. We can usr rw_trx_ids.trxs instead.

Thank you. I will revise this to only fix the reported problem and not refactor the trx_sys at this point. A separate ticket may be filed for the trx_sys refactoring.

For 10.6, the simpler patch to fix the reported problem should be good enough. This more extensive patch would be applicable to any later major version branch; only the find_interesting_trx is something that was removed in MDEV-29445.

@dr-m dr-m force-pushed the 10.6-MDEV-39344 branch from ae744e4 to 6bc94e0 Compare May 27, 2026 08:11
@dr-m dr-m requested a review from svoj May 27, 2026 08:11
@dr-m dr-m force-pushed the 10.6-MDEV-39344 branch from 5395628 to cb6b5e2 Compare May 27, 2026 08:47
Copy link
Copy Markdown
Contributor

@svoj svoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is a step towards the right direction. It keeps issue described in c0817da fixed.

However c0817da unintentionally fixed similar issue in lock_print_info_all_transactions(). This fix reopens it. The monitor code looks quite fragile, so I leave it up to you to decide if you want to dig into this further.

/* these are protected by cache->rw_lock.wr_lock() */
trx_i_s_cache_clear(cache);
/* this flag may be set by fetch_data_into_cache_low() below */
cache->is_truncated= false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is correct. But locking pattern of is_truncated is broken. It is subject of data race as well as race condition.

thr1: trx_i_s_cache_start_write(trx_i_s_cache);
thr1: trx_i_s_possibly_fetch_data_into_cache(trx_i_s_cache); // all is good, not truncated
thr1: trx_i_s_cache_end_write(trx_i_s_cache);

thr1: if (trx_i_s_cache_is_truncated(trx_i_s_cache)) // data race

thr2: trx_i_s_cache_start_write(trx_i_s_cache);
thr2: trx_i_s_possibly_fetch_data_into_cache(trx_i_s_cache); // truncated
thr2: trx_i_s_cache_end_write(trx_i_s_cache);

thr1: starts reading truncated data with no warning issued

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. The probability of this TOCTOU race could be reduced by the timestamp logic.

I ended up folding all this logic to a single function call that returns whether the cache was truncated. Also, there is no need for pointer indirection or for passing a pointer to a singleton static object across compilation unit boundaries.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change. It solves data race, but it doesn't solve reported TOCTOU. :)

I’m also not sure which approach is preferable here: keeping trx_i_s_cache_t hidden as a static class inside trx0i_s.cc, or making it a globally visible class with private members. I suspect the former is mostly a remnant of the pre-C++ era.

dr-m added 3 commits May 28, 2026 09:23
trx_t::disconnect_prepared(): Replaces trx_disconnect_prepared().
Protect is_recovered, mysql_thd with mutex, to be consistent with
protecting fetch_data_into_cache_low().

This fixes up
commit c0817da (MDEV-29575).

Suggested by: Sergey Vojtovich
There is a memory limit implemented for the INFORMATION_SCHEMA tables
innodb_trx, innodb_locks, innodb_lock_waits. The predicate
trx_i_s_cache_is_truncated() would never hold in
trx_i_s_common_fill_table() due to a logic bug that had been
introduced in a refactoring.

fetch_data_into_cache(): Shrink the critical section of lock_sys.latch
and do not unconditionally reset cache->is_truncated when leaving the
function. This fixes a correctness regression that had been introduced
back in 2007 by commit 5cbf4e3
for the InnoDB Plugin for MySQL 5.1, which later became
the built-in InnoDB in MySQL 5.5 and MariaDB Server 5.5.

Reviewed by: Sergey Vojtovich
trx_i_s_possibly_fetch_data_into_cache(): Fetch into the cache and
return whether the cache was truncated. Previously, we checked this
flag without holding any latch.

trx_i_s_cache_start_write(), trx_i_s_cache_end_write(),
trx_i_s_cache_is_truncated(): Remove. These are now part of
trx_i_s_possibly_fetch_data_into_cache().

trx_i_s_cache: Remove. The singleton cache object can be a static object
with no pointer indirection.
@dr-m dr-m force-pushed the 10.6-MDEV-39344 branch from cb6b5e2 to 1606951 Compare May 28, 2026 06:24
@dr-m dr-m merged commit 1606951 into 10.6 May 28, 2026
12 of 14 checks passed
@dr-m dr-m deleted the 10.6-MDEV-39344 branch May 28, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants