Skip to content

Commit e515e80

Browse files
MDEV-34689 Redo log corruption at high load
Issue: During mtr_t:commit, if there is not enough space available in redo log buffer, we flush the buffer. During flush, the LSN lock is released allowing other concurrent mtr to commit. After flush we reacquire the lock but use the old LSN obtained before check. It could lead to redo log corruption. As the LSN moves backwards with the possibility of data loss and unrecoverable server if the server aborts for any reason or if server is shutdown with innodb_fast_shutdown=2. With normal shutdown, recovery fails to map the checkpoint LSN to correct offset. In debug mode it hits log0log.cc:863: lsn_t log_t::write_buf() Assertion `new_buf_free == ((lsn - first_lsn) & write_size_1)' failed. In release mode, after normal shutdown, restart fails. [ERROR] InnoDB: Missing FILE_CHECKPOINT(8416546) at 8416546 [ERROR] InnoDB: Log scan aborted at LSN 8416546 Backup fails reading the corrupt redo log. [00] 2024-07-31 20:59:10 Retrying read of log at LSN=7334851 [00] FATAL ERROR: 2024-07-31 20:59:11 Was only able to copy log from 7334851 to 7334851, not 8416446; try increasing innodb_log_file_size Unless a backup is tried or the server is shutdown or killed immediately, the corrupt redo part is eventually truncated and there may not be any visible issues seen in release mode. This issue was introduced by the following commit. commit a635c40 MDEV-27774 Reduce scalability bottlenecks in mtr_t::commit() Fix: If we need to release latch and flush redo before writing mtr logs, make sure to get the latest system LSN after reacquiring the redo system latch.
1 parent 9ab3794 commit e515e80

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

storage/innobase/mtr/mtr0mtr.cc

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1042,13 +1042,19 @@ std::pair<lsn_t,byte*> log_t::append_prepare(size_t size, bool ex) noexcept
10421042
size_t b{spin ? lock_lsn() : buf_free.load(std::memory_order_relaxed)};
10431043
write_to_buf++;
10441044

1045-
const lsn_t l{lsn.load(std::memory_order_relaxed)}, end_lsn{l + size};
1045+
lsn_t l{lsn.load(std::memory_order_relaxed)}, end_lsn{l + size};
10461046

10471047
if (UNIV_UNLIKELY(pmem
10481048
? (end_lsn -
10491049
get_flushed_lsn(std::memory_order_relaxed)) > capacity()
10501050
: b + size >= buf_size))
1051+
{
10511052
b= append_prepare_wait<spin>(b, ex, l);
1053+
/* While flushing log, we had released the lsn lock and LSN could have
1054+
progressed in the meantime. */
1055+
l= lsn.load(std::memory_order_relaxed);
1056+
end_lsn= l + size;
1057+
}
10521058

10531059
size_t new_buf_free= b + size;
10541060
if (pmem && new_buf_free >= file_size)

0 commit comments

Comments
 (0)