Bug #100586 Assertion failure m_status == DA_ERROR
in Diagnostics_area::mysql_errno()
#304
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://bugs.mysql.com/bug.php?id=100586
Background
In general, if a replication applier thread fails to execute a
transaction because of an InnoDB deadlock or because the transaction's
execution time exceeded InnoDB's innodb_lock_wait_timeout, it
automatically retries slave_transaction_retries times before stopping
with an error.
And when --slave_preserve_commit_order is enabled, the replica server
ensures that transactions are externalized on the replica in the same
order as they appear in the replica's relay log, and prevents gaps in
the sequence of transactions that have been executed from the relay log.
If a thread's execution is completed before its preceding thread, then
the executing thread waits until all previous transactions are committed
before committing.
Problem & Analysis
When --slave_preserve_commit_order is enabled on slave and if the
waiting thread has locked the rows which are needed by the thread
executing the previous transaction(as per relay log), then the innodb
deadlock detection algorithm detects the deadlock between workers and
will ask the waiting thread to rollback (only if its sequence number is
lesser than that of the waiting thread).
When this happens, the waiting thread wakes up from the cond_wait(SPCO)
and it gets to know that it was asked to rollback by its preceding
transaction as it was holding a lock that is needed by the other
transaction to progress. It then rolls back its transaction so that the
the preceding transaction can be committed and retries the transaction.
But just before the transaction is retried, the worker checks if it
encountered any errors during its execution. If there is no error, it
simulates ER_LOCK_DEADLOCK error in order for it to be considered as a
temporary error so that the worker thread retries the transaction.
However, when the retries are exhausted, the worker thread logs an error
into the error log by accessing the thread's diagnostic_area by calling
thd->get_stmt_da()->mysql_errno()
. If the error had been simulate (notcalled through
my_error
function call), the diagnostic_area wouldstill be empty and thus making the assertion
DBUG_ASSERT(m_status == DA_ERROR);
to fail.NOTE: This assertion is observed only when both
slave_transaction_retries and slave_preserve_commit_order are enabled on
the replica server and is more likely to happen when
slave_transaction_retries is set to a lower value.
Fix
Slave_transaction:::retry_transaction()
has been modified to callthd->get_stmt_da()->mysql_errno()
only whenthd->is_error()
isevaluated to true.