Skip to content

Conversation

venkatesh-prasad-v
Copy link
Contributor

https://bugs.mysql.com/bug.php?id=100586

Background

In general, if a replication applier thread fails to execute a
transaction because of an InnoDB deadlock or because the transaction's
execution time exceeded InnoDB's innodb_lock_wait_timeout, it
automatically retries slave_transaction_retries times before stopping
with an error.

And when --slave_preserve_commit_order is enabled, the replica server
ensures that transactions are externalized on the replica in the same
order as they appear in the replica's relay log, and prevents gaps in
the sequence of transactions that have been executed from the relay log.
If a thread's execution is completed before its preceding thread, then
the executing thread waits until all previous transactions are committed
before committing.

Problem & Analysis

When --slave_preserve_commit_order is enabled on slave and if the
waiting thread has locked the rows which are needed by the thread
executing the previous transaction(as per relay log), then the innodb
deadlock detection algorithm detects the deadlock between workers and
will ask the waiting thread to rollback (only if its sequence number is
lesser than that of the waiting thread).

When this happens, the waiting thread wakes up from the cond_wait(SPCO)
and it gets to know that it was asked to rollback by its preceding
transaction as it was holding a lock that is needed by the other
transaction to progress. It then rolls back its transaction so that the
the preceding transaction can be committed and retries the transaction.

But just before the transaction is retried, the worker checks if it
encountered any errors during its execution. If there is no error, it
simulates ER_LOCK_DEADLOCK error in order for it to be considered as a
temporary error so that the worker thread retries the transaction.

However, when the retries are exhausted, the worker thread logs an error
into the error log by accessing the thread's diagnostic_area by calling
thd->get_stmt_da()->mysql_errno(). If the error had been simulate (not
called through my_error function call), the diagnostic_area would
still be empty and thus making the assertion DBUG_ASSERT(m_status == DA_ERROR); to fail.

NOTE: This assertion is observed only when both
slave_transaction_retries and slave_preserve_commit_order are enabled on
the replica server and is more likely to happen when
slave_transaction_retries is set to a lower value.

Fix

Slave_transaction:::retry_transaction() has been modified to call
thd->get_stmt_da()->mysql_errno() only when thd->is_error() is
evaluated to true.

…area::mysql_errno()`

https://bugs.mysql.com/bug.php?id=100586

Background
----------
In general, if a replication applier thread fails to execute a
transaction because of an InnoDB deadlock or because the transaction's
execution time exceeded InnoDB's innodb_lock_wait_timeout, it
automatically retries slave_transaction_retries times before stopping
with an error.

And when --slave_preserve_commit_order is enabled, the replica server
ensures that transactions are externalized on the replica in the same
order as they appear in the replica's relay log, and prevents gaps in
the sequence of transactions that have been executed from the relay log.
If a thread's execution is completed before its preceding thread, then
the executing thread waits until all previous transactions are committed
before committing.

Problem & Analysis
------------------
When --slave_preserve_commit_order is enabled on slave and if the
waiting thread has locked the rows which are needed by the thread
executing the previous transaction(as per relay log), then the innodb
deadlock detection algorithm detects the deadlock between workers and
will ask the waiting thread to rollback (only if its sequence number is
lesser than that of the waiting thread).

When this happens, the waiting thread wakes up from the cond_wait(SPCO)
and it gets to know that it was asked to rollback by its preceding
transaction as it was holding a lock that is needed by the other
transaction to progress. It then rolls back its transaction so that the
the preceding transaction can be committed and retries the transaction.

But just before the transaction is retried, the worker checks if it
encountered any errors during its execution. If there is no error, it
simulates ER_LOCK_DEADLOCK error in order for it to be considered as a
temporary error so that the worker thread retries the transaction.

However, when the retries are exhausted, the worker thread logs an error
into the error log by accessing the thread's diagnostic_area by calling
`thd->get_stmt_da()->mysql_errno()`. If the error had been simulate (not
called through `my_error` function call), the diagnostic_area would
still be empty and thus making the assertion `DBUG_ASSERT(m_status ==
DA_ERROR);` to fail.

NOTE: This assertion is observed only when both
slave_transaction_retries and slave_preserve_commit_order are enabled on
the replica server and is more likely to happen when
slave_transaction_retries is set to a lower value.

Fix
---
`Slave_transaction:::retry_transaction()` has been modified to call
`thd->get_stmt_da()->mysql_errno()` only when `thd->is_error()` is
evaluated to true.
@venkatesh-prasad-v
Copy link
Contributor Author

I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

@mysql-oca-bot
Copy link

Hi, thank you for your contribution. Your code has been assigned to an internal queue. Please follow
bug http://bugs.mysql.com/bug.php?id=100591 for updates.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants