Skip to content

MDEV-32633: Fix Galera cluster <-> native replication interaction#3111

Closed
denis-protivensky wants to merge 1 commit intoMariaDB:10.4from
mariadb-corporation:10.4-MDEV-32633
Closed

MDEV-32633: Fix Galera cluster <-> native replication interaction#3111
denis-protivensky wants to merge 1 commit intoMariaDB:10.4from
mariadb-corporation:10.4-MDEV-32633

Conversation

@denis-protivensky
Copy link
Copy Markdown

  • The Jira issue number for this PR is: MDEV-32633

Description

It's possible to establish Galera multi-cluster setups connected through the native replication when every Galera cluster is configured to have a separate domain ID.
For this setup to work, we need to replace domain ID values in generated GTID events when they are written at transaction commit to the values configured by Wsrep replication.

At the same time, it's possible that the GTID event already contains a correct domain ID if it comes through the native replication from another Galera cluster.
In this case, when such an event is applied either through a native replication slave thread or through Wsrep applier, we write GTID event on transaction start and avoid writing it during transaction commit.

The code contained multiple problems that were fixed:

  • applying GTID events didn't work because it's applied without a running server transaction and Wsrep transaction was not started
  • GTID event generation on transaction start didn't contain proper "standalone" and "is_transactional" flags that the original applied GTID event contained
  • condition determining that GTID event is written on transaction start to avoid writing it on commit relied on the fact that the GTID event is the first found in transaction/statement caches, which wasn't the case and resulted in duplicate GTID events written
  • instead of relying on the caches to find a GTID event, a simple check is introduced that follows the exact rules for checking if event is written at transaction start as described above
  • the test case is improved to check that exact GTID events are applied after two Galera clusters have synced.

Release Notes

This fix is 10.4 version-only.

How can this PR be tested?

Re-enabled previously failing MTR test.

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch.
  • This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

It's possible to establish Galera multi-cluster setups connected
through the native replication when every Galera cluster is configured
to have a separate domain ID.
For this setup to work, we need to replace domain ID values in generated
GTID events when they are written at transaction commit to the values
configured by Wsrep replication.

At the same time, it's possible that the GTID event already contains
a correct domain ID if it comes through the native replication from
another Galera cluster.
In this case, when such an event is applied either through a native
replication slave thread or through Wsrep applier, we write GTID event
on transaction start and avoid writing it during transaction commit.

The code contained multiple problems that were fixed:
- applying GTID events didn't work because it's applied without a
running server transaction and Wsrep transaction was not started
- GTID event generation on transaction start didn't contain proper
"standalone" and "is_transactional" flags that the original applied
GTID event contained
- condition determining that GTID event is written on transaction start
to avoid writing it on commit relied on the fact that the GTID event
is the first found in transaction/statement caches, which wasn't the
case and resulted in duplicate GTID events written
- instead of relying on the caches to find a GTID event, a simple check
is introduced that follows the exact rules for checking if event is
written at transaction start as described above
- the test case is improved to check that exact GTID events are
applied after two Galera clusters have synced.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@janlindstrom janlindstrom added the Codership Codership Galera label Mar 18, 2024
@sysprg
Copy link
Copy Markdown
Contributor

sysprg commented Jun 4, 2024

Thanks, the fix has been merged with the head revision, but for version 10.5+, since version 10.4 no longer in active development:
0cc9b49
a483872
c21aa48

@sysprg sysprg closed this Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Codership Codership Galera

Development

Successfully merging this pull request may close these issues.

4 participants