-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Description
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
We are running MySQL in group replication mode and as per docs a primary key is required in a table. We noticed during our migration from Airflow 2.1.x to 2.3.4 that Airflow creates a temporary table to perform xcom related migrations by inserting the records and then renaming the table __airflow_tmp_xcom to xcom. The insert statement fails since the primary key is created after insertions. We have made a fix to move the primary key creation to be on the temporary table. Another possible solution would be to set the pragma as per docs. I have opened this issue for discussion since this might be helpful to others who might be running a similar MySQL setup and face this issue with this and possibly other migrations to 2.4.x .
airflow/airflow/migrations/versions/0102_2_3_0_switch_xcom_table_to_use_run_id.py
Lines 114 to 124 in 4dd3c6a
| op.execute(f"INSERT INTO __airflow_tmp_xcom {query.selectable.compile(op.get_bind())}") | |
| if is_sqlite: | |
| op.execute("PRAGMA foreign_keys=off") | |
| op.drop_table("xcom") | |
| if is_sqlite: | |
| op.execute("PRAGMA foreign_keys=on") | |
| op.rename_table("__airflow_tmp_xcom", "xcom") | |
| with op.batch_alter_table("xcom") as batch_op: | |
| batch_op.create_primary_key("xcom_pkey", ["dag_run_id", "task_id", "map_index", "key"]) |
https://dev.mysql.com/doc/refman/8.0/en/group-replication-requirements.html
Primary Keys. Every table that is to be replicated by the group must have a defined primary key, or primary key equivalent where the equivalent is a non-null unique key. Such keys are required as a unique identifier for every row within a table, enabling the system to determine which transactions conflict by identifying exactly which rows each transaction has modified. Group Replication has its own built-in set of checks for primary keys or primary key equivalents, and does not use the checks carried out by the sql_require_primary_key system variable. You may set sql_require_primary_key=ON for a server instance where Group Replication is running, and you may set the REQUIRE_TABLE_PRIMARY_KEY_CHECK option of the CHANGE REPLICATION SOURCE TO | CHANGE MASTER TO statement to ON for a Group Replication channel. However, be aware that you might find some transactions that are permitted under Group Replication's built-in checks are not permitted under the checks carried out when you set sql_require_primary_key=ON or REQUIRE_TABLE_PRIMARY_KEY_CHECK=ON.
What you think should happen instead
The migration should succeed without any issues.
How to reproduce
Perform airflow db upgrade to 2.3.0 with MySQL in group replication mode.
Operating System
Redhat
Versions of Apache Airflow Providers
No response
Deployment
Other Docker-based deployment
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct