Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use CREATE TABLE AS SELECT ... with mySQL #19999

Merged
merged 2 commits into from
Dec 3, 2021

Conversation

SamWheating
Copy link
Contributor

Closes: #19988

Splitting this CREATE TABLE AS SELECT query into two queries (CREATE TABLE LIKE .. followed by INSERT INTO) because the former doesn't play nicely with MySQL.

This PR is inherently difficult to test/validate, but I ran the proposed queries against the metadata database in our development environment (where we first ran into this issue) and was able to replicate the issue and verify the fix:

// replicating the issue:

MySQL [airflow] > create table _airflow_moved__2_2__task_instance as select source.* from task_instance as source;
ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.

// Validating that splitting the query fixes the issue

MySQL [airflow] > create table _airflow_moved__2_2__task_instance like task_instance;
Query OK, 0 rows affected (0.063 sec)

MySQL [airflow] > INSERT INTO _airflow_moved__2_2__task_instance select source.* from task_instance as source left join dag_run as dr on (source.dag_id = dr.dag_id and source.execution_date = dr.execution_date) where dr.id is null;
Query OK, 22900 rows affected (2.110 sec)

Tomorrow I can create a patched image with these changes and validate the fix in our development environment.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk potiuk added this to the Airflow 2.2.3 milestone Dec 3, 2021
@github-actions
Copy link

github-actions bot commented Dec 3, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Dec 3, 2021
@uranusjr uranusjr requested a review from ashb December 3, 2021 07:21
@ashb
Copy link
Member

ashb commented Dec 3, 2021

Can you define "doesn't play nicely with"? Cos in my testing it did work (but I didn't test mysql beyond one or two rows.

Duh, read the full error message Ash.

@ashb
Copy link
Member

ashb commented Dec 3, 2021

Ah, This doesn't play nicely when replication is turned on.

@ashb
Copy link
Member

ashb commented Dec 3, 2021

I am amazed at MySQLs ability to find new ways to make me sad.

airflow/utils/db.py Outdated Show resolved Hide resolved
@ashb ashb merged commit 4996501 into apache:main Dec 3, 2021
@potiuk
Copy link
Member

potiuk commented Dec 3, 2021

I am amazed at MySQLs ability to find new ways to make me sad.

Very, very, very much so. I expressed my feelings on MySQL today at the talk I gave at Data Science Summit a bit after I found out that about half of the audience used it, unfortunately. I hope that brings more people to Postgres.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrations fail due to GTID Consistency Violation when using GTID Replication
4 participants