Skip to content

Duplicate entry with MySQL backend (AriFlow 2.0.0) #13925

@gedOHub

Description

@gedOHub

Apache Airflow version: 2.0.0

Kubernetes version: -

Environment:

  • Two servers running docker
  • Docker image: apache/airflow:2.0.0-python3.7
  • Each server contains:
    • webserver
    • scheduler
    • celeny worker 1
    • celeny worker 2
    • celeny flower
  • Broker- redis version 6.0.5
  • DB backend - MySQL version 8.0.20

What happened:

Error message:

[...] {taskinstance.py:1038} INFO - Executing <Task(BashOperator): xxx.yyy> on ...
[...] {standard_task_runner.py:51} INFO - Started process 4245 to run task
[...] {standard_task_runner.py:75} INFO - Running: ['airflow', 'tasks', 'run', 'zzz', 'xxx.yyy', '...', '--job-id', '8039', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/ccc/ccc.zip', '--cfg-path', '/tmp/tmpq2aaznri']
[...] {standard_task_runner.py:76} INFO - Job 8039: Subtask xxx.yyy
[...] {logging_mixin.py:103} INFO - Running <TaskInstance: zzz.xxx.yyy ... [running]> on host HOST
[...] {taskinstance.py:1396} ERROR - (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry 'zzz-xxx.can_run_tasks_loca' for key 'rendered_task_instance_fields.PRIMARY'")
[SQL: INSERT INTO rendered_task_instance_fields (dag_id, task_id, execution_date, rendered_fields, k8s_pod_yaml) VALUES (%s, %s, %s, %s, %s)]
[parameters: ('zzz', 'xxx.yyy', datetime.datetime(..., ..., ..., ..., ...), '{"bash_command": "mkdir -p ~/eee && rm -rf ~/eee", "env": null}', 'null')]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
    cursor, statement, parameters, context
  File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
  File "/home/airflow/.local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 255, in execute
    self.errorhandler(self, exc, value)
  File "/home/airflow/.local/lib/python3.7/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
    raise errorvalue
  File "/home/airflow/.local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 252, in execute
    res = self._query(query)
  File "/home/airflow/.local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 378, in _query
    db.query(q)
  File "/home/airflow/.local/lib/python3.7/site-packages/MySQLdb/connections.py", line 280, in query
    _mysql.connection.query(self, query)
_mysql_exceptions.IntegrityError: (1062, "Duplicate entry 'zzz-xxx.yyy' for key 'rendered_task_instance_fields.PRIMARY'")

What you expected to happen:

Run DAG without failing

How to reproduce it:

Use MySQL backend?

Anything else we need to know:

This problem occurs once in a while. I haven't found any pattern.

Maybe this related to #9148 issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    affected_version:2.0Issues Reported for 2.0area:Schedulerincluding HA (high availability) schedulerkind:bugThis is a clearly a bugpending-responsestaleStale PRs per the .github/workflows/stale.yml policy file

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions