-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Open
Labels
Stale Bug Reportarea:providerskind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yetprovider:openlineageAIP-53AIP-53
Description
Apache Airflow Provider(s)
openlineage
Versions of Apache Airflow Providers
1.14.0
Also seeing missing start DAG events for versions <= 1.12.0. However, those versions weren't logging the exception, making it difficult to determine if this is the same issue.
Apache Airflow version
2.10.1
Operating System
Amazon Linux
Deployment
Amazon (AWS) MWAA
Deployment details
MWAA with:
- requirements.txt with
apache-airflow-providers-openlineage==1.14.0 - startup.sh with
OPENLINEAGE_URLpointing to a webserver logging all received requests
What happened
OpenLineage provider failed to send some DAG start events, with the following exception in the scheduler logs:
[2024-12-17T00:44:00.564+0000] {listener.py:528} WARNING - Failed to submit method to executor
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <functools._lru_cache_wrapper object at 0x7fb8f1c02980>:
it's not the same object as airflow.models.abstractoperator.AbstractOperator.get_parse_time_mapped_ti_count
"""
What you think should happen instead
No response
How to reproduce
The failures to send events were non-deterministic and appear to be caused by a race condition. They seem to occur more frequently when multiple DAGs are being scheduled simultaneously.
I used this code to reproduce the issue, and it failed to send at least one DAG start almost every minute.
for i in range(4):
with DAG(
f'frequent_dag_{i}',
schedule_interval=timedelta(minutes=1),
start_date=days_ago(1),
catchup=False,
) as dag:
def task():
print("Task is running")
task = PythonOperator(
task_id=f'print_task_{i}',
python_callable=task,
dag=dag,
)Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
MassyB
Metadata
Metadata
Assignees
Labels
Stale Bug Reportarea:providerskind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yetprovider:openlineageAIP-53AIP-53