-
Notifications
You must be signed in to change notification settings - Fork 16.6k
Description
Apache Airflow version
2.2.4 (latest released)
What happened
Hello Eeveryone,
I am creating 6 instances of airflow tasks to run in parallel. however most of the times one or two of them fail without so many logs explaining the reason!
sim_num = 6
sim_all_paths = []
for i in range(sim_num):
sim_single_path = ECSOperator(
task_id="sim_single_path_" + str(i),
task_definition_id="airflow-python-jobs",
command=PythonCommand(
module="jobs.inventory_optimization.sim_single_path",
),
container_config=EC2ContainerConfig.x_large,
execution_timeout=timedelta(hours=15),
task_role="tai_main_role"
)
sim_all_paths.append(sim_single_path)
The error in the logs say: airflow.providers.amazon.aws.exceptions.ECSOperatorError: {'tasks': [], 'failures': [{'arn': 'arn:aws:ecs:eu-west-1:xxxx:container-instance/zzzzz', 'reason': 'RESOURCE:CPU'}], 'ResponseMetadata': {'RequestId': 'xxxx', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'xxxxx', 'content-type': 'application/x-amz-json-1.1', 'content-length': '146', 'date': 'Mon, 14 Mar 2022 16:18:11 GMT'}, 'RetryAttempts': 0}}
Although when i checked Container Instance i found that cpu and memory Registered values are same as available. and no errors in the task logs.
Things i already tried:
- upgrade to airflow 2.2.4
- added "AIRFLOW__CORE__KILLED_TASK_CLEANUP_TIME": "604800" to airflow env variables.
Any idea how to tackle this issue?
Thanks in advance.
What you expected to happen
No response
How to reproduce
No response
Operating System
Linux
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct