[AIRFLOW-6012] - performance improvement to end task result queue when k8 labels do not need scrubbing #6602
[AIRFLOW-6012] - performance improvement to end task result queue when k8 labels do not need scrubbing #6602wyndhblb wants to merge 2 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6602 +/- ##
==========================================
- Coverage 83.69% 83.48% -0.21%
==========================================
Files 650 650
Lines 37427 37429 +2
==========================================
- Hits 31325 31249 -76
- Misses 6102 6180 +78
Continue to review full report at Codecov.
|
XD-DENG
left a comment
There was a problem hiding this comment.
The logic of your change seems incorrect to me.
Let's say the dag_id/task_id in your labels is "randomdag"/"randomtask", surely they will pass the check you added (_make_safe_label_value("randomdag") will return "randomdag"), and return directly. But this DAG/Task don't necessarily exist in DB.
Correct me if I'm wrong.
|
Yes that is correct, and is the current behavior in the "main release" (1.10.6). should https://github.com/apache/airflow/blob/master/airflow/executors/kubernetes_executor.py#L781 We have experienced ~20min speed up in general scheduling with this small change, having hundreds of tasks that complete every 10-15min makes that DB loop prohibitively expensive for every done task. |
|
Please give this PR a more descriptive title - they form part of our changelog and this doesn't give enough info to say performance what/where.
|
|
possible duplicate of #6340 |
|
Closing this in favour of #6340. |
Make sure you have checked all steps below.
Jira
Description
For hundreds of tasks finishing, this is a very long process (lots of DB calls and a large loop).
We can make the loop exit quickly if we don't need to check for the label safety.
Tests
Commits
Documentation