Skip to content

[AIRFLOW-6012] - performance improvement to end task result queue when k8 labels do not need scrubbing #6602

Closed
wyndhblb wants to merge 2 commits intoapache:masterfrom
asappinc:k8-clean-up-improvement
Closed

[AIRFLOW-6012] - performance improvement to end task result queue when k8 labels do not need scrubbing #6602
wyndhblb wants to merge 2 commits intoapache:masterfrom
asappinc:k8-clean-up-improvement

Conversation

@wyndhblb
Copy link
Contributor

@wyndhblb wyndhblb commented Nov 18, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-XXX
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

For hundreds of tasks finishing, this is a very long process (lots of DB calls and a large loop).

We can make the loop exit quickly if we don't need to check for the label safety.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@codecov-io
Copy link

Codecov Report

Merging #6602 into master will decrease coverage by 0.2%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6602      +/-   ##
==========================================
- Coverage   83.69%   83.48%   -0.21%     
==========================================
  Files         650      650              
  Lines       37427    37429       +2     
==========================================
- Hits        31325    31249      -76     
- Misses       6102     6180      +78
Impacted Files Coverage Δ
airflow/executors/kubernetes_executor.py 58.71% <0%> (-0.29%) ⬇️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 77.14% <0%> (-21.43%) ⬇️
airflow/configuration.py 89.13% <0%> (-3.63%) ⬇️
airflow/jobs/scheduler_job.py 74.96% <0%> (+1.19%) ⬆️
airflow/utils/dag_processing.py 58.48% <0%> (+2.3%) ⬆️
airflow/executors/__init__.py 67.34% <0%> (+4.08%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd6a950...6acf15c. Read the comment docs.

Copy link
Member

@XD-DENG XD-DENG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of your change seems incorrect to me.

Let's say the dag_id/task_id in your labels is "randomdag"/"randomtask", surely they will pass the check you added (_make_safe_label_value("randomdag") will return "randomdag"), and return directly. But this DAG/Task don't necessarily exist in DB.

Correct me if I'm wrong.

@wyndhblb
Copy link
Contributor Author

Yes that is correct, and is the current behavior in the "main release" (1.10.6).

should https://github.com/apache/airflow/blob/master/airflow/executors/kubernetes_executor.py#L781
fail to find the pod/tasks in it's queue, the KeyError will trap it, and simply move on.

We have experienced ~20min speed up in general scheduling with this small change, having hundreds of tasks that complete every 10-15min makes that DB loop prohibitively expensive for every done task.

@ashb
Copy link
Member

ashb commented Nov 22, 2019

Please give this PR a more descriptive title - they form part of our changelog and this doesn't give enough info to say performance what/where.

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@wyndhblb wyndhblb changed the title [AIRFLOW-6012] - performance improvement to cleanup [AIRFLOW-6012] - performance improvement to end task result queue when k8 labels do not need scrubbing Nov 22, 2019
@mik-laj mik-laj added the k8s label Nov 25, 2019
@kaxil kaxil requested a review from dimberman December 3, 2019 00:01
@adityav
Copy link
Contributor

adityav commented Dec 9, 2019

possible duplicate of #6340

@ashb
Copy link
Member

ashb commented Dec 12, 2019

Closing this in favour of #6340.

@ashb ashb closed this Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants