Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-4311] Remove sleep in LocalExecutor #5096

Merged
merged 1 commit into from Apr 14, 2019
Merged

[AIRFLOW-4311] Remove sleep in LocalExecutor #5096

merged 1 commit into from Apr 14, 2019

Conversation

BasPH
Copy link
Contributor

@BasPH BasPH commented Apr 14, 2019

The LocalExecutor currently sleeps after running a task. This is unnecessary and the time could be used more efficiently for running the next task. I wrote a performance test (not added in code, see below) which ran a set of dummy tasks much faster as the executor doesn't sleep in between tasks. (with parallelism=2, 1000 dummy tasks completed in 8 seconds vs 505 seconds)

I'm confused about the reason behind the sleeps. The very first Airflow commit added them, I'm guessing for debugging, and it hasn't been looked at ever since.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-4311
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

I did not add tests, however wrote one for testing the performance with and without changes:

def test_performance(self):
    executor = LocalExecutor(parallelism=2)
    executor.start()

    for i in range(1000):
        key, command = "test_{0}".format(i), ["true", "testing..."]
        executor.execute_async(key=key, command=command)
        executor.running[key] = True

    executor.end()

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@mik-laj
Copy link
Member

mik-laj commented Apr 14, 2019

It looks interesting. I recently noticed that LocalExecutor works very slowly. When I added tests of several DAGs in tests, they did over 50 minutes, so I received a timeout error on Travis.
PolideaInternal#77
I was to look at it in future.

I am definitely for this PR, but travis reports a error

======================================================================
55) FAIL: test_run_naive_taskinstance (tests.test_jobs.BackfillJobTest)
----------------------------------------------------------------------
   Traceback (most recent call last):
    tests/test_jobs.py line 749 in test_run_naive_taskinstance
      self.assertEqual(ti_dependent0.state, State.FAILED)
   AssertionError: None != 'failed'

   -------------------- >> begin captured stdout << ---------------------
   [2019-04-14 15:40:00,636] {test_task_view_type_check.py:52} INFO - class_instance type: <class 'unusual_prefix_61c0ab525b75d060ea41c0aa11a98c88efc72c26_test_task_view_type_check.CallableClass'>
   [2019-04-14 15:40:00,756] {test_task_view_type_check.py:52} INFO - class_instance type: <class 'unusual_prefix_61c0ab525b75d060ea41c0aa11a98c88efc72c26_test_task_view_type_check.CallableClass'>

   --------------------- >> end captured stdout << ----------------------
   -------------------- >> begin captured logging << --------------------
   root: INFO: class_instance type: <class 'unusual_prefix_61c0ab525b75d060ea41c0aa11a98c88efc72c26_test_task_view_type_check.CallableClass'>
   root: INFO: class_instance type: <class 'unusual_prefix_61c0ab525b75d060ea41c0aa11a98c88efc72c26_test_task_view_type_check.CallableClass'>
   --------------------- >> end captured logging << ---------------------

@BasPH
Copy link
Contributor Author

BasPH commented Apr 14, 2019

Think it's the flaky CI, the other jobs run successfully and the test also completes successfully locally and on my own Travis setup. I'll rerun.

@codecov-io
Copy link

Codecov Report

Merging #5096 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5096      +/-   ##
==========================================
+ Coverage   76.86%   76.86%   +<.01%     
==========================================
  Files         463      463              
  Lines       29799    29795       -4     
==========================================
- Hits        22905    22903       -2     
+ Misses       6894     6892       -2
Impacted Files Coverage Δ
airflow/executors/local_executor.py 81.05% <ø> (+1.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b53d99...f1dd17d. Read the comment docs.

@milton0825
Copy link
Contributor

@BasPH good findings.

@feluelle
Copy link
Member

feluelle commented Apr 14, 2019

If it is really that much of a difference (8 seconds vs 505 seconds) it would be so great to have this change 👍

the time could be used more efficiently for running the next task

Everyone needs a break even the LocalExecutor :D We have like hours of sleep and he cannot sleep a second :(

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Awesome job @BasPH

@Fokko Fokko merged commit 6cc7712 into apache:master Apr 14, 2019
Fokko pushed a commit to Fokko/incubator-airflow that referenced this pull request Apr 14, 2019
@BasPH BasPH deleted the bash-remove-sleep-localexecutor branch April 14, 2019 20:33
@ashb
Copy link
Member

ashb commented Apr 14, 2019

A sleep(0) might be still useful - I think from my C days that would give other processes a better chance to be scheduled, but not sleep for any time.

But it's late and I need sleep, so I could be making stuff up.

@ashb
Copy link
Member

ashb commented Apr 14, 2019

Given we exec a process it doesn't matter anyway. Nm

@Fokko
Copy link
Contributor

Fokko commented Apr 14, 2019

You mean the Java yield() :-)

cthenderson pushed a commit to cthenderson/apache-airflow that referenced this pull request Apr 16, 2019
andriisoldatenko pushed a commit to andriisoldatenko/airflow that referenced this pull request Jul 26, 2019
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
dharamsk pushed a commit to postmates/airflow that referenced this pull request Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants