[AIRFLOW-6921] Fetch celery states in bulk by mik-laj · Pull Request #7542 · apache/airflow

mik-laj · 2020-02-26T09:24:35Z

Before:
For each task, we send n query
After:
When we use BaseKeyValueStoreBackend or DatabaseBackend, we send only one query
When we use other backends, we send n queries (old behavior)

This change will be particularly significant if someone is using a backend that is on a different node.

Issue link: AIRFLOW-6921

Make sure to mark the boxes below before creating PR: [x]

Description above provides context of the change
Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID^*
Unit tests coverage for changes (not needed for documentation changes)
Commits follow "How to write a good git commit message"
Relevant documentation is updated including usage instructions.
I will engage committers as explained in Contribution Workflow Example.

^* For document-only changes commit message can start with [AIRFLOW-XXXX].

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

stale · 2020-04-23T04:32:52Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mik-laj · 2020-05-06T00:47:42Z

CC: @KevinYang21 @saguziel Can you look at this?

turbaszek · 2020-05-06T11:00:28Z

airflow/executors/celery_executor.py

Suggested change

exception_traceback = "Celery Task ID: {}\n{}".format(async_result, traceback.format_exc())

exception_traceback = f"Celery Task ID: {async_result}\n{traceback.format_exc()}"

airflow/executors/celery_executor.py

KevinYang21

Elegant solution, nice. Do we have any performance data also to share?

mik-laj · 2020-05-11T07:39:25Z

Backend: <class 'celery.backends.database.DatabaseBackend'> (Postgres)
My solution;

Loop time: 33.211 ms
Loop time: 16.434 ms
Loop time: 33.305 ms
Loop time: 36.204 ms
Loop time: 19.188 ms
Average time: 27.825 ms

Multiprocessing:

Loop time: 3520.833 ms
Loop time: 3015.769 ms
Loop time: 3014.007 ms
Loop time: 3008.050 ms
Loop time: 3011.210 ms
Average time: 3114.103 ms

Diff: -3087 ms (114 00%)

Backend: <class 'celery.backends.redis.RedisBackend'>
My solution:

Loop time: 4.023 ms
Loop time: 1.918 ms
Loop time: 5.565 ms
Loop time: 1.459 ms
Loop time: 1.200 ms
Average time: 3.417 ms

Multiprocessing:

Loop time: 1144.165 ms
Loop time: 686.941 ms
Loop time: 688.492 ms
Loop time: 691.988 ms
Loop time: 680.671 ms
Average time: 778.577 ms

Diff: -775 ms (258 33 %)

Details

import time

from airflow.executors.celery_executor import BulkStateFetcher, execute_command, app


def create_tasks(task_count: int):
    tasks = [["airflow", "version"] for _ in range(task_count)]
    async_tasks = []

    for task in tasks:
        async_tasks.append(execute_command.apply_async(args=[task]))
    time.sleep(1)
    return async_tasks


from perf_kit.repeat_and_time import timing, repeat
REPEAT_COUNT = 5


print("Backend: ", type(app.backend))
tasks = create_tasks(5000)


@timing(REPEAT_COUNT)
@repeat(REPEAT_COUNT)
@timing()
def case():
    fetcher = BulkStateFetcher(sync_parralelism=16)
    fetcher.get_many({})


case()

@timing(REPEAT_COUNT)
@repeat(REPEAT_COUNT)
@timing()
def case():
    fetcher = BulkStateFetcher(sync_parralelism=16)
    fetcher._get_many_using_multiprocessing(tasks)


case()

mik-laj · 2020-05-11T08:25:29Z

This is a test when all components are local. In a real environment, network latency has a big impact on the result because many round-trips. I mainly try to limit the number of queries to limit the number of round-trips.

turbaszek · 2020-05-11T08:37:52Z

@mik-laj big kudos for this change! Should we merge it? :)

mik-laj · 2020-05-11T08:43:50Z

Yes. I did rebase today and i waited for the result.

auvipy · 2020-05-11T08:49:33Z

great job!

mik-laj · 2020-05-11T08:56:35Z

Result for 500 tasks (A more common case):
Backend: <class 'celery.backends.database.DatabaseBackend'> (Postgres)

My solution:

Loop time: 39.800 ms
Loop time: 16.189 ms
Loop time: 15.799 ms
Loop time: 14.845 ms
Loop time: 19.363 ms
Average time: 21.314 ms

Multiprocessing:

Loop time: 500.316 ms
Loop time: 472.745 ms
Loop time: 488.694 ms
Loop time: 494.459 ms
Loop time: 495.087 ms
Average time: 490.514 ms

Diff: -469 ms (22 33%)

Backend: <class 'celery.backends.redis.RedisBackend'>
My solution:

Loop time: 4.843 ms
Loop time: 0.943 ms
Loop time: 1.421 ms
Loop time: 0.757 ms
Loop time: 1.215 ms
Average time: 2.051 ms

Multiprocessing:

Loop time: 290.430 ms
Loop time: 168.682 ms
Loop time: 269.204 ms
Loop time: 285.174 ms
Loop time: 276.070 ms
Average time: 258.048 ms

Diff: -256 ms (128 00%)

mik-laj · 2020-05-11T10:25:53Z

For 1 task:
Backend: <class 'celery.backends.redis.RedisBackend'>
My solution:

Loop time: 2.100 ms
Loop time: 0.722 ms
Loop time: 0.871 ms
Loop time: 0.589 ms
Loop time: 0.877 ms
Average time: 1.162 ms

Multiprocessing

Loop time: 130.874 ms
Loop time: 109.608 ms
Loop time: 120.918 ms
Loop time: 114.716 ms
Loop time: 116.159 ms
Average time: 118.662 ms

Diff: 117 ms

Backend: <class 'celery.backends.database.DatabaseBackend'> (Postgres)
My solution:

Loop time: 36.081 ms
Loop time: 17.839 ms
Loop time: 16.895 ms
Loop time: 18.450 ms
Loop time: 20.996 ms
Average time: 22.199 ms

Multiprocessing:

Loop time: 266.681 ms
Loop time: 116.485 ms
Loop time: 114.453 ms
Loop time: 111.432 ms
Loop time: 111.997 ms
Average time: 144.389 ms

Diff: -122 ms

ashb · 2020-05-11T10:37:38Z

Nice one, using multiprocessing pool there always felt a bit wrong.

ashb · 2020-05-11T10:46:09Z

@auvipy speaking as a celery dev: everything here is using a "public" API right? You don't see any long-term problems with this?

(It looks all good to my eyes, but I'm not familiar with internals of Celery)

boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Feb 26, 2020

mik-laj force-pushed the AIRFLOW-6921 branch 6 times, most recently from 9538e8a to 9377e21 Compare February 27, 2020 12:48

ashb requested review from KevinYang21 and saguziel February 27, 2020 13:50

mik-laj added the area:performance label Feb 29, 2020

mik-laj force-pushed the AIRFLOW-6921 branch 2 times, most recently from 490f219 to 0c2a1e1 Compare March 1, 2020 00:16

mik-laj force-pushed the AIRFLOW-6921 branch from 0c2a1e1 to e4b3044 Compare March 8, 2020 17:16

turbaszek mentioned this pull request Mar 18, 2020

update celery to 4.4.2 #7754

Merged

6 tasks

stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 23, 2020

mik-laj added the pinned Protect from Stalebot auto closing label Apr 23, 2020

stale bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 23, 2020

mik-laj force-pushed the AIRFLOW-6921 branch 4 times, most recently from 38b6b25 to 4a6d2ed Compare May 5, 2020 08:45

auvipy approved these changes May 5, 2020

View reviewed changes

mik-laj changed the title ~~[AIRFLOW-6921][WIP] Fetch celery states in bulk~~ [AIRFLOW-6921] Fetch celery states in bulk May 6, 2020

auvipy approved these changes May 6, 2020

View reviewed changes

turbaszek reviewed May 6, 2020

View reviewed changes

airflow/executors/celery_executor.py Outdated Show resolved Hide resolved

turbaszek approved these changes May 6, 2020

View reviewed changes

KevinYang21 approved these changes May 8, 2020

View reviewed changes

Kamil Breguła added 5 commits May 11, 2020 09:46

[AIRFLOW-6921] Fetch celery states in bulk

ee872cb

fixup! [AIRFLOW-6921] Fetch celery states in bulk

7c52b94

fixup! fixup! [AIRFLOW-6921] Fetch celery states in bulk

b9a16b9

fixup! fixup! fixup! [AIRFLOW-6921] Fetch celery states in bulk

9551edd

fixup! fixup! fixup! fixup! [AIRFLOW-6921] Fetch celery states in bulk

6730ff9

mik-laj force-pushed the AIRFLOW-6921 branch from 3bb76a5 to 6730ff9 Compare May 11, 2020 07:46

mik-laj merged commit 5f3774a into apache:master May 11, 2020

mik-laj deleted the AIRFLOW-6921 branch May 11, 2020 08:44

mik-laj mentioned this pull request May 13, 2020

Send task to queue in bulk - Celery Executor #8854

Open

dirrao mentioned this pull request Oct 16, 2023

Scheduler is spending most of its time in clear_not_launched_queued_tasks function #34877

Closed

2 tasks

potiuk mentioned this pull request May 8, 2024

Fix: Reuse ProcessPoolExecutor in CeleryExecutor #39484

Closed

	exception_traceback = "Celery Task ID: {}\n{}".format(async_result, traceback.format_exc())
	exception_traceback = f"Celery Task ID: {async_result}\n{traceback.format_exc()}"

Conversation

mik-laj commented Feb 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stale bot commented Apr 23, 2020

Uh oh!

mik-laj commented May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

turbaszek May 6, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KevinYang21 left a comment

Choose a reason for hiding this comment

Uh oh!

mik-laj commented May 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mik-laj commented May 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

turbaszek commented May 11, 2020

Uh oh!

mik-laj commented May 11, 2020

Uh oh!

auvipy commented May 11, 2020

Uh oh!

mik-laj commented May 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mik-laj commented May 11, 2020

Uh oh!

ashb commented May 11, 2020

Uh oh!

ashb commented May 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mik-laj commented Feb 26, 2020 •

edited

Loading

mik-laj commented May 6, 2020 •

edited

Loading

mik-laj commented May 11, 2020 •

edited

Loading

mik-laj commented May 11, 2020 •

edited

Loading

mik-laj commented May 11, 2020 •

edited

Loading

ashb commented May 11, 2020 •

edited

Loading