Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery autoscaling overrides normal worker_concurrency setting #8480

Closed
jsmodic opened this issue Apr 20, 2020 · 18 comments
Closed

Celery autoscaling overrides normal worker_concurrency setting #8480

jsmodic opened this issue Apr 20, 2020 · 18 comments
Assignees
Labels
kind:bug This is a clearly a bug

Comments

@jsmodic
Copy link

jsmodic commented Apr 20, 2020

Airflow 1.10.9
Celery 4.3.0

Setting Celery 'worker_concurrency' is overriden by the autoscaling no matter what you configure.

Problematic code is here:

'autoscale': autoscale,

(this is head but it's the same bug in the 1.10.x version of this file)

You always get the default autoscaling setting with head if you don't set autoscaling at all but do set worker concurrency (which is 12 workers).

Not adding "autoscale" to the dictionary when autoscaling is not set is enough to solve this (at least, I tested the worker_concurrency half of things, and the asked for concurrency is back with that change).

@jsmodic jsmodic added the kind:bug This is a clearly a bug label Apr 20, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 20, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@eeshugerman
Copy link

I just ran into this too. Airflow 1.10.9 / Celery 4.4.0.

This may be unrelated, but I also noticed that the worker wasn't actually autoscaling: worker_autoscale defaults to 16,12 (max,min), but concurrency stayed put at 12, even when there were tasks waiting with all dependencies met.

@jsmodic
Copy link
Author

jsmodic commented Apr 21, 2020

Autoscaling isn't supposed to be supported in Celery 4.x, so I wouldn't be surprised to see it behaving erratically anyways. I just was surprised to see it always defaulting to 12 workers even with "-c 1" 🤔

@turbaszek
Copy link
Member

turbaszek commented Apr 21, 2020

Duplicate of #8044

@jsmodic you are right, autoscaling is weird in Celery in 4.x and there are issues reporting this problem in Celery itself. If you want to perform production-grade autoscaling and you are using k8s cluster this may be interesting to you: https://www.astronomer.io/blog/the-keda-autoscaler/

Indeed -c 1 spawns 12 workers in 1.10.9 ... I tested it on master and it seems to be fixed (-c n spawns main celery process and n child processes). Same on 1.10.10:

root       412 48.3  2.0 660108 123128 pts/0   Sl+  14:07   0:02 [celeryd: celery@3715dff2587f:MainProcess] -active- (worker -c 2)
root       422 50.5  1.7 374028 107824 pts/0   S+   14:07   0:02 /usr/local/bin/python /usr/local/bin/airflow serve_logs
root       426  0.0  1.5 383524 92124 pts/0    S+   14:07   0:00 [celeryd: celery@3715dff2587f:ForkPoolWorker-1]
root       427  0.0  1.5 383528 92124 pts/0    S+   14:07   0:00 [celeryd: celery@3715dff2587f:ForkPoolWorker-2]

@jsmodic if you think we can close the issue - please do :)

@turbaszek
Copy link
Member

@dimberman @ashb just an idea: maybe we should remove / deprecated the "celery autoscaling" option?

@ashb
Copy link
Member

ashb commented Apr 21, 2020

Isn't there no worker_autoscale option set by default:

# The maximum and minimum concurrency that will be used when starting workers with the
# ``airflow celery worker`` command (always keep minimum processes, but grow
# to maximum if necessary). Note the value should be max_concurrency,min_concurrency
# Pick these numbers based on resources on worker box and the nature of the task.
# If autoscale option is available, worker_concurrency will be ignored.
# http://docs.celeryproject.org/en/latest/reference/celery.bin.worker.html#cmdoption-celery-worker-autoscale
# Example: worker_autoscale = 16,12
# worker_autoscale =

So where is this default coming from?

@turbaszek
Copy link
Member

So where is this default coming from?

I will take a closer look, but as reported on 1.10.9 changing concurrency of worker from cli level doesn't work so it's possible that the autoscaling option is also corrupted.

@jsmodic
Copy link
Author

jsmodic commented Apr 21, 2020

@turbaszek what celery version are you using? On 1.10.9 it starts up like this for me on Celery 4.3.0 with no autoscaling configured anywhere.

From Celery's stdout on startup:

[config]
app: airflow.executors.celery_executor:0x7fe57b350be0
transport: {my_transport}
results: {my_db}
concurrency: {min=12, max=16} (prefork)
task events: OFF (enable -E to monitor tasks in this worker)

On the machine:

ps aux | grep celery
[celeryd: celery@ip-my-ip:MainProcess] -active- (worker -q queue -c 4)
[celeryd: celery@ip-my-ip:ForkPoolWorker-1]
[celeryd: celery@ip-my-ip:ForkPoolWorker-2]
[celeryd: celery@ip-my-ip:ForkPoolWorker-3]
[celeryd: celery@ip-my-ip:ForkPoolWorker-4]
[celeryd: celery@ip-my-ip:ForkPoolWorker-5]
[celeryd: celery@ip-my-ip:ForkPoolWorker-6]
[celeryd: celery@ip-my-ip:ForkPoolWorker-7]
[celeryd: celery@ip-my-ip:ForkPoolWorker-8]
[celeryd: celery@ip-my-ip:ForkPoolWorker-9]
[celeryd: celery@ip-my-ip:ForkPoolWorker-10]
[celeryd: celery@ip-my-ip:ForkPoolWorker-11]
[celeryd: celery@ip-my-ip:ForkPoolWorker-12]

There's only 4 cores on that machine, that's what triggered me digging into how there could possibly be 12 workers.

@turbaszek
Copy link
Member

@turbaszek what celery version are you using? On 1.10.9 it starts up like this for me on Celery 4.3.0 with no autoscaling configured anywhere.

I am testing this in breeze, and I've got celery==4.4.2.

There's only 4 cores on that machine, that's what triggered me digging into how there could possibly be 12 workers.

Of course, you can have more processes than cores. But it doesn't mean that everything will be executed at the same time :)

@eeshugerman
Copy link

eeshugerman commented Apr 22, 2020

So where is this default coming from?

Well there's this...

- name: worker_autoscale
description: |
The maximum and minimum concurrency that will be used when starting workers with the
``airflow celery worker`` command (always keep minimum processes, but grow
to maximum if necessary). Note the value should be max_concurrency,min_concurrency
Pick these numbers based on resources on worker box and the nature of the task.
If autoscale option is available, worker_concurrency will be ignored.
http://docs.celeryproject.org/en/latest/reference/celery.bin.worker.html#cmdoption-celery-worker-autoscale
version_added: ~
type: string
example: 16,12
default: ~

...but from what I can tell that file is only used for documentation, right?

I can't find anything about it in the Celery source either 🤔

@turbaszek
Copy link
Member

...but from what I can tell that file is only used for documentation, right?

Not exactly, this file is used to generate default_airflow.cfg :)

@jsmodic
Copy link
Author

jsmodic commented Apr 23, 2020

My airflow.cfg (and celery_config.py) doesn't reference autoscaling at all (since I haven't updated them really since 1.10.2 or so, before this feature landed in airflow). It's interesting you can't replicate the behavior though. When I test it with my config with and without 'autoscale' in the dictionary, it will show both the bad and then expected behavior.

(I'm not doing a very scientific test since I can't easily setup a clean environment, I'm editing the dist-package directly on a development machine)

@turbaszek
Copy link
Member

It's interesting you can't replicate the behavior though.

@jsmodic I was able to verify that -w flag doesn't work on 1.10.9. I will take a deeper look at autoscaling over the weekend 👌

@nadflinn
Copy link
Contributor

nadflinn commented May 3, 2020

The autoscale config is commented out since 1.10.10 which is a change from 1.10.9 where it isn't and I think this accounts for the difference in the behavior that people are seeing.
https://github.com/apache/airflow/blob/1.10.9/airflow/config_templates/default_airflow.cfg#L480
https://github.com/apache/airflow/blob/1.10.10/airflow/config_templates/default_airflow.cfg#L516

Based on the comment above the setting, it makes sense that you wouldn't be able to set worker concurrency in 1.10.9:
If autoscale option is available, worker_concurrency will be ignored

@jsmodic
Copy link
Author

jsmodic commented May 3, 2020

As far as I tested it should be enough to remove 'autoscale' from the dictionary to be unpacked into Celery if autoscaling isn't configured in the airflow.cfg. But I haven't confirmed that works on other versions of Celery or how it behaves if you actually desire autoscaling.

@nadflinn
Copy link
Contributor

nadflinn commented May 4, 2020

@jsmodic from what I'm understanding you're editing this in the default configuration in airflow. From my understanding of it, I think without actually editing the package itself you are going to have trouble avoiding autoscale in 1.10.9 because it is set as a default. Given it is commented out in the package default config default_airflow.cfg as of 1.10.10, it is now up to the user whether they want to include it via airflow.cfg or by other means.

@nadflinn
Copy link
Contributor

nadflinn commented May 4, 2020

Following up on this, it's not ideal, but if in 1.10.9 you want to set a specific worker process count to, for example, 8 workers then just include:
worker_autoscale = 8,8
in your airflow.cfg.

As @eeshugerman pointed out, auto-scaling is not working so even if you set the upper bound to be something higher you would still sit at 8 workers. A caveat to this, I dug into auto-scaling not working and I think autoscaling is only broken in celery with the combination of settings: worker_prefetch_multipler=1 and task_acks_late=True
which are the settings in airflow. I submitted a patch to Celery that should fix this.

@nadflinn
Copy link
Contributor

nadflinn commented May 9, 2020

@turbaszek I think this can be closed, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

5 participants