-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
FAQ Documentation for max_threads currently reads as follows:
max_threads: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by max_threads with default value of 2. User should increase this value to a larger value (e.g numbers of cpus where scheduler runs - 1) in production.
The example above creates confusion in the minds of new developers as it is incorrectly understood as the maximum number of threads for the scheduler cannot exceed the number of cpus - 1. I have seen many Airflow installations where the value is setup as max number of cpus - 1, while the upper limit of threads should actually be determined by the size of the instance (CPU + Memory) onto which the scheduler is installed. Due to this misunderstanding, I've heard many new Airflow developers say that Airflow is very slow at scheduling DAGs. When I delve deeper into their config I see the max_threads configuration limited to the number of CPUs.
Kindly consider changing this to the below as follows -
max_threads: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by max_threads with default value of 2. User should increase this value to a larger value that fits the size of the installed hardware in production.