Cleanup pending pods on scale down #817

BitTheByte · 2023-09-16T14:03:52Z

Currently, the operator retires workers using the HTTP or RPC APIs however those only control the connected dask workers, the operator should take into count dask's Kubernetes worker pods that are in a pending state as those will cause a useless Kubernetes cluster scale-up and then connect to dask and get retired thus a scale down should retire active workers and prevent pending pods from entering running state

jacobtomlinson · 2023-09-18T13:46:48Z

Agreed.

We could add a check here for any Pods that aren't in a Running phase and delete those before calling retire_workers (if that's even necessary any more).

dask-kubernetes/dask_kubernetes/operator/controller/controller.py

Lines 599 to 600 in 92714da

    
           if workers_needed < 0: 
        
               worker_ids = await retire_workers(

BitTheByte · 2023-09-18T13:58:23Z

Looks good to me, we should also subtract pending workers from the number of workers passed to retire_workers

jacobtomlinson added enhancement help wanted operator labels Sep 18, 2023

BitTheByte mentioned this issue Sep 18, 2023

Retire pending workers #820

Closed

BitTheByte mentioned this issue Mar 20, 2024

Retire pending workers #876

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup pending pods on scale down #817

Cleanup pending pods on scale down #817

BitTheByte commented Sep 16, 2023 •

edited

Loading

jacobtomlinson commented Sep 18, 2023

BitTheByte commented Sep 18, 2023 •

edited

Loading

Cleanup pending pods on scale down #817

Cleanup pending pods on scale down #817

Comments

BitTheByte commented Sep 16, 2023 • edited Loading

jacobtomlinson commented Sep 18, 2023

BitTheByte commented Sep 18, 2023 • edited Loading

BitTheByte commented Sep 16, 2023 •

edited

Loading

BitTheByte commented Sep 18, 2023 •

edited

Loading