Pick worker with lowest memory use by percentage, not absolute

Currently the `worker_objective` function uses worker managed memory as a tiebreaker if it looks like a task will start in the same amount of time on multiple workers: https://github.com/dask/distributed/blob/00bf8ed477ea1bbce5b0a4d33645351b375606f2/distributed/scheduler.py#L3236

In a heterogeneous cluster, this means we might pick a small worker with less memory available instead of a large worker with lots of memory available, but more total data in memory.

Maybe we should compare by percentage of memory used, rather than total bytes used:

```diff
diff --git a/distributed/scheduler.py b/distributed/scheduler.py
index eb5828bf..5325af4b 100644
--- a/distributed/scheduler.py
+++ b/distributed/scheduler.py
@@ -3233,7 +3233,7 @@ class SchedulerState:
         if ts.actor:
             return (len(ws.actors), start_time, ws.nbytes)
         else:
-            return (start_time, ws.nbytes)
+            return (start_time, ws.nbytes / ws.memory_limit)
 
     def add_replica(self, ts: TaskState, ws: WorkerState):
         """Note that a worker holds a replica of a task with state='memory'"""
```

https://github.com/dask/distributed/pull/7248 does this for root tasks when queuing is enabled. I think it would make sense to do in all cases though.

cc @fjetter @crusaderky

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pick worker with lowest memory use by percentage, not absolute #7266

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pick worker with lowest memory use by percentage, not absolute #7266

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions