Reduce memory usage of scheduler process - Optimize scheduler.py::TaskState class #8331

milesgranger · 2023-11-08T07:49:09Z

Part of #7998

The first approach in ef4b7dd moved things into descriptors, which would only assign a value when being accessed. This results in ~34% reduction in memory when compared to main. Downside was there is some boilerplace for each attribute requiring a lazy creation (albeit this could be generalized by prefixing each with __lazy_ for example then iterating on those annotations after initialization..)

Second approach, the current direction of this PR, uses optionals for more attributes in this class. Primarily those using set/dict. This results in ~42% reduction in memory when compared to main. Downside is much more changed code, checking for None everywhere and the tests are currently wildly broken. 😅

Snippet I'm using for this comparison to main:

from dask.datasets import timeseries
from dask.distributed import Client
from distributed.diagnostics.memray import memray_scheduler

def main():
    ddf = timeseries("2020", "2021", partition_freq='2h')
    ddf2 = ddf.shuffle(on="x", shuffle='tasks')

    print(len(ddf2.__dask_graph__()))

    with Client() as client:
        with memray_scheduler():
            fut = client.compute(ddf2.size)
            import time
            time.sleep(90)
        fut.cancel()

if __name__ == '__main__':
    main()

cc @fjetter

github-actions · 2023-11-08T10:40:48Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      27 files ±0       27 suites ±0 13h 41m 1s ⏱️ - 1h 13m 13s
  3 926 tests ±0   3 803 ✔️ +4   117 💤 ±0 6 ❌ - 4
49 391 runs ±0 46 970 ✔️ +5 2 414 💤 +1 7 ❌ - 6

For more details on these failures, see this check.

Results for commit 45447a6. ± Comparison against base commit 0dc9e88.

♻️ This comment has been updated with latest results.

milesgranger · 2023-11-08T11:59:10Z

Okay, @fjetter I think this is ready for a proper review then. Failing tests don't seem related, (ran most/all of those failing here locally fine as well).

distributed/active_memory_manager.py

distributed/shuffle/_scheduler_plugin.py

distributed/scheduler.py

fjetter · 2023-11-08T13:13:10Z

I think summarizing my review

If possible we should use assert foo is not None instead of if foo else () since it helps readability a lot. The scheduler is a messy, complex system and knowing for a fact that an attribute exists helps
there are a bunch of places where we're clearing existing dicts/sets. In the spirit of only keeping what's necessary, we can instead just reset them to None
There are a couple of very common sets that are used for every task and I think we should just keep them non-optional. This concerns primarily dependencies and dependents. They are all but guaranteed to be populated so we can initialize them right away and save ourselves a bit of complexity
There are a couple of sets that are going to be populated for every single task but only for a certain amount of time. This includes who_has, waiters, waiting_on. We will only save ourselves memory by delaying initialization if we ensure that the sets are actually released once those collections are no longer required (which is typically true once a task is in memory). See the above comment about clear

[skip ci] Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>

fjetter · 2023-11-09T13:08:40Z

distributed/scheduler.py

-        if self.scheduler.validate:
-            assert self not in ts.who_has
-            assert ts not in self.has_what


I changed add_replica to be idempotent so this validation is no longer valid

fjetter · 2023-11-10T14:28:59Z

test failures are all unrelated

distributed/active_memory_manager.py

Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>

fjetter · 2023-11-10T14:45:48Z

test failures are all unrelated

kind of sad considering all the failures...

milesgranger force-pushed the milesgranger/7998-update_graph-optimize branch from 02d1844 to eb69001 Compare November 8, 2023 07:51

Move most attrs to descriptors (about 35% memorys savings)

ef4b7dd

milesgranger force-pushed the milesgranger/7998-update_graph-optimize branch 4 times, most recently from 9ed69f1 to 67a80cf Compare November 8, 2023 09:20

Use optionals for TaskState where possible (~42% reduction)

f952b61

milesgranger force-pushed the milesgranger/7998-update_graph-optimize branch from 67a80cf to f952b61 Compare November 8, 2023 11:03

milesgranger marked this pull request as ready for review November 8, 2023 11:58

milesgranger requested review from jacobtomlinson and fjetter as code owners November 8, 2023 11:58

fjetter reviewed Nov 8, 2023

View reviewed changes

milesgranger and others added 11 commits November 8, 2023 14:33

Update distributed/active_memory_manager.py

ecc0c09

[skip ci] Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>

Update distributed/shuffle/_scheduler_plugin.py

ac0e6fd

[skip ci] Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>

Update distributed/scheduler.py

b3c35d0

[skip ci] Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>

Update distributed/scheduler.py

e97fa45

[skip ci] Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>

Update distributed/scheduler.py

a880960

[skip ci] Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>

Update distributed/scheduler.py

2e6fa95

[skip ci] Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>

Make TaskState.dependents/dependencies non-optional

7df5240

Trust ts.who_has.remove is not None

8cd2b9e

active_memory_manager.py::_find_dropper early check who_has is None

35d68f7

active_memory_manager.py::ReduceReplicas::run assert ts.who_has

c847897

Assign None instead of clearing optional sets

62bf269

fjetter changed the title ~~Optimize scheduler.py::TaskState class~~ Reduce memory usage of scheduler process - Optimize scheduler.py::TaskState class Nov 9, 2023

fjetter added 2 commits November 9, 2023 13:27

review (#1)

23e9ed7

remove assert in add_replica

04aa3e4

fjetter reviewed Nov 9, 2023

View reviewed changes

fjetter added 2 commits November 9, 2023 15:54

do not call set_restrictions with None

1474cd1

fix shuffle annotation handling

485d61c

fjetter reviewed Nov 10, 2023

View reviewed changes

distributed/active_memory_manager.py Outdated Show resolved Hide resolved

Update distributed/active_memory_manager.py

45447a6

Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>

fjetter merged commit 30086af into dask:main Nov 15, 2023
25 of 34 checks passed

milesgranger deleted the milesgranger/7998-update_graph-optimize branch November 15, 2023 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage of scheduler process - Optimize scheduler.py::TaskState class #8331

Reduce memory usage of scheduler process - Optimize scheduler.py::TaskState class #8331

milesgranger commented Nov 8, 2023 •

edited

github-actions bot commented Nov 8, 2023 •

edited

milesgranger commented Nov 8, 2023

fjetter commented Nov 8, 2023

fjetter Nov 9, 2023

fjetter commented Nov 10, 2023

fjetter commented Nov 10, 2023

Reduce memory usage of scheduler process - Optimize scheduler.py::TaskState class #8331

Reduce memory usage of scheduler process - Optimize scheduler.py::TaskState class #8331

Conversation

milesgranger commented Nov 8, 2023 • edited

github-actions bot commented Nov 8, 2023 • edited

Unit Test Results

milesgranger commented Nov 8, 2023

fjetter commented Nov 8, 2023

fjetter Nov 9, 2023

Choose a reason for hiding this comment

fjetter commented Nov 10, 2023

fjetter commented Nov 10, 2023

milesgranger commented Nov 8, 2023 •

edited

github-actions bot commented Nov 8, 2023 •

edited