Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage of scheduler process - Optimize scheduler.py::TaskState class #8331

Merged

Conversation

milesgranger
Copy link
Contributor

@milesgranger milesgranger commented Nov 8, 2023

Part of #7998

The first approach in ef4b7dd moved things into descriptors, which would only assign a value when being accessed. This results in ~34% reduction in memory when compared to main. Downside was there is some boilerplace for each attribute requiring a lazy creation (albeit this could be generalized by prefixing each with __lazy_ for example then iterating on those annotations after initialization..)

Second approach, the current direction of this PR, uses optionals for more attributes in this class. Primarily those using set/dict. This results in ~42% reduction in memory when compared to main. Downside is much more changed code, checking for None everywhere and the tests are currently wildly broken. 😅

Snippet I'm using for this comparison to main:

from dask.datasets import timeseries
from dask.distributed import Client
from distributed.diagnostics.memray import memray_scheduler

def main():
    ddf = timeseries("2020", "2021", partition_freq='2h')
    ddf2 = ddf.shuffle(on="x", shuffle='tasks')

    print(len(ddf2.__dask_graph__()))

    with Client() as client:
        with memray_scheduler():
            fut = client.compute(ddf2.size)
            import time
            time.sleep(90)
        fut.cancel()

if __name__ == '__main__':
    main()

cc @fjetter

@milesgranger milesgranger force-pushed the milesgranger/7998-update_graph-optimize branch from 02d1844 to eb69001 Compare November 8, 2023 07:51
@milesgranger milesgranger force-pushed the milesgranger/7998-update_graph-optimize branch 4 times, most recently from 9ed69f1 to 67a80cf Compare November 8, 2023 09:20
Copy link
Contributor

github-actions bot commented Nov 8, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       27 files  ±0         27 suites  ±0   13h 41m 1s ⏱️ - 1h 13m 13s
  3 926 tests ±0    3 803 ✔️ +4     117 💤 ±0  6  - 4 
49 391 runs  ±0  46 970 ✔️ +5  2 414 💤 +1  7  - 6 

For more details on these failures, see this check.

Results for commit 45447a6. ± Comparison against base commit 0dc9e88.

♻️ This comment has been updated with latest results.

@milesgranger milesgranger force-pushed the milesgranger/7998-update_graph-optimize branch from 67a80cf to f952b61 Compare November 8, 2023 11:03
@milesgranger milesgranger marked this pull request as ready for review November 8, 2023 11:58
@milesgranger
Copy link
Contributor Author

Okay, @fjetter I think this is ready for a proper review then. Failing tests don't seem related, (ran most/all of those failing here locally fine as well).

distributed/active_memory_manager.py Outdated Show resolved Hide resolved
distributed/active_memory_manager.py Outdated Show resolved Hide resolved
distributed/active_memory_manager.py Outdated Show resolved Hide resolved
distributed/shuffle/_scheduler_plugin.py Outdated Show resolved Hide resolved
distributed/scheduler.py Outdated Show resolved Hide resolved
distributed/scheduler.py Outdated Show resolved Hide resolved
distributed/scheduler.py Outdated Show resolved Hide resolved
distributed/scheduler.py Outdated Show resolved Hide resolved
distributed/scheduler.py Outdated Show resolved Hide resolved
distributed/scheduler.py Outdated Show resolved Hide resolved
@fjetter
Copy link
Member

fjetter commented Nov 8, 2023

I think summarizing my review

  • If possible we should use assert foo is not None instead of if foo else () since it helps readability a lot. The scheduler is a messy, complex system and knowing for a fact that an attribute exists helps
  • there are a bunch of places where we're clearing existing dicts/sets. In the spirit of only keeping what's necessary, we can instead just reset them to None
  • There are a couple of very common sets that are used for every task and I think we should just keep them non-optional. This concerns primarily dependencies and dependents. They are all but guaranteed to be populated so we can initialize them right away and save ourselves a bit of complexity
  • There are a couple of sets that are going to be populated for every single task but only for a certain amount of time. This includes who_has, waiters, waiting_on. We will only save ourselves memory by delaying initialization if we ensure that the sets are actually released once those collections are no longer required (which is typically true once a task is in memory). See the above comment about clear

milesgranger and others added 11 commits November 8, 2023 14:33
[skip ci]

Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>
[skip ci]

Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>
[skip ci]

Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>
[skip ci]

Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>
[skip ci]

Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>
[skip ci]

Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>
@fjetter fjetter changed the title Optimize scheduler.py::TaskState class Reduce memory usage of scheduler process - Optimize scheduler.py::TaskState class Nov 9, 2023
Comment on lines -819 to -821
if self.scheduler.validate:
assert self not in ts.who_has
assert ts not in self.has_what
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed add_replica to be idempotent so this validation is no longer valid

@fjetter
Copy link
Member

fjetter commented Nov 10, 2023

test failures are all unrelated

Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>
@fjetter
Copy link
Member

fjetter commented Nov 10, 2023

test failures are all unrelated

kind of sad considering all the failures...

@fjetter fjetter merged commit 30086af into dask:main Nov 15, 2023
25 of 34 checks passed
@milesgranger milesgranger deleted the milesgranger/7998-update_graph-optimize branch November 15, 2023 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants