Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Parsl stop serializing the entire Runner instance #616

Merged
merged 3 commits into from
Nov 11, 2021

Conversation

jrueb
Copy link
Contributor

@jrueb jrueb commented Nov 11, 2021

Parsl serializes the callables provided to it and for each chunk keeps one copy of serialized buffer in memory locally. Currently in case of Coffea these are bound methods of Runner, which makes Parsl serialize the entire Runner instance. However the Runner instance contains things like caches and executors, which can be user-provided and very large. This makes the serialized buffer very large as well. In my tests I easily got buffer sizes of ~ MB and with 10000 chunks, which is a normal number for my applications, this amounts to ~ 10 GB. All this memory is occupied without any good reason. In some extreme cases I was even able to fill up all the memory of my machine.

This patch makes the methods that are serialized static, so that the serialization works independent of the Runner instance. Affected methods are config, automatic_retries, metadata_fetcher and _work_function.

@lgray
Copy link
Collaborator

lgray commented Nov 11, 2021

Thanks for catching this!

@lgray lgray merged commit 183cbe4 into CoffeaTeam:master Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants