Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audit and regression-test pickle/dill serialization of parsl internals #3491

Open
benclifford opened this issue Jun 20, 2024 · 0 comments
Open

Comments

@benclifford
Copy link
Collaborator

benclifford commented Jun 20, 2024

Is your feature request related to a problem? Please describe.

When parsl sends an app to be executed in another process (such as a high throughput executor worker), a callable is created and then serialized using (by default) Parsl's DillCallableSerializer class.

This then relies on dill and pickle serialization.

In some cases, the basic pickle serialization of functions or partials can be used, where a reference to the callable object along with parameters (which might themselves be callables) is sent: this roughly happens when the callable has a __module__ and __name__ defined and that module.name dereferences to the same callable.

In other cases, those conditions do not hold, and dill makes a much larger serialization consisting of the definition of the callable, and this can often bring along a lot of weight: PR #3491 makes a relatively small change to the definition of bash_apps and reduces a simple bash app serialization from 6940 bytes to 2305 bytes, for example, by switching remote_side_bash_executor from "send the whole definition of remote_side_bash_executor" to "send a reference to remote_side_bash_executor".

The situations when these two methods are used are quite subtle, and it's easy to make changes to code that cause a switch from one to the other without realising it: the point of dill choosing this for you is that you don't notice most of the time.

Describe the solution you'd like

Audit serialized function definitions to see where internals of parsl are being sent as full definitions (as remote_side_bash_executor is before PR #3491) and try to make them be sent as pickle-style references.

Implement some kind of regression testing to try to make the test suite detect when some future change accidentally switches some internal piece of parsl from one serialization form to the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant