You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When parsl sends an app to be executed in another process (such as a high throughput executor worker), a callable is created and then serialized using (by default) Parsl's DillCallableSerializer class.
This then relies on dill and pickle serialization.
In some cases, the basic pickle serialization of functions or partials can be used, where a reference to the callable object along with parameters (which might themselves be callables) is sent: this roughly happens when the callable has a __module__ and __name__ defined and that module.name dereferences to the same callable.
In other cases, those conditions do not hold, and dill makes a much larger serialization consisting of the definition of the callable, and this can often bring along a lot of weight: PR #3491 makes a relatively small change to the definition of bash_apps and reduces a simple bash app serialization from 6940 bytes to 2305 bytes, for example, by switching remote_side_bash_executor from "send the whole definition of remote_side_bash_executor" to "send a reference to remote_side_bash_executor".
The situations when these two methods are used are quite subtle, and it's easy to make changes to code that cause a switch from one to the other without realising it: the point of dill choosing this for you is that you don't notice most of the time.
Describe the solution you'd like
Audit serialized function definitions to see where internals of parsl are being sent as full definitions (as remote_side_bash_executor is before PR #3491) and try to make them be sent as pickle-style references.
Implement some kind of regression testing to try to make the test suite detect when some future change accidentally switches some internal piece of parsl from one serialization form to the other.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When parsl sends an app to be executed in another process (such as a high throughput executor worker), a callable is created and then serialized using (by default) Parsl's
DillCallableSerializer
class.This then relies on dill and pickle serialization.
In some cases, the basic pickle serialization of functions or partials can be used, where a reference to the callable object along with parameters (which might themselves be callables) is sent: this roughly happens when the callable has a
__module__
and__name__
defined and that module.name dereferences to the same callable.In other cases, those conditions do not hold, and dill makes a much larger serialization consisting of the definition of the callable, and this can often bring along a lot of weight: PR #3491 makes a relatively small change to the definition of bash_apps and reduces a simple bash app serialization from 6940 bytes to 2305 bytes, for example, by switching
remote_side_bash_executor
from "send the whole definition of remote_side_bash_executor" to "send a reference to remote_side_bash_executor".The situations when these two methods are used are quite subtle, and it's easy to make changes to code that cause a switch from one to the other without realising it: the point of dill choosing this for you is that you don't notice most of the time.
Describe the solution you'd like
Audit serialized function definitions to see where internals of parsl are being sent as full definitions (as
remote_side_bash_executor
is before PR #3491) and try to make them be sent as pickle-style references.Implement some kind of regression testing to try to make the test suite detect when some future change accidentally switches some internal piece of parsl from one serialization form to the other.
The text was updated successfully, but these errors were encountered: