New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compress serialized config data with zlib #4465
base: main
Are you sure you want to change the base?
Compress serialized config data with zlib #4465
Conversation
Thanks @jacobtomlinson -- do you think we need to handle cases where zlib is not installed ? |
@quasiben I was under the impression that |
@jacobtomlinson , apologies. It is in the standard library! |
Maybe we should consider some more general purpose solution to this? Maybe
we should write things into a file and use the config system?
…On Thu, Jan 28, 2021 at 6:35 AM Benjamin Zaitlen ***@***.***> wrote:
@jacobtomlinson <https://github.com/jacobtomlinson> , apologies. It is in
the standard library!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4465 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTDSV4YPKHMSA53M4CDS4FY2HANCNFSM4WW5FOHA>
.
|
I agree there is probably a better way to pass this config along. This PR is just treating a symptom. Passing a file as part of a VM creation on any cloud is non-trivial though. I could imagine passing it through some storage like S3, but that adds complexity. |
Coming from here (thanks @quasiben), I can confirm that this PR solved my issue, at least on this side of things there is no more error. However my machine is stuck at edit: So I did a bad thing; I launched the AMI created by packer, edited all files /distributed/utils.py in the conda env's as in the merge changes, and created a new AMI from this image. I still get the same behavior as above unfortunately. |
@ZirconCode glad this PR got you to the next error. Could you raise an issue on dask-cloudprovider about this? |
I was reminded of his issue by https://stackoverflow.com/questions/75815578/starting-ec2cluster-with-dask-cloudprovider . I can't think of any reason NOT to do this, even though one would think crypto data is not very compressible. |
While working on
dask-cloudprovider
I've noticed there is a limit to the amount of user_data you can pass to AWS. Among other things, a serialized copy of the local Dask config is passed via the user_data and depending on how much config a user has can result in the AWS API rejecting the call.dask/dask-cloudprovider#249
In an attempt to mitigate this I've added
zlib
compression to the config serialization utility methods. From the limited testing I've done locally I've seen a ~60% reduction in size.