Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization family to preserve headers of the underlying dumps functions. #5380

Merged
merged 18 commits into from
Nov 5, 2021

Conversation

madsbk
Copy link
Contributor

@madsbk madsbk commented Oct 1, 2021

Currently, the dask and cuda serialization families overwrite the header of the underlying dumps functions. For instance, the cuda_dumps overwrite the "type-serialized", which is a problem when Dask uses pickle5 and the underlying loads function use pickle protocol=4.
Beside, overwriting an underlying protocol's header is bad style.

This PR fixes this issue

@madsbk
Copy link
Contributor Author

madsbk commented Oct 1, 2021

Notice, I have changed the test test_compression_numpy_list() to better reflect what (I think) we want to support.

@madsbk madsbk marked this pull request as ready for review October 1, 2021 14:36
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @madsbk! It looks like there's another spot we need to insert ["sub-headers"]

____________________________ test_deserialize_grad _____________________________

    def test_deserialize_grad():
        a = np.random.rand(8, 1)
        t = torch.tensor(a, requires_grad=True, dtype=torch.float)
>       t2 = deserialize(*serialize(t))

distributed/protocol/tests/test_torch.py:48: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
distributed/protocol/serialize.py:410: in deserialize
    return loads(header, frames)
distributed/protocol/serialize.py:50: in dask_loads
    return loads(header["sub-header"], frames)
distributed/protocol/torch.py:36: in deserialize_torch_Tensor
    x = dask_deserialize.dispatch(np.ndarray)(header, frames)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

header = {'device': 'cpu', 'requires_grad': True, 'serializer': 'dask', 'sub-header': {'dtype': (0, '<f4'), 'shape': (8, 1), 'strides': (4, 4), 'writeable': [True]}, ...}
frames = [<memory at 0x13abbdb80>]

    @dask_deserialize.register(np.ndarray)
    def deserialize_numpy_ndarray(header, frames):
        with log_errors():
            if header.get("pickle"):
                return pickle.loads(frames[0], buffers=frames[1:])
    
            (frame,) = frames
>           (writeable,) = header["writeable"]
E           KeyError: 'writeable'

distributed/protocol/numpy.py:116: KeyError

@jakirkham would you have time to take a look at this PR?

distributed/protocol/serialize.py Outdated Show resolved Hide resolved
@madsbk
Copy link
Contributor Author

madsbk commented Oct 5, 2021

Thanks @madsbk! It looks like there's another spot we need to insert ["sub-headers"]

Thanks, the torch protocol now also uses a sub-header

@jakirkham
Copy link
Member

@jakirkham would you have time to take a look at this PR?

@jrbourbeau, were you just wanting feedback on that test failure or was there something else you were looking for?

@madsbk
Copy link
Contributor Author

madsbk commented Oct 5, 2021

Hmm, distributed/protocol/tests/test_torch.py::test_resnet is still failing. Will look at it tomorrow.

@madsbk
Copy link
Contributor Author

madsbk commented Oct 6, 2021

As far as I can see, the CI errors isn't related to this PR.

FAILED distributed/tests/test_stress.py::test_stress_creation_and_deletion - ...
FAILED distributed/comm/tests/test_ws.py::test_collections - tornado.util.Tim..
FAILED distributed/deploy/tests/test_adaptive.py::test_adaptive_local_cluster_multi_workers
FAILED distributed/tests/test_asyncprocess.py::test_exit_callback 

@madsbk
Copy link
Contributor Author

madsbk commented Oct 6, 2021

@jrbourbeau, I think this is ready to be merged.

@jakirkham
Copy link
Member

@jrbourbeau was there anything else we needed to do here or is this good to go?

@madsbk
Copy link
Contributor Author

madsbk commented Nov 4, 2021

@jrbourbeau it would be good to get this merged, it is blocking Dask-CUDA: rapidsai/dask-cuda#746

@jakirkham
Copy link
Member

Going to merge to unblock Dask-CUDA. The PyTorch issue noted originally has been resolved.

Most of the failures here seem to be related to Distributed's concurrent.future support, which are unrelated. There is one failure with test_spill_to_disk, but it only happens in one job. We might want to keep an eye on that though.

@jakirkham jakirkham merged commit 4b2d1f2 into dask:main Nov 5, 2021
@madsbk madsbk deleted the preserve_dump_header branch November 11, 2021 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants