What
docs/source/protocol.rst lists, as an advantage of the scheduler dealing in MsgPack, that "The Scheduler is protected from unpickling unsafe code," and says a pickled function sent to the scheduler will not be unpacked but kept as bytes.
This is inaccurate. The scheduler does unpickle __Pickled__ / ToPickle-wrapped frames during frame decode in distributed/protocol/core.py::loads(). The deserialize=False flag gates only the __Serialized__ branch; the __Pickled__ branch calls pickle.loads() unconditionally.
This is by design — the client wraps control-plane fields (code, annotations, span_metadata) in ToPickle on update-graph, and the scheduler must unpack them — and the ToPickle docstring states it directly: both the scheduler and workers automatically unpickle the object on arrival.
So the docs contradict the code and the docstring. deserialize=False protects the Serialized (forwarded task data) path, not the Pickled (control-plane) path.
Why it matters
The claim is more than cosmetic given the default posture: the scheduler binds 0.0.0.0:8786, distributed.comm.require-encryption defaults to null, and there is no shared secret. An operator relying on the documented "protected from unpickling" guarantee is misled about the actual trust boundary.
Suggested fix
Replace the "protected from unpickling unsafe code" claim with a security note rather than deleting it silently — e.g. state that the scheduler unpickles control-plane (ToPickle) frames during decode, that access to the scheduler port must therefore be treated as trusted, and that network-level controls (and TLS via require-encryption) are the recommended mitigation. This mirrors the established threat model (cf. the withdrawn CVE-2024-10096).
Context
A Sonar maintainer reviewing a private GitHub Security Advisory confirmed the behavior is intended and that the protocol.rst line is a documentation bug, and asked that this issue be filed. Happy to open a PR with the corrected wording.
What
docs/source/protocol.rstlists, as an advantage of the scheduler dealing in MsgPack, that "The Scheduler is protected from unpickling unsafe code," and says a pickled function sent to the scheduler will not be unpacked but kept as bytes.This is inaccurate. The scheduler does unpickle
__Pickled__/ToPickle-wrapped frames during frame decode indistributed/protocol/core.py::loads(). Thedeserialize=Falseflag gates only the__Serialized__branch; the__Pickled__branch callspickle.loads()unconditionally.This is by design — the client wraps control-plane fields (
code,annotations,span_metadata) inToPickleonupdate-graph, and the scheduler must unpack them — and theToPickledocstring states it directly: both the scheduler and workers automatically unpickle the object on arrival.So the docs contradict the code and the docstring.
deserialize=Falseprotects the Serialized (forwarded task data) path, not the Pickled (control-plane) path.Why it matters
The claim is more than cosmetic given the default posture: the scheduler binds
0.0.0.0:8786,distributed.comm.require-encryptiondefaults tonull, and there is no shared secret. An operator relying on the documented "protected from unpickling" guarantee is misled about the actual trust boundary.Suggested fix
Replace the "protected from unpickling unsafe code" claim with a security note rather than deleting it silently — e.g. state that the scheduler unpickles control-plane (
ToPickle) frames during decode, that access to the scheduler port must therefore be treated as trusted, and that network-level controls (and TLS viarequire-encryption) are the recommended mitigation. This mirrors the established threat model (cf. the withdrawn CVE-2024-10096).Context
A Sonar maintainer reviewing a private GitHub Security Advisory confirmed the behavior is intended and that the
protocol.rstline is a documentation bug, and asked that this issue be filed. Happy to open a PR with the corrected wording.