Skip to content

Expose Multi-Segment / Scatter-Gather Payload APIs in zenoh-python #730

@BigTailFox

Description

@BigTailFox

Describe the feature

Motivation

Many high-performance serialization frameworks expose messages as multiple independent memory segments instead of a single contiguous buffer.

Examples include:

  • Cap'n Proto (to_segments() / from_segments())
  • Apache Arrow
  • FlatBuffers (advanced builders)
  • Custom shared-memory allocators

Currently, zenoh-python primarily exposes payloads as a single bytes-like object. As a result, applications often need to:

  1. Serialize into multiple segments
  2. Copy and concatenate those segments into one contiguous buffer
  3. Publish through zenoh
  4. Potentially split them again on the receiving side

This introduces unnecessary memory copies and allocation overhead, especially for large messages and high-frequency data streams.

For robotics, simulation, perception, and ML workloads, payloads can easily reach several megabytes per sample, making these copies a significant bottleneck.

Proposed API

Expose a scatter-gather / multi-segment payload interface in zenoh-python.

Publisher Side

Example:

segments = msg.to_segments()

payload = zenoh.ZBytes.from_segments(
    memoryview(seg) for seg in segments
)

pub.put(payload)

or

pub.put_segments([
    memoryview(seg0),
    memoryview(seg1),
    memoryview(seg2),
])
Subscriber Side

Example:

def callback(sample):
    segments = sample.payload.segments()

    msg = MyCapnpType.from_segments(segments)

where each returned segment is exposed as a zero-copy Python buffer (memoryview or equivalent).

Benefits

  • Zero-copy integration

  • Enables efficient integration with:

    Cap'n Proto
    Apache Arrow
    Shared-memory allocators
    Custom serialization frameworks
    Reduced memory bandwidth

  • Avoids unnecessary buffer concatenation and splitting.

  • Better SHM utilization
    Fits naturally with zenoh shared-memory transports where payloads may already be represented as multiple buffers internally.

  • Alignment with Rust APIs
    The Rust implementation already contains abstractions such as ZBytes and slice-based payload handling.
    Exposing similar capabilities in Python would allow advanced users to leverage the same performance characteristics.

Lifetime and Ownership Considerations

One important aspect of this feature is the lifetime relationship between Python buffer objects and the underlying zenoh payload.

For publisher-side APIs, it should be explicit whether ZBytes.from_segments(...):

  1. Copies the input buffers immediately, or
  2. Retains references to the Python buffer owners until the payload is no longer needed by zenoh.

For zero-copy behavior, option 2 is preferred, but the API must guarantee that the referenced Python objects remain alive for the full lifetime of the constructed ZBytes object and any asynchronous send operation using it.

For example:

segments = msg.to_segments()
payload = zenoh.ZBytes.from_segments(segments)
pub.put(payload)

In this case, payload should keep the original segment owners alive until zenoh no longer needs the data.

Similarly, for subscriber-side APIs, if sample.payload.segments() returns memoryview objects, those memoryviews must keep the underlying payload alive for as long as the memoryviews are accessible.

A safe design could follow these principles:

  • ZBytes.from_segments(...) stores strong references to the Python buffer-exporting objects.
  • The returned ZBytes owns or references those Python objects for its full lifetime.
  • pub.put(payload) must either synchronously complete the necessary handoff or retain the payload internally until transmission is complete.
  • sample.payload.segments() returns memoryviews whose base object keeps the Sample / ZBytes payload alive.
  • The API should clearly document whether the returned views are valid only inside the callback or can outlive it.

This is especially important for integration with Python buffer protocol objects such as:

memoryview
bytes
bytearray
mmap
numpy.ndarray
torch.Tensor on CPU
Cap'n Proto segment buffers
shared-memory-backed arrays

Without clear ownership and lifetime guarantees, exposing a scatter-gather zero-copy API could lead to use-after-free bugs or hidden copies that defeat the purpose of the feature.

Additional Consideration

The proposal is not intended to define transport-level multipart semantics.

Applications that require stable message framing can still encode framing information in their payload format.

The goal is simply to expose payloads as multiple memory segments when possible, allowing applications to avoid unnecessary copies.

Use Case

One concrete example is Cap'n Proto:

Current path:

Capnp Segments
    ↓
Concatenate
    ↓
Python bytes
    ↓
zenoh.put()

Desired path:

Capnp Segments
    ↓
ZBytes / Segments
    ↓
zenoh.put()

and on the receiving side:

zenoh Payload Segments
    ↓
Capnp.from_segments()

allowing true end-to-end zero-copy operation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions