Skip to content

WireTransferable result batches.#18909

Merged
gianm merged 8 commits intoapache:masterfrom
gianm:msq-wirexfr
Jan 14, 2026
Merged

WireTransferable result batches.#18909
gianm merged 8 commits intoapache:masterfrom
gianm:msq-wirexfr

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Jan 12, 2026

This patch introduces a "WireTransferable" interface that is used to serialize batches of results represented by RowsAndColumns objects. This allows query implementations to control the exact manner in which their result batches are modeled, serialized, and deserialized, which enables greater efficiency.

For example, this would allow an MSQ implementation of TopN (which does not currently exist) to use a data structure between leaf and merge stages that matches what it uses in its native implementation. It would also allow this data structure to be serialized efficiently, without needing to use JSON/Smile.

WireTransferable is used in two places:

  1. In ObjectMappers, when they read or write RowsAndColumns objects.

  2. In MSQ channels, when frames or other batches are read or written to disk or network.

Prior to this patch, the ObjectMapper path only handled FrameRowsAndColumns, and the channel path only handled Frames, in both cases using a Frame-specific serialization format. These Frame-specific serialization formats are still used by default when the result batches are Frames. The new FrameWireTransferable format can be enabled in both places by setting
druid.serde.rac.useLegacyFrameSerialization = false.

As part of this change, the ReadableFrameChannels and WritableFrameChannels used by MSQ are generalized to allow transmitting any RowsAndColumns, not just Frames. When Frames are transmitted, they are enclosed in a thin FrameRowsAndColumns wrapper. In the interest of minimizing the size of the diff, the names of the classes are not changed.

This patch introduces a "WireTransferable" interface that is used to
serialize batches of results represented by RowsAndColumns objects.
This allows query implementations to control the exact manner in which
their result batches are modeled, serialized, and deserialized, which
enables greater efficiency.

For example, this would allow an MSQ implementation of TopN (which does
not currently exist) to use a data structure between leaf and merge stages
that matches what it uses in its native implementation. It would also
allow this data structure to be serialized efficiently, without needing
to use JSON/Smile.

WireTransferable is used in two places:

1) In ObjectMappers, when they read or write RowsAndColumns objects.

2) In MSQ channels, when frames or other batches are read or written to
   disk or network.

Prior to this patch, the ObjectMapper path only handled FrameRowsAndColumns,
and the channel path only handled Frames, in both cases using a
Frame-specific serialization format. These Frame-specific serialization
formats are still used by default. The new FrameWireTransferable format
can be enabled in both places by setting
druid.serde.rac.useLegacyFrameSerialization = false.

As part of this change, the ReadableFrameChannels and WritableFrameChannels
used by MSQ are generalized to allow transmitting any RowsAndColumns,
not just Frames. When Frames are transmitted, they are enclosed in a thin
FrameRowsAndColumns wrapper. In the interest of minimizing the size of the
diff, the names of the classes are not changed.
@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jan 12, 2026
/**
* Deserializer for frames.
*/
public static class Deserializer implements WireTransferable.Deserializer

Check notice

Code scanning / CodeQL

Class has same name as super class Note

Deserializer has the same name as its supertype
org.apache.druid.query.rowsandcols.semantic.WireTransferable$Deserializer
.
private final int offset;
private final int length;

public ByteArrayOffsetAndLen(byte[] array, int offset, int length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use case for offset? Afaict it seems like it always set to 0, but trying to wrap my head around what it would mean/how it would be used if it was set to non-zero, like i guess packing multiple WireTransferrable into the same blob?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's meant to be something that can be created from an onheap ByteBuffer without doing any copies, so it supports offset/length like position/limit. Although those aren't currently used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to just call this write?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named it writeRAC because the old method was called writeFrame, but I suppose it could be just write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it write.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like this method should be called read and the other should be readAsFrame or readFrame

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was about diff-minimization. Changing the name of read() adds another chunk of changed lines at various call sites that only deal with Frames.

Copy link
Member

@clintropolis clintropolis Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea fair, it still seems worth changing since this is now the real 'read' method; it looks like only 37 callers of read after the changes in this PR, so maybe not too disruptive? fine to do as a follow-up too though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'll just change it now.

@gianm gianm merged commit 10e56bf into apache:master Jan 14, 2026
40 checks passed
@gianm gianm deleted the msq-wirexfr branch January 14, 2026 18:20
@kgyrtkirk kgyrtkirk added this to the 37.0.0 milestone Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments