WireTransferable result batches.#18909
Conversation
This patch introduces a "WireTransferable" interface that is used to serialize batches of results represented by RowsAndColumns objects. This allows query implementations to control the exact manner in which their result batches are modeled, serialized, and deserialized, which enables greater efficiency. For example, this would allow an MSQ implementation of TopN (which does not currently exist) to use a data structure between leaf and merge stages that matches what it uses in its native implementation. It would also allow this data structure to be serialized efficiently, without needing to use JSON/Smile. WireTransferable is used in two places: 1) In ObjectMappers, when they read or write RowsAndColumns objects. 2) In MSQ channels, when frames or other batches are read or written to disk or network. Prior to this patch, the ObjectMapper path only handled FrameRowsAndColumns, and the channel path only handled Frames, in both cases using a Frame-specific serialization format. These Frame-specific serialization formats are still used by default. The new FrameWireTransferable format can be enabled in both places by setting druid.serde.rac.useLegacyFrameSerialization = false. As part of this change, the ReadableFrameChannels and WritableFrameChannels used by MSQ are generalized to allow transmitting any RowsAndColumns, not just Frames. When Frames are transmitted, they are enclosed in a thin FrameRowsAndColumns wrapper. In the interest of minimizing the size of the diff, the names of the classes are not changed.
| private final int offset; | ||
| private final int length; | ||
|
|
||
| public ByteArrayOffsetAndLen(byte[] array, int offset, int length) |
There was a problem hiding this comment.
what is the use case for offset? Afaict it seems like it always set to 0, but trying to wrap my head around what it would mean/how it would be used if it was set to non-zero, like i guess packing multiple WireTransferrable into the same blob?
There was a problem hiding this comment.
It's meant to be something that can be created from an onheap ByteBuffer without doing any copies, so it supports offset/length like position/limit. Although those aren't currently used.
There was a problem hiding this comment.
any reason not to just call this write?
There was a problem hiding this comment.
I named it writeRAC because the old method was called writeFrame, but I suppose it could be just write.
There was a problem hiding this comment.
it seems like this method should be called read and the other should be readAsFrame or readFrame
There was a problem hiding this comment.
This was about diff-minimization. Changing the name of read() adds another chunk of changed lines at various call sites that only deal with Frames.
There was a problem hiding this comment.
yea fair, it still seems worth changing since this is now the real 'read' method; it looks like only 37 callers of read after the changes in this PR, so maybe not too disruptive? fine to do as a follow-up too though
There was a problem hiding this comment.
Hmm, I'll just change it now.
This patch introduces a "WireTransferable" interface that is used to serialize batches of results represented by RowsAndColumns objects. This allows query implementations to control the exact manner in which their result batches are modeled, serialized, and deserialized, which enables greater efficiency.
For example, this would allow an MSQ implementation of TopN (which does not currently exist) to use a data structure between leaf and merge stages that matches what it uses in its native implementation. It would also allow this data structure to be serialized efficiently, without needing to use JSON/Smile.
WireTransferable is used in two places:
In ObjectMappers, when they read or write RowsAndColumns objects.
In MSQ channels, when frames or other batches are read or written to disk or network.
Prior to this patch, the ObjectMapper path only handled FrameRowsAndColumns, and the channel path only handled Frames, in both cases using a Frame-specific serialization format. These Frame-specific serialization formats are still used by default when the result batches are Frames. The new FrameWireTransferable format can be enabled in both places by setting
druid.serde.rac.useLegacyFrameSerialization = false.As part of this change, the ReadableFrameChannels and WritableFrameChannels used by MSQ are generalized to allow transmitting any RowsAndColumns, not just Frames. When Frames are transmitted, they are enclosed in a thin FrameRowsAndColumns wrapper. In the interest of minimizing the size of the diff, the names of the classes are not changed.