Description
Introduce Apache Arrow as a first-class in-memory representation inside Druid's MSQ engine by adding ArrowRowsAndColumns .
The #19456 discussion identified that swapping Druid's row-at-a-time JVM hot path requires touching multiple load-bearing abstractions (Frame, FrameProcessor, channels, leaf segment readers, shuffle, planner).
As @gianm , @jtuglu1 mentioned - The first move is the operator with the smallest blast radius (one non-leaf single-input FrameProcessor) backed by a reusable abstraction (ArrowRowsAndColumns) — so every new implementations phases reuses it.
Motivation
introduce ArrowRowsAndColumns and adopt it inside GroupByPostShuffleFrameProcessor — first concrete step of the #19456 modernisation program.
Related
• #19456 — Native, vectorised, zero-copy execution path (this is the first concrete step)
• #13458 — RowsAndColumns introduction (the abstraction this extends)
• #18909 — WireTransferable (the seam that Phase C will use)
Description
Introduce Apache Arrow as a first-class in-memory representation inside Druid's MSQ engine by adding ArrowRowsAndColumns .
The #19456 discussion identified that swapping Druid's row-at-a-time JVM hot path requires touching multiple load-bearing abstractions (Frame, FrameProcessor, channels, leaf segment readers, shuffle, planner).
As @gianm , @jtuglu1 mentioned - The first move is the operator with the smallest blast radius (one non-leaf single-input FrameProcessor) backed by a reusable abstraction (ArrowRowsAndColumns) — so every new implementations phases reuses it.
Motivation
introduce ArrowRowsAndColumns and adopt it inside GroupByPostShuffleFrameProcessor — first concrete step of the #19456 modernisation program.
Related
• #19456 — Native, vectorised, zero-copy execution path (this is the first concrete step)
• #13458 — RowsAndColumns introduction (the abstraction this extends)
• #18909 — WireTransferable (the seam that Phase C will use)