Skip to content

MSQ: introduce ArrowRowsAndColumns and adopt it inside GroupByPostShuffleFrameProcessor #19499

@Shekharrajak

Description

@Shekharrajak

Description

Introduce Apache Arrow as a first-class in-memory representation inside Druid's MSQ engine by adding ArrowRowsAndColumns .

The #19456 discussion identified that swapping Druid's row-at-a-time JVM hot path requires touching multiple load-bearing abstractions (Frame, FrameProcessor, channels, leaf segment readers, shuffle, planner).

As @gianm , @jtuglu1 mentioned - The first move is the operator with the smallest blast radius (one non-leaf single-input FrameProcessor) backed by a reusable abstraction (ArrowRowsAndColumns) — so every new implementations phases reuses it.

Motivation

introduce ArrowRowsAndColumns and adopt it inside GroupByPostShuffleFrameProcessor — first concrete step of the #19456 modernisation program.

Related

#19456 — Native, vectorised, zero-copy execution path (this is the first concrete step)
#13458 — RowsAndColumns introduction (the abstraction this extends)
#18909 — WireTransferable (the seam that Phase C will use)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions