[MSE] Add Apache Arrow as native columnar block format#18207
Draft
praveenc7 wants to merge 6 commits intoapache:masterfrom
Draft
[MSE] Add Apache Arrow as native columnar block format#18207praveenc7 wants to merge 6 commits intoapache:masterfrom
praveenc7 wants to merge 6 commits intoapache:masterfrom
Conversation
…ht inter-node transport
…boxService and OpChainExecutionContext
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18207 +/- ##
============================================
- Coverage 63.34% 63.26% -0.09%
+ Complexity 1627 1624 -3
============================================
Files 3238 3252 +14
Lines 197003 198121 +1118
Branches 30464 30644 +180
============================================
+ Hits 124801 125349 +548
- Misses 62202 62670 +468
- Partials 10000 10102 +102
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
Code reviewFound 10 issues:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Contributor
Author
|
@yashmayya Still working on this, will let you known by next week. When it is ready for full review. I am testing different stratergy to see the performance difference |
…in Arrow path Copy elimination: - Flight receiver: replace unload/reload with TransferPair (0 copies vs 2) - Flight sender: skip slice+transfer for single-batch blocks - ArrowJoinProbe: bulk ArrowBuf gather for fixed-width vectors - ArrowBlockConverter: direct buffer read with byte-swap for ColumnarDataBlock - Dictionary merge: skip decode when all batches share the same dictionary - Dictionary sharing: reference-counted SharedDictionaryProvider instead of deep copy Correctness fixes from code review: - Rename ArrowLookupTable.finalize() to build() to avoid Object.finalize() GC conflict - Use CAS loop in retain() for thread-safe reference counting - Guard bulk gather fast path with getNullCount()==0 to preserve nulls - Make KeySelector Arrow methods default to avoid breaking external implementations - Handle empty right-side in ArrowLookupTable.build() without NPE
Arrow Flight wiring: - MailboxService creates FlightMailboxServer and FlightChannelManager when Arrow is enabled - getSendingMailbox() returns FlightSendingMailbox for remote targets (zero-copy IPC) - Flight port defaults to gRPC port + 1 (configurable via pinot.query.runner.flight.port) - EOS/error signalling stays on gRPC for rolling-upgrade compatibility - Flight runs plaintext only (TLS requires PEM certs — TODO for follow-up) Correctness fixes: - Close Arrow allocator in OpChain.cancel() (was only in close(), leaked on errors) - Fall back to row path for RIGHT/FULL joins (buildNonMatchRightRows needs _rightTable) - Rename ArrowLookupTable.finalize() to build() to avoid Object.finalize() GC conflict - CAS loop in retain() for thread-safe reference counting - Guard bulk gather with getNullCount()==0 to preserve nulls - Make KeySelector Arrow methods default to avoid SPI break - Fix FLOAT/DOUBLE ClassCastException in ArrowBlockConverter - Close per-query Arrow BufferAllocator in OpChain.close() LOC reduction: - Remove unused ArrowSchemaConverter (inlined into ArrowDataBlock) - Extract shared tryDirectCopy4/8 for endian-swap fast path - Merge 4 probe methods into 2 unified implementations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
#18205
Testing Done
TODO