feat(dataframe): add executeStream(allocator) for incremental batch iteration#51
Open
LantaoJin wants to merge 1 commit into
Open
feat(dataframe): add executeStream(allocator) for incremental batch iteration#51LantaoJin wants to merge 1 commit into
LantaoJin wants to merge 1 commit into
Conversation
…teration DataFrame.collect(allocator) materializes every batch into a Vec<RecordBatch> on the Rust heap before the first batch crosses the FFI boundary into Java. For TB-scale or unbounded result sets, this OOMs the Rust side regardless of how memory accounting is configured downstream. Adds DataFrame.executeStream(BufferAllocator) -> ArrowReader as a peer to collect, sharing the same lifecycle (consumes the DataFrame, caller closes the returned reader, allocator must outlive it) but pulling batches lazily. The native side wraps DataFusion's existing SessionContext::execute_stream output (a SendableRecordBatchStream) in a small StreamingReader adapter that bridges async Stream::next() to Arrow's synchronous RecordBatchReader trait; each call to ArrowReader.loadNextBatch() on the Java side drives one runtime().block_on(stream.next()) on the Rust side. Memory pressure stays bounded by the executor pipeline plus a single in-flight batch instead of the full result set. collect remains on its current code path and is unchanged behaviorally; only its Javadoc gains a forward-pointer to executeStream for analytics-scale queries. A follow-up could consolidate collect onto executeStream + concat (~10 LOC, no API change) but that refactor is out of scope here to keep the diff focused on adding the streaming primitive. Tests cover equivalence with collect (same row count over a small VALUES query), the consumes-DataFrame contract (second collect/executeStream/count throws after a successful executeStream), incremental delivery (with batchSize=2 over 5 rows the reader yields multiple batches and no single batch holds the full result), early-close survival (closing the reader mid-stream does not panic), TPC-H integration gated on the SF1 lineitem table, and column-value correctness (pins actual cell values across batches, not just row counts). native/Cargo.toml gains futures = "0.3" for StreamExt::next; it is already pulled in transitively by tokio and datafusion, the addition just makes the import path explicit.
Member
Code reviewFound 2 issues:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
DataFrame.collect(allocator)is the only way to retrieve query results today. Internally, the native handler callsdf.collect().awaitwhich materializes every batch into aVec<RecordBatch>on the Rust heap before the first batch crosses the FFI boundary. For TB-scale or unbounded result sets, this OOMs the Rust side regardless of how memory accounting is configured downstream.DataFusion's own
SessionContext::execute_streamalready returns aSendableRecordBatchStreamthat yields one batch at a time. The Java binding just needs to wire that to the existingFFI_ArrowArrayStreampath so eachloadNextBatch()from Java drives onestream.next()on the Rust side.What changes are included in this PR?
DataFrame.executeStream(BufferAllocator) → ArrowReader— peer tocollect. Same lifecycle (consumes the DataFrame, caller closes the returned reader, allocator must outlive it). Same return type so existing reader-driven code paths work unchanged.native/src/lib.rs— newJava_org_apache_datafusion_DataFrame_executeStreamDataFrameJNI handler plus a smallStreamingReaderadapter that bridges DataFusion's asyncSendableRecordBatchStreamto Arrow's synchronousRecordBatchReaderinterface. Eachnext()drives oneruntime().block_on(stream.next()).native/Cargo.toml— addsfutures = "0.3"forStreamExt::next.collect's Javadoc to point atexecuteStreamfor analytics-scale queries.collectis unchanged. The two methods coexist;collectremains the right call for small results that benefit from a single owning buffer,executeStreamis the right call for unbounded or very large result sets.Are these changes tested?
Yes — 6 new tests in
DataFrameExecuteStreamTestAre there any user-facing changes?
Yes — purely additive. New public API:
DataFrame.executeStream(BufferAllocator) → ArrowReadercollect's behavior is unchanged; only its Javadoc gains a forward-pointer toexecuteStream. No deprecations, no API removals.A new transitive build dependency (
futures = "0.3") is added to the native crate forStreamExt. It is already pulled in transitively bytokioanddatafusion, so this just makes the dependency explicit on the import path.