-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning #8307
Conversation
…keep state about how much of a chunk has been processed already
…duce an empty (or null) chunk, to signal depleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Tishj thanks for the PR, it looks great! I just had some small comments.
You might also need to update the arrow ipc extension
I had a look at the |
Thanks! |
Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state [Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz Python TIMESTAMPTZ support Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci [CI] More CI reduction and clean-up Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description Add union to test_all_types, and arrow and json R/W
Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state [Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz Python TIMESTAMPTZ support Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci [CI] More CI reduction and clean-up Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description Add union to test_all_types, and arrow and json R/W
Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state [Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz Python TIMESTAMPTZ support Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci [CI] More CI reduction and clean-up Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description Add union to test_all_types, and arrow and json R/W
- Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state: [Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning - Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz: Python TIMESTAMPTZ support - Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci: [CI] More CI reduction and clean-up - Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description: Add union to test_all_types, and arrow and json R/W - Merge pull request duckdb/duckdb#8497 from samansmink/pending-execute-result-api-change: Add PendingExecutionResult::ALL_TASKS_BLOCKED
- Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state: [Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning - Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz: Python TIMESTAMPTZ support - Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci: [CI] More CI reduction and clean-up - Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description: Add union to test_all_types, and arrow and json R/W - Merge pull request duckdb/duckdb#8497 from samansmink/pending-execute-result-api-change: Add PendingExecutionResult::ALL_TASKS_BLOCKED
This PR adds an abstract ChunkScanState class, which can be used when scanning from a source that produces data chunks, and the chunks could potentially only be partially processed.
Previously we used a
CurrentChunk
struct located in the QueryResult to keep this state, this was only used in theArrowUtil::FetchChunk
method.We now move to using a
ChunkScanState
in there instead, which allows us to replace the source that these chunks come from.This is currently only implemented for one source, the
QueryResult
.Previously we used
CurrentChunk
which was being kept in the QueryResult, this PR retires that struct.Motivation for this PR is work on the upcoming pyarrow result collector.
In that PR we will need to scan from a ColumnDataCollection in
ArrowUtil::FetchChunk
, and we will need similar state to this.