Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning #8307

Merged
merged 14 commits into from
Aug 1, 2023

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Jul 19, 2023

This PR adds an abstract ChunkScanState class, which can be used when scanning from a source that produces data chunks, and the chunks could potentially only be partially processed.

Previously we used a CurrentChunk struct located in the QueryResult to keep this state, this was only used in the ArrowUtil::FetchChunk method.
We now move to using a ChunkScanState in there instead, which allows us to replace the source that these chunks come from.

This is currently only implemented for one source, the QueryResult.
Previously we used CurrentChunk which was being kept in the QueryResult, this PR retires that struct.

Motivation for this PR is work on the upcoming pyarrow result collector.
In that PR we will need to scan from a ColumnDataCollection in ArrowUtil::FetchChunk, and we will need similar state to this.

@Tishj Tishj requested a review from pdet July 19, 2023 11:54
@Tishj Tishj marked this pull request as draft July 19, 2023 13:21
@Tishj Tishj marked this pull request as ready for review July 19, 2023 17:02
@github-actions github-actions bot marked this pull request as draft July 20, 2023 07:20
Copy link
Contributor

@pdet pdet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Tishj thanks for the PR, it looks great! I just had some small comments.

You might also need to update the arrow ipc extension

src/main/chunk_scan_state.cpp Outdated Show resolved Hide resolved
src/main/chunk_scan_state.cpp Outdated Show resolved Hide resolved
@Tishj Tishj marked this pull request as ready for review July 24, 2023 09:09
@github-actions github-actions bot marked this pull request as draft July 24, 2023 12:02
@Tishj Tishj marked this pull request as ready for review July 24, 2023 12:02
@github-actions github-actions bot marked this pull request as draft July 26, 2023 09:35
@Tishj Tishj marked this pull request as ready for review July 27, 2023 10:05
@Tishj
Copy link
Contributor Author

Tishj commented Jul 28, 2023

Hi @Tishj thanks for the PR, it looks great! I just had some small comments.

You might also need to update the arrow ipc extension

I had a look at the duckdblabs/arrow extension repo and compiled with it, no errors and it doesn't seem to be affected by this PR. It only deals with the ArrowAppender and this abstracts it away from the implementation details

@Mytherin Mytherin merged commit 75aa9bb into duckdb:master Aug 1, 2023
53 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Aug 1, 2023

Thanks!

krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state

[Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning

Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz

Python TIMESTAMPTZ support

Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci

[CI] More CI reduction and clean-up

Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description

Add union to test_all_types, and arrow and json R/W
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state

[Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning

Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz

Python TIMESTAMPTZ support

Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci

[CI] More CI reduction and clean-up

Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description

Add union to test_all_types, and arrow and json R/W
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state

[Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning

Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz

Python TIMESTAMPTZ support

Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci

[CI] More CI reduction and clean-up

Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description

Add union to test_all_types, and arrow and json R/W
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
- Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state: [Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning

- Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz: Python TIMESTAMPTZ support

- Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci: [CI] More CI reduction and clean-up

- Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description: Add union to test_all_types, and arrow and json R/W

- Merge pull request duckdb/duckdb#8497 from samansmink/pending-execute-result-api-change: Add PendingExecutionResult::ALL_TASKS_BLOCKED
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Sep 5, 2023
- Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state: [Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning

- Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz: Python TIMESTAMPTZ support

- Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci: [CI] More CI reduction and clean-up

- Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description: Add union to test_all_types, and arrow and json R/W

- Merge pull request duckdb/duckdb#8497 from samansmink/pending-execute-result-api-change: Add PendingExecutionResult::ALL_TASKS_BLOCKED
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants