Skip to content

Parallel order preserving CREATE TABLE AS and INSERT INTO #5082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 25, 2022

Conversation

Mytherin
Copy link
Collaborator

This PR adds support for parallel processing of CREATE TABLE AS and INSERT INTO queries in an order preserving manner. That is to say, the order of the tuples in the table is guaranteed to be the same as the insertion order.

This method is used when order preservation is enabled (which defaults to true), and can be used as long as the source supports batch indexes (see #3700).

The PhysicalBatchInsert works by materializing row groups as a RowGroupCollection per batch index, and merging them together in the correct order after the insertion is completed. If the row group collections are too small (which might happen if we have selective predicates) they will be merged together with row group collections of adjacent batch indexes. If the row group collections get large (> one row group) the row groups will be optimistically flushed to disk.

There are two remaining limitations:

  • Parallelism is disabled when the table has indexes
  • Parallelism is disabled when there is a RETURNING clause present

Both of those should be resolvable in a subsequent PR.

@lnkuiper perhaps you want to add batch index support to the PhysicalOrder operator, to allow for parallel materialization of ordered tables as well?

@Alex-Monahan
Copy link
Contributor

I checked #3700, and it didn't mention Arrow tables. Since that is a common data import method, could batch indexes be added there also? This is AWESOME stuff!

@Mytherin Mytherin merged commit 09cf8f0 into duckdb:master Oct 25, 2022
@Mytherin Mytherin deleted the parallelorderpreserving branch January 7, 2023 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants