Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further parallelize index creation #5812

Merged
merged 18 commits into from
Jan 12, 2023
Merged

Conversation

taniabogatsch
Copy link
Contributor

Before this PR, we sorted the index key columns and executed key column expressions in the PhysicalCreateIndex operator. However, this limited the sorting performance and decreased the index creation's performance in general.

This PR moves the sorting and the expression execution out of the PhyscialCreateIndex operator and adds them as extra operators to the CREATE INDEX pipeline. There are also some other minor code improvements.

@taniabogatsch taniabogatsch requested review from pdet, Mytherin and Tishj and removed request for pdet January 1, 2023 22:47
Copy link
Contributor

@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some nitpicky observations, but it looks good :)

One more thought I had, but I'm not sure of the added benefit over the existing solution:
Maybe we want to create a NodeChildIterator instead of the NextPosAndByte functions?
That would encapsulate the position, so you don't have to keep track of this position yourself and keep giving it back to the function.

@taniabogatsch
Copy link
Contributor Author

One more thought I had, but I'm not sure of the added benefit over the existing solution:
Maybe we want to create a NodeChildIterator instead of the NextPosAndByte functions?
That would encapsulate the position, so you don't have to keep track of this position yourself and keep giving it back to the function.

I have a list of possible code refactoring ideas/improvements for the ART. I will prolly put them in an issue these days, and I think that this should go there, too. Instead of adding it to this PR. This iterator would also make sense for similar functions iterating node children.

taniabogatsch added a commit to taniabogatsch/duckdb that referenced this pull request Jan 3, 2023
@taniabogatsch taniabogatsch requested review from Tishj and removed request for Mytherin January 4, 2023 09:53
Copy link
Contributor

@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me :)

@taniabogatsch taniabogatsch mentioned this pull request Jan 9, 2023
16 tasks
@Mytherin Mytherin merged commit e4c51a8 into duckdb:master Jan 12, 2023
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just one minor comment - perhaps you can fix it in a future PR:

@taniabogatsch taniabogatsch deleted the ART-parallel branch January 12, 2023 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants