Further parallelize index creation #5812

taniabogatsch · 2023-01-01T22:47:03Z

Before this PR, we sorted the index key columns and executed key column expressions in the PhysicalCreateIndex operator. However, this limited the sorting performance and decreased the index creation's performance in general.

This PR moves the sorting and the expression execution out of the PhyscialCreateIndex operator and adds them as extra operators to the CREATE INDEX pipeline. There are also some other minor code improvements.

src/execution/index/art/node256.cpp

src/execution/physical_plan/plan_create_index.cpp

Tishj

I made some nitpicky observations, but it looks good :)

One more thought I had, but I'm not sure of the added benefit over the existing solution:
Maybe we want to create a NodeChildIterator instead of the NextPosAndByte functions?
That would encapsulate the position, so you don't have to keep track of this position yourself and keep giving it back to the function.

taniabogatsch · 2023-01-03T15:07:54Z

One more thought I had, but I'm not sure of the added benefit over the existing solution:
Maybe we want to create a NodeChildIterator instead of the NextPosAndByte functions?
That would encapsulate the position, so you don't have to keep track of this position yourself and keep giving it back to the function.

I have a list of possible code refactoring ideas/improvements for the ART. I will prolly put them in an issue these days, and I think that this should go there, too. Instead of adding it to this PR. This iterator would also make sense for similar functions iterating node children.

Tishj

Looks good to me :)

test/fuzzer/pedro/index_current_timestamp.test

Mytherin

Thanks! Just one minor comment - perhaps you can fix it in a future PR:

src/execution/physical_plan/plan_create_index.cpp

taniabogatsch added 13 commits November 17, 2022 18:40

parallel sort

a45d6c6

Merge branch 'feature' into ART-sort-operator

a5953c4

put expression execution in separate operator

933950a

Merge branch 'feature' into ART-sort

2ca7001

moving sort into operator

1d9e6a5

changed merge strategy of nodes

2809490

small improvements

83e2806

Merge branch 'master' into ART-parallel

b4631c0

fix expression execution

0b09ab7

fixed expression bug and adjusted fuzzer tests

ac44779

tidy fixes

ee3a131

don't allow specific scalar functions as index keys

08aeae5

fixed casting to incorrect type

285ee3b

taniabogatsch requested review from pdet, Mytherin and Tishj and removed request for pdet January 1, 2023 22:47

Tishj reviewed Jan 3, 2023

View reviewed changes

src/execution/index/art/node256.cpp Show resolved Hide resolved

Tishj reviewed Jan 3, 2023

View reviewed changes

src/execution/physical_plan/plan_create_index.cpp Outdated Show resolved Hide resolved

Tishj reviewed Jan 3, 2023

View reviewed changes

using HAS_SIDE_EFFECTS for non-deterministic functions

24aa4aa

Merge branch 'master' into ART-parallel

56d82ce

taniabogatsch added a commit to taniabogatsch/duckdb that referenced this pull request Jan 3, 2023

node refactoring as suggested in duckdb#5812

8022303

taniabogatsch requested review from Tishj and removed request for Mytherin January 4, 2023 09:53

Tishj approved these changes Jan 9, 2023

View reviewed changes

taniabogatsch mentioned this pull request Jan 9, 2023

Index improvements #5865

Closed

16 tasks

Tmonster reviewed Jan 11, 2023

View reviewed changes

test/fuzzer/pedro/index_current_timestamp.test Show resolved Hide resolved

added explicit error messages to test cases

459cf5c

taniabogatsch added 2 commits January 11, 2023 10:33

remove accidentally added tmp test file ^^

4aa48b9

Merge branch 'master' into ART-parallel

42a6890

Mytherin merged commit e4c51a8 into duckdb:master Jan 12, 2023

Mytherin reviewed Jan 12, 2023

View reviewed changes

src/execution/physical_plan/plan_create_index.cpp Show resolved Hide resolved

taniabogatsch deleted the ART-parallel branch January 12, 2023 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further parallelize index creation #5812

Further parallelize index creation #5812

taniabogatsch commented Jan 1, 2023

Tishj left a comment

taniabogatsch commented Jan 3, 2023

Tishj left a comment

Mytherin left a comment

Further parallelize index creation #5812

Further parallelize index creation #5812

Conversation

taniabogatsch commented Jan 1, 2023

Tishj left a comment

Choose a reason for hiding this comment

taniabogatsch commented Jan 3, 2023

Tishj left a comment

Choose a reason for hiding this comment

Mytherin left a comment

Choose a reason for hiding this comment