Map restructure #5768

LindsayWray · 2022-12-22T15:34:05Z

Map Restructure

Currently MAP is internally stored as a struct containing two lists -> STRUCT{ [keys], [values] }
This restructure stores MAP as a list of structs -> LIST[ {key, val}, {key, val}, …] and allows map to be treated internally a list and therefore re-uses a lot of the list functionalities.

This is now in line with the Arrow structure:

Map data is nested data where each value is a variable number of key-item pairs. Its physical representation is the same as a list of {key, item} structs

The memory usage has diminished since now only one list_entry_t struct per map is needed instead of two.

Previous state (Two list_entry_t structs were required for the keys and values separately):

This PR (One list_entry_t struct for the key/value structs):

This alteration also impacted other map functions, namely map_from_entries. Since the input is already parsed as a list of key/value pair structs,  this memory could be referenced as a whole instead of moving each value into a new block of memory. This does however change the former functionality where NULL elements are (silently) skipped instead of throwing an error when performing this query on multiple rows. See #3600, I’ve also discussed this with @Tishj. To accommodate this functionality, it would not be possible to re-use the memory, since it impacts the lengths and offsets. Or perhaps use a selection vector

So in this PR I’ve opted for referencing the allocated memory and throwing an error when encountering a NULL value (keys can not be NULL)

Tishj · 2022-12-22T15:38:42Z

Sounds like a good change!

Regarding:

So in this PR I’ve opted for moving the memory and throwing an error when encountering a NULL value (keys can not be NULL)

Do you then also verify that keys don't contain duplicates?

LindsayWray · 2022-12-22T15:43:28Z

Sounds like a good change!

Regarding:

So in this PR I’ve opted for moving the memory and throwing an error when encountering a NULL value (keys can not be NULL)

Do you then also verify that keys don't contain duplicates?

Yes! It goes through the MapConversionVerify function, that makes all those checks and returns the aproppiate error

Mytherin

Thanks for the PR! Great rework. Some minor comments, otherwise this looks good to me:

src/include/duckdb/common/types/vector.hpp

src/storage/statistics/base_statistics.cpp

Lindsay Wray and others added 11 commits November 21, 2022 17:14

benchmark tests

74d3062

Setup map restructure

a18a297

Merge remote-tracking branch 'upstream/feature' into map_restructure

e4c75a4

Merge branch 'master' into map_restructure

a5c1c99

fixing several components that broke due to map restructure

3890926

Merge branch 'feature' into map_restructure_2

765529b

refactor map changes throughout codebase

4c095f1

merge

0244045

format fix

5102c05

extra test

de7314c

rm comments

7ce2d7c

Mytherin reviewed Dec 23, 2022

View reviewed changes

src/include/duckdb/common/types/vector.hpp Outdated Show resolved Hide resolved

src/storage/statistics/base_statistics.cpp Outdated Show resolved Hide resolved

Lindsay Wray and others added 8 commits December 23, 2022 13:42

rm PhysicalType::MAP

93e1df2

merge confl

b72bd52

hopefully fixes CI errors

673f6a2

hopefully fixes CI errors

4098df8

more ci error fixes

6fa81a0

Fix python CI tests

99a1432

Merge branch 'master' into map_restructure_physical_type

c973d63

merge fix

a438c46

hannes merged commit db2bb06 into duckdb:master Jan 2, 2023

begelundmuller mentioned this pull request Feb 15, 2023

Update DuckDB to 0.7.0 marcboeker/go-duckdb#72

Merged

mmzeeman mentioned this pull request Feb 26, 2023

Wip upgrade duckdb to v0.7.0 mmzeeman/educkdb#24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map restructure #5768

Map restructure #5768

LindsayWray commented Dec 22, 2022 •

edited

Loading

Tishj commented Dec 22, 2022

LindsayWray commented Dec 22, 2022

Mytherin left a comment •

edited

Loading

Map restructure #5768

Map restructure #5768

Conversation

LindsayWray commented Dec 22, 2022 • edited Loading

Map Restructure

Tishj commented Dec 22, 2022

LindsayWray commented Dec 22, 2022

Mytherin left a comment • edited Loading

Choose a reason for hiding this comment

LindsayWray commented Dec 22, 2022 •

edited

Loading

Mytherin left a comment •

edited

Loading