Skip to content

DictionaryEncodedColumnPartSerde: Read values using smooshReader.#17688

Closed
gianm wants to merge 1 commit intoapache:masterfrom
gianm:fix-large-value-sections
Closed

DictionaryEncodedColumnPartSerde: Read values using smooshReader.#17688
gianm wants to merge 1 commit intoapache:masterfrom
gianm:fix-large-value-sections

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Jan 30, 2025

The write side uses a FileSmoosher for valueWriter, which means that the value component may be split over multiple files. In this case, the read side needs a SmooshedFileMapper.

It is plausible for multi-valued columns to have value sections in excess of 2 GB: imagine a segment with 10 million rows and an average of 50 values per row. The value section would have 500 million ints, which is 2 GB.

I believe this is the cause of #7943.

The write side uses a FileSmoosher for valueWriter, which means that the
value component may be split over multiple files. In this case, the
read side needs a SmooshedFileMapper.

It is plausible for multi-valued columns to have value sections in excess
of 2 GB: imagine a segment with 10 million rows and an average of 50 values
per row. The value section would have 500 million ints, which is 2 GB.
@gianm
Copy link
Contributor Author

gianm commented Jan 31, 2025

Marking draft since the patch should have a test.

@gianm gianm marked this pull request as draft January 31, 2025 00:00
@gianm
Copy link
Contributor Author

gianm commented Jan 31, 2025

Btw, I think the same basic problem exists in BlockLayoutColumnarDoublesSupplier. The corresponding writer can write multi-file, but the reader can only read single-file. Probably exists other places too. We should scan for all of them and come up with a way to test them all.

@gianm
Copy link
Contributor Author

gianm commented Jan 31, 2025

Closing in favor of #17691.

@gianm gianm closed this Jan 31, 2025
@gianm gianm deleted the fix-large-value-sections branch January 31, 2025 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant