Implement lazy columns replication in JOIN and ARRAY JOIN#88752
Implement lazy columns replication in JOIN and ARRAY JOIN#88752Avogar merged 44 commits intoClickHouse:masterfrom
Conversation
|
Workflow [PR], commit [57d94ce] Summary: ❌
|
| namespace DB | ||
| { | ||
|
|
||
| /// Wrapper around ColumnVector to store indexes. |
There was a problem hiding this comment.
I think this part was ejected from LowCardinality with no change, so I can skip reviewing it.
There was a problem hiding this comment.
Howewer, I added a few new simple methods that I needed in ColumnReplicated
There was a problem hiding this comment.
I found a bug in ColumnVariant::index implementation during testing. I will also create a separate bug-fix PR with this change to backport it
| ColumnReplicated::ColumnReplicated(MutableColumnPtr && nested_column_) | ||
| : nested_column(std::move(nested_column_)) | ||
| { | ||
| indexes.insertIndexesRange(0, nested_column->size()); |
There was a problem hiding this comment.
Can we deprecate this ctor? If ColumnReplicated can replace any column, then we can always use the initial one without indexes.
There was a problem hiding this comment.
Also, maybe we can check that no sparse/LC is possible inside. (not sure, but at least it does not make a lot of sense to me).
There was a problem hiding this comment.
Can we deprecate this ctor? If ColumnReplicated can replace any column, then we can always use the initial one without indexes.
We cannot use initial one if we will need to insert into it from ColumnReplicated. It's needed in MergingSortedTransform. Otherwise we will need to convert ColumnReplicated to full there and loose it's benefits.
There was a problem hiding this comment.
Also, maybe we can check that no sparse/LC is possible inside. (not sure, but at least it does not make a lot of sense to me).
Sparse inside Replicated 100% makes sense, Sparse column can contain big values inside that we want to avoid replicating.
LC inside Replicated doesn't make sense, let's avoid it
There was a problem hiding this comment.
Sparse inside Replicated 100% makes sense
Sparse is kind of included in replicated. At least you can reuse the same internal column when converting sparse->replicated, and only rebuild indexes
There was a problem hiding this comment.
If it's ok, I will add proper Sparse -> Replicated conversion later, maybe even in a separate PR.
…ion, add a test for an old client requesting replicated columns
…olumn-replication
8cb04b3
Cherry pick #88752 to 25.10: Implement lazy columns replication in JOIN and ARRAY JOIN
Backport #88752 to 25.10: Implement lazy columns replication in JOIN and ARRAY JOIN
Tuple itself cannot be Sparse, but, some previous ClickHouse version may write this into the serialization.json, and such table will not be able to loaded. Follow-up for: ClickHouse#88752
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Implement lazy columns replication in JOIN and ARRAY JOIN. Avoid converting special columns representation like Sparse and Replicated to full columns in some output formats. This avoids unnecessary data copy in memory.
Closes #82669.
To use lazy replication enable settings
enable_lazy_columns_replicationandallow_special_serialization_kinds_in_output_formats(they are disabled by default for now).Documentation entry for user-facing changes