-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(expr): allow sparse column id in chunk #8789
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
The memory consumption of |
The index of the hashmap seems useless to represent a chunk. Maybe we should have this map out of the chunk. |
The idea behind this change is to use the global column id all the way through the process of planning, block pruning, and execution. It helps reduce the complexity of reusing the same column id in different phrases in which the column refernces used in the Expr are different(some of the column refs are eliminated by constant folder), especially when considering to give every chunk a unique Expr with unique column ref set in the future. |
The columns order is the same as the fields order in the schema. When we wanted to operate one column in the chunk, we would write such codes: let index = schema.index_of(&col_name)?; // which is to find an index in Vec<DataField>
let col = chunk.columns()[[index]; How do we achieve this after refactoring |
Can we put |
|
In order to release the full power of the adaptive constant folder, each chunk should be able to have different numbers of columns even if they are for the same query. Therefore, every Chunk will have its own Schema to retrieve its unique id mapping. |
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
Columns in a
Chunk
has an individual column id that is not necessarily continuous.When converting between
Chunk
andDataBlock
, we assume the ids are continuous.