refactor(expr): allow sparse column id in chunk #8789

andylokandy · 2022-11-14T09:16:41Z

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Columns in a Chunk has an individual column id that is not necessarily continuous.

When converting between Chunk and DataBlock, we assume the ids are continuous.

vercel · 2022-11-14T09:16:45Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Updated
databend	⬜️ Ignored (Inspect)		Nov 14, 2022 at 9:37AM (UTC)

RinChanNOWWW · 2022-11-14T10:12:58Z

The memory consumption of HashMap may be larger than Vec when the chunk is almost full.

sundy-li · 2022-11-14T10:32:56Z

The index of the hashmap seems useless to represent a chunk.

Maybe we should have this map out of the chunk.

andylokandy · 2022-11-14T12:13:51Z

The idea behind this change is to use the global column id all the way through the process of planning, block pruning, and execution. It helps reduce the complexity of reusing the same column id in different phrases in which the column refernces used in the Expr are different(some of the column refs are eliminated by constant folder), especially when considering to give every chunk a unique Expr with unique column ref set in the future.

RinChanNOWWW · 2022-11-14T12:32:30Z

The idea behind this change is to use the global column id all the way through the process of planning, block pruning, and execution.

The columns order is the same as the fields order in the schema. When we wanted to operate one column in the chunk, we would write such codes:

let index = schema.index_of(&col_name)?; // which is to find an index  in  Vec<DataField>
let col = chunk.columns()[[index];

How do we achieve this after refactoring Chunk?

b41sh · 2022-11-14T13:20:46Z

The idea behind this change is to use the global column id all the way through the process of planning, block pruning, and execution. It helps reduce the complexity of reusing the same column id in different phrases in which the column refernces used in the Expr are different(some of the column refs are eliminated by constant folder), especially when considering to give every chunk a unique Expr with unique column ref set in the future.

Can we put ColumnId into Schema? When using it, the index of Column in Chunk can be obtained from the Schema according to the ColumnId.
This makes the structure of Chunk simpler. At the same time, Schema and Chunk also need to be used together, and will not bring too many additional operations.

sundy-li · 2022-11-14T13:34:53Z

The idea behind this change is to use the global column id all the way through the process of planning, block pruning, and execution. It helps reduce the complexity of reusing the same column id in different phrases in which the column refernces used in the Expr are different(some of the column refs are eliminated by constant folder), especially when considering to give every chunk a unique Expr with unique column ref set in the future.

Chunk shall be enabled to create without any information with Expr or binder on the storage side. How to convert a chunk from arrow::Chunk ?

andylokandy · 2022-11-14T13:39:27Z

This makes the structure of Chunk simpler. At the same time, Schema and Chunk also need to be used together, and will not bring too many additional operations.

In order to release the full power of the adaptive constant folder, each chunk should be able to have different numbers of columns even if they are for the same query. Therefore, every Chunk will have its own Schema to retrieve its unique id mapping.

refactor(expr): allow sparse column id in chunk

32f23f3

andylokandy requested a review from sundy-li November 14, 2022 09:17

mergify bot added the pr-refactor this PR changes the code base without new features or bugfix label Nov 14, 2022

andylokandy requested a review from leiysky November 14, 2022 09:17

leiysky approved these changes Nov 14, 2022

View reviewed changes

sundy-li requested a review from RinChanNOWWW November 14, 2022 09:36

sundy-li approved these changes Nov 14, 2022

View reviewed changes

Merge branch 'main' into hm

4c12fa5

BohuTANG merged commit 2fb477c into databendlabs:main Nov 14, 2022

sundy-li mentioned this pull request Nov 14, 2022

Revert "refactor(expr): allow sparse column id in chunk" #8794

Merged

andylokandy mentioned this pull request Nov 16, 2022

refactor(function): allow sparse column id for constant folder #8821

Merged

andylokandy mentioned this pull request Nov 29, 2022

refactor(expr): allow sparse column id in chunk #9008

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(expr): allow sparse column id in chunk #8789

refactor(expr): allow sparse column id in chunk #8789

andylokandy commented Nov 14, 2022 •

edited

Loading

vercel bot commented Nov 14, 2022 •

edited

Loading

RinChanNOWWW commented Nov 14, 2022

sundy-li commented Nov 14, 2022

andylokandy commented Nov 14, 2022 •

edited

Loading

RinChanNOWWW commented Nov 14, 2022

b41sh commented Nov 14, 2022

sundy-li commented Nov 14, 2022

andylokandy commented Nov 14, 2022

refactor(expr): allow sparse column id in chunk #8789

refactor(expr): allow sparse column id in chunk #8789

Conversation

andylokandy commented Nov 14, 2022 • edited Loading

Summary

vercel bot commented Nov 14, 2022 • edited Loading

RinChanNOWWW commented Nov 14, 2022

sundy-li commented Nov 14, 2022

andylokandy commented Nov 14, 2022 • edited Loading

RinChanNOWWW commented Nov 14, 2022

b41sh commented Nov 14, 2022

sundy-li commented Nov 14, 2022

andylokandy commented Nov 14, 2022

andylokandy commented Nov 14, 2022 •

edited

Loading

vercel bot commented Nov 14, 2022 •

edited

Loading

andylokandy commented Nov 14, 2022 •

edited

Loading