feat: Support projection #192

evenyag · 2022-08-19T09:57:53Z

Changes

Add ProjectedSchema to represent a Schema with projection
- The ProjectedSchema maintains two schema, one for reading data from memtables and SSTs, which would reads all row key columns and internal columns, and one for the schema that user expects to see after projection
Memtable and SST could accept the ProjectedSchema as read options to push down projection and adjust output Batch schema to the projected schema
Change Batch internal representation, now it holds all columns in a single Vec
Mito table now would pass the projection to ScanRequest

Related Issues

feat: Projection pushdown #177

Unresolved Problems

The SstSchema is not only used by SST module now, it also represent the schema with internal columns, so InternalSchema or StoreSchema may be more proper. I would rename it in another PR

The btree memtable use `is_needed()` to filter unneeded value columns, then use `ProjectedSchema::batch_from_parts()` to construct batch, so it don't need to known the layout of internal columns.

Also returns error if the `projected_columns` used to build the `ProjectedSchema` is empty.

This fix the issue that the metadata refer to the wrong timestamp column if datafusion reorder the fields of the arrow schema.

codecov · 2022-08-24T03:09:24Z

Codecov Report

Merging #192 (effa23b) into develop (4a11715) will increase coverage by 0.46%.
The diff coverage is 95.60%.

@@             Coverage Diff             @@
##           develop     #192      +/-   ##
===========================================
+ Coverage    75.45%   75.91%   +0.46%     
===========================================
  Files          237      245       +8     
  Lines        18670    19650     +980     
===========================================
+ Hits         14088    14918     +830     
- Misses        4582     4732     +150

Flag	Coverage Δ
rust	`75.91% <95.60%> (+0.46%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/datatypes/src/error.rs	`50.00% <0.00%> (+2.77%)`	⬆️
src/storage/src/memtable/version.rs	`98.66% <ø> (ø)`
src/storage/src/region/tests.rs	`98.05% <ø> (ø)`
src/storage/src/error.rs	`31.57% <50.00%> (-0.28%)`	⬇️
src/storage/src/metadata.rs	`95.74% <66.66%> (-0.72%)`	⬇️
src/storage/src/snapshot.rs	`75.75% <75.00%> (+1.47%)`	⬆️
src/storage/src/sst.rs	`90.52% <83.33%> (-0.69%)`	⬇️
src/storage/src/read.rs	`89.87% <85.71%> (-0.90%)`	⬇️
src/storage/src/chunk.rs	`96.59% <94.44%> (-0.75%)`	⬇️
src/storage/src/schema.rs	`93.52% <94.77%> (+1.00%)`	⬆️
... and 35 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

src/storage/src/region/tests/projection.rs

src/storage/src/sst.rs

killme2008

LGTM

Also fix some typo

v0y4g3r

LGTM

* feat: Add projected schema * feat: Use projected schema to read sst * feat: Use vector of column to implement Batch * feat: Use projected schema to convert batch to chunk * feat: Add no_projection() to build ProjectedSchema * feat: Memtable supports projection The btree memtable use `is_needed()` to filter unneeded value columns, then use `ProjectedSchema::batch_from_parts()` to construct batch, so it don't need to known the layout of internal columns. * test: Add tests for ProjectedSchema * test: Add tests for ProjectedSchema Also returns error if the `projected_columns` used to build the `ProjectedSchema` is empty. * test: Add test for memtable projection * feat: Table pass projection to storage engine * fix: Use timestamp column name as schema metadata This fix the issue that the metadata refer to the wrong timestamp column if datafusion reorder the fields of the arrow schema. * fix: Fix projected schema not passed to memtable * feat: Add tests for region projection * chore: fix clippy * test: Add test for unordered projection * chore: Move projected_schema to ReadOptions Also fix some typo

evenyag requested review from killme2008 and v0y4g3r August 23, 2022 03:11

evenyag force-pushed the feat/storage-projection branch from 0f3ec71 to 9a48944 Compare August 23, 2022 06:29

evenyag added 12 commits August 23, 2022 17:47

feat: Add projected schema

939095d

feat: Use projected schema to read sst

7223629

feat: Use vector of column to implement Batch

9008346

feat: Use projected schema to convert batch to chunk

285123c

feat: Add no_projection() to build ProjectedSchema

b82f4ef

feat: Memtable supports projection

118e9ab

The btree memtable use `is_needed()` to filter unneeded value columns, then use `ProjectedSchema::batch_from_parts()` to construct batch, so it don't need to known the layout of internal columns.

test: Add tests for ProjectedSchema

b92bbee

test: Add tests for ProjectedSchema

33f822b

Also returns error if the `projected_columns` used to build the `ProjectedSchema` is empty.

test: Add test for memtable projection

166e1e8

feat: Table pass projection to storage engine

967e675

fix: Use timestamp column name as schema metadata

a5b1457

This fix the issue that the metadata refer to the wrong timestamp column if datafusion reorder the fields of the arrow schema.

fix: Fix projected schema not passed to memtable

9957e2d

evenyag force-pushed the feat/storage-projection branch from 899cc9d to 9957e2d Compare August 23, 2022 11:31

killme2008 marked this pull request as ready for review August 24, 2022 02:48

evenyag added 3 commits August 24, 2022 12:03

feat: Add tests for region projection

18904a5

chore: fix clippy

fd5c1a9

test: Add test for unordered projection

60e841a

v0y4g3r reviewed Aug 25, 2022

View reviewed changes

src/storage/src/region/tests/projection.rs Outdated Show resolved Hide resolved

src/storage/src/sst.rs Outdated Show resolved Hide resolved

killme2008 approved these changes Aug 25, 2022

View reviewed changes

chore: Move projected_schema to ReadOptions

effa23b

Also fix some typo

v0y4g3r approved these changes Aug 25, 2022

View reviewed changes

v0y4g3r merged commit 53637c9 into develop Aug 25, 2022

v0y4g3r deleted the feat/storage-projection branch August 25, 2022 07:27

evenyag mentioned this pull request Sep 1, 2022

feat: Projection pushdown #177

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support projection #192

feat: Support projection #192

evenyag commented Aug 19, 2022 •

edited

codecov bot commented Aug 24, 2022 •

edited

killme2008 left a comment

v0y4g3r left a comment

feat: Support projection #192

feat: Support projection #192

Conversation

evenyag commented Aug 19, 2022 • edited

Changes

Related Issues

Unresolved Problems

codecov bot commented Aug 24, 2022 • edited

Codecov Report

killme2008 left a comment

Choose a reason for hiding this comment

v0y4g3r left a comment

Choose a reason for hiding this comment

evenyag commented Aug 19, 2022 •

edited

codecov bot commented Aug 24, 2022 •

edited