Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support projection #192

Merged
merged 16 commits into from
Aug 25, 2022
Merged

feat: Support projection #192

merged 16 commits into from
Aug 25, 2022

Conversation

evenyag
Copy link
Contributor

@evenyag evenyag commented Aug 19, 2022

Changes

  • Add ProjectedSchema to represent a Schema with projection
    • The ProjectedSchema maintains two schema, one for reading data from memtables and SSTs, which would reads all row key columns and internal columns, and one for the schema that user expects to see after projection
  • Memtable and SST could accept the ProjectedSchema as read options to push down projection and adjust output Batch schema to the projected schema
  • Change Batch internal representation, now it holds all columns in a single Vec
  • Mito table now would pass the projection to ScanRequest

Related Issues

Unresolved Problems

  • The SstSchema is not only used by SST module now, it also represent the schema with internal columns, so InternalSchema or StoreSchema may be more proper. I would rename it in another PR

The btree memtable use `is_needed()` to filter unneeded value columns,
then use `ProjectedSchema::batch_from_parts()` to construct
batch, so it don't need to known the layout of internal columns.
Also returns error if the `projected_columns` used to build the
`ProjectedSchema` is empty.
This fix the issue that the metadata refer to the wrong timestamp column
if datafusion reorder the fields of the arrow schema.
@killme2008 killme2008 marked this pull request as ready for review August 24, 2022 02:48
@codecov
Copy link

codecov bot commented Aug 24, 2022

Codecov Report

Merging #192 (effa23b) into develop (4a11715) will increase coverage by 0.46%.
The diff coverage is 95.60%.

@@             Coverage Diff             @@
##           develop     #192      +/-   ##
===========================================
+ Coverage    75.45%   75.91%   +0.46%     
===========================================
  Files          237      245       +8     
  Lines        18670    19650     +980     
===========================================
+ Hits         14088    14918     +830     
- Misses        4582     4732     +150     
Flag Coverage Δ
rust 75.91% <95.60%> (+0.46%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/datatypes/src/error.rs 50.00% <0.00%> (+2.77%) ⬆️
src/storage/src/memtable/version.rs 98.66% <ø> (ø)
src/storage/src/region/tests.rs 98.05% <ø> (ø)
src/storage/src/error.rs 31.57% <50.00%> (-0.28%) ⬇️
src/storage/src/metadata.rs 95.74% <66.66%> (-0.72%) ⬇️
src/storage/src/snapshot.rs 75.75% <75.00%> (+1.47%) ⬆️
src/storage/src/sst.rs 90.52% <83.33%> (-0.69%) ⬇️
src/storage/src/read.rs 89.87% <85.71%> (-0.90%) ⬇️
src/storage/src/chunk.rs 96.59% <94.44%> (-0.75%) ⬇️
src/storage/src/schema.rs 93.52% <94.77%> (+1.00%) ⬆️
... and 35 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

src/storage/src/region/tests/projection.rs Outdated Show resolved Hide resolved
src/storage/src/sst.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@v0y4g3r v0y4g3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@v0y4g3r v0y4g3r merged commit 53637c9 into develop Aug 25, 2022
@v0y4g3r v0y4g3r deleted the feat/storage-projection branch August 25, 2022 07:27
clickme-zxy pushed a commit that referenced this pull request Aug 26, 2022
* feat: Add projected schema

* feat: Use projected schema to read sst

* feat: Use vector of column to implement Batch

* feat: Use projected schema to convert batch to chunk

* feat: Add no_projection() to build ProjectedSchema

* feat: Memtable supports projection

The btree memtable use `is_needed()` to filter unneeded value columns,
then use `ProjectedSchema::batch_from_parts()` to construct
batch, so it don't need to known the layout of internal columns.

* test: Add tests for ProjectedSchema

* test: Add tests for ProjectedSchema

Also returns error if the `projected_columns` used to build the
`ProjectedSchema` is empty.

* test: Add test for memtable projection

* feat: Table pass projection to storage engine

* fix: Use timestamp column name as schema metadata

This fix the issue that the metadata refer to the wrong timestamp column
if datafusion reorder the fields of the arrow schema.

* fix: Fix projected schema not passed to memtable

* feat: Add tests for region projection

* chore: fix clippy

* test: Add test for unordered projection

* chore: Move projected_schema to ReadOptions

Also fix some typo
@evenyag evenyag mentioned this pull request Sep 1, 2022
paomian pushed a commit to paomian/greptimedb that referenced this pull request Oct 19, 2023
* feat: Add projected schema

* feat: Use projected schema to read sst

* feat: Use vector of column to implement Batch

* feat: Use projected schema to convert batch to chunk

* feat: Add no_projection() to build ProjectedSchema

* feat: Memtable supports projection

The btree memtable use `is_needed()` to filter unneeded value columns,
then use `ProjectedSchema::batch_from_parts()` to construct
batch, so it don't need to known the layout of internal columns.

* test: Add tests for ProjectedSchema

* test: Add tests for ProjectedSchema

Also returns error if the `projected_columns` used to build the
`ProjectedSchema` is empty.

* test: Add test for memtable projection

* feat: Table pass projection to storage engine

* fix: Use timestamp column name as schema metadata

This fix the issue that the metadata refer to the wrong timestamp column
if datafusion reorder the fields of the arrow schema.

* fix: Fix projected schema not passed to memtable

* feat: Add tests for region projection

* chore: fix clippy

* test: Add test for unordered projection

* chore: Move projected_schema to ReadOptions

Also fix some typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants