-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support projection #192
Conversation
0f3ec71
to
9a48944
Compare
The btree memtable use `is_needed()` to filter unneeded value columns, then use `ProjectedSchema::batch_from_parts()` to construct batch, so it don't need to known the layout of internal columns.
Also returns error if the `projected_columns` used to build the `ProjectedSchema` is empty.
This fix the issue that the metadata refer to the wrong timestamp column if datafusion reorder the fields of the arrow schema.
899cc9d
to
9957e2d
Compare
Codecov Report
@@ Coverage Diff @@
## develop #192 +/- ##
===========================================
+ Coverage 75.45% 75.91% +0.46%
===========================================
Files 237 245 +8
Lines 18670 19650 +980
===========================================
+ Hits 14088 14918 +830
- Misses 4582 4732 +150
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Also fix some typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* feat: Add projected schema * feat: Use projected schema to read sst * feat: Use vector of column to implement Batch * feat: Use projected schema to convert batch to chunk * feat: Add no_projection() to build ProjectedSchema * feat: Memtable supports projection The btree memtable use `is_needed()` to filter unneeded value columns, then use `ProjectedSchema::batch_from_parts()` to construct batch, so it don't need to known the layout of internal columns. * test: Add tests for ProjectedSchema * test: Add tests for ProjectedSchema Also returns error if the `projected_columns` used to build the `ProjectedSchema` is empty. * test: Add test for memtable projection * feat: Table pass projection to storage engine * fix: Use timestamp column name as schema metadata This fix the issue that the metadata refer to the wrong timestamp column if datafusion reorder the fields of the arrow schema. * fix: Fix projected schema not passed to memtable * feat: Add tests for region projection * chore: fix clippy * test: Add test for unordered projection * chore: Move projected_schema to ReadOptions Also fix some typo
* feat: Add projected schema * feat: Use projected schema to read sst * feat: Use vector of column to implement Batch * feat: Use projected schema to convert batch to chunk * feat: Add no_projection() to build ProjectedSchema * feat: Memtable supports projection The btree memtable use `is_needed()` to filter unneeded value columns, then use `ProjectedSchema::batch_from_parts()` to construct batch, so it don't need to known the layout of internal columns. * test: Add tests for ProjectedSchema * test: Add tests for ProjectedSchema Also returns error if the `projected_columns` used to build the `ProjectedSchema` is empty. * test: Add test for memtable projection * feat: Table pass projection to storage engine * fix: Use timestamp column name as schema metadata This fix the issue that the metadata refer to the wrong timestamp column if datafusion reorder the fields of the arrow schema. * fix: Fix projected schema not passed to memtable * feat: Add tests for region projection * chore: fix clippy * test: Add test for unordered projection * chore: Move projected_schema to ReadOptions Also fix some typo
Changes
ProjectedSchema
to represent a Schema with projectionProjectedSchema
maintains two schema, one for reading data from memtables and SSTs, which would reads all row key columns and internal columns, and one for the schema that user expects to see after projectionProjectedSchema
as read options to push down projection and adjust outputBatch
schema to the projected schemaBatch
internal representation, now it holds all columns in a singleVec
Related Issues
Unresolved Problems
SstSchema
is not only used by SST module now, it also represent the schema with internal columns, soInternalSchema
orStoreSchema
may be more proper. I would rename it in another PR