Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: avoid pulling unnecessary columns when querying append mode table #1307

Merged
merged 14 commits into from
Dec 22, 2023

Conversation

Rachelint
Copy link
Contributor

@Rachelint Rachelint commented Nov 15, 2023

Rationale

Closes #1302

The pulling arrow record batches are ensured to include primary key columns, however the pulled primary key columns are unused for append mode tables' queries.
I refactor the whole record batches pulling path in this pr for readability and enhancement for avoiding pulling primary key columns even they are unused.

Detailed Changes

  • Refactor RowProjector to RecordFetchingContext holding just the needed information, and pass it to ScanRequest & SstReadOptions rather than the too heavy ProjectedSchema.
  • Refactor RecordBatchWithKey to FetchingRecordBatch which holds the primary indexes on demand.

Test Plan

Test by exist and new added tests.

@Rachelint Rachelint force-pushed the avoid-pulling-unnecessary-columns branch 3 times, most recently from 48e29cf to 78e4e25 Compare November 16, 2023 03:42
@Rachelint Rachelint marked this pull request as ready for review November 17, 2023 09:58
@Rachelint Rachelint force-pushed the avoid-pulling-unnecessary-columns branch from 5cb0d04 to eba48ba Compare November 19, 2023 14:47
@Rachelint Rachelint force-pushed the avoid-pulling-unnecessary-columns branch from 1e43ff9 to 3c0c4c0 Compare November 29, 2023 09:00
@Rachelint Rachelint force-pushed the avoid-pulling-unnecessary-columns branch from 3c0c4c0 to 45a8788 Compare November 29, 2023 09:13
@jiacai2050 jiacai2050 changed the base branch from main to dev December 4, 2023 03:58
common_types/src/projected_schema.rs Outdated Show resolved Hide resolved
common_types/src/projected_schema.rs Show resolved Hide resolved
common_types/src/record_batch.rs Show resolved Hide resolved
analytic_engine/src/instance/read.rs Outdated Show resolved Hide resolved
common_types/src/projected_schema.rs Outdated Show resolved Hide resolved
common_types/src/projected_schema.rs Outdated Show resolved Hide resolved
@Rachelint Rachelint force-pushed the avoid-pulling-unnecessary-columns branch from 479c0e6 to f70da9b Compare December 22, 2023 08:41
Copy link
Member

@tanruixiang tanruixiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. merge after ci pass

@tanruixiang tanruixiang merged commit 4abc764 into apache:dev Dec 22, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wasteful to pull all primary key columns in query of append mode
4 participants