Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement dedup and filter for vectors #245

Merged
merged 21 commits into from Sep 19, 2022
Merged

Conversation

evenyag
Copy link
Contributor

@evenyag evenyag commented Sep 9, 2022

Changes

This PR adds some necessary compute operations for vectors in order to implement the DedupReader.

  • Adds a VectorOp super trait to provide computation support for Vector
  • Implements dedup and filter
  • Move replicate method to VectorOp
  • Implements ScalarVector for ListVector, so all except NullVector could share the same dedup implementation based on the ScalarVector trait.
  • Since ListVector now implements ScalarVector, it could use replicate_scalar() to implement replicate()
  • A small refactor that let ListValueRef::Ref be a struct variant instead of tuple variant, for consistence between Ref and Indexed variant
  • Refactor the PrimitiveElement trait and move the Scalar and ScalarRef trait bound to the PrimitiveElement trait, so PrimitiveVector<T: PrimitiveElement> implies the ScalarVector trait bound

@evenyag evenyag force-pushed the feat/dedup-vector branch 2 times, most recently from 59901e1 to 8502409 Compare September 13, 2022 11:43
@evenyag evenyag changed the title feat: Implement dedup operation for vectors feat: Implement dedup and filter for vectors Sep 14, 2022
@evenyag evenyag marked this pull request as ready for review September 15, 2022 03:19
@evenyag evenyag force-pushed the feat/dedup-vector branch 2 times, most recently from 35d07f2 to ab3fc5d Compare September 15, 2022 03:41
@codecov
Copy link

codecov bot commented Sep 15, 2022

Codecov Report

Merging #245 (43dc7c7) into develop (a649f34) will increase coverage by 0.53%.
The diff coverage is 95.00%.

@@             Coverage Diff             @@
##           develop     #245      +/-   ##
===========================================
+ Coverage    83.33%   83.87%   +0.53%     
===========================================
  Files          295      297       +2     
  Lines        25338    25737     +399     
===========================================
+ Hits         21116    21587     +471     
+ Misses        4222     4150      -72     
Flag Coverage Δ
rust 83.87% <95.00%> (+0.53%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/time/src/lib.rs 100.00% <ø> (ø)
src/datatypes/src/error.rs 45.94% <0.00%> (-2.63%) ⬇️
src/datatypes/src/types/primitive_type.rs 93.33% <ø> (ø)
src/datatypes/src/vectors.rs 94.73% <ø> (+5.26%) ⬆️
src/datatypes/src/vectors/binary.rs 90.10% <0.00%> (ø)
src/datatypes/src/vectors/boolean.rs 88.88% <50.00%> (+0.17%) ⬆️
src/datatypes/src/vectors/eq.rs 95.14% <77.77%> (-0.94%) ⬇️
src/datatypes/src/scalars.rs 78.73% <90.00%> (+8.50%) ⬆️
src/datatypes/src/vectors/list.rs 91.62% <91.66%> (+0.71%) ⬆️
src/datatypes/src/value.rs 95.40% <92.30%> (+0.29%) ⬆️
... and 22 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

src/datatypes/src/value.rs Show resolved Hide resolved
src/datatypes/src/vectors/constant.rs Show resolved Hide resolved
src/datatypes/src/vectors/list.rs Outdated Show resolved Hide resolved
src/datatypes/src/vectors/list.rs Outdated Show resolved Hide resolved
src/datatypes/src/vectors/operations.rs Outdated Show resolved Hide resolved
src/datatypes/src/vectors/operations.rs Outdated Show resolved Hide resolved
src/datatypes/src/vectors/operations/dedup.rs Outdated Show resolved Hide resolved
src/datatypes/src/vectors/operations/dedup.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@MichaelScofield MichaelScofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Use field val instead of tuple for variant ListValueRef::Ref to keep
consistence with ListValueRef::Indexed
Also implements ScalarVectorBuilder for ListVectorBuilder, Scalar for
ListValue and ScalarRef for ListValueRef
Move compute operations to VectorOp trait and acts as an super trait of
Vector. So we could later put dedup/filter methods to VectorOp trait,
avoid to define too many methods in Vector trait.
Move Scalar and ScalarRef trait bounds to PrimitiveElement, so for each
native type which implements PrimitiveElement, its PrimitiveVector
always implements ScalarVector, so we could use it as ScalarVector
without adding additional trait bounds
Remove compute mod and move dedup logic to operations::dedup
Copy link
Contributor

@v0y4g3r v0y4g3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@v0y4g3r v0y4g3r merged commit e697ba9 into develop Sep 19, 2022
@v0y4g3r v0y4g3r deleted the feat/dedup-vector branch September 19, 2022 06:05
paomian pushed a commit to paomian/greptimedb that referenced this pull request Oct 19, 2023
* feat: Dedup vector

* refactor: Re-export Date/DateTime/Timestamp

* refactor: Named field for ListValueRef::Ref

Use field val instead of tuple for variant ListValueRef::Ref to keep
consistence with ListValueRef::Indexed

* feat: Implement ScalarVector for ListVector

Also implements ScalarVectorBuilder for ListVectorBuilder, Scalar for
ListValue and ScalarRef for ListValueRef

* test: Add tests for ScalarVector implementation of ListVector

* feat: Implement dedup using match_scalar_vector

* refactor: Move dedup func to individual mod

* chore: Update ListValueRef comments

* refactor: Move replicate to VectorOp

Move compute operations to VectorOp trait and acts as an super trait of
Vector. So we could later put dedup/filter methods to VectorOp trait,
avoid to define too many methods in Vector trait.

* refactor: Move scalar bounds to PrimitiveElement

Move Scalar and ScalarRef trait bounds to PrimitiveElement, so for each
native type which implements PrimitiveElement, its PrimitiveVector
always implements ScalarVector, so we could use it as ScalarVector
without adding additional trait bounds

* refactor: Move dedup to VectorOp

Remove compute mod and move dedup logic to operations::dedup

* feat: Implement VectorOp::filter

* test: Move replicate test of primitive to replicate.rs

* test: Add more replicate tests

* test: Add tests for dedup and filter

Also fix NullVector::dedup and ConstantVector::dedup

* style: fix clippy

* chore: Remove unused scalar.rs

* test: Add more tests for VectorOp and fix failed tests

Also fix TimestampVector eq not implemented.

* chore: Address CR comments

* chore: mention vector should be sorted in comment

* refactor: slice the vector directly in replicate_primitive_with_type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants