Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage]Adding more tuple sketch scalar functions and integration tests #11517

Merged
merged 1 commit into from Sep 10, 2023

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Sep 6, 2023

  • Adding more tuple sketch scalar functions
  • Adding more tuple sketch integration test with union and join
  • Adding more theta sketch integration test with union and join
    Sample queries:
  1. Intersection on sketch bytes with filters
SELECT 
    GET_INT_TUPLE_SKETCH_ESTIMATE(
        INT_SUM_TUPLE_SKETCH_INTERSECTION(
          DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(metTupleSketchBytes) FILTER (WHERE id = 1 OR id = 2),
          DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(metTupleSketchBytes) FILTER (WHERE id = 2 OR id = 3)
        )
    )
FROM myTable
  1. TupleSketch after union multiple sub queries
SELECT 
    DISTINCT_COUNT_TUPLE_SKETCH(metTupleSketchBytes),
    DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(metTupleSketchBytes),
    SUM_VALUES_INTEGER_SUM_TUPLE_SKETCH(metTupleSketchBytes), 
    AVG_VALUE_INTEGER_SUM_TUPLE_SKETCH(metTupleSketchBytes)
FROM (
    SELECT metTupleSketchBytes FROM myTable WHERE id = 4
    UNION ALL
    SELECT metTupleSketchBytes FROM myTable WHERE id = 5
    UNION ALL
    SELECT metTupleSketchBytes FROM myTable WHERE id = 6
    UNION ALL
    SELECT metTupleSketchBytes FROM myTable WHERE id = 7
)
  1. TupleSketch after join
SELECT a.dimValue, distinctCountThetaSketch(b.thetaSketchCol)
FROM
(SELECT dimName, dimValue, thetaSketchCol FROM myTable WHERE dimName = 'gender' AND dimValue = 'Female') a 
JOIN 
(SELECT dimName, dimValue, thetaSketchCol FROM myTable WHERE dimName = 'gender' AND dimValue = 'Male') b 
ON a.dimName = b.dimName
GROUP BY a.dimValue;
  1. TupleSketch with Intersection/Union after join
SELECT
    GET_INT_TUPLE_SKETCH_ESTIMATE(
        INT_SUM_TUPLE_SKETCH_INTERSECTION(
            DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(a.metTupleSketchBytes),
            DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(b.metTupleSketchBytes)
        )
    ),
    GET_INT_TUPLE_SKETCH_ESTIMATE(
        INT_SUM_TUPLE_SKETCH_UNION(
            DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(a.metTupleSketchBytes),
            DISTINCT_COUNT_RAW_INTEGER_SUM_TUPLE_SKETCH(b.metTupleSketchBytes)
        )
    )
FROM
    (SELECT id, metTupleSketchBytes FROM myTable WHERE id < 8 ) a
JOIN
    (SELECT id, metTupleSketchBytes FROM myTable WHERE id > 3 ) b
ON
    a.id = b.id
  1. ThetaSketch with Intersection/Union after join
SELECT 
    GET_THETA_SKETCH_ESTIMATE(
        THETA_SKETCH_INTERSECT(
            DISTINCT_COUNT_RAW_THETA_SKETCH(a.thetaSketchCol, ''),
            DISTINCT_COUNT_RAW_THETA_SKETCH(b.thetaSketchCol, '')
        )
    ),
    GET_THETA_SKETCH_ESTIMATE(
        THETA_SKETCH_UNION(
            DISTINCT_COUNT_RAW_THETA_SKETCH(a.thetaSketchCol, ''), 
            DISTINCT_COUNT_RAW_THETA_SKETCH(b.thetaSketchCol, '')
        )
    ) 
FROM 
    (SELECT dimName, dimValue, thetaSketchCol FROM myTable where dimName = 'gender' and dimValue = 'Female') a
JOIN 
    (SELECT dimName, dimValue, thetaSketchCol FROM myTable where dimName = 'gender' and dimValue = 'Male') b
ON
    a.dimName = b.dimName

@codecov-commenter
Copy link

codecov-commenter commented Sep 6, 2023

Codecov Report

Merging #11517 (8de0a62) into master (b25b62a) will decrease coverage by 0.06%.
Report is 11 commits behind head on master.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##             master   #11517      +/-   ##
============================================
- Coverage     63.07%   63.02%   -0.06%     
- Complexity      207     1108     +901     
============================================
  Files          2320     2320              
  Lines        124598   124691      +93     
  Branches      19022    19036      +14     
============================================
- Hits          78596    78581      -15     
- Misses        40408    40511     +103     
- Partials       5594     5599       +5     
Flag Coverage Δ
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 62.99% <0.00%> (+12.94%) ⬆️
java-17 14.48% <0.00%> (-48.44%) ⬇️
java-20 62.87% <0.00%> (+12.94%) ⬆️
temurin 63.02% <0.00%> (-0.06%) ⬇️
unittests 63.01% <0.00%> (-0.06%) ⬇️
unittests1 67.41% <0.00%> (-0.10%) ⬇️
unittests2 14.50% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...he/pinot/core/function/scalar/SketchFunctions.java 59.43% <0.00%> (-24.57%) ⬇️

... and 27 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@xiangfu0 xiangfu0 changed the title Adding more theta sketch integration test Adding more tuple sketch scalar functions and integration tests Sep 6, 2023
@xiangfu0 xiangfu0 changed the title Adding more tuple sketch scalar functions and integration tests [multistage]Adding more tuple sketch scalar functions and integration tests Sep 6, 2023
@xiangfu0 xiangfu0 added feature query multi-stage Related to the multi-stage query engine labels Sep 6, 2023
@xiangfu0 xiangfu0 requested a review from snleee September 6, 2023 17:54
Adding more sketch integration test
@xiangfu0 xiangfu0 merged commit d211d89 into apache:master Sep 10, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature multi-stage Related to the multi-stage query engine query testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants