Skip to content

uniqTheta* variations don't seem to work on theta sketches within MV #45241

@githubdev2018

Description

@githubdev2018

I'm trying to re-wirte a Druid query using uniqTheta* & MV combination in ClickHouse to achieve a similar outcome. But I'm getting 0 result as opposed to 1 with the below example.

According to Druid doc, the DS_THETA function can work on sketches.

How could the below Druid query be rewritten in ClickHouse using the uniqTheta* variations?

SELECT THETA_SKETCH_ESTIMATE(
         THETA_SKETCH_INTERSECT(
           DS_THETA(theta_uid) FILTER(WHERE "show" = 'Bridgerton' AND "episode" = 'S1E1'),
           DS_THETA(theta_uid) FILTER(WHERE "show" = 'Bridgerton' AND "episode" = 'S1E2')
         )
       ) AS users
FROM ts_tutorial

Here's an attempt to solve it using MV

CREATE TABLE ts_tutorial
(
   date_id Date,
   uid String,
   show String,
   episode String
)
ENGINE = MergeTree()
ORDER BY (date_id, show, episode, uid);

CREATE TABLE tutorial_tbl
(
   date_id Date,
   show String,
   episode String,
   theta_uid AggregateFunction(uniqTheta, String)
)
ENGINE = AggregatingMergeTree()
ORDER BY (date_id, show, episode);

CREATE MATERIALIZED VIEW IF NOT EXISTS tutorial_mv TO tutorial_tbl
AS
    SELECT
        date_id,
        show,
        episode,
        uniqThetaState(uid) as theta_uid

    FROM ts_tutorial
    GROUP BY date_id, show, episode
;

INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-19','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-20','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-20','carol','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-20','dan','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','carol','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','alice','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','carol','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','erin','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-23','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-23','alice','Game of Thrones','S1E1');

And to answer: How many users watched both episodes of Bridgerton?

SELECT finalizeAggregation(
         uniqThetaIntersect(
           uniqThetaStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E1'),
           uniqThetaStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E2')
         )
       ) AS users
FROM tutorial_mv

The above query would return 0 instead of 1 for user carol. Which doesn't seem to work.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions