I'm trying to re-wirte a Druid query using uniqTheta* & MV combination in ClickHouse to achieve a similar outcome. But I'm getting 0 result as opposed to 1 with the below example.
How could the below Druid query be rewritten in ClickHouse using the uniqTheta* variations?
SELECT THETA_SKETCH_ESTIMATE(
THETA_SKETCH_INTERSECT(
DS_THETA(theta_uid) FILTER(WHERE "show" = 'Bridgerton' AND "episode" = 'S1E1'),
DS_THETA(theta_uid) FILTER(WHERE "show" = 'Bridgerton' AND "episode" = 'S1E2')
)
) AS users
FROM ts_tutorial
CREATE TABLE ts_tutorial
(
date_id Date,
uid String,
show String,
episode String
)
ENGINE = MergeTree()
ORDER BY (date_id, show, episode, uid);
CREATE TABLE tutorial_tbl
(
date_id Date,
show String,
episode String,
theta_uid AggregateFunction(uniqTheta, String)
)
ENGINE = AggregatingMergeTree()
ORDER BY (date_id, show, episode);
CREATE MATERIALIZED VIEW IF NOT EXISTS tutorial_mv TO tutorial_tbl
AS
SELECT
date_id,
show,
episode,
uniqThetaState(uid) as theta_uid
FROM ts_tutorial
GROUP BY date_id, show, episode
;
INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-19','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-19','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-20','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-20','carol','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-20','dan','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','alice','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','carol','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-21','alice','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','carol','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-22','bob','Bridgerton','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-22','erin','Bridgerton','S1E2');
INSERT INTO ts_tutorial VALUES ('2022-05-23','erin','Game of Thrones','S1E1');
INSERT INTO ts_tutorial VALUES ('2022-05-23','alice','Game of Thrones','S1E1');
SELECT finalizeAggregation(
uniqThetaIntersect(
uniqThetaStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E1'),
uniqThetaStateIf(theta_uid, show = 'Bridgerton' AND episode = 'S1E2')
)
) AS users
FROM tutorial_mv
The above query would return 0 instead of 1 for user carol. Which doesn't seem to work.
I'm trying to re-wirte a Druid query using uniqTheta* & MV combination in ClickHouse to achieve a similar outcome. But I'm getting 0 result as opposed to 1 with the below example.
According to Druid doc, the DS_THETA function can work on sketches.
How could the below Druid query be rewritten in ClickHouse using the uniqTheta* variations?
Here's an attempt to solve it using MV
And to answer: How many users watched both episodes of Bridgerton?
The above query would return 0 instead of 1 for user carol. Which doesn't seem to work.