Hi, I wanted to dedupe data using replacingMergeTree. One issue i'm facing is that i have two columns event_received_at(datetime) and event_id(string). I want to dedupe data by event_id, but sorting should be by event_received_at.
According to the docs replacingMergeTree dedupes based on sorting key for me it is event_received_at, but there can be two different events which has the same event_received_at.
https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/replacingmergetree/
So is it possible to dedupe by event_id and sort by event_received_at using replacingMergeTree
or there is another mergeTree that could achieve this functionality?
tried below example:
CREATE TABLE IF NOT EXISTS test.app
(
event_id String,
event_received_at DateTime
) ENGINE = ReplacingMergeTree() order by (event_id, event_received_at) primary key event_id;
INSERT INTO test.app FORMAT CSV
abc, 2019-12-23 14:36:31
abc, 2020-12-23 14:36:31
def, 2019-12-23 14:36:31
select * from test.app final;
output:
abc, 2019-12-23 14:36:31
abc, 2020-12-23 14:36:31
def, 2019-12-23 14:36:31
expecting output to be like this
abc, 2020-12-23 14:36:31
def, 2019-12-23 14:36:31
if its not possible by replacingMergeTree is there another mergeTree which could achieve the above functionality
Hi, I wanted to dedupe data using replacingMergeTree. One issue i'm facing is that i have two columns event_received_at(datetime) and event_id(string). I want to dedupe data by event_id, but sorting should be by event_received_at.
According to the docs replacingMergeTree dedupes based on sorting key for me it is event_received_at, but there can be two different events which has the same event_received_at.
https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/replacingmergetree/
So is it possible to dedupe by event_id and sort by event_received_at using replacingMergeTree
or there is another mergeTree that could achieve this functionality?
tried below example:
expecting output to be like this
if its not possible by replacingMergeTree is there another mergeTree which could achieve the above functionality