Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
How to guarantee data deduplication by clickhouse? #1178
At first，I thought ReplacingMergeTree can do this, after i tried serveral times (insert a set of data by file with version 1, than insert the same data set with version 2), i find this method can't realize data deduplication, even if i create a materialized view by select with final keyword, or group by max(ver).
I also read it from the documents "https://clickhouse.yandex/docs/en/table_engines/replacingmergetree.html" that guarantee of data deduplication can't be done by ReplacingMergeTree .
So how do i guarantee data deduplication when some cases i have to insert a set of data many times(load data from file to table is terminated by unexpected exceptions)?
If you use Replicated tables, they will deduplicate inserted blocks of data:
(after "Blocks of data are deduplicated.")
Non replicated tables doesn't have this feature.