New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting do_not_merge_across_partitions_select_final = 1 does not filter deleted rows in ReplacingMergeTree #49685
Comments
repro https://fiddle.clickhouse.com/23eb97fc-b305-4ffc-a299-fad738bda8ad CREATE TABLE t
(
`account_id` UInt64,
`_is_deleted` UInt8,
`_version` UInt64
)
ENGINE = ReplacingMergeTree(_version, _is_deleted)
ORDER BY (account_id);
insert into t select number, 0, 1 from numbers(1e3);
insert into t select number, 1, 1 from numbers(1e2);
optimize table t final;
select count() from t final;
┌─count()─┐
│ 900 │
└─────────┘
select count() from t final SETTINGS do_not_merge_across_partitions_select_final = 1;
┌─count()─┐
│ 1000 │
└─────────┘
select count() from t;
┌─count()─┐
│ 1000 │
└─────────┘ |
For me, this is the expected behaviour. You can add
Maybe this case should be added in the documentation. |
Hm, So table's / merge_tree settings solves the issue ( https://fiddle.clickhouse.com/de6ccea6-63bb-4bcc-ab9d-fc19f4d4502c |
If you add this settings, |
@youennL-cs |
I understand the reasoning and I can always filter using the However, I feel like having a setting ( Could we at least document this somehow? |
@den-crane , And yes this specific behaviour with this setting should be documented, to avoid futur confusions. |
@youennL-cs I mean that ReplacingMT can merge new parts
I guess Clickhouse is able to merge parts 3 + 4 , in the result will be no rows with key 1. |
the row magically appears after optimize drop table t;
CREATE TABLE t
(
`account_id` UInt64,
`_is_deleted` UInt8,
`_version` UInt64
)
ENGINE = ReplacingMergeTree(_version, _is_deleted)
ORDER BY (account_id)
settings clean_deleted_rows= 'Always' as select number, 0, 1 from numbers (1e7);
optimize table t final;
insert into t select number, 0, 2 from numbers(600);
insert into t select number, 1, 3 from numbers(300);
insert into t select number, 0, 4 from numbers(10);
select * from t final where account_id = 11; --------- no row
Ok.
0 rows in set. Elapsed: 0.005 sec. Processed 9.09 thousand rows, 154.56 KB (2.00 million rows/s., 34.02 MB/s.)
optimize table t; --------- optimize
select * from t final where account_id = 11;
┌─account_id─┬─_is_deleted─┬─_version─┐
│ 11 │ 0 │ 1 │ ---- row has resurrected
└────────────┴─────────────┴──────────┘
1 row in set. Elapsed: 0.013 sec. Processed 8.50 thousand rows, 144.53 KB (643.01 thousand rows/s., 10.93 MB/s.) |
I think one more mode is needed |
Yep, that was discussed in PR #41005 (comment) And that's why it's not enabled by default - it can give unreliable results. Maybe we can try to implement logic in the merge, like following: "if the merge includes the minimum block of the partition - it's safe to remove rows marked as deleted ". Right now optimize deletes the last state of the row. |
idea1.Even with CREATE TABLE t
(
`account_id` UInt64,
`_is_deleted` UInt8,
`_version` UInt64
)
ENGINE = ReplacingMergeTree(_version, _is_deleted)
ORDER BY (account_id);
insert into t select number, 0, 1 from numbers(1e8);
insert into t select number, 1, 1 from numbers(1);
optimize table t final;
select count() from t final;
--
99999999
Elapsed: 1.629 sec.
select count() from t final SETTINGS do_not_merge_across_partitions_select_final = 1;
--
100000000
Elapsed: 0.313 sec
select count() from t final where _is_deleted=0 SETTINGS do_not_merge_across_partitions_select_final = 1;
--
99999999
Elapsed: 0.332 sec
select count() from (select _is_deleted from t final where _is_deleted=0) SETTINGS do_not_merge_across_partitions_select_final = 1;
--
99999999
Elapsed: 0.293 sec The result is correct and no performance degradation. idea2.Another solution is to store meta-information in a part that it has deleted rows and disable something like |
@den-crane @youennL-cs : the setting should not affect the result even if you document this odd behavior. So let us apply the filtering when final is set |
That is wrong. That setting does not avoid merges, it limits the scope FINAL to the parts of a single partition (so it makes FINAL work just as a usual merge, while by default it also tries to find duplicates across partitions). Expectation that it should work sounds reasonable and valid. Probably it is not working due to some silly missing line somewhere. |
It's this logic working there: ClickHouse/src/Processors/QueryPlan/ReadFromMergeTree.cpp Lines 988 to 999 in eddd932
So if there are more that a single part in partition the is_deleted logic works as expected. We can just try to disable that part of logic there by adding condition like
But that means that we will still do FINAL on partitions contatining single part (but in the scope on one partition only). |
Also I think you should run the test suite with |
We are using Clickhouse 23.3.2 version and we have a few
ReplacingMergeTree
tables defined like below:When we are querying with
FINAL
, we get different results if we add the settingdo_not_merge_across_partitions_select_final = 1
. It seems like it skips filtering the deleted rows:Unfortunately I couldn't reproduce in a local setup.
Is it expected that this setting affects the latest behaviour of
ReplacingMergeTrees
which filters the deleted rows, or is this a bug?The text was updated successfully, but these errors were encountered: