Support SQL standard "delete from ... where ..." syntax and lightweight implementation on merge tree tables #37893

zhangjmruc · 2022-06-07T05:30:00Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families.

Description:
This is planned for lightweight delete on merge tree families. In this pull request, we can parse SQL standard DELETE syntax, and the lightweight delete is converted to a normal update like "update _row_exists = 0 where predicate".
There will be an _row_exists system virtual column stored in data part.
The SELECT query and MERGE now apply the deleted rows mask implicitly, considered as PREWHERE predicate, filtering the "_row_exists = 0" rows.

=== Tests for lightweight delete and normal delete with "mutations_sync = 1"
The table has 12 columns with different data types, total rows are 100,000,000, about 110 parts.
The results show that the lightweight delete runs faster than normal delete. 200ms vs 8s.

--- 15 single row deletes

--- 10 range rows deletes

alesapin

From my point of view, such syntax will be more confusing for users. DELETEs in clickhouse is extremely slow and expensive operations. When the user sees ALTER prefix in ALTER DELETE query it became more clear that it's not an ordinary OLTP database delete operation and maybe you have to execute them in a special way. So I think we shouldn't introduce this new standard syntax until we make our DELETEs faster.

zhangjmruc · 2022-06-28T10:31:08Z

From my point of view, such syntax will be more confusing for users. DELETEs in clickhouse is extremely slow and expensive operations. When the user sees ALTER prefix in ALTER DELETE query it became more clear that it's not an ordinary OLTP database delete operation and maybe you have to execute them in a special way. So I think we shouldn't introduce this new standard syntax until we make our DELETEs faster.

@alesapin I implemented prototype for lightweight delete on merge tree with wide part only. The lightweight delete make hard links from source part and add a deleted_row_mask.bin in the new part.

zhangjmruc · 2022-06-29T09:58:37Z

@alexey-milovidov, I tried to implemented a prototype for lightweight delete + sql standard syntax.
The deleted mask is now stored as String.
For select query, the deleted mask filter works like row filter, the deleted rows are filtered implicitly.
Please help to add "can be tested"

zhangjmruc · 2022-07-01T02:01:23Z

@alesapin I implemented a version of lightweight delete. You left a change requested, however I cannot resolve it. So the merging is blocked. Please help to add a "can be tested" tag, so I can continue the work.

lqhl · 2022-07-01T06:33:07Z

When we were using ClickHouse, we were not satisfied with the performance of deletion, so we developed the lightweight delete feature using lazy deletion. After preliminary testing, we found that the deletion performance has been greatly improved compared to the original alter table ... delete. We want to contribute our lightweight delete solution to the community to help ClickHouse meet more user needs. Can you mark this part of the code as can be tested, so that we can conduct a comprehensive test? If there are functional or performance problems, we are willing to continue to revise according to the opinions of the community. We also plan to continue to submit features such as lightweight delete on ReplicatedMergeTree on the basis of this feature.

@alesapin

zhangjmruc · 2022-07-04T07:47:06Z

@alesapin @alexey-milovidov The failed tests are not related to this feature. Would you please take a look at the code changes for lightweight delete on merge tree with wide part? Let me know if you have any comments.

zhangjmruc · 2022-07-05T05:55:48Z

@alexey-milovidov Hi Alexey, please help to continue the code review for the lightweight delete.
Summary of 3 failing checks:

Stress test (memory, actions): table test_15.tp_2 is a replicated merge tree. lightweight delete doesn't kick in.
2022.07.04 07:49:09.157043 [ 542812 ] {} test_15.tp_2 (debc2ecd-e9b2-497c-8c48-7ec57d5d4258): Cannot quickly remove directory /var/lib/clickhouse/store/deb/debc2ecd-e9b2-497c-8c48-7ec57d5d4258/delete_tmp_all_0_0_0_2 by removing files; fallback to recursive removal. Reason: Code: 458. DB::ErrnoException: Cannot unlink file /var/lib/clickhouse/store/deb/debc2ecd-e9b2-497c-8c48-7ec57d5d4258/delete_tmp_all_0_0_0_2/pp.proj, errno: 21, strerror: Is a directory. (CANNOT_UNLINK) (version 22.7.1.1)
...
Referenced from error log:
2022.07.04 07:31:59.987510 [ 232572 ] {} test_15.tp_2 (debc2ecd-e9b2-497c-8c48-7ec57d5d4258): void DB::StorageReplicatedMergeTree::mutationsFinalizingTask(): Code: 242. DB::Exception: Table is in readonly mode (replica path: /clickhouse/tables/s1/01710_projection_fetch_test_15/replicas/2_r1). (TABLE_IS_READ_ONLY)
Stress Test (address, Actions): clieckhouse-server failed to start due to former server not exit.
Referenced from runlog.log:

echo 'Cannot start clickhouse-server'
echo -e 'Cannot start clickhouse-server\tFAIL'
cat /var/log/clickhouse-server/stdout.log
Cannot start clickhouse-server
/var/run/clickhouse-server/clickhouse-server.pid file exists and contains pid = 460221.
The process with pid = 460221 is already running.

Stateless tests (thread, actions) [1/3] -- not related, the failed test contains only insert and select.

00731_long_merge_tree_select_opened_files	FAIL	97.64
2022-07-04 00:40:48 Received exception from server (version 22.7.1): 2022-07-04 00:40:48 Code: 81. DB::Exception: Received from localhost:9000. DB::Exception: Database test_2vos10 doesn't exist. (UNKNOWN_DATABASE) 2022-07-04 00:40:48 (query: SYSTEM FLUSH LOGS)

alesapin

Great contribution, thank you! Please check my review remarks. If possible, please add comments directly to the code instead of answering in PR.

I'll ask @davenger to check read part.

src/Interpreters/MutationsInterpreter.h

src/Parsers/ASTDeleteQuery.cpp

src/Storages/IStorage.h

src/Storages/MergeTree/DataPartStorageOnDisk.cpp

src/Storages/MergeTree/IMergeTreeDataPart.h

src/Interpreters/InterpreterDeleteQuery.cpp

src/Interpreters/MutationsInterpreter.h

src/Parsers/ParserDeleteQuery.h

src/Storages/MergeTree/IMergeTreeDataPart.cpp

src/Storages/MergeTree/MergeTreeDataPartDeletedMask.cpp

Algunenano · 2022-07-07T11:12:58Z

Question about the feature in general:

A new data part is created after lightweight delete, all files expect checksums, deleted_rows_mask.bin... are created from hard links of the source part. The deleted rows mask is updated based on the old one in source part + new deleted rows's _part_offset according to where conditions.

How does this work with projections? Hard linking the data from the projections will lead to incorrect data. Shouldn't they be regenerated every time? Or maybe invalidated? Or maybe for now throw an error as not supported.

davenger · 2022-07-07T11:26:46Z

@zhangjmruc Thank you very much for working on this feature! I looked through the implementation and I'd like to suggest some improvements.

As far as I see from the code the whole deleted_rows mask for a part is always loaded in memory with the part metadata.
This might consume significant amount of memory for big tables.

The other thing is about filtering. Making the deleted_rows be more like an ordinary column with corresponding .mrk file should benefit it.
Lets say we have a table like this

CREATE TABLE t (id UInt64 , value String) ENGINE MergeTree() ORDER BY id;

INSERT INTO t SELECT number, randomString(1000) FROM system.numbers LIMIT 1000000;

where values are rather big strings
And at some point we delete a range of rows that covers several granules

DELETE FROM t WHERE id < 100000

and then we read form the table

SELECT id, length(value) FROM t ORDER BY id LIMIT 3

Query id: b70bf4f9-e882-47eb-9ba6-cc82a8a3cdca

┌─────id─┬─length(value)─┐
│ 100000 │          1000 │
│ 100001 │          1000 │
│ 100002 │          1000 │
└────────┴───────────────┘

3 rows in set. Elapsed: 0.561 sec. Processed 106.50 thousand rows, 6.61 MB (189.89 thousand rows/s., 11.78 MB/s.)

This is rather slow because we don't filter whole granules where deleted_mask is 1 but read big value column

if we add a fake "always true" PREWHERE condition the query executes much faster:

SELECT id, length(value) FROM t PREWHERE id >= 0 ORDER BY id LIMIT 3

Query id: 989e1721-31f2-4f8b-84e2-e08a6ad581ba

┌─────id─┬─length(value)─┐
│ 100000 │          1000 │
│ 100001 │          1000 │
│ 100002 │          1000 │
└────────┴───────────────┘

3 rows in set. Elapsed: 0.087 sec. Processed 139.26 thousand rows, 42.44 MB (1.59 million rows/s., 486.05 MB/s.)

Here it first reads the thin id column, filters it by the mask and only then reads big value columns only for needed rows

In order to have this fast filtering the deletion_mask check should be the first step in PREWHERE chain. I recently refactored reader code to support multiple steps of PREWHERE (#37165) and I'm willing to help with implementing this deletion_maks filtering.

zhangjmruc · 2022-07-08T03:35:44Z

Great contribution, thank you! Please check my review remarks. If possible, please add comments directly to the code instead of answering in PR.

I'll ask @davenger to check read part.

@alesapin Please help to review parts of code changes and check if they are expected or not.
The failed test case in integration tests are expected when we make ALTER DELETE lightweight, because the delete and drop column are mutated separately, hence the count of hard link are different as normal.

For MergeTreeDataPartDeletedMask, I refer to the implementation of #32774. According to the design of light weight delete RFC, the deletedMask should be saved as a roaring bitmap. From the perspective of easier query optimization, Davenger suggested the deletedMask should be saved as a bin file. +.mrk file. Maybe we need to create another issue to figure out which of the three implementations is more optimal. I'm willing to do more discussions with davenger or others to determine the final solution of this deletedMask, and also willing to be responsible for the final implementation of this deletedMask, but I hope to be able to merge other code into maser first.

zhangjmruc · 2022-07-08T08:06:04Z

Question about the feature in general:

A new data part is created after lightweight delete, all files expect checksums, deleted_rows_mask.bin... are created from hard links of the source part. The deleted rows mask is updated based on the old one in source part + new deleted rows's _part_offset according to where conditions.

How does this work with projections? Hard linking the data from the projections will lead to incorrect data. Shouldn't they be regenerated every time? Or maybe invalidated? Or maybe for now throw an error as not supported.

I disabled lightweight delete mutate for part with projections. Thank you for pointing this.

zhangjmruc · 2022-07-08T08:46:16Z

@davenger Thank you for suggesting improvements.
Indeed, loading the deleted_rows mask with metadata will consume memory for big table. Hope if we plan to make deleted_rows like an ordinary column, this can be avoided. Currently, I remove the loadDeletedMask() function and rewrite getDeletedMask() to load deleted_rows mask when needed.

Please help me to implement the delete_rows filter. If possible, I wish the version in this pull request can be considered as prototype for LWD. Later an improved version with _delete.bin and _delete.mrk or other better solutions.
Please let me know your opinion.

My concerns here are:
Let's say the deleted_rows mask is stored in _delete.bin.

Is the column _delete a virtual column?
When should we add the "_delete = 0" to PREWHERE? For all the selects on MergeTreeTables with lightweight deleted mask?
How do we read the _delete column in IMergeTreeReader? If we adding this to PREWHERE's read column, this is not a physical column in columns.txt.

The other thing is about filtering. Making the deleted_rows be more like an ordinary column with corresponding .mrk file should benefit it.

In order to have this fast filtering the deletion_mask check should be the first step in PREWHERE chain. I recently refactored reader code to support multiple steps of PREWHERE (#37165) and I'm willing to help with implementing this deletion_maks filtering.

zhangjmruc · 2022-07-25T08:14:41Z

Hi @davenger, thank you for helping reworked lightweight delete logic! The deleted mask is stored as normal columns and SELECT queries can work faster with PREWHERE condition on deleted mask.
However, as for the performance, because the DELETE WHERE queries are translated to UPDATE _row_exists = 0 WHERE, and the lightweight delete task is removed, now the lightweight delete is treated as an ordinary update mutation on the system column _row_exists. For cases that we delete 1 row in part with large number of rows, how is the performance of lightweight delete execution? Also for mutate all columns cases, the deleted mask is not applied, because PREWHERE on deleted mask can make faster?
For lightweight delete, the system table mutations and mutation.txt show the its command as "UPDATE _row_exists = 0 WHERE predicate". Can this be used as mark of lightweight delete?

src/Storages/LightweightDeleteDescription.h

src/Storages/MergeTree/MutateTask.cpp

alesapin · 2022-07-25T13:26:50Z

We still need to implement replicated case

lqhl · 2022-07-26T02:15:24Z

We still need to implement replicated case

@alesapin We already implemented lightweight delete for ReplicatedMergeTree. Shall we submit the code to this PR or we can merge this one first and start a new PR for the replicated case.

zhangjmruc · 2022-07-26T04:29:41Z

@davenger @alesapin I tried to implement lightweight delete for replicated case and added a test case for it. Please help to review and let me know your opinion?

…icas

alesapin

I like the simplicity of implementation for RMT. Thanks!

davenger · 2022-07-26T21:38:58Z

2 perf test failures are expected because this new setting is not present in old build:
lightweight_delete | DB::Exception: Unknown setting allow_experimental_lightweight_delete.

WillMeng · 2022-08-12T02:38:25Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for wide part only.

Description: This is planned for lightweight delete on merge tree families. In this pull request, we can parse SQL standard DELETE syntax, and the lightweight delete is implemented for wide part only. A new data part is created after lightweight delete, all files expect checksums, deleted_rows_mask.bin... are created from hard links of the source part. The deleted rows mask is updated based on the old one in source part + new deleted rows's _part_offset according to where conditions. The SELECT query and MERGE now apply the deleted rows mask implicitly, filtering the "_delete = 1" rows.

=== Tests for lightweight delete and normal delete with "mutations_sync = 1" The table has 12 columns with different data types, total rows are 100,000,000, about 110 parts. The results show that the lightweight delete runs faster than normal delete. 200ms vs 8s.

--- 15 single row deletes

--- 10 range rows deletes

Thanks for the feature. We used a lot of "SETTINGS min_bytes_for_wide_part = '100M'" in our table, when support Compact part?

zhangjmruc · 2022-08-12T06:12:20Z

@WillMeng I will update descriptions. The compact part and Replicated MergeTree are all supported. Please reference the new test cases.

WillMeng · 2022-08-15T10:18:21Z

@WillMeng I will update descriptions. The compact part and Replicated MergeTree are all supported. Please reference the new test cases.

Thank you. I have another question, when the data by marked "deleted" is clear from disk.

zhangjmruc · 2022-08-15T10:51:03Z

@WillMeng I will update descriptions. The compact part and Replicated MergeTree are all supported. Please reference the new test cases.

Thank you. I have another question, when the data by marked "deleted" is clear from disk.

@WillMeng The "deleted" rows will be cleared after background merge.

WillMeng · 2022-08-22T09:51:26Z

@zhangjmruc , Hi, We had try out this feature at first time. We found those problem:

SELECT COUNT() FROM system.mutations WHERE is_done = 0, The result is 92568, and read timed out(30 seconds) at often
The destination table become so slow when read it, and read timed out at often.

Our table schema like this :
`CREATE TABLE default.custom_conversion_day_details
(

`chan_pub_id` Int64,

`channel_id` Int32,

`publisher_id` Int32,

`pub_account_id` Int64,

`pub_account_name` String,

`pub_campaign_id` Int64,

`pub_campaign_name` String,

`pub_adgroup_id` Int64,

`pub_adgroup_name` String,
`data_date` Date)

ENGINE = MergeTree
PARTITION BY toYYYYMMDD(data_date)
PRIMARY KEY chan_pub_id
ORDER BY chan_pub_id
SETTINGS min_bytes_for_wide_part = '100M',
index_granularity = 8192;`

zhangjmruc · 2022-08-23T07:24:57Z

@WillMeng Would you please share us more details about the unfinished mutations? The column latest_fail_reason contains info about failure reason. Or you need to provide error messages in /var/log/clickhouse-server/clickhouse-server.err.log.
Please open a new issue for your problem, we can continue inv there.

The lightweight delete just converted the delete to an update on virtual column _rows_exists. Not sure why so many unfinished mutations.

robot-clickhouse added the pr-improvement Pull request with some product improvements label Jun 7, 2022

zhangjmruc changed the title ~~Support delete from ... where syntax on merge tree tables~~ Support SQL standard "delete from ... where ..." syntax on merge tree tables Jun 8, 2022

alesapin requested changes Jun 8, 2022

View reviewed changes

alesapin self-assigned this Jun 8, 2022

alesapin marked this pull request as draft June 12, 2022 11:13

zhangjmruc mentioned this pull request Jun 15, 2022

Prototype lighweight mutation for mergetree #35090

Closed

zhangjmruc force-pushed the feature/sql-standard-delete branch from 9d1c5e3 to 93b8c55 Compare June 28, 2022 10:20

zhangjmruc marked this pull request as ready for review June 28, 2022 10:28

zhangjmruc requested a review from alesapin June 29, 2022 09:44

zhangjmruc changed the title ~~Support SQL standard "delete from ... where ..." syntax on merge tree tables~~ Support SQL standard "delete from ... where ..." syntax and lightweight implementation on merge tree tables Jun 30, 2022

zhangjmruc force-pushed the feature/sql-standard-delete branch 2 times, most recently from 9d1c5e3 to 6236450 Compare June 30, 2022 10:05

alexey-milovidov added the can be tested Allows running workflows for external contributors label Jul 3, 2022

zhangjmruc force-pushed the feature/sql-standard-delete branch from 6236450 to 716928c Compare July 4, 2022 01:48

alesapin reviewed Jul 5, 2022

View reviewed changes

zhangjmruc requested a review from alesapin July 8, 2022 03:25

zhangjmruc force-pushed the feature/sql-standard-delete branch 2 times, most recently from d055501 to 473a46b Compare July 12, 2022 01:40

alesapin assigned davenger Jul 12, 2022

davenger closed this Jul 25, 2022

davenger reopened this Jul 25, 2022

davenger approved these changes Jul 25, 2022

View reviewed changes

alesapin approved these changes Jul 25, 2022

View reviewed changes

src/Storages/LightweightDeleteDescription.h Outdated Show resolved Hide resolved

src/Storages/MergeTree/MutateTask.cpp Show resolved Hide resolved

davenger added 2 commits July 25, 2022 16:14

More detailed comment

7b44950

Capitalized const name

48de02a

davenger force-pushed the feature/sql-standard-delete branch from 3f42378 to 48de02a Compare July 25, 2022 14:32

Merge branch 'master' into feature/sql-standard-delete

25deba2

ReplicatedMergeTree support lightweight delete

c1834d1

zhangjmruc and others added 3 commits July 26, 2022 16:10

set mutations_sync to 2 to wait for mutations to complete on all repl…

162be5a

…icas

Merge branch 'master' into feature/sql-standard-delete

0666ec2

Added 'long' tag

1b1fa54

alesapin approved these changes Jul 26, 2022

View reviewed changes

davenger merged commit a64aa00 into ClickHouse:master Jul 26, 2022

alexey-milovidov mentioned this pull request Aug 4, 2022

RFC: Lightweight deletes for MergeTree-family tables. #19627

Closed

filimonov mentioned this pull request Sep 8, 2022

[RFC] Replacing merge tree new engine #41005

Merged

rfraposa mentioned this pull request Sep 8, 2022

Update docs for the new lightweight deletes ClickHouse/clickhouse-docs#366

Closed

Algunenano mentioned this pull request Sep 27, 2022

[RFC] Table Engine with Unique Key support #41817

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SQL standard "delete from ... where ..." syntax and lightweight implementation on merge tree tables #37893

Support SQL standard "delete from ... where ..." syntax and lightweight implementation on merge tree tables #37893

zhangjmruc commented Jun 7, 2022 •

edited

alesapin left a comment •

edited

zhangjmruc commented Jun 28, 2022

zhangjmruc commented Jun 29, 2022

zhangjmruc commented Jul 1, 2022

lqhl commented Jul 1, 2022

zhangjmruc commented Jul 4, 2022 •

edited

zhangjmruc commented Jul 5, 2022 •

edited

alesapin left a comment

Algunenano commented Jul 7, 2022

davenger commented Jul 7, 2022

zhangjmruc commented Jul 8, 2022 •

edited

zhangjmruc commented Jul 8, 2022

zhangjmruc commented Jul 8, 2022 •

edited

zhangjmruc commented Jul 25, 2022

alesapin commented Jul 25, 2022

lqhl commented Jul 26, 2022

zhangjmruc commented Jul 26, 2022

alesapin left a comment

davenger commented Jul 26, 2022

WillMeng commented Aug 12, 2022

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

zhangjmruc commented Aug 12, 2022

WillMeng commented Aug 15, 2022

zhangjmruc commented Aug 15, 2022

WillMeng commented Aug 22, 2022 •

edited

zhangjmruc commented Aug 23, 2022

Support SQL standard "delete from ... where ..." syntax and lightweight implementation on merge tree tables #37893

Support SQL standard "delete from ... where ..." syntax and lightweight implementation on merge tree tables #37893

Conversation

zhangjmruc commented Jun 7, 2022 • edited

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

alesapin left a comment • edited

Choose a reason for hiding this comment

zhangjmruc commented Jun 28, 2022

zhangjmruc commented Jun 29, 2022

zhangjmruc commented Jul 1, 2022

lqhl commented Jul 1, 2022

zhangjmruc commented Jul 4, 2022 • edited

zhangjmruc commented Jul 5, 2022 • edited

alesapin left a comment

Choose a reason for hiding this comment

Algunenano commented Jul 7, 2022

davenger commented Jul 7, 2022

zhangjmruc commented Jul 8, 2022 • edited

zhangjmruc commented Jul 8, 2022

zhangjmruc commented Jul 8, 2022 • edited

zhangjmruc commented Jul 25, 2022

alesapin commented Jul 25, 2022

lqhl commented Jul 26, 2022

zhangjmruc commented Jul 26, 2022

alesapin left a comment

Choose a reason for hiding this comment

davenger commented Jul 26, 2022

WillMeng commented Aug 12, 2022

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

zhangjmruc commented Aug 12, 2022

WillMeng commented Aug 15, 2022

zhangjmruc commented Aug 15, 2022

WillMeng commented Aug 22, 2022 • edited

zhangjmruc commented Aug 23, 2022

zhangjmruc commented Jun 7, 2022 •

edited

alesapin left a comment •

edited

zhangjmruc commented Jul 4, 2022 •

edited

zhangjmruc commented Jul 5, 2022 •

edited

zhangjmruc commented Jul 8, 2022 •

edited

zhangjmruc commented Jul 8, 2022 •

edited

WillMeng commented Aug 22, 2022 •

edited