New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SQL standard "delete from ... where ..." syntax and lightweight implementation on merge tree tables #37893
Support SQL standard "delete from ... where ..." syntax and lightweight implementation on merge tree tables #37893
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view, such syntax will be more confusing for users. DELETE
s in clickhouse is extremely slow and expensive operations. When the user sees ALTER
prefix in ALTER DELETE
query it became more clear that it's not an ordinary OLTP database delete operation and maybe you have to execute them in a special way. So I think we shouldn't introduce this new standard syntax until we make our DELETE
s faster.
9d1c5e3
to
93b8c55
Compare
@alesapin I implemented prototype for lightweight delete on merge tree with wide part only. The lightweight delete make hard links from source part and add a deleted_row_mask.bin in the new part. |
@alexey-milovidov, I tried to implemented a prototype for lightweight delete + sql standard syntax. |
9d1c5e3
to
6236450
Compare
@alesapin I implemented a version of lightweight delete. You left a change requested, however I cannot resolve it. So the merging is blocked. Please help to add a "can be tested" tag, so I can continue the work. |
When we were using ClickHouse, we were not satisfied with the performance of deletion, so we developed the lightweight delete feature using lazy deletion. After preliminary testing, we found that the deletion performance has been greatly improved compared to the original |
6236450
to
716928c
Compare
@alesapin @alexey-milovidov The failed tests are not related to this feature. Would you please take a look at the code changes for lightweight delete on merge tree with wide part? Let me know if you have any comments. |
@alexey-milovidov Hi Alexey, please help to continue the code review for the lightweight delete.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution, thank you! Please check my review remarks. If possible, please add comments directly to the code instead of answering in PR.
I'll ask @davenger to check read part.
Question about the feature in general:
How does this work with projections? Hard linking the data from the projections will lead to incorrect data. Shouldn't they be regenerated every time? Or maybe invalidated? Or maybe for now throw an error as not supported. |
@zhangjmruc Thank you very much for working on this feature! I looked through the implementation and I'd like to suggest some improvements. As far as I see from the code the whole deleted_rows mask for a part is always loaded in memory with the part metadata. The other thing is about filtering. Making the deleted_rows be more like an ordinary column with corresponding .mrk file should benefit it.
where values are rather big strings
and then we read form the table
This is rather slow because we don't filter whole granules where deleted_mask is 1 but read big if we add a fake "always true" PREWHERE condition the query executes much faster:
Here it first reads the thin In order to have this fast filtering the deletion_mask check should be the first step in PREWHERE chain. I recently refactored reader code to support multiple steps of PREWHERE (#37165) and I'm willing to help with implementing this deletion_maks filtering. |
@alesapin Please help to review parts of code changes and check if they are expected or not. For MergeTreeDataPartDeletedMask, I refer to the implementation of #32774. According to the design of light weight delete RFC, the deletedMask should be saved as a roaring bitmap. From the perspective of easier query optimization, Davenger suggested the deletedMask should be saved as a bin file. +.mrk file. Maybe we need to create another issue to figure out which of the three implementations is more optimal. I'm willing to do more discussions with davenger or others to determine the final solution of this deletedMask, and also willing to be responsible for the final implementation of this deletedMask, but I hope to be able to merge other code into maser first. |
I disabled lightweight delete mutate for part with projections. Thank you for pointing this. |
@davenger Thank you for suggesting improvements. Please help me to implement the delete_rows filter. If possible, I wish the version in this pull request can be considered as prototype for LWD. Later an improved version with _delete.bin and _delete.mrk or other better solutions. My concerns here are:
|
d055501
to
473a46b
Compare
Hi @davenger, thank you for helping reworked lightweight delete logic! The deleted mask is stored as normal columns and SELECT queries can work faster with PREWHERE condition on deleted mask. |
We still need to implement replicated case |
3f42378
to
48de02a
Compare
@alesapin We already implemented lightweight delete for ReplicatedMergeTree. Shall we submit the code to this PR or we can merge this one first and start a new PR for the replicated case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplicity of implementation for RMT. Thanks!
2 perf test failures are expected because this new setting is not present in old build: |
Thanks for the feature. We used a lot of "SETTINGS min_bytes_for_wide_part = '100M'" in our table, when support Compact part? |
@WillMeng I will update descriptions. The compact part and Replicated MergeTree are all supported. Please reference the new test cases. |
Thank you. I have another question, when the data by marked "deleted" is clear from disk. |
@zhangjmruc , Hi, We had try out this feature at first time. We found those problem:
Our table schema like this :
ENGINE = MergeTree |
@WillMeng Would you please share us more details about the unfinished mutations? The column latest_fail_reason contains info about failure reason. Or you need to provide error messages in /var/log/clickhouse-server/clickhouse-server.err.log. The lightweight delete just converted the delete to an update on virtual column _rows_exists. Not sure why so many unfinished mutations. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families.
Description:
This is planned for lightweight delete on merge tree families. In this pull request, we can parse SQL standard DELETE syntax, and the lightweight delete is converted to a normal update like "update _row_exists = 0 where predicate".
There will be an _row_exists system virtual column stored in data part.
The SELECT query and MERGE now apply the deleted rows mask implicitly, considered as PREWHERE predicate, filtering the "_row_exists = 0" rows.
=== Tests for lightweight delete and normal delete with "mutations_sync = 1"
The table has 12 columns with different data types, total rows are 100,000,000, about 110 parts.
The results show that the lightweight delete runs faster than normal delete. 200ms vs 8s.
--- 15 single row deletes
--- 10 range rows deletes