Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SQL standard "delete from ... where ..." syntax and lightweight implementation on merge tree tables #37893

Merged
merged 44 commits into from Jul 26, 2022

Conversation

zhangjmruc
Copy link
Contributor

@zhangjmruc zhangjmruc commented Jun 7, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families.

Description:
This is planned for lightweight delete on merge tree families. In this pull request, we can parse SQL standard DELETE syntax, and the lightweight delete is converted to a normal update like "update _row_exists = 0 where predicate".
There will be an _row_exists system virtual column stored in data part.
The SELECT query and MERGE now apply the deleted rows mask implicitly, considered as PREWHERE predicate, filtering the "_row_exists = 0" rows.

=== Tests for lightweight delete and normal delete with "mutations_sync = 1"
The table has 12 columns with different data types, total rows are 100,000,000, about 110 parts.
The results show that the lightweight delete runs faster than normal delete. 200ms vs 8s.

--- 15 single row deletes
企业微信截图_39322126-b3bc-495e-9ea9-6db1da27418e

--- 10 range rows deletes
企业微信截图_d1dd0c0c-1dd5-48c0-a4e9-638c11a245f2

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Jun 7, 2022
@zhangjmruc zhangjmruc changed the title Support delete from ... where syntax on merge tree tables Support SQL standard "delete from ... where ..." syntax on merge tree tables Jun 8, 2022
Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view, such syntax will be more confusing for users. DELETEs in clickhouse is extremely slow and expensive operations. When the user sees ALTER prefix in ALTER DELETE query it became more clear that it's not an ordinary OLTP database delete operation and maybe you have to execute them in a special way. So I think we shouldn't introduce this new standard syntax until we make our DELETEs faster.

@alesapin alesapin self-assigned this Jun 8, 2022
@alesapin alesapin marked this pull request as draft June 12, 2022 11:13
@zhangjmruc zhangjmruc marked this pull request as ready for review June 28, 2022 10:28
@zhangjmruc
Copy link
Contributor Author

From my point of view, such syntax will be more confusing for users. DELETEs in clickhouse is extremely slow and expensive operations. When the user sees ALTER prefix in ALTER DELETE query it became more clear that it's not an ordinary OLTP database delete operation and maybe you have to execute them in a special way. So I think we shouldn't introduce this new standard syntax until we make our DELETEs faster.

@alesapin I implemented prototype for lightweight delete on merge tree with wide part only. The lightweight delete make hard links from source part and add a deleted_row_mask.bin in the new part.

@zhangjmruc zhangjmruc requested a review from alesapin June 29, 2022 09:44
@zhangjmruc
Copy link
Contributor Author

@alexey-milovidov, I tried to implemented a prototype for lightweight delete + sql standard syntax.
The deleted mask is now stored as String.
For select query, the deleted mask filter works like row filter, the deleted rows are filtered implicitly.
Please help to add "can be tested"

@zhangjmruc zhangjmruc changed the title Support SQL standard "delete from ... where ..." syntax on merge tree tables Support SQL standard "delete from ... where ..." syntax and lightweight implementation on merge tree tables Jun 30, 2022
@zhangjmruc zhangjmruc force-pushed the feature/sql-standard-delete branch 2 times, most recently from 9d1c5e3 to 6236450 Compare June 30, 2022 10:05
@zhangjmruc
Copy link
Contributor Author

@alesapin I implemented a version of lightweight delete. You left a change requested, however I cannot resolve it. So the merging is blocked. Please help to add a "can be tested" tag, so I can continue the work.

@lqhl
Copy link

lqhl commented Jul 1, 2022

When we were using ClickHouse, we were not satisfied with the performance of deletion, so we developed the lightweight delete feature using lazy deletion. After preliminary testing, we found that the deletion performance has been greatly improved compared to the original alter table ... delete. We want to contribute our lightweight delete solution to the community to help ClickHouse meet more user needs. Can you mark this part of the code as can be tested, so that we can conduct a comprehensive test? If there are functional or performance problems, we are willing to continue to revise according to the opinions of the community. We also plan to continue to submit features such as lightweight delete on ReplicatedMergeTree on the basis of this feature.

@alesapin

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Jul 3, 2022
@zhangjmruc zhangjmruc force-pushed the feature/sql-standard-delete branch from 6236450 to 716928c Compare July 4, 2022 01:48
@zhangjmruc
Copy link
Contributor Author

zhangjmruc commented Jul 4, 2022

@alesapin @alexey-milovidov The failed tests are not related to this feature. Would you please take a look at the code changes for lightweight delete on merge tree with wide part? Let me know if you have any comments.

@zhangjmruc
Copy link
Contributor Author

zhangjmruc commented Jul 5, 2022

@alexey-milovidov Hi Alexey, please help to continue the code review for the lightweight delete.
Summary of 3 failing checks:

  1. Stress test (memory, actions): table test_15.tp_2 is a replicated merge tree. lightweight delete doesn't kick in.
    2022.07.04 07:49:09.157043 [ 542812 ] {} test_15.tp_2 (debc2ecd-e9b2-497c-8c48-7ec57d5d4258): Cannot quickly remove directory /var/lib/clickhouse/store/deb/debc2ecd-e9b2-497c-8c48-7ec57d5d4258/delete_tmp_all_0_0_0_2 by removing files; fallback to recursive removal. Reason: Code: 458. DB::ErrnoException: Cannot unlink file /var/lib/clickhouse/store/deb/debc2ecd-e9b2-497c-8c48-7ec57d5d4258/delete_tmp_all_0_0_0_2/pp.proj, errno: 21, strerror: Is a directory. (CANNOT_UNLINK) (version 22.7.1.1)
    ...
    Referenced from error log:
    2022.07.04 07:31:59.987510 [ 232572 ] {} test_15.tp_2 (debc2ecd-e9b2-497c-8c48-7ec57d5d4258): void DB::StorageReplicatedMergeTree::mutationsFinalizingTask(): Code: 242. DB::Exception: Table is in readonly mode (replica path: /clickhouse/tables/s1/01710_projection_fetch_test_15/replicas/2_r1). (TABLE_IS_READ_ONLY)

  2. Stress Test (address, Actions): clieckhouse-server failed to start due to former server not exit.
    Referenced from runlog.log:

  • echo 'Cannot start clickhouse-server'
  • echo -e 'Cannot start clickhouse-server\tFAIL'
  • cat /var/log/clickhouse-server/stdout.log
    Cannot start clickhouse-server
    /var/run/clickhouse-server/clickhouse-server.pid file exists and contains pid = 460221.
    The process with pid = 460221 is already running.
  1. Stateless tests (thread, actions) [1/3] -- not related, the failed test contains only insert and select.
00731_long_merge_tree_select_opened_files FAIL 97.64
2022-07-04 00:40:48 Received exception from server (version 22.7.1): 2022-07-04 00:40:48 Code: 81. DB::Exception: Received from localhost:9000. DB::Exception: Database test_2vos10 doesn't exist. (UNKNOWN_DATABASE) 2022-07-04 00:40:48 (query: SYSTEM FLUSH LOGS)

Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution, thank you! Please check my review remarks. If possible, please add comments directly to the code instead of answering in PR.

I'll ask @davenger to check read part.

src/Interpreters/MutationsInterpreter.h Outdated Show resolved Hide resolved
src/Parsers/ASTDeleteQuery.cpp Show resolved Hide resolved
src/Storages/IStorage.h Outdated Show resolved Hide resolved
src/Storages/MergeTree/DataPartStorageOnDisk.cpp Outdated Show resolved Hide resolved
src/Storages/MergeTree/IMergeTreeDataPart.h Outdated Show resolved Hide resolved
src/Interpreters/InterpreterDeleteQuery.cpp Outdated Show resolved Hide resolved
src/Interpreters/MutationsInterpreter.h Outdated Show resolved Hide resolved
src/Parsers/ParserDeleteQuery.h Show resolved Hide resolved
src/Storages/MergeTree/IMergeTreeDataPart.cpp Outdated Show resolved Hide resolved
src/Storages/MergeTree/MergeTreeDataPartDeletedMask.cpp Outdated Show resolved Hide resolved
@Algunenano
Copy link
Member

Question about the feature in general:

A new data part is created after lightweight delete, all files expect checksums, deleted_rows_mask.bin... are created from hard links of the source part. The deleted rows mask is updated based on the old one in source part + new deleted rows's _part_offset according to where conditions.

How does this work with projections? Hard linking the data from the projections will lead to incorrect data. Shouldn't they be regenerated every time? Or maybe invalidated? Or maybe for now throw an error as not supported.

@davenger
Copy link
Member

davenger commented Jul 7, 2022

@zhangjmruc Thank you very much for working on this feature! I looked through the implementation and I'd like to suggest some improvements.

As far as I see from the code the whole deleted_rows mask for a part is always loaded in memory with the part metadata.
This might consume significant amount of memory for big tables.

The other thing is about filtering. Making the deleted_rows be more like an ordinary column with corresponding .mrk file should benefit it.
Lets say we have a table like this

CREATE TABLE t (id UInt64 , value String) ENGINE MergeTree() ORDER BY id;

INSERT INTO t SELECT number, randomString(1000) FROM system.numbers LIMIT 1000000;

where values are rather big strings
And at some point we delete a range of rows that covers several granules

DELETE FROM t WHERE id < 100000

and then we read form the table

SELECT id, length(value) FROM t ORDER BY id LIMIT 3

Query id: b70bf4f9-e882-47eb-9ba6-cc82a8a3cdca

┌─────id─┬─length(value)─┐
│ 100000 │          1000 │
│ 100001 │          1000 │
│ 100002 │          1000 │
└────────┴───────────────┘

3 rows in set. Elapsed: 0.561 sec. Processed 106.50 thousand rows, 6.61 MB (189.89 thousand rows/s., 11.78 MB/s.)

This is rather slow because we don't filter whole granules where deleted_mask is 1 but read big value column

if we add a fake "always true" PREWHERE condition the query executes much faster:

SELECT id, length(value) FROM t PREWHERE id >= 0 ORDER BY id LIMIT 3

Query id: 989e1721-31f2-4f8b-84e2-e08a6ad581ba

┌─────id─┬─length(value)─┐
│ 100000 │          1000 │
│ 100001 │          1000 │
│ 100002 │          1000 │
└────────┴───────────────┘

3 rows in set. Elapsed: 0.087 sec. Processed 139.26 thousand rows, 42.44 MB (1.59 million rows/s., 486.05 MB/s.)

Here it first reads the thin id column, filters it by the mask and only then reads big value columns only for needed rows

In order to have this fast filtering the deletion_mask check should be the first step in PREWHERE chain. I recently refactored reader code to support multiple steps of PREWHERE (#37165) and I'm willing to help with implementing this deletion_maks filtering.

@zhangjmruc zhangjmruc requested a review from alesapin July 8, 2022 03:25
@zhangjmruc
Copy link
Contributor Author

zhangjmruc commented Jul 8, 2022

Great contribution, thank you! Please check my review remarks. If possible, please add comments directly to the code instead of answering in PR.

I'll ask @davenger to check read part.

@alesapin Please help to review parts of code changes and check if they are expected or not.
The failed test case in integration tests are expected when we make ALTER DELETE lightweight, because the delete and drop column are mutated separately, hence the count of hard link are different as normal.

For MergeTreeDataPartDeletedMask, I refer to the implementation of #32774. According to the design of light weight delete RFC, the deletedMask should be saved as a roaring bitmap. From the perspective of easier query optimization, Davenger suggested the deletedMask should be saved as a bin file. +.mrk file. Maybe we need to create another issue to figure out which of the three implementations is more optimal. I'm willing to do more discussions with davenger or others to determine the final solution of this deletedMask, and also willing to be responsible for the final implementation of this deletedMask, but I hope to be able to merge other code into maser first.

@zhangjmruc
Copy link
Contributor Author

Question about the feature in general:

A new data part is created after lightweight delete, all files expect checksums, deleted_rows_mask.bin... are created from hard links of the source part. The deleted rows mask is updated based on the old one in source part + new deleted rows's _part_offset according to where conditions.

How does this work with projections? Hard linking the data from the projections will lead to incorrect data. Shouldn't they be regenerated every time? Or maybe invalidated? Or maybe for now throw an error as not supported.

I disabled lightweight delete mutate for part with projections. Thank you for pointing this.

@zhangjmruc
Copy link
Contributor Author

zhangjmruc commented Jul 8, 2022

@davenger Thank you for suggesting improvements.
Indeed, loading the deleted_rows mask with metadata will consume memory for big table. Hope if we plan to make deleted_rows like an ordinary column, this can be avoided. Currently, I remove the loadDeletedMask() function and rewrite getDeletedMask() to load deleted_rows mask when needed.

Please help me to implement the delete_rows filter. If possible, I wish the version in this pull request can be considered as prototype for LWD. Later an improved version with _delete.bin and _delete.mrk or other better solutions.
Please let me know your opinion.

My concerns here are:
Let's say the deleted_rows mask is stored in _delete.bin.

  1. Is the column _delete a virtual column?
  2. When should we add the "_delete = 0" to PREWHERE? For all the selects on MergeTreeTables with lightweight deleted mask?
  3. How do we read the _delete column in IMergeTreeReader? If we adding this to PREWHERE's read column, this is not a physical column in columns.txt.

The other thing is about filtering. Making the deleted_rows be more like an ordinary column with corresponding .mrk file should benefit it.

In order to have this fast filtering the deletion_mask check should be the first step in PREWHERE chain. I recently refactored reader code to support multiple steps of PREWHERE (#37165) and I'm willing to help with implementing this deletion_maks filtering.

@zhangjmruc zhangjmruc force-pushed the feature/sql-standard-delete branch 2 times, most recently from d055501 to 473a46b Compare July 12, 2022 01:40
@davenger davenger closed this Jul 25, 2022
@davenger davenger reopened this Jul 25, 2022
@zhangjmruc
Copy link
Contributor Author

Hi @davenger, thank you for helping reworked lightweight delete logic! The deleted mask is stored as normal columns and SELECT queries can work faster with PREWHERE condition on deleted mask.
However, as for the performance, because the DELETE WHERE queries are translated to UPDATE _row_exists = 0 WHERE, and the lightweight delete task is removed, now the lightweight delete is treated as an ordinary update mutation on the system column _row_exists. For cases that we delete 1 row in part with large number of rows, how is the performance of lightweight delete execution? Also for mutate all columns cases, the deleted mask is not applied, because PREWHERE on deleted mask can make faster?
For lightweight delete, the system table mutations and mutation.txt show the its command as "UPDATE _row_exists = 0 WHERE predicate". Can this be used as mark of lightweight delete?

src/Storages/LightweightDeleteDescription.h Outdated Show resolved Hide resolved
src/Storages/MergeTree/MutateTask.cpp Show resolved Hide resolved
@alesapin
Copy link
Member

We still need to implement replicated case

@davenger davenger force-pushed the feature/sql-standard-delete branch from 3f42378 to 48de02a Compare July 25, 2022 14:32
@lqhl
Copy link

lqhl commented Jul 26, 2022

We still need to implement replicated case

@alesapin We already implemented lightweight delete for ReplicatedMergeTree. Shall we submit the code to this PR or we can merge this one first and start a new PR for the replicated case.

@zhangjmruc
Copy link
Contributor Author

@davenger @alesapin I tried to implement lightweight delete for replicated case and added a test case for it. Please help to review and let me know your opinion?

Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of implementation for RMT. Thanks!

@davenger
Copy link
Member

2 perf test failures are expected because this new setting is not present in old build:
lightweight_delete | DB::Exception: Unknown setting allow_experimental_lightweight_delete.

@WillMeng
Copy link

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for wide part only.

Description: This is planned for lightweight delete on merge tree families. In this pull request, we can parse SQL standard DELETE syntax, and the lightweight delete is implemented for wide part only. A new data part is created after lightweight delete, all files expect checksums, deleted_rows_mask.bin... are created from hard links of the source part. The deleted rows mask is updated based on the old one in source part + new deleted rows's _part_offset according to where conditions. The SELECT query and MERGE now apply the deleted rows mask implicitly, filtering the "_delete = 1" rows.

=== Tests for lightweight delete and normal delete with "mutations_sync = 1" The table has 12 columns with different data types, total rows are 100,000,000, about 110 parts. The results show that the lightweight delete runs faster than normal delete. 200ms vs 8s.

--- 15 single row deletes 企业微信截图_39322126-b3bc-495e-9ea9-6db1da27418e

--- 10 range rows deletes 企业微信截图_d1dd0c0c-1dd5-48c0-a4e9-638c11a245f2

Thanks for the feature. We used a lot of "SETTINGS min_bytes_for_wide_part = '100M'" in our table, when support Compact part?

@zhangjmruc
Copy link
Contributor Author

@WillMeng I will update descriptions. The compact part and Replicated MergeTree are all supported. Please reference the new test cases.

@WillMeng
Copy link

@WillMeng I will update descriptions. The compact part and Replicated MergeTree are all supported. Please reference the new test cases.

Thank you. I have another question, when the data by marked "deleted" is clear from disk.

@zhangjmruc
Copy link
Contributor Author

@WillMeng I will update descriptions. The compact part and Replicated MergeTree are all supported. Please reference the new test cases.

Thank you. I have another question, when the data by marked "deleted" is clear from disk.

@WillMeng The "deleted" rows will be cleared after background merge.

@WillMeng
Copy link

WillMeng commented Aug 22, 2022

@zhangjmruc , Hi, We had try out this feature at first time. We found those problem:

  1. SELECT COUNT() FROM system.mutations WHERE is_done = 0, The result is 92568, and read timed out(30 seconds) at often
  2. The destination table become so slow when read it, and read timed out at often.

Our table schema like this :
`CREATE TABLE default.custom_conversion_day_details
(

`chan_pub_id` Int64,

`channel_id` Int32,

`publisher_id` Int32,

`pub_account_id` Int64,

`pub_account_name` String,

`pub_campaign_id` Int64,

`pub_campaign_name` String,

`pub_adgroup_id` Int64,

`pub_adgroup_name` String,
`data_date` Date)

ENGINE = MergeTree
PARTITION BY toYYYYMMDD(data_date)
PRIMARY KEY chan_pub_id
ORDER BY chan_pub_id
SETTINGS min_bytes_for_wide_part = '100M',
index_granularity = 8192;`

@zhangjmruc
Copy link
Contributor Author

@WillMeng Would you please share us more details about the unfinished mutations? The column latest_fail_reason contains info about failure reason. Or you need to provide error messages in /var/log/clickhouse-server/clickhouse-server.err.log.
Please open a new issue for your problem, we can continue inv there.

The lightweight delete just converted the delete to an update on virtual column _rows_exists. Not sure why so many unfinished mutations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants