Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete SQL Cause memory overflow #9670

Closed
showsmall opened this issue Mar 15, 2020 · 7 comments
Closed

delete SQL Cause memory overflow #9670

showsmall opened this issue Mar 15, 2020 · 7 comments
Assignees
Labels
bug Confirmed user-visible misbehaviour in official release comp-mutations ALTER UPDATE/DELETE

Comments

@showsmall
Copy link

OS_version:CentOS release 6.10 (Final)
kernel_version:4.4.214-1.el6.elrepo.x86_64
clickhouse_version:20.1.4 revision 54431

create table:
CREATE TABLE ck.detail (id Int64,chat_info_id Int64 COMMENT 'chat_id',company_id Int64 COMMENT 'ID', content String COMMENT 'str', create_datetime DateTime COMMENT 'time', integer1 Int64,integer2 Int64,integer3 Int64,integer4 Int64,integer5 Int64,char1 String, char2 String, char3 String, char4 String, char5 String,ck_sign Int8 DEFAULT CAST('1', 'Int8') COMMENT 'ckSign') ENGINE = ReplicatedCollapsingMergeTree('/clickhouse/tables/01-01/detail', 'detail_28', ck_sign) PARTITION BY company_id ORDER BY (id, chat_info_id, create_datetime);

execute sql:ALTER TABLE detail DELETE WHERE (company_id = 37) AND (id = 121632077),Causes a full table scan and overflows system memory,sql Did not hit any partition.
execute sql:SELECT * FROM detail WHERE (company_id = 37) AND (id = 121632077),sql Hit the partition with high efficiency.
I don't understand why?

clickhouse logs:
2020.03.15 18:23:26.817086 [ 46 ] {344250df-1179-4b6a-b50a-469eaeee09a3} executeQuery: (from 172.16.10.26:53606) ALTER TABLE detail DELETE WHERE (company_id = 37) AND (id = 121632077)
2020.03.15 18:23:26.830361 [ 46 ] {344250df-1179-4b6a-b50a-469eaeee09a3} ck.detail: Created mutation with ID 0000000003
2020.03.15 18:23:26.830813 [ 28 ] {} ck.detail (ReplicatedMergeTreeQueue): Loading 1 mutation entries: 0000000003 - 0000000003
2020.03.15 18:23:26.835891 [ 46 ] {344250df-1179-4b6a-b50a-469eaeee09a3} MemoryTracker: Peak memory usage (for query): 0.00 B.
2020.03.15 18:23:26.835963 [ 46 ] {} MemoryTracker: Peak memory usage (total): 0.00 B.
2020.03.15 18:23:26.836002 [ 46 ] {} TCPHandler: Processed in 0.019 sec.
2020.03.15 18:23:26.841030 [ 35 ] {} ck.detail (ReplicatedMergeTreeQueue): Pulling 1 entries to queue: log-0000148990 - log-0000148990
2020.03.15 18:23:26.844889 [ 35 ] {} ck.detail (ReplicatedMergeTreeQueue): Pulled 1 entries to queue.
2020.03.15 18:23:26.845024 [ 9 ] {} ck.detail: Executing log entry to mutate part 1_0_6492_2131_7772 to 1_0_6492_2131_7773
2020.03.15 18:23:26.845079 [ 9 ] {} DiskLocal: Reserving 8.41 MiB on disk default, having unreserved 337.14 GiB.
2020.03.15 18:23:26.845669 [ 9 ] {} ck.detail (SelectExecutor): Key condition: unknown, (column 0 in [121632077, 121632077]), and
2020.03.15 18:23:26.845687 [ 9 ] {} ck.detail (SelectExecutor): MinMax index condition: (column 0 in [37, 37]), unknown, and
2020.03.15 18:23:26.845705 [ 9 ] {} ck.detail (SelectExecutor): Selected 0 parts by date, 0 parts by key, 0 marks to read from 0 ranges
2020.03.15 18:23:26.845730 [ 9 ] {} InterpreterSelectQuery: FetchColumns -> Complete
2020.03.15 18:23:26.845783 [ 9 ] {} Aggregator: Aggregating
2020.03.15 18:23:26.845799 [ 9 ] {} Aggregator: Aggregation method: without_key
2020.03.15 18:23:26.845815 [ 9 ] {} Aggregator: Aggregated. 0 to 1 rows (from 0.000 MiB) in 0.000 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2020.03.15 18:23:26.845822 [ 9 ] {} Aggregator: Merging aggregated data
2020.03.15 18:23:26.845841 [ 9 ] {} virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2020.03.15 18:23:26.845873 [ 9 ] {} ck.detail (MergerMutator): Part 1_0_6492_2131_7772 doesn't change up to mutation version 7773
2020.03.15 18:23:26.845892 [ 9 ] {} DiskLocal: Reserving 7.65 MiB on disk default, having unreserved 337.13 GiB.
2020.03.15 18:23:26.845920 [ 9 ] {} ck.detail: Cloning part /Data/clickhouse/clickhousedb/data/ck/detail/1_0_6492_2131_7772/ to /Data/clickhouse/clickhousedb/data/ck/detail/tmp_clone_1_0_6492_2131_7773
2020.03.15 18:23:26.847549 [ 26 ] {} ck.detail (ReplicatedMergeTreeQueue): Pulling 1 entries to queue: log-0000148991 - log-0000148991
2020.03.15 18:23:26.847741 [ 9 ] {} ck.detail: Renaming temporary part tmp_clone_1_0_6492_2131_7773 to 1_0_6492_2131_7773.
2020.03.15 18:23:26.849444 [ 26 ] {} ck.detail (ReplicatedMergeTreeQueue): Pulled 1 entries to queue.
2020.03.15 18:23:26.849514 [ 21 ] {} ck.detail: Executing log entry to mutate part 1_6493_7768_930_7772 to 1_6493_7768_930_7773
2020.03.15 18:23:26.849550 [ 21 ] {} DiskLocal: Reserving 2.00 MiB on disk default, having unreserved 337.13 GiB.
2020.03.15 18:23:26.850237 [ 21 ] {} ck.detail (SelectExecutor): Key condition: unknown, (column 0 in [121632077, 121632077]), and
2020.03.15 18:23:26.850261 [ 21 ] {} ck.detail (SelectExecutor): MinMax index condition: (column 0 in [37, 37]), unknown, and
2020.03.15 18:23:26.850273 [ 21 ] {} ck.detail (SelectExecutor): Selected 0 parts by date, 0 parts by key, 0 marks to read from 0 ranges
2020.03.15 18:23:26.850323 [ 21 ] {} InterpreterSelectQuery: FetchColumns -> Complete

2020.03.15 18:23:27.560326 [ 19 ] {} ck.detail (SelectExecutor): Key condition: unknown, (column 0 in [121632077, 121632077]), and
2020.03.15 18:23:27.560345 [ 19 ] {} ck.detail (SelectExecutor): MinMax index condition: (column 0 in [37, 37]), unknown, and
2020.03.15 18:23:27.560353 [ 19 ] {} ck.detail (SelectExecutor): Selected 0 parts by date, 0 parts by key, 0 marks to read from 0 ranges
2020.03.15 18:23:27.560380 [ 19 ] {} InterpreterSelectQuery: FetchColumns -> Complete
2020.03.15 18:23:27.560433 [ 19 ] {} Aggregator: Aggregating
2020.03.15 18:23:27.560447 [ 19 ] {} Aggregator: Aggregation method: without_key
2020.03.15 18:23:27.560474 [ 19 ] {} Aggregator: Aggregated. 0 to 1 rows (from 0.000 MiB) in 0.000 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2020.03.15 18:23:27.560486 [ 19 ] {} Aggregator: Merging aggregated data
2020.03.15 18:23:27.560509 [ 19 ] {} virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2020.03.15 18:23:27.560555 [ 19 ] {} ck.detail (MergerMutator): Part 87_0_0_0_3 doesn't change up to mutation version 4
2020.03.15 18:23:27.560577 [ 19 ] {} DiskLocal: Reserving 1.00 MiB on disk default, having unreserved 333.15 GiB.
2020.03.15 18:23:27.560605 [ 19 ] {} ck.detail: Cloning part /Data/clickhouse/clickhousedb/data/ck/detail/87_0_0_0_3/ to /Data/clickhouse/clickhousedb/data/ck/detail/tmp_clone_87_0_0_0_4
2020.03.15 18:23:27.562207 [ 19 ] {} ck.detail: Renaming temporary part tmp_clone_87_0_0_0_4 to 87_0_0_0_4.
2020.03.15 18:23:27.570979 [ 27 ] {} ck.detail (ReplicatedMergeTreeQueue): Pulling 1 entries to queue: log-0000149049 - log-0000149049
2020.03.15 18:23:27.579359 [ 27 ] {} ck.detail (ReplicatedMergeTreeQueue): Pulled 1 entries to queue.
2020.03.15 18:23:27.579481 [ 21 ] {} ck.detail: Executing log entry to mutate part 90_0_0_0_3 to 90_0_0_0_4
2020.03.15 18:23:27.579536 [ 21 ] {} DiskLocal: Reserving 1.00 MiB on disk default, having unreserved 333.15 GiB.
2020.03.15 18:23:27.580043 [ 21 ] {} ck.detail (SelectExecutor): Key condition: unknown, (column 0 in [121632077, 121632077]), and
2020.03.15 18:23:27.580085 [ 21 ] {} ck.detail (SelectExecutor): MinMax index condition: (column 0 in [37, 37]), unknown, and
2020.03.15 18:23:27.580100 [ 21 ] {} ck.detail (SelectExecutor): Selected 0 parts by date, 0 parts by key, 0 marks to read from 0 ranges
2020.03.15 18:23:27.580135 [ 21 ] {} InterpreterSelectQuery: FetchColumns -> Complete
2020.03.15 18:23:27.580213 [ 21 ] {} Aggregator: Aggregating
2020.03.15 18:23:27.580242 [ 21 ] {} Aggregator: Aggregation method: without_key
2020.03.15 18:23:27.580281 [ 21 ] {} Aggregator: Aggregated. 0 to 1 rows (from 0.000 MiB) in 0.000 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2020.03.15 18:23:27.580318 [ 21 ] {} Aggregator: Merging aggregated data
2020.03.15 18:23:27.580348 [ 21 ] {} virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2020.03.15 18:23:27.580401 [ 21 ] {} ck.detail (MergerMutator): Part 90_0_0_0_3 doesn't change up to mutation version 4
2020.03.15 18:23:27.580432 [ 21 ] {} DiskLocal: Reserving 1.00 MiB on disk default, having unreserved 333.15 GiB.
2020.03.15 18:23:27.580479 [ 21 ] {} ck.detail: Cloning part /Data/clickhouse/clickhousedb/data/ck/detail/90_0_0_0_3/ to /Data/clickhouse/clickhousedb/data/ck/detail/tmp_clone_90_0_0_0_4
2020.03.15 18:23:27.582087 [ 21 ] {} ck.detail: Renaming temporary part tmp_clone_90_0_0_0_4 to 90_0_0_0_4.
2020.03.15 18:23:28.414744 [ 7 ] {} MemoryTracker: Current memory usage: 1.00 GiB.
2020.03.15 18:23:29.476398 [ 7 ] {} MemoryTracker: Current memory usage: 2.00 GiB.
2020.03.15 18:23:30.652241 [ 7 ] {} MemoryTracker: Current memory usage: 3.00 GiB.
2020.03.15 18:23:33.417664 [ 7 ] {} MemoryTracker: Current memory usage: 4.00 GiB.
2020.03.15 18:23:37.244366 [ 7 ] {} MemoryTracker: Current memory usage: 5.00 GiB.
2020.03.15 18:23:40.171480 [ 7 ] {} MemoryTracker: Current memory usage: 6.00 GiB.
2020.03.15 18:23:44.143887 [ 7 ] {} MemoryTracker: Current memory usage: 7.00 GiB.
2020.03.15 18:23:45.744932 [ 7 ] {} MemoryTracker: Current memory usage: 8.00 GiB.

Here is the query SQL log:
2020.03.15 18:49:45.753888 [ 46 ] {6742553c-0615-4eb1-b6f6-982447a2a1d3} executeQuery: (from 172.16.10.26:53618) SELECT * FROM detail WHERE (company_id = 37) AND (id = 121632077)
2020.03.15 18:49:45.754415 [ 46 ] {6742553c-0615-4eb1-b6f6-982447a2a1d3} InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "company_id = 37" moved to PREWHERE
2020.03.15 18:49:45.756014 [ 46 ] {6742553c-0615-4eb1-b6f6-982447a2a1d3} ck.detail (SelectExecutor): Key condition: unknown, (column 0 in [121632077, 121632077]), and, unknown, and
2020.03.15 18:49:45.756049 [ 46 ] {6742553c-0615-4eb1-b6f6-982447a2a1d3} ck.detail (SelectExecutor): MinMax index condition: (column 0 in [37, 37]), unknown, and, (column 0 in [37, 37]), and
2020.03.15 18:49:45.756085 [ 46 ] {6742553c-0615-4eb1-b6f6-982447a2a1d3} ck.detail (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1 marks to read from 1 ranges
2020.03.15 18:49:45.756258 [ 46 ] {6742553c-0615-4eb1-b6f6-982447a2a1d3} ck.detail (SelectExecutor): Reading approx. 8192 rows with 1 streams
2020.03.15 18:49:45.756402 [ 46 ] {6742553c-0615-4eb1-b6f6-982447a2a1d3} InterpreterSelectQuery: FetchColumns -> Complete
2020.03.15 18:49:45.756580 [ 46 ] {6742553c-0615-4eb1-b6f6-982447a2a1d3} executeQuery: Query pipeline:
Expression
Expression
Filter
MergeTreeThread

2020.03.15 18:49:45.774276 [ 46 ] {6742553c-0615-4eb1-b6f6-982447a2a1d3} executeQuery: Read 8192 rows, 2.82 MiB in 0.020 sec., 403349 rows/sec., 139.06 MiB/sec.
2020.03.15 18:49:45.774335 [ 46 ] {6742553c-0615-4eb1-b6f6-982447a2a1d3} MemoryTracker: Peak memory usage (for query): 8.37 MiB.
2020.03.15 18:49:45.774440 [ 46 ] {} MemoryTracker: Peak memory usage (total): 8.37 MiB.
2020.03.15 18:49:45.774480 [ 46 ] {} TCPHandler: Processed in 0.021 sec.

@showsmall showsmall added the bug Confirmed user-visible misbehaviour in official release label Mar 15, 2020
@KochetovNicolai
Copy link
Member

KochetovNicolai commented Mar 16, 2020

Causes a full table scan and overflows system memory,sql Did not hit any partition.

It is not quite correct. Query indeed check every part of table. But for every part we create separate select query which counts the number of rows affected. If no rows affected, part is just renamed. It is written in logs:

Start processing part:

2020.03.15 18:23:27.579481 [ 21 ] {} ck.detail: Executing log entry to mutate part 90_0_0_0_3 to 90_0_0_0_4

Get the number of affected rows. 0 parts means we do not read data at all.

2020.03.15 18:23:27.580100 [ 21 ] {} ck.detail (SelectExecutor): Selected 0 parts by date, 0 parts by key, 0 marks to read from 0 ranges

This message says that part wasn't changed, so we just rename it.

2020.03.15 18:23:27.580401 [ 21 ] {} ck.detail (MergerMutator): Part 90_0_0_0_3 doesn't change up to mutation version 4

Also, take a look at thread number in []. Messages

2020.03.15 18:23:28.414744 [ 7 ] {} MemoryTracker: Current memory usage: 1.00 GiB.
2020.03.15 18:23:29.476398 [ 7 ] {} MemoryTracker: Current memory usage: 2.00 GiB.
2020.03.15 18:23:30.652241 [ 7 ] {} MemoryTracker: Current memory usage: 3.00 GiB.
2020.03.15 18:23:33.417664 [ 7 ] {} MemoryTracker: Current memory usage: 4.00 GiB.
2020.03.15 18:23:37.244366 [ 7 ] {} MemoryTracker: Current memory usage: 5.00 GiB.
2020.03.15 18:23:40.171480 [ 7 ] {} MemoryTracker: Current memory usage: 6.00 GiB.
2020.03.15 18:23:44.143887 [ 7 ] {} MemoryTracker: Current memory usage: 7.00 GiB.
2020.03.15 18:23:45.744932 [ 7 ] {} MemoryTracker: Current memory usage: 8.00 GiB.

say that thread number 7 is allocating memory, but there are no other messages in log with this thread, so I can't understand what's going on (it may be as well caused by independent select). So, please copy other messages with this thread number above. Probably, it will help to understand what it does.

@onine007
Copy link

onine007 commented Mar 17, 2020

I got same issue . #9533
and it works fine on version 19.16.14 and 19.15.4

@showsmall
Copy link
Author

showsmall commented Mar 17, 2020

@KochetovNicolai
A、Does the process of renaming a file take up a lot of memory.
B、Cloning part /Data/clickhouse/clickhousedb/data/ck/detail/1_0_6492_2131_7772/ to /Data/clickhouse/clickhousedb/data/ck/detail/tmp_clone_1_0_6492_2131_7773:The previous log is for partition replication. Does this process take up a lot of memory.
C、I checked the log again. Thread number 7 did appear after executing alter table detail delete where (company id = 37) and (id = 121632077). I restarted the Clickhouse service before executing SQL. Please see the attachment for the detailed log

iss.log

@KochetovNicolai
Copy link
Member

A. and B. takes almost no memory.

@KochetovNicolai
Copy link
Member

Well, according to log, threads number 7 does actual mutation:

2020.03.15 18:23:27.160160 [ 7 ] {} <Trace> ck.detail: Executing log entry to mutate part 37_0_12020_13_12037 to 37_0_12020_13_12038
2020.03.15 18:23:27.160214 [ 7 ] {} <Debug> DiskLocal: Reserving 3.99 GiB on disk `default`, having unreserved 337.14 GiB.
2020.03.15 18:23:27.160787 [ 7 ] {} <Debug> ck.detail (SelectExecutor): Key condition: unknown, (column 0 in [121632077, 121632077]), and
2020.03.15 18:23:27.160816 [ 7 ] {} <Debug> ck.detail (SelectExecutor): MinMax index condition: (column 0 in [37, 37]), unknown, and
2020.03.15 18:23:27.160849 [ 7 ] {} <Debug> ck.detail (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1 marks to read from 1 ranges
2020.03.15 18:23:27.160906 [ 7 ] {} <Trace> MergeTreeSelectProcessor: Reading 1 ranges from part 37_0_12020_13_12037, approx. 8192 rows starting from 79804627
2020.03.15 18:23:27.160967 [ 7 ] {} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2020.03.15 18:23:27.161064 [ 7 ] {} <Trace> Aggregator: Aggregating
2020.03.15 18:23:27.161547 [ 7 ] {} <Trace> Aggregator: Aggregation method: without_key
2020.03.15 18:23:27.161596 [ 7 ] {} <Trace> Aggregator: Aggregated. 1 to 1 rows (from 0.000 MiB) in 0.000 sec. (2002.178 rows/sec., 0.017 MiB/sec.)
2020.03.15 18:23:27.161617 [ 7 ] {} <Trace> Aggregator: Merging aggregated data
2020.03.15 18:23:27.161643 [ 7 ] {} <Trace> virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2020.03.15 18:23:27.161711 [ 7 ] {} <Trace> ck.detail (MergerMutator): Mutating part 37_0_12020_13_12037 to mutation version 12038
2020.03.15 18:23:27.162623 [ 7 ] {} <Debug> ck.detail (SelectExecutor): Key condition: unknown, (column 0 in [121632077, 121632077]), and, not
2020.03.15 18:23:27.162654 [ 7 ] {} <Debug> ck.detail (SelectExecutor): MinMax index condition: (column 0 in [37, 37]), unknown, and, not
2020.03.15 18:23:27.166885 [ 7 ] {} <Debug> ck.detail (SelectExecutor): Selected 1 parts by date, 1 parts by key, 10349 marks to read from 1 ranges
2020.03.15 18:23:27.166980 [ 7 ] {} <Trace> MergeTreeSelectProcessor: Reading 1 ranges from part 37_0_12020_13_12037, approx. 80153406 rows starting from 0
2020.03.15 18:23:27.167055 [ 7 ] {} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2020.03.15 18:23:28.414744 [ 7 ] {} <Debug> MemoryTracker: Current memory usage: 1.00 GiB.
2020.03.15 18:23:29.476398 [ 7 ] {} <Debug> MemoryTracker: Current memory usage: 2.00 GiB.
2020.03.15 18:23:30.652241 [ 7 ] {} <Debug> MemoryTracker: Current memory usage: 3.00 GiB.
2020.03.15 18:23:33.417664 [ 7 ] {} <Debug> MemoryTracker: Current memory usage: 4.00 GiB.
2020.03.15 18:23:37.244366 [ 7 ] {} <Debug> MemoryTracker: Current memory usage: 5.00 GiB.
2020.03.15 18:23:40.171480 [ 7 ] {} <Debug> MemoryTracker: Current memory usage: 6.00 GiB.
2020.03.15 18:23:44.143887 [ 7 ] {} <Debug> MemoryTracker: Current memory usage: 7.00 GiB.
2020.03.15 18:23:45.744932 [ 7 ] {} <Debug> MemoryTracker: Current memory usage: 8.00 GiB.

Which is ok, cause part is from affected partition (part name stats from 37)

@KochetovNicolai
Copy link
Member

This still may happen if you have "fat" rows.
Maybe temporarily decreasing of max_block_size for default user may help.

@alesapin
Copy link
Member

It was not completely fixed. Complete fix in #9860.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed user-visible misbehaviour in official release comp-mutations ALTER UPDATE/DELETE
Projects
None yet
Development

No branches or pull requests

5 participants