Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory limit (total) exceeded during insert leads to partially write #11546

Closed
dmitryikh opened this issue Jun 9, 2020 · 8 comments
Closed
Labels
bug Confirmed user-visible misbehaviour in official release st-need-info We need extra data to continue (waiting for response)

Comments

@dmitryikh
Copy link

dmitryikh commented Jun 9, 2020

(you don't have to strictly follow this form)

Describe the bug
I use java clickhouse connector to insert data into Clickhouse. Once I inserted 4 rows and got the answer from the clickhouse:

Caused by: ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 241, host: clickhouse.prod.env, port: 38123; Code: 241, e.displayText() = DB::Exception: Memory limit (total) exceeded: would use 123.87 GiB (attempt to allocate chunk of 4217748 bytes), maximum: 123.87 GiB (version 19.16.10.44 (official build))

(The memory seems to be consumed by another queries).

After that I found that 3 of 4 records was inserted and the one (last in ORDER BY sort) was not inserted.

It seems that clickhouse violates block insert atomicity in case of out of memory errors?

IMHO, there should be atomicity in block write - either all rows are inserted or none of rows.

@dmitryikh dmitryikh added the bug Confirmed user-visible misbehaviour in official release label Jun 9, 2020
@alexey-milovidov alexey-milovidov added the st-need-info We need extra data to continue (waiting for response) label Jun 9, 2020
@alexey-milovidov
Copy link
Member

Insertion of one block into table of MergeTree family is atomic. This is true in all ClickHouse versions.
Is it true that all records were sent in one block?

@dmitryikh
Copy link
Author

dmitryikh commented Jun 9, 2020

I didn't dive into Java Clickhouse driver. But I suppose that this code should insert in one block:

        try(Connection connection = dataSource.getConnection()) {
            // create batch insert statement
            PreparedStatement statement = connection.prepareStatement(this.insertSql);
            populateInsertStatement(recordsAsJson, statement);
            return Arrays.stream(statement.executeBatch()).sum();

        } catch (SQLException e) {
            // omitted
        }

We do inserts once per minute and a single block can be easily detected by the same value in updated_time column (default NOW()).

We bumped into the problem described above only once when got out of memory error..

@den-crane
Copy link
Contributor

@dmitryikh Check that all 4 records belong to 1 partition (table partition by). Partitions break insert to several blocks.

@dmitryikh
Copy link
Author

@den-crane , Yes! you right! I missed that.
Thank you.

I'm going to close the issue.

@RonanMorgan
Copy link

RonanMorgan commented Jul 17, 2020

Hello,

I have a similar issue,
Caused by: java.lang.Throwable: Code: 241, e.displayText() = DB::Exception: Memory limit (total) exceeded: would use 28.06 GiB (attempt to allocate chunk of 4382610 bytes), maximum: 28.06 GiB (version 20.5.2.7 (official build))

I am using the jdbc official driver to save 1 500K rows, PARTITION BY the starting date (and it's a two-days data set)

The code :

				while (json_iterator.hasNext())
				{
					try {

						sth
						    .write()
						    .table("poc.visits_distributed")
						    .sql("INSERT INTO poc.visits_distributed")
						    .format(ClickHouseFormat.JSONEachRow)
						    .data(new ByteArrayInputStream(json_iterator.next().getBytes()))
						    .send();
					} catch(Exception e) {
						throw e;
					}
				}

I tried to set
<max_insert_block_size>1024</max_insert_block_size>
<min_insert_block_size_rows>128</min_insert_block_size_rows>
In the user.xml configuration, in the default part, but it didn't work.

Any idea ? Do I have to create a new issue for that ?

@RonanMorgan
Copy link

RonanMorgan commented Jul 17, 2020

ok I fixed my issue by using the batch API of the jdbc driver instead of the writer one, I still don't understand the error log though (maybe an overhead when writting rows one by one)

@den-crane
Copy link
Contributor

@RonanMorgan you can't set min_insert_block_size_rows / max_insert_block_size in server. They should be set as query properties in JDBC query.
I think your issue is because of JSONEachRow / parralel parsing, try to set input_format_parallel_parsing=0 in JDBC

@Qsimple
Copy link

Qsimple commented Jun 28, 2021

meet same error, the server version is 20.4.2。When I insert 400MB+ data into ONE partition of a merge tree table, the Memory limit(total) exception happen. The client has retry, but finally, we got more data in clickhouse then in datasource (hive).
the SUM(writen_rows) in system.query_log show the same data size with the datasource. But the sum(rows) system.parts show the more data size.
It's exactly 1000 times type=2 query in system.query_logs and 4 times error_code=241 query. How the locate the extra data in Clickhouse???

Hope your reponse, tks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed user-visible misbehaviour in official release st-need-info We need extra data to continue (waiting for response)
Projects
None yet
Development

No branches or pull requests

5 participants