ClickHouse reads 40 million data and writes it into MySQL. When 35 million data is read, an error is reported. #1550

T-M-L-C · 2024-02-23T13:47:59Z

clickhouse-jdbc: version(0.4.6)
error message: java.sql.SQLException: java.io.StreamCorruptedException: Reached end of input stream after reading 6719 of 10487 bytes
clickhouse: version(23.3.2.1)
error message:
Code: 24.DB::Exception: Cannot write to ostream at offset 1082130432: While executing ParallelFormattingOutputFormat.(CANNOT_WRITE_TO_OSTREAM)

T-M-L-C · 2024-02-23T14:05:38Z

chernser · 2024-06-18T19:45:22Z

Good day, @T-M-L-C !
Is the problem reproducible? Is it on the latest version, too?

Thanks in advance!

TimonSP · 2024-07-23T10:36:02Z

There is a similar issue, but it is intermittent issue, occurs mostly when client side is under pressure (several statements)

ClickHouse:

Code: 24. DB::Exception: Cannot write to ostream at offset 7340032: While executing ParallelFormattingOutputFormat. (CANNOT_WRITE_TO_OSTREAM) (version 23.8.14.6 (official build))

Client side:

Error reading from db: java.io.StreamCorruptedException: Reached end of input stream after reading 70390 of 93625 bytes

Any thought what could be the reason? And how to avoid such failures?

chernser · 2024-07-23T17:38:28Z

@TimonSP What are your client / server versions?
(removed internal link)

TimonSP · 2024-07-23T18:32:38Z

@chernser, sorry, link is not work for me (Page not found)

ClickHouse version 23.8.14.6
clickhouse-java version 0.4.6

wallacms · 2024-08-21T22:14:49Z

I'm seeing this same issue using:
Clickhouse 23.9.1.1854
clickhouse-java version 0.6.3

Similar to @TimonSP we see this intermittently, but frequently. We are not able to reproduce on demand, but we could turn on any debug logging that might be helpful. It always occurs when reading large amounts of records from Clickhouse (> 10M). We see this both when using the Clickhouse JDBC driver and when using the native HTTP client.

One guess I had was that this could be related to TCP send/recv buffers. Is this a plausible explanation? How does Clickhouse handle the TCP send buffer being full?

Also, the link to the issue does not work for me either. I think this is because it's in the "clickhouse-private" repo. Can you give some more information about what this issue references?

TimonSP · 2024-09-02T07:59:30Z

@chernser could you please copy-paste ClickHouse issue at [internal link]
so we could check if it is the case

TimonSP · 2024-09-02T15:22:04Z

I was managed to reproduce issue using Thread.sleep as load simulation.
I'm using simple query for 10KK numbers generation and two pauses on 25 seconds after 1k and 2k.
The issue is reproducible each time on such pause (for shorter pause sometimes succeeded).

import com.clickhouse.jdbc.ClickHouseDataSource;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Properties;

public class ConsoleTestException {
    public static void main(String[] args) {
        String user_name = "***";
        String user_passsword = "***";
        String url = "jdbc:clickhouse://***:8123";
        String query = "SELECT toString(cityHash64(number)) FROM numbers(10000000)";

        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");

        Properties properties = new Properties();

        ClickHouseDataSource dataSource;
        Connection connection;
        PreparedStatement statement;
        ResultSet rs;
        int idx = 1;
        try {
            dataSource = new ClickHouseDataSource(url, properties);
            connection = dataSource.getConnection(user_name, user_passsword);
            statement = connection.prepareStatement(query);
            rs = statement.executeQuery();
            while (rs.next()) {
                if (idx % 100 == 0){
                    System.out.println(String.format("[%s] %s", sdf.format(new Date()), idx));

                    if (idx == 1000 || idx == 2000){
                        try {
                            Thread.sleep(25000);
                        } catch (InterruptedException e) {
                            e.printStackTrace();
                        }
                        System.out.println(String.format("[%s] sleep completed", sdf.format(new Date())));
                    }
                }
                idx++;
            }
            System.out.println(String.format("[%s] total %s", sdf.format(new Date()), idx));
        } catch (SQLException e) {
            System.out.println(String.format("[%s] last %s", sdf.format(new Date()), idx));
            e.printStackTrace();
        }
    }
}

But I'm afraid it is ClickHouse issue, it is reproducible with python clickhouse_connect too

import clickhouse_connect
import time

if __name__ == '__main__':
    client = clickhouse_connect.get_client(
        host="***",
        port=8123,
        username="***",
        password="***"
    )
    with client.query_rows_stream('SELECT toString(cityHash64(number)) FROM numbers(10000000)') as stream:
        idx = 1
        for row in stream:
            if idx % 1000 == 0:
                print(idx)
            if idx == 1000 or idx == 2000:
                time.sleep(25)
            idx += 1
            pass

Fails with clickhouse_connect.driver.exceptions.StreamFailureError

TimonSP · 2024-09-03T19:22:05Z

The same code on the 24.3.7.30 provided another (more clear) exception:

Code: 209. DB::NetException: Timeout exceeded while writing to socket ([::ffff:10.131.0.27]:55976, 30000 ms): While executing ParallelFormattingOutputFormat. (SOCKET_TIMEOUT) (version 24.3.7.30 (official build)) (from [::ffff:***]:55976) (in query: SELECT toString(cityHash64(number)) FROM numbers(10000000)

That helped to find something that looks like root cause of the issue, it is http_send_timeout but it should be set for default profile only and restart would required, see ClickHouse/ClickHouse#64731 for details

I've a bit problem to check this out (as far as I don't have access to change default profile)
@wallacms , hope it would help you

wallacms · 2024-09-03T20:24:40Z

I can confirm that we fixed this problem by increasing the http_send_timeout in the default profile.

chernser · 2024-09-03T20:47:56Z

@TimonSP @wallacms sorry it is an internal link.
It is about memory allocation, but I think it is not related now. If there is a memory issue - it should be reported on the server side.
There is a PR ClickHouse/ClickHouse#51732 related to that issue. Change was done recently this year.

@wallacms

10M+ result should not create an issue. With cloud instance there is no problem to read >1.6 Gb datasets with current java client.
you may increase logger level to debug on the server side and see when http session is created. Server will output Logout when request is done. If anything goes wrong - server will report it.

chernser · 2024-09-03T21:12:52Z

@TimonSP
http_send_timeout from my understanding should be a problem only for slow readers (clients) (here is CH code https://github.com/ClickHouse/ClickHouse/blob/master/src/Server/HTTP/HTTPServerRequest.cpp and here is description of what is send/recv timeout for socket:

Specify the receiving or sending timeouts until reporting an error. The argument is a struct timeval. If an input or output function blocks for this period of time, and data has been sent or received, the return value of that function will be the amount of data transferred;

I read it like this http_send_timeout is defines for how long server will hold unsent data before giving up. In your example code with Thread.sleep you are emulating such case. However timeout of 30 seconds is big enough and it means that client is not receiving data at all for long time.
@wallacms mentioned socket buffers - it may help to release server connection, but would not solve a problem, I think.

Summary

increasing timeout may be hides a problem
we need to understand why client/server is not sending data at each case
- is it a GC pause
- is it lock somewhere
- is it because reading from source is slow

wallacms · 2024-09-03T21:36:36Z

@chernser We've observed this problem primarily in the context of accessing Clickhouse through Dremio. Since Dremio is a distributed execution engine, it can have long pauses in reading results from Clickhouse while it is servicing other queries or doing upstream processing on the results it gets from Clickhouse. Thus, in our case increasing the the timeout is a reasonable thing to do because we expect long pauses.

TimonSP · 2024-09-04T01:50:04Z

@chernser
It looks like normal case, there is no problem in reading or on server, but the data is processing during the read.
In my case it is BI tool that read data from ClickHouse and store in own storage, so the last action can take time.

chernser · 2024-09-04T05:20:06Z

@wallacms @TimonSP thank you for sharing your use-cases.

@TimonSP how do you handle errors like:

writing to BI tool storage failed
reading from ClickHouse failed

Do you track what data is already synced or do you repeat whole request?

TimonSP · 2024-09-04T05:49:21Z

@chernser whole process repeats, it is a third party tool for us so we cannot control its behavior

BTW, I've found that the setting was decreased in 23.6 ClickHouse/ClickHouse#51171

laeg · 2024-11-05T11:24:22Z

@TimonSP - which BI tool are you using?

T-M-L-C added the bug label Feb 23, 2024

This was referenced Feb 24, 2024

WriteBufferFromOStream(): Code: 24, e.displayText() = DB::Exception: Cannot write to ostream at offset 1071166 ClickHouse/ClickHouse#3981

Closed

Bump ClickHouse driver to 0.4.6 and fix broken link trinodb/trino#17469

Merged

chernser added the investigating Investigation of a root cause is on going label Jun 18, 2024

TimonSP mentioned this issue Sep 4, 2024

Decrease default timeouts for S3 and HTTP requests ClickHouse/ClickHouse#51171

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClickHouse reads 40 million data and writes it into MySQL. When 35 million data is read, an error is reported. #1550

ClickHouse reads 40 million data and writes it into MySQL. When 35 million data is read, an error is reported. #1550

T-M-L-C commented Feb 23, 2024

T-M-L-C commented Feb 23, 2024

chernser commented Jun 18, 2024

TimonSP commented Jul 23, 2024 •

edited

Loading

chernser commented Jul 23, 2024 •

edited

Loading

TimonSP commented Jul 23, 2024

wallacms commented Aug 21, 2024 •

edited

Loading

TimonSP commented Sep 2, 2024 •

edited by chernser

Loading

TimonSP commented Sep 2, 2024

TimonSP commented Sep 3, 2024

wallacms commented Sep 3, 2024

chernser commented Sep 3, 2024 •

edited

Loading

chernser commented Sep 3, 2024

wallacms commented Sep 3, 2024

TimonSP commented Sep 4, 2024

chernser commented Sep 4, 2024

TimonSP commented Sep 4, 2024

laeg commented Nov 5, 2024

ClickHouse reads 40 million data and writes it into MySQL. When 35 million data is read, an error is reported. #1550

ClickHouse reads 40 million data and writes it into MySQL. When 35 million data is read, an error is reported. #1550

Comments

T-M-L-C commented Feb 23, 2024

T-M-L-C commented Feb 23, 2024

chernser commented Jun 18, 2024

TimonSP commented Jul 23, 2024 • edited Loading

chernser commented Jul 23, 2024 • edited Loading

TimonSP commented Jul 23, 2024

wallacms commented Aug 21, 2024 • edited Loading

TimonSP commented Sep 2, 2024 • edited by chernser Loading

TimonSP commented Sep 2, 2024

TimonSP commented Sep 3, 2024

wallacms commented Sep 3, 2024

chernser commented Sep 3, 2024 • edited Loading

chernser commented Sep 3, 2024

wallacms commented Sep 3, 2024

TimonSP commented Sep 4, 2024

chernser commented Sep 4, 2024

TimonSP commented Sep 4, 2024

laeg commented Nov 5, 2024

TimonSP commented Jul 23, 2024 •

edited

Loading

chernser commented Jul 23, 2024 •

edited

Loading

wallacms commented Aug 21, 2024 •

edited

Loading

TimonSP commented Sep 2, 2024 •

edited by chernser

Loading

chernser commented Sep 3, 2024 •

edited

Loading