Preserve original exception when sendCancel fails during INSERT#97339
Preserve original exception when sendCancel fails during INSERT#97339alexey-milovidov merged 6 commits intomasterfrom
Conversation
In `ClientBase::processInsertQuery`, when the client-side parallel parser throws a parsing exception (e.g., with the correct row number), the catch block calls `sendCancel()` to notify the server. But `sendCancel()` performs socket I/O via `Connection::sendCancel()` which can throw `NETWORK_ERROR` if the server already closed the connection. This new exception escapes the catch block, permanently replacing the original parsing exception - the `throw;` is never reached. This was observed as a flaky failure in `01583_parallel_parsing_exception_with_offset`: the test expects "(at row 2000001)" in the error, but the client reports "Connection reset by peer" instead. The fix wraps `sendCancel()` and `receiveEndOfQueryForInsert()` in an inner try-catch so connection errors during cleanup cannot replace the original exception. https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=b841c75e5133904d06b5d02f79485aceff0af3b6&name_0=MasterCI&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Flaky check shows that the test is slow. But it's okay for this test. |
7a5f7fb to
db96399
Compare
Fgrtue
left a comment
There was a problem hiding this comment.
I was not able to reproduce the bug, even though I can see that the changed code can be erroneous in its nature, and that the change might fix this.
In the server logs in failed cases I can see the following pattern: 1) the client sends some data 2) the server is trying to send profile event to the client 3) server receives a network error Connection reset by peer, while writing to socket. The error happens after about 5 minutes, or 300 seconds, which is set as our receive_timeout by default.
Claude suggests that there can be a deadlock scenario:
- Client is sending massive amounts of data and fills the server's TCP receive buffer
- Server periodically tries to send profile events back
- Client's TCP receive buffer fills up (client busy sending, not reading responses)
- Server's send() blocks waiting for buffer space (up to 300 seconds)
- During this time, server stops reading from client
- Client's TCP send buffer also fills up
- Both sides are blocked on send operations → effective deadlock
And from there I could not find any evidence that the flakiness is related to the overridden exceptions. However, we can try to see if the fix does the job -- maybe it is going to work. It is valid anyways.
…n-insert Preserve original exception when sendCancel fails during INSERT
Summary
ClientBase::processInsertQuerycallssendCancel()to notify the serversendCancel()performs socket I/O viaConnection::sendCancel()which can throwNETWORK_ERRORif the server already closed the connectionthrow;is never reachedsendCancel+receiveEndOfQueryForInsert) in an inner try-catch so connection errors cannot replace the original exceptionCI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=b841c75e5133904d06b5d02f79485aceff0af3b6&name_0=MasterCI&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix the client reporting
NETWORK_ERRORinstead of the actual parsing error (with the correct row number) when an INSERT with parallel parsing encounters invalid data.Documentation entry for user-facing changes
🤖 Generated with Claude Code