Skip to content

Set a non-zero exit code when client elapses receive_timeout#91432

Merged
Fgrtue merged 4 commits intoClickHouse:masterfrom
sberss:clickhouse-client-receive-timeout-exit-code
Dec 13, 2025
Merged

Set a non-zero exit code when client elapses receive_timeout#91432
Fgrtue merged 4 commits intoClickHouse:masterfrom
sberss:clickhouse-client-receive-timeout-exit-code

Conversation

@sberss
Copy link
Copy Markdown
Contributor

@sberss sberss commented Dec 3, 2025

Changelog category (leave one):

  • Backward Incompatible Change

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Update clickhouse-client to return a non-zero exit code (159 - TIMEOUT_EXCEEDED) when a query times out due to receive_timeout. Previously, timeouts would return exit code 0 (success), making it difficult for scripts and automation to detect timeout failures.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Change Details:

Motivation:
When running clickhouse-client in non-interactive mode (scripts/automation), timeout errors were incorrectly treated as successful execution. The client would print "Timeout exceeded while receiving data from server" but exit with code 0, causing scripts to incorrectly assume the query completed successfully.

This is particularly problematic for long-running INSERT statements where timeouts are a real failure condition that should stop script execution.

Behavior Change:

  • Before: Timeout → prints error message → exit code 0
  • After: Timeout → prints error message → exit code 159 (TIMEOUT_EXCEEDED)

Example:

# Before this fix - timeout returns 0.
clickhouse-client --receive_timeout=5 --query="INSERT INTO table SELECT * FROM huge_table"
# Timeout exceeded while receiving data from server. Waited for 5 seconds, timeout is 5 seconds.
echo $?  # Returns: 0 (incorrectly indicates success)
# After this fix - timeout returns error code.
clickhouse-client --receive_timeout=5 --query="INSERT INTO table SELECT * FROM huge_table"
# Timeout exceeded while receiving data from server. Waited for 5 seconds, timeout is 5 seconds.
echo $?  # Returns: 159 (correctly indicates timeout error)
# Scripts can now properly detect and handle timeout failures:
if ! clickhouse-client --query="..."; then
    echo "Query failed or timed out"
    exit 1
fi

@Fgrtue Fgrtue self-assigned this Dec 4, 2025
@Fgrtue Fgrtue added the can be tested Allows running workflows for external contributors label Dec 4, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Dec 4, 2025

Workflow [PR], commit [a196398]

Summary:

job_name test_name status info comment
Integration tests (amd_asan, db disk, old analyzer, 3/6) failure
test_concurrent_ttl_merges/test.py::test_limited_ttl_merges_in_empty_pool_replicated FAIL cidb
test_plain_rewritable_backward_compatibility/test.py::test_backward_compatibility[disk = 's3_plain_rewritable'-0] FAIL cidb
test_plain_rewritable_backward_compatibility/test.py::test_backward_compatibility[table_disk = 1, disk = 's3_plain_rewritable'-1] FAIL cidb
test_concurrent_ttl_merges/test.py::test_limited_ttl_merges_two_replicas FAIL cidb
test_plain_rewritable_backward_compatibility/test.py::test_backward_compatibility_readonly_tables FAIL cidb
test_plain_rewritable_backward_compatibility/test.py::test_backward_compatibility_bug_80393 FAIL cidb
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting in Function_arrayElement: the query: FAIL cidb, issue

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Dec 4, 2025
@Fgrtue
Copy link
Copy Markdown
Contributor

Fgrtue commented Dec 4, 2025

Hi @sberss! Thank you for contribution. Could you please add a test that will allow to test the improvement that you are adding?

Here are the docs for your reference.

@sberss sberss force-pushed the clickhouse-client-receive-timeout-exit-code branch from f978952 to 2997c1e Compare December 5, 2025 11:42
@sberss sberss force-pushed the clickhouse-client-receive-timeout-exit-code branch from 2997c1e to ad54a68 Compare December 5, 2025 11:43
@sberss
Copy link
Copy Markdown
Contributor Author

sberss commented Dec 5, 2025

Good shout @Fgrtue - I have added a test to check for the new exit code.

@Fgrtue
Copy link
Copy Markdown
Contributor

Fgrtue commented Dec 5, 2025

@sberss сould you also add a test with a different receive_timeout value? For example, when it's set to 1.
I tested clickhouse client --receive_timeout=1 --query="SELECT sleep(2); SELECT 1" and instead of the expected exception and error code, I received a normal response with 0 code. Just want to make sure that the use cases you're addressing work correctly after this change.

@Fgrtue Fgrtue requested a review from Copilot December 5, 2025 15:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves error handling in clickhouse-client by ensuring timeout failures return a proper non-zero exit code (159 - TIMEOUT_EXCEEDED) instead of exiting successfully with code 0. This change enables scripts and automation to correctly detect and handle query timeout scenarios.

Key Changes:

  • Modified timeout handling to throw an exception instead of just printing an error message
  • Added comprehensive test coverage to verify the new exit code behavior

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/Client/ClientBase.cpp Changed timeout handling from printing an error message to throwing a TIMEOUT_EXCEEDED exception, ensuring non-zero exit code
tests/queries/0_stateless/03734_client_receive_timeout_exit_code.sh Added test script to verify timeout returns exit code 159 and error message is displayed
tests/queries/0_stateless/03734_client_receive_timeout_exit_code.reference Added expected output for the test (exit code 159 and grep success code 0)

@sberss
Copy link
Copy Markdown
Contributor Author

sberss commented Dec 5, 2025

I tested clickhouse client --receive_timeout=1 --query="SELECT sleep(2); SELECT 1" and instead of the expected exception and error code, I received a normal response with 0 code. Just want to make sure that the use cases you're addressing work correctly after this change.

This also seems to happen on the master branch. I was a bit confused by this also, and digging a bit further it seems it's not super easy to hit this timeout! I was seeing this when downloading files from S3 (i.e. SELECT * FROM s3('...');), and it doesn't feel like a good idea to add that as a test.

I'm struggling to find anything else that triggers the timeout when set to greater than 0. Do you have any suggestions?

@ClickHouse ClickHouse deleted a comment from Copilot AI Dec 5, 2025
@sberss sberss force-pushed the clickhouse-client-receive-timeout-exit-code branch from ad54a68 to 3337ad6 Compare December 5, 2025 16:01
@Fgrtue
Copy link
Copy Markdown
Contributor

Fgrtue commented Dec 5, 2025

I don't have ideas on the test at the moment, but I just wanted to make sure that this is the behavior that you want. Probably the client does receive some packets internally from the server, and that is why the timeout does not fire. However, I can imagine it happening in the cases when there are no packets received for time longer than the receive_timeout. Let's leave the test like it is now this then.

Could you also fix the Exception::createRuntime(ErrorCodes::TIMEOUT_EXCEEDED, error_message) to a more optimal version, and then we can test it.

@sberss
Copy link
Copy Markdown
Contributor Author

sberss commented Dec 5, 2025

Could you also fix the Exception::createRuntime(ErrorCodes::TIMEOUT_EXCEEDED, error_message) to a more optimal version, and then we can test it.

If this refers to the Copilot message, it seems to have been deleted. Do you have the specifics on what is needed here?

@Fgrtue
Copy link
Copy Markdown
Contributor

Fgrtue commented Dec 5, 2025

If this refers to the Copilot message, it seems to have been deleted. Do you have the specifics on what is needed here?

I deleted the Copilot message as it was misleading and added my own explanation of what I suggest to do. If it works for you, feel free to use it a guidance.

@sberss
Copy link
Copy Markdown
Contributor Author

sberss commented Dec 8, 2025

Sorry @Fgrtue, I'm potentially being completely blind, but I don't see the explanation you are referring to.

@Fgrtue
Copy link
Copy Markdown
Contributor

Fgrtue commented Dec 8, 2025

No worries, I'll copy it here:

The Exception constructor uses PreformattedMessage and error code in the constrictor. At the moment we additionally create another exception before passing it to make_unique constructor. This seems to be redundant, since we can just create PreformatterMessage instead of a std::string and pass it as an argument to make_unique.

Could you adjust this part please?

@sberss sberss force-pushed the clickhouse-client-receive-timeout-exit-code branch from 3337ad6 to a07fd91 Compare December 8, 2025 17:16
@sberss
Copy link
Copy Markdown
Contributor Author

sberss commented Dec 8, 2025

Ah, I see what you mean - does the change I made look OK?

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Dec 9, 2025

CLA assistant check
All committers have signed the CLA.

@Fgrtue Fgrtue force-pushed the clickhouse-client-receive-timeout-exit-code branch from fcfc6e1 to fc35478 Compare December 9, 2025 11:23
@Fgrtue
Copy link
Copy Markdown
Contributor

Fgrtue commented Dec 9, 2025

Hi @sberss, your change looked a bit different from what I had in mind, so I updated it to streamline the implementation. I hope this is fine.

I discussed the test issue with other colleagues. As I mentioned, the server is probably sending some information to the client, which is why the timeout doesn't fire. It's unclear what exactly the server is sending, and this requires some investigation.

Unfortunately, I don't think I can merge without a test that properly validates the functionality with a non-trivial receive_timeout. I suggest investigating what the server is doing and why we cannot make the timeout fire. Once it is understood this, you would be able to write an appropriate test and we can proceed with the merge.

@sberss
Copy link
Copy Markdown
Contributor Author

sberss commented Dec 9, 2025

I had another look into this and found that the server was sending progress packets every 100ms (controlled by the interactive_delay setting), which kept resetting the receive timeout counter. So with SELECT sleep(2) and receive_timeout=1, the client would receive a progress packet every 100ms, preventing the timeout from ever firing.

I've updated the test to set interactive_delay=10000000 (10 seconds), which disables progress packets during the test and means the test works as expected when the timeout is set >0. Is this an acceptable solution?

@sberss sberss force-pushed the clickhouse-client-receive-timeout-exit-code branch from 8fbe23f to a0d71f2 Compare December 9, 2025 12:25
…ogress packets resetting the timeout counter
@sberss sberss force-pushed the clickhouse-client-receive-timeout-exit-code branch from a0d71f2 to a196398 Compare December 11, 2025 18:12
@Fgrtue
Copy link
Copy Markdown
Contributor

Fgrtue commented Dec 11, 2025

Hi @sberss! Sorry for the late reply. Yes, the tests look good now. Let's see what the tests do, and if there is nothing related inside we will proceed.

@Fgrtue
Copy link
Copy Markdown
Contributor

Fgrtue commented Dec 12, 2025

BuzzHouse amd_debug cidb

Integration tests (amd_asan, db disk, old analyzer, 3/6) -- several tests: cidb cidb cidb cidb cidb cidb

  • All of the tests failed at master on 11.12.25 or 12.12.25. Seem to be flaky and unrelated to these changes.

@Fgrtue Fgrtue added this pull request to the merge queue Dec 13, 2025
Merged via the queue into ClickHouse:master with commit 002a1ee Dec 13, 2025
375 of 382 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants