Skip to content

Improve GLOBAL IN/JOIN performance and respect max_execution_time#95966

Open
c-end wants to merge 2 commits intoClickHouse:masterfrom
c-end:global_in
Open

Improve GLOBAL IN/JOIN performance and respect max_execution_time#95966
c-end wants to merge 2 commits intoClickHouse:masterfrom
c-end:global_in

Conversation

@c-end
Copy link
Contributor

@c-end c-end commented Feb 4, 2026

This PR tries to address some issues with queries using GLOBAL IN/JOIN.

The first issue is about performance. Processing the data from the GLOBAL subquery can take a significant amount of time on the remote replicas receiving this data. The data is sent by RemoteQueryExecutor in blocks. The default block size is 65409. TCPHandler::receiveData creates a new QueryPipeline and PushingPipelineExecutor for each block. MemorySink::onFinish copies all existing blocks of the memory table before appending the new block. I think it does that for snapshot isolation. But this makes the external table initialization O(n^2) (with n being the number of blocks). We've seen GLOBAL IN subqueries in production that produced billions of rows, leading to thousands of blocks. For those cases, the quadratic complexity becomes an issue (queries were running for hours).

I'm addressing this performance problem by using a single pipeline and executor per external table instead of per block. This allows us to flush the data from MemorySink to StorageMemory only once. This should be fine because nothing will query the memory table until all data is available.

It's worth noting that with the analyzer enabled, this is less of an issue because externabl table blocks are squashed (see min_external_table_block_size_rows and min_external_table_block_size_bytes settings). So there are less blocks, but the fundamental issue still exists.

The second issue is that no timeouts are checked during external table initialization. If table initialization takes longer than max_execution_time (because of the performance issue described above or any other reason), the query should be aborted. This is fixed by checking the elapsed time in the read loop of TCPHandler::readData.

In addition to that, I think that timeouts like receive_timeout and send_timeout should be capped by max_execution_time, if present. This avoids waiting for a dead/frozen initiator replica longer than necessary. This is done in TCPHandler::extractConnectionSettingsFromContext.

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Improve performance for GLOBAL IN/JOIN subqueries that return a large number of rows and respect max_execution_time on remote servers.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Touches core TCP protocol handling for external table reception and connection timeouts; bugs here could cause stuck queries, premature timeouts, or unfinished pipeline executors under error paths.

Overview
Improves performance of GLOBAL IN/JOIN external table initialization by reusing a single QueryPipeline/PushingPipelineExecutor per external table instead of recreating one per received block, and ensures these executors are finished/cancelled via QueryState lifecycle management.

Makes remote replicas respect max_execution_time during external table reception (throws TIMEOUT_EXCEEDED) and caps send_timeout/receive_timeout/poll_interval by max_execution_time when set. Adds profile events (InitializeExternalTablesMicroseconds, SendExternalTablesMicroseconds) plus a new failpoint (sleep_on_receive_external_table_data) and an integration test covering both execution-time and socket-timeout scenarios.

Written by Cursor Bugbot for commit 13bb0b1. This will update automatically on new commits. Configure here.

@c-end
Copy link
Contributor Author

c-end commented Feb 9, 2026

@alexey-milovidov could you or someone else who's familiar with this part of the codebase take a look? I ran some tests in one of our production clusters and the changes from this PR indeed seem to fix the "stuck" (extremely slow) GLOBAL IN queries.

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Feb 9, 2026

Workflow [PR], commit [13bb0b1]

Summary:

job_name test_name status info comment
Integration tests (amd_asan, flaky) failure
test_global_in/test.py::test_max_execution_time_socket_timeout[1-2-3] FAIL cidb
Stateless tests (amd_asan, distributed plan, parallel, 2/2) failure
00755_avg_value_size_hint_passing FAIL cidb
Integration tests (amd_tsan, 6/6) failure
test_global_in/test.py::test_max_execution_time_socket_timeout[0] FAIL cidb
test_global_in/test.py::test_max_execution_time_socket_timeout[1] FAIL cidb
Integration tests (amd_msan, 6/6) failure
test_global_in/test.py::test_success[1] FAIL cidb
test_global_in/test.py::test_max_execution_time_timeout[1] FAIL cidb
test_global_in/test.py::test_max_execution_time_socket_timeout[1] FAIL cidb
Integration tests (amd_llvm_coverage, 5/5) failure
test_global_in/test.py::test_success[0] FAIL cidb
test_global_in/test.py::test_success[1] FAIL cidb
test_global_in/test.py::test_max_execution_time_timeout[0] FAIL cidb
test_global_in/test.py::test_max_execution_time_socket_timeout[0] FAIL cidb

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Feb 9, 2026
@nickitat nickitat added the can be tested Allows running workflows for external contributors label Feb 16, 2026
@clickhouse-gh clickhouse-gh bot added the manual approve Manual approve required to run CI label Feb 19, 2026
@c-end
Copy link
Contributor Author

c-end commented Feb 23, 2026

@nickitat could you take a look and trigger another CI run? Locally, the test I added passes even if I run it multiple times. But the timeouts might be too strict for busy CI runners.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

while (receivePacketsExpectData(state))
{
if (max_execution_time.totalSeconds() > 0 && watch.elapsedSeconds() > max_execution_time.totalSeconds())
throw Exception(ErrorCodes::TIMEOUT_EXCEEDED, "Timeout exceeded while reading external table data. Spent {} seconds, timeout is {} seconds.", watch.elapsedSeconds(), max_execution_time.totalSeconds());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer truncation of fractional max_execution_time causes wrong timeout

Low Severity

max_execution_time.totalSeconds() truncates to integer, causing incorrect behavior for fractional-second values. For max_execution_time = 1.5s, totalSeconds() returns 1, so the timeout fires at 1 second instead of 1.5. For max_execution_time = 0.5s, totalSeconds() returns 0, so the guard totalSeconds() > 0 fails and the timeout check is completely skipped. The same truncation issue affects the saturate_timeout guard and poll_interval capping in extractConnectionSettingsFromContext. Using totalMicroseconds() (or comparing Poco::Timespan values directly) would avoid the precision loss.

Additional Locations (1)

Fix in Cursor Fix in Web

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Mar 10, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.80% +0.00%
Functions 23.90% 23.90% +0.00%
Branches 76.30% 76.30% +0.00%

PR changed lines: PR changed-lines coverage: 97.12% (101/104)
Diff coverage report
Uncovered code

@c-end
Copy link
Contributor Author

c-end commented Mar 11, 2026

@nickitat could you take a look at the code changes? I will look into making the integration test less flaky. Locally it works reliably even if I run it a hundred times, but maybe the query is just too big for CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors manual approve Manual approve required to run CI pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants