Improve GLOBAL IN/JOIN performance and respect max_execution_time by c-end · Pull Request #95966 · ClickHouse/ClickHouse

c-end · 2026-02-04T16:56:46Z

This PR tries to address some issues with queries using GLOBAL IN/JOIN.

The first issue is about performance. Processing the data from the GLOBAL subquery can take a significant amount of time on the remote replicas receiving this data. The data is sent by RemoteQueryExecutor in blocks. The default block size is 65409. TCPHandler::receiveData creates a new QueryPipeline and PushingPipelineExecutor for each block. MemorySink::onFinish copies all existing blocks of the memory table before appending the new block. I think it does that for snapshot isolation. But this makes the external table initialization O(n^2) (with n being the number of blocks). We've seen GLOBAL IN subqueries in production that produced billions of rows, leading to thousands of blocks. For those cases, the quadratic complexity becomes an issue (queries were running for hours).

I'm addressing this performance problem by using a single pipeline and executor per external table instead of per block. This allows us to flush the data from MemorySink to StorageMemory only once. This should be fine because nothing will query the memory table until all data is available.

It's worth noting that with the analyzer enabled, this is less of an issue because externabl table blocks are squashed (see min_external_table_block_size_rows and min_external_table_block_size_bytes settings). So there are less blocks, but the fundamental issue still exists.

The second issue is that no timeouts are checked during external table initialization. If table initialization takes longer than max_execution_time (because of the performance issue described above or any other reason), the query should be aborted. This is fixed by checking the elapsed time in the read loop of TCPHandler::readData.

In addition to that, I think that timeouts like receive_timeout and send_timeout should be capped by max_execution_time, if present. This avoids waiting for a dead/frozen initiator replica longer than necessary. This is done in TCPHandler::extractConnectionSettingsFromContext.

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Improve performance for GLOBAL IN/JOIN subqueries that return a large number of rows and respect max_execution_time on remote servers.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Note

Medium Risk
Touches core TCP protocol handling for external table reception and connection timeouts; bugs here could cause stuck queries, premature timeouts, or unfinished pipeline executors under error paths.

Overview
Improves performance of GLOBAL IN/JOIN external table initialization by reusing a single QueryPipeline/PushingPipelineExecutor per external table instead of recreating one per received block, and ensures these executors are finished/cancelled via QueryState lifecycle management.

Makes remote replicas respect max_execution_time during external table reception (throws TIMEOUT_EXCEEDED) and caps send_timeout/receive_timeout/poll_interval by max_execution_time when set. Adds profile events (InitializeExternalTablesMicroseconds, SendExternalTablesMicroseconds) plus a new failpoint (sleep_on_receive_external_table_data) and an integration test covering both execution-time and socket-timeout scenarios.

^{Written by Cursor Bugbot for commit 13bb0b1. This will update automatically on new commits. Configure here.}

c-end · 2026-02-09T12:05:01Z

@alexey-milovidov could you or someone else who's familiar with this part of the codebase take a look? I ran some tests in one of our production clusters and the changes from this PR indeed seem to fix the "stuck" (extremely slow) GLOBAL IN queries.

clickhouse-gh · 2026-02-09T21:44:49Z

Workflow [PR], commit [13bb0b1]

Summary: ❌

job_name	test_name	status	info
Integration tests (amd_asan, flaky)		failure
	test_global_in/test.py::test_max_execution_time_socket_timeout[1-2-3]	FAIL	cidb
Stateless tests (amd_asan, distributed plan, parallel, 2/2)		failure
	00755_avg_value_size_hint_passing	FAIL	cidb
Integration tests (amd_tsan, 6/6)		failure
	test_global_in/test.py::test_max_execution_time_socket_timeout[0]	FAIL	cidb
	test_global_in/test.py::test_max_execution_time_socket_timeout[1]	FAIL	cidb
Integration tests (amd_msan, 6/6)		failure
	test_global_in/test.py::test_success[1]	FAIL	cidb
	test_global_in/test.py::test_max_execution_time_timeout[1]	FAIL	cidb
	test_global_in/test.py::test_max_execution_time_socket_timeout[1]	FAIL	cidb
Integration tests (amd_llvm_coverage, 5/5)		failure
	test_global_in/test.py::test_success[0]	FAIL	cidb
	test_global_in/test.py::test_success[1]	FAIL	cidb
	test_global_in/test.py::test_max_execution_time_timeout[0]	FAIL	cidb
	test_global_in/test.py::test_max_execution_time_socket_timeout[0]	FAIL	cidb

c-end · 2026-02-23T14:51:12Z

@nickitat could you take a look and trigger another CI run? Locally, the test I added passes even if I run it multiple times. But the timeouts might be too strict for busy CI runners.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

cursor · 2026-03-10T12:25:53Z

src/Server/TCPHandler.cpp

    while (receivePacketsExpectData(state))
    {
+        if (max_execution_time.totalSeconds() > 0 && watch.elapsedSeconds() > max_execution_time.totalSeconds())
+            throw Exception(ErrorCodes::TIMEOUT_EXCEEDED, "Timeout exceeded while reading external table data. Spent {} seconds, timeout is {} seconds.", watch.elapsedSeconds(), max_execution_time.totalSeconds());


Integer truncation of fractional max_execution_time causes wrong timeout

Low Severity

max_execution_time.totalSeconds() truncates to integer, causing incorrect behavior for fractional-second values. For max_execution_time = 1.5s, totalSeconds() returns 1, so the timeout fires at 1 second instead of 1.5. For max_execution_time = 0.5s, totalSeconds() returns 0, so the guard totalSeconds() > 0 fails and the timeout check is completely skipped. The same truncation issue affects the saturate_timeout guard and poll_interval capping in extractConnectionSettingsFromContext. Using totalMicroseconds() (or comparing Poco::Timespan values directly) would avoid the precision loss.

Additional Locations (1)

src/Server/TCPHandler.cpp#L1039-L1043

clickhouse-gh · 2026-03-10T16:23:30Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	83.80%	83.80%	+0.00%
Functions	23.90%	23.90%	+0.00%
Branches	76.30%	76.30%	+0.00%

PR changed lines: PR changed-lines coverage: 97.12% (101/104)
Diff coverage report
Uncovered code

c-end · 2026-03-11T09:36:34Z

@nickitat could you take a look at the code changes? I will look into making the integration test less flaky. Locally it works reliably even if I run it a hundred times, but maybe the query is just too big for CI.

c-end force-pushed the global_in branch from 255e4bc to 44f1ecb Compare February 9, 2026 12:00

clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Feb 9, 2026

nickitat added the can be tested Allows running workflows for external contributors label Feb 16, 2026

clickhouse-gh bot added the manual approve Manual approve required to run CI label Feb 19, 2026

cursor bot reviewed Mar 10, 2026

View reviewed changes

c-end added 2 commits March 10, 2026 12:27

Improve GLOBAL IN/JOIN performance and respect max_execution_time

2960bb5

make test more robust

13bb0b1

c-end force-pushed the global_in branch from 2999fa6 to 13bb0b1 Compare March 10, 2026 12:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve GLOBAL IN/JOIN performance and respect max_execution_time#95966

Improve GLOBAL IN/JOIN performance and respect max_execution_time#95966
c-end wants to merge 2 commits intoClickHouse:masterfrom
c-end:global_in

c-end commented Feb 4, 2026 •

edited by cursor bot

Loading

Uh oh!

c-end commented Feb 9, 2026

Uh oh!

clickhouse-gh bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

c-end commented Feb 23, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 10, 2026

Uh oh!

clickhouse-gh bot commented Mar 10, 2026

Uh oh!

c-end commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

c-end commented Feb 4, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

c-end commented Feb 9, 2026

Uh oh!

clickhouse-gh bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

c-end commented Feb 23, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 10, 2026

Choose a reason for hiding this comment

Integer truncation of fractional max_execution_time causes wrong timeout

Uh oh!

clickhouse-gh bot commented Mar 10, 2026

LLVM Coverage Report

Uh oh!

c-end commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

c-end commented Feb 4, 2026 •

edited by cursor bot

Loading

clickhouse-gh bot commented Feb 9, 2026 •

edited

Loading

Integer truncation of fractional `max_execution_time` causes wrong timeout