Skip to content

Reset buffer cancellation on rewind#102524

Merged
nikitamikhaylov merged 6 commits intoClickHouse:masterfrom
yurifedoseev:fix/buffer-rewind-cancel
Apr 15, 2026
Merged

Reset buffer cancellation on rewind#102524
nikitamikhaylov merged 6 commits intoClickHouse:masterfrom
yurifedoseev:fix/buffer-rewind-cancel

Conversation

@yurifedoseev
Copy link
Copy Markdown
Contributor

@yurifedoseev yurifedoseev commented Apr 13, 2026

While testing a debug build I noticed chassert errors:

<Fatal> : Logical error: 'ReadBuffer is canceled. Can't read from it.'

Stack Trace:
2026-04-13 06:14:50.009640000
FATAL
[ 129 ] () <Fatal> BaseDaemon: Stack trace: 0x00007d0c4482e9fd 0x00007d0c447da476 0x00007d0c447c07f3 0x00005af4983b8036 0x00005af4983b8d87 0x00005af4984ef854 0x00005af4985f2fb2 0x00005af4985e9031 0x00005af4985f2b19 0x00005af4985f7847 0x00005af49854f819 0x00005af498556c8e 0x00007d0c4482cac3 0x00007d0c448be8d0

2026-04-13 06:14:50.009677000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 3. pthread_kill @ 0x00000000000969fd

2026-04-13 06:14:50.009697000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 4. gsignal @ 0x0000000000042476

2026-04-13 06:14:50.009709000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 5. __lgamma_r_finite @ 0x00000000000287f3

2026-04-13 06:14:50.025476000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 6. /ClickHouse/src/Common/Exception.cpp:60:5: DB::abortOnFailedAssertion(String const&, std::basic_string_view<char, std::char_traits<char>>, void* const*, unsigned long, unsigned long) @ 0x0000000013b72036

2026-04-13 06:14:50.040368000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 7. /ClickHouse/src/Common/Exception.cpp:66:5: ? @ 0x0000000013b72d87

2026-04-13 06:14:50.043907000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 8. /ClickHouse/src/IO/ReadBuffer.cpp:90:5: DB::ReadBuffer::next() @ 0x0000000013ca9854

2026-04-13 06:14:50.082812000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 9.0. inlined from /ClickHouse/src/IO/ReadBuffer.h:81: DB::ReadBuffer::eof()

2026-04-13 06:14:50.082842000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 9.1. inlined from /ClickHouse/src/IO/ReadHelpers.h:1920: DB::skipWhitespaceIfAny(DB::ReadBuffer&, bool)

2026-04-13 06:14:50.082847000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 9. /ClickHouse/src/Common/AsynchronousMetrics.cpp:591:38: DB::AsynchronousMetrics::BlockDeviceStatValues::read(DB::ReadBuffer&) @ 0x0000000013dacfb2

2026-04-13 06:14:50.191881000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 12.3. inlined from /ClickHouse/contrib/llvm-project/libcxx/include/tuple:1404: decltype(auto) std::apply[abi:sqe220101]<DB::AsynchronousMetrics::start()::$_0&, std::tuple<>&>(DB::AsynchronousMetrics::start()::$_0&, std::tuple<>&)

2026-04-13 06:14:50.191887000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 12.4. inlined from /ClickHouse/src/Common/ThreadPool.h:312: operator()

2026-04-13 06:14:50.191899000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 12.5. inlined from /ClickHouse/contrib/llvm-project/libcxx/include/__type_traits/invoke.h:90: std::__invoke_result_impl<void, ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::AsynchronousMetrics::start()::$_0>(DB::AsynchronousMetrics::start()::$_0&&)::'lambda'()&>::type std::__invoke[abi:sqe220101]<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::AsynchronousMetrics::start()::$_0>(DB::AsynchronousMetrics::start()::$_0&&)::'lambda'()&>(ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::AsynchronousMetrics::start()::$_0>(DB::AsynchronousMetrics::start()::$_0&&)::'lambda'()&)

2026-04-13 06:14:50.191911000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 12.6. inlined from /ClickHouse/contrib/llvm-project/libcxx/include/__type_traits/invoke.h:350: void std::__invoke_void_return_wrapper<void, true>::__call[abi:sqe220101]<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::AsynchronousMetrics::start()::$_0>(DB::AsynchronousMetrics::start()::$_0&&)::'lambda'()&>(ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::AsynchronousMetrics::start()::$_0>(DB::AsynchronousMetrics::start()::$_0&&)::'lambda'()&)

2026-04-13 06:14:50.191921000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 12.7. inlined from /ClickHouse/contrib/llvm-project/libcxx/include/__type_traits/invoke.h:356: DB::AsynchronousMetrics::start()::$_0 std::__invoke_r[abi:sqe220101]<void, ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::AsynchronousMetrics::start()::$_0>(DB::AsynchronousMetrics::start()::$_0&&)::'lambda'()&>()

2026-04-13 06:14:50.191927000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 12. /ClickHouse/contrib/llvm-project/libcxx/include/__functional/function.h:443:62: ? @ 0x0000000013db1847

2026-04-13 06:14:50.204254000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 13.0. inlined from /ClickHouse/contrib/llvm-project/libcxx/include/__functional/function.h:502: ?

2026-04-13 06:14:50.204281000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 13.1. inlined from /ClickHouse/contrib/llvm-project/libcxx/include/__functional/function.h:754: ?

2026-04-13 06:14:50.204286000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 13. /ClickHouse/src/Common/ThreadPool.cpp:809:12: ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x0000000013d09819

2026-04-13 06:14:50.226512000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 14.0. inlined from /ClickHouse/contrib/llvm-project/libcxx/include/__type_traits/invoke.h:0: std::__invoke_result_impl<void, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>::type std::__invoke[abi:sqe220101]<void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>(void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*&&)

2026-04-13 06:14:50.226555000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 14.1. inlined from /ClickHouse/contrib/llvm-project/libcxx/include/__thread/thread.h:161: void std::__thread_execute[abi:sqe220101]<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*, 0ul, 1ul>(std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>&, std::__integer_sequence<unsigned long, 0ul, 1ul>)

2026-04-13 06:14:50.226561000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 14. /ClickHouse/contrib/llvm-project/libcxx/include/__thread/thread.h:169: void* std::__thread_proxy[abi:sqe220101]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x0000000013d10c8e
2026-04-13 06:14:50.226584000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 15. ? @ 0x0000000000094ac3

2026-04-13 06:14:50.226600000
FATAL
[ 129 ] () <Fatal> BaseDaemon: 16. ? @ 0x00000000001268d0

2026-04-13 06:14:50.709913000
FATAL
[ 129 ] () <Fatal> BaseDaemon: Integrity check of the executable successfully passed (checksum: 84B0D13EE5AC6303B5A54097728ED598)

The check is defined in src/IO/ReadBuffer.cpp, line 90:

chassert(!isCanceled(), "ReadBuffer is canceled. Can't read from it.");

The issue is caused by calling ReadBufferFromFileDescriptor::rewind() without resetting canceled flag.
The fix resets the canceled flag on rewind. The same fix is applied to AsynchronousReadBufferFromFileDescriptor::rewind() for consistency

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixes a chassert exception ReadBuffer is canceled in debug builds in AsynchronousMetrics, caused by rewind not resetting the buffer cancellation flag.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 13, 2026

Workflow [PR], commit [7b51a5a]

Summary:


AI Review

Summary

This PR fixes a real state-reset bug in rewind for both ReadBufferFromFileDescriptor and AsynchronousReadBufferFromFileDescriptor by clearing the inherited canceled flag before the next read cycle. It also adds regression tests for both sync and async paths. I did not find correctness, safety, concurrency, or performance issues in the patch itself.

Missing context
  • ⚠️ CI logs/results were not provided in the review input, so validation here is source-based only.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-build Pull request with build/testing/packaging improvement label Apr 13, 2026
@yurifedoseev yurifedoseev marked this pull request as ready for review April 13, 2026 08:11
Copy link
Copy Markdown
Member

@Ergus Ergus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test for this fix?

@Ergus
Copy link
Copy Markdown
Member

Ergus commented Apr 13, 2026

This looks like a bugfix, not an improvement.

@yurifedoseev
Copy link
Copy Markdown
Contributor Author

yurifedoseev commented Apr 13, 2026

This looks like a bugfix, not an improvement.

I was a little bit confused with a comment around "bugfix (user-visible misbehavior)". Because the issue is only visible for users in debug releases. Let me set it as a bugfix

Could we add a test for this fix?

Sure, let me try to add it

@clickhouse-gh clickhouse-gh Bot added pr-bugfix Pull request with bugfix, not backported by default and removed pr-build Pull request with build/testing/packaging improvement labels Apr 13, 2026
Comment thread src/IO/AsynchronousReadBufferFromFileDescriptor.cpp
@Ergus Ergus self-assigned this Apr 13, 2026
@Ergus
Copy link
Copy Markdown
Member

Ergus commented Apr 13, 2026

Hi @yurifedoseev

As you describe the issue it could be reproduced from sql right? If so, then an sql test is preferred over unit tests in cases like this.

@yurifedoseev
Copy link
Copy Markdown
Contributor Author

yurifedoseev commented Apr 13, 2026

Hi @yurifedoseev

As you describe the issue it could be reproduced from sql right? If so, then an sql test is preferred over unit tests in cases like this.

The error wasn't triggered by any specific sql. The device->rewind() method was called from AsynchronousMetrics::BlockDeviceStatValues which run in a background thread. The error is reproduced when chassert is active (i.e. debug build) and one of the async metrics read is cancelled.

What's your recommendation over tests? I see that rewind method is used in AsynchronousMetrics and MemoryWorker. So far I've noticed the issue only with AsynchronousMetrics

Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not merge before fixing "Logical error: '!res.empty()' (STID: 2508-3ea0)"

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 14, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 100.00% (42/42) · Uncovered code

Full report · Diff report

@yurifedoseev
Copy link
Copy Markdown
Contributor Author

Do not merge before fixing "Logical error: '!res.empty()' (STID: 2508-3ea0)"

I've merged master with your latest fix for !res.empty(). Looks ok now

@Ergus
Copy link
Copy Markdown
Member

Ergus commented Apr 15, 2026

I think we can merge this. @alexey-milovidov any objection?

@nikitamikhaylov nikitamikhaylov added this pull request to the merge queue Apr 15, 2026
Merged via the queue into ClickHouse:master with commit d48daf2 Apr 15, 2026
162 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants