Skip to content

Allow disabling of PSI_*_* async metrics collection#88557

Merged
alesapin merged 3 commits intoClickHouse:masterfrom
MikhailBurdukov:allow_disable_collecting_psi
Dec 13, 2025
Merged

Allow disabling of PSI_*_* async metrics collection#88557
alesapin merged 3 commits intoClickHouse:masterfrom
MikhailBurdukov:allow_disable_collecting_psi

Conversation

@MikhailBurdukov
Copy link
Copy Markdown
Contributor

@MikhailBurdukov MikhailBurdukov commented Oct 15, 2025

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Allow disabling of PSI_*_* async metrics collection

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Details

We run ClickHouse in a pretty specific containerization infrastructure and we have faced issues where /proc/pressure/* files actually exist:

 ~ # ls -la /proc/pressure/
total 0
dr-xr-xr-x    5 root root 0 Oct  2 13:00 .
dr-xr-xr-x 3055 root root 0 Oct  2 13:00 ..
-r--r--r--    1 root root 0 Oct  2 13:00 cpu
-r--r--r--    1 root root 0 Oct  2 13:00 io
-r--r--r--    1 root root 0 Oct  2 13:00 memory

But it is impossible to read from them:

~ # cat /proc/pressure/io
cat: /proc/pressure/io: Operation not supported

And it causes a lot of spam in ClickHouse logs with exceptions when ClickHouse reads from these files:

2025.10.15 09:41:53.005752 [ 10180 ] {}  ReadBuffer: ReadBuffer is canceled by the exception: Code: 74. DB::ErrnoException: Cannot read from file /proc/pressure/cpu: , errno: 95, strerror: Operation not supported. (CANNOT_READ_FROM_FILE_DESCRIPTOR), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000133ac8df
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c8579ce
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000c857480
3. DB::ErrnoException::ErrnoException(int, int, FormatStringHelperImpl::type>, String&&) @ 0x0000000013489d00
4. void DB::ErrnoException::throwFromPath(int, String const&, FormatStringHelperImpl::type>, String&&) @ 0x00000000134884c4
5. DB::ReadBufferFromFileDescriptor::readImpl(char*, unsigned long, unsigned long, unsigned long) const @ 0x000000001349c10f
6. DB::ReadBufferFromFileDescriptor::nextImpl() @ 0x000000001349c1a0
7. DB::ReadBuffer::next() @ 0x000000001349a2ad
8. DB::readPressureFile(std::unordered_map, std::equal_to, std::allocator>>&, String const&, DB::ReadBufferFromFilePRead&, std::unordered_map, std::equal_to, std::allocator>>&, bool) @ 0x00000000135b0074
9. DB::AsynchronousMetrics::update(std::chrono::time_point>>, bool) @ 0x00000000135a0b39
10. void std::__function::__policy_invoker::__call_impl[abi:ne190107]::ThreadFromGlobalPoolImpl(DB::AsynchronousMetrics::start()::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000135b5f3a
11. ThreadPoolImpl::ThreadFromThreadPool::worker() @ 0x00000000135077d2
12. void* std::__thread_proxy[abi:ne190107]>, void (ThreadPoolImpl::ThreadFromThreadPool::*)(), ThreadPoolImpl::ThreadFromThreadPool*>>(void*) @ 0x000000001350f29a
13. ? @ 0x0000000000094ac3
14. ? @ 0x00000000001268c0
 (version 25.8.8.26 (official build))
2025.10.15 09:41:53.005796 [ 10180 ] {}  void DB::AsynchronousMetrics::update(TimePoint, bool): Code: 74. DB::ErrnoException: Cannot read from file /proc/pressure/cpu: , errno: 95, strerror: Operation not supported. (CANNOT_READ_FROM_FILE_DESCRIPTOR), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000133ac8df
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c8579ce
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000c857480
3. DB::ErrnoException::ErrnoException(int, int, FormatStringHelperImpl::type>, String&&) @ 0x0000000013489d00
4. void DB::ErrnoException::throwFromPath(int, String const&, FormatStringHelperImpl::type>, String&&) @ 0x00000000134884c4
5. DB::ReadBufferFromFileDescriptor::readImpl(char*, unsigned long, unsigned long, unsigned long) const @ 0x000000001349c10f
6. DB::ReadBufferFromFileDescriptor::nextImpl() @ 0x000000001349c1a0
7. DB::ReadBuffer::next() @ 0x000000001349a2ad
8. DB::readPressureFile(std::unordered_map, std::equal_to, std::allocator>>&, String const&, DB::ReadBufferFromFilePRead&, std::unordered_map, std::equal_to, std::allocator>>&, bool) @ 0x00000000135b0074
9. DB::AsynchronousMetrics::update(std::chrono::time_point>>, bool) @ 0x00000000135a0b39
10. void std::__function::__policy_invoker::__call_impl[abi:ne190107]::ThreadFromGlobalPoolImpl(DB::AsynchronousMetrics::start()::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000135b5f3a
11. ThreadPoolImpl::ThreadFromThreadPool::worker() @ 0x00000000135077d2
12. void* std::__thread_proxy[abi:ne190107]>, void (ThreadPoolImpl::ThreadFromThreadPool::*)(), ThreadPoolImpl::ThreadFromThreadPool*>>(void*) @ 0x000000001350f29a
13. ? @ 0x0000000000094ac3
14. ? @ 0x00000000001268c0
 (version 25.8.8.26 (official build))

This PR adds a setting to disable collection of such metrics.

@alesapin alesapin self-assigned this Oct 16, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Oct 16, 2025

Workflow [PR], commit [284ae23]

Summary:

job_name test_name status info comment
Stateless tests (amd_tsan, parallel, 2/2) failure
01443_merge_truncate_long FAIL cidb, issue ISSUE CREATED
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: FAIL cidb, issue ISSUE EXISTS
BuzzHouse (amd_ubsan) failure
Let op! ERROR cidb, issue ISSUE CREATED

@alesapin
Copy link
Copy Markdown
Member

@MikhailBurdukov the only thing is that it's not a bug fix, but improvement.

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Oct 16, 2025
@MikhailBurdukov
Copy link
Copy Markdown
Contributor Author

@alesapin
Could you put can be tested label, pls?

Fixed changelog category, but can we try to backport the pr anyway, because the issue is pretty painful for our clusters? 🙏

@MikhailBurdukov
Copy link
Copy Markdown
Contributor Author

@alesapin ping

@MikhailBurdukov
Copy link
Copy Markdown
Contributor Author

@alesapin Could you put can be tested pls?

@alesapin alesapin added the can be tested Allows running workflows for external contributors label Nov 19, 2025
@clickhouse-gh clickhouse-gh bot added pr-improvement Pull request with some product improvements and removed pr-bugfix Pull request with bugfix, not backported by default labels Nov 19, 2025
@MikhailBurdukov
Copy link
Copy Markdown
Contributor Author

@alesapin
Can we merge the PR?

@alesapin alesapin added this pull request to the merge queue Dec 13, 2025
Merged via the queue into ClickHouse:master with commit 43172b4 Dec 13, 2025
128 of 132 checks passed
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Dec 13, 2025
@MikhailBurdukov
Copy link
Copy Markdown
Contributor Author

@alesapin The PR is improvement but can we try to backport to 25.8 anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants