Skip to content

Reduce query memory tracking drift for executable UDF#83929

Merged
azat merged 1 commit intoClickHouse:masterfrom
korowa:fix-exec-udf-mem-tracking
Jul 18, 2025
Merged

Reduce query memory tracking drift for executable UDF#83929
azat merged 1 commit intoClickHouse:masterfrom
korowa:fix-exec-udf-mem-tracking

Conversation

@korowa
Copy link
Copy Markdown
Contributor

@korowa korowa commented Jul 17, 2025

Currently executable functions, when called over multiple blocks during single query execution, will produce a significant overhead in query memory tracker.

Mostly, this is caused by the execution of tasks sending data to shell commands: TimeoutWriteBufferFromFileDescriptor, which is a part of SendDataTask is created in the query thread and increments its memory counter, but then SendDataTasks are executed and destroyed in "anonymous" threads from global pool, decrementing only global memory counter, which causes significant difference in tracked memory values at the query level and the global one, and may lead to MEMORY_LIMIT_EXCEEDED errors.

This patch partially fixes the memory drift, specifically its part produced by TimeoutWriteBufferFromFileDescriptor objects, by attaching data sending threads to the query thread group.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Reduce query memory tracking overhead for executable user-defined functions.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@korowa korowa changed the title fix: memory tracking drift for executable UDF Reduce memory tracking drift for executable UDF Jul 17, 2025
@azat azat self-assigned this Jul 17, 2025
@Algunenano Algunenano added the can be tested Allows running workflows for external contributors label Jul 17, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jul 17, 2025

Workflow [PR], commit [fb79ace]

Summary:

job_name test_name status info comment
Stateless tests (amd_tsan, s3 storage, 1/3) failure
03002_part_log_rmt_fetch_merge_error FAIL

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Jul 17, 2025
Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good! But the test requires some adjustments

Currently executable functions, when called over multiple blocks
during single query execution, will produce a significant overhead
in query memory tracker.

Mostly, this is caused by the execution of tasks sending data to
shell commands: `TimeoutWriteBufferFromFileDescriptor`, which is
a part of `SendDataTask` is created in the query thread and
increments its memory counter, but then `SendDataTask`s are
executed and destroyed in "anonymous" threads from global pool,
decrementing only global memory counter, which causes significant
difference in tracked memory values at the query level and the
global one, and may lead to MEMORY_LIMIT_EXCEEDED errors.

This patch partially fixes the memory drift, specifically its part
produced by `TimeoutWriteBufferFromFileDescriptor` objects, by
attaching data sending threads to the query thread group.
@korowa korowa force-pushed the fix-exec-udf-mem-tracking branch from d22ad6b to fb79ace Compare July 17, 2025 17:39
@korowa
Copy link
Copy Markdown
Contributor Author

korowa commented Jul 18, 2025

03002_part_log_rmt_fetch_merge_error | FAIL

Looks unrelated, but I'll still check what caused the failure.

@azat azat changed the title Reduce memory tracking drift for executable UDF Reduce query memory tracking drift for executable UDF Jul 18, 2025
@azat azat enabled auto-merge July 18, 2025 08:12
@azat azat added this pull request to the merge queue Jul 18, 2025
Merged via the queue into ClickHouse:master with commit 31e0fc2 Jul 18, 2025
120 of 123 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants