Detect AI coding agent in client and expose it as client_agent by alexey-milovidov · Pull Request #106619 · ClickHouse/ClickHouse

alexey-milovidov · 2026-06-06T01:04:27Z

clickhouse-client and clickhouse-local now detect whether they were invoked under a known AI coding agent (Claude Code, Cursor, OpenAI Codex, Gemini CLI, Goose, Cline, OpenCode, GitHub Copilot, etc.) by inspecting environment variables only — no filesystem probing. The detected canonical agent id (e.g. claude-code, cursor, gemini-cli) is stored in a new ClientInfo::client_agent field, sent over the TCP protocol (gated on a new protocol revision for backward compatibility), and exposed as a client_agent LowCardinality(String) column in system.query_log, system.query_thread_log and system.processes.

The motivation is observability: server-side, operators can attribute query traffic to the AI coding agent that produced it, similar to how clickhousectl already tags its outbound User-Agent. A generic AGENT environment variable is also honored verbatim, and the field is empty when no agent is detected.

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added detection of the AI coding agent (Claude Code, Cursor, Codex, Gemini CLI, Goose, etc.) that invoked clickhouse-client or clickhouse-local, based on environment variables. The detected agent is reported in the new client_agent column of system.query_log, system.query_thread_log and system.processes.

Documentation entry for user-facing changes

Documentation update is included (system table docs for query_log, query_thread_log and processes).

clickhouse-client and clickhouse-local now detect whether they were invoked under a known AI coding agent (Claude Code, Cursor, OpenAI Codex, Gemini CLI, Goose, etc.) by inspecting environment variables only (no filesystem checks), and report the detected canonical agent id in a new `ClientInfo::client_agent` field. The field is serialized over the TCP protocol (gated on the new `DBMS_MIN_REVISION_WITH_CLIENT_AGENT_IN_CLIENT_INFO` revision) and exposed as a `client_agent` `LowCardinality(String)` column in `system.query_log`, `system.query_thread_log` and `system.processes`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The test runner itself may be invoked under an AI coding agent, whose environment variables would leak into the spawned clickhouse-client and make the test non-deterministic. Clear all known agent markers before each case so only the variable under test is set. Also cover the `cursor` case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-06-06T01:05:06Z

Workflow [PR], commit [08862e7]

Summary: ✅

Performance Comparison: Performance dashboard

AI Review

Summary

This PR adds environment-based AI agent detection to clickhouse-client / clickhouse-local, carries the detected client_agent through ClientInfo, and exposes it in query/process observability tables. The main TCP/query-log path is coherent, and the earlier propagation and async batching issues are fixed in the current code, but the new ClientInfo field is also written into the persisted async Distributed insert header in a way that breaks old-reader compatibility for queued files.

Findings

❌ Blockers

[src/Interpreters/ClientInfo.cpp:216] ClientInfo::write is used inside the async Distributed insert file header before the existing rows / bytes / block-header fields. Older binaries that do not know DBMS_MIN_REVISION_WITH_CLIENT_AGENT_IN_CLIENT_INFO stop reading ClientInfo before client_agent, then DistributedAsyncInsertHeader::read interprets the leftover client_agent string length as rows. This breaks the append-compatible header contract and can corrupt pending async Distributed queue files after downgrade or when an older binary has to drain files written by a newer one.
Suggested fix: keep the embedded ClientInfo serialization old-reader-compatible for distributed queue files, and store client_agent as a trailing async-header field after the existing shard/table fields, or add an equivalent context-specific serialization path that old readers can safely ignore.

Final Verdict

Status: ❌ Block

Minimum required action: fix the async Distributed header compatibility issue before merge.

alexey-milovidov

The code looks good in general, but see the above about the usage of remote connections from clickhouse-local.

Also I'm curious - do we invoke the ClientInfo constructor once in clickhouse-client/clickhouse-local?

Serialize `client_agent` for all interfaces (at the end of the stream, independent of the initial interface) instead of only inside the `TCP` branch. Without this, a `clickhouse-local` query (`LOCAL` interface) that is forwarded to a remote shard via `RemoteQueryExecutor` would drop the field, so remote `system.query_log` / `system.query_thread_log` could not attribute traffic originating from `clickhouse-local`. Also use `SYSTEM FLUSH LOGS query_log` instead of the global form (style check) and extend the test to cover the `clickhouse-local` -> remote secondary-query propagation path.

alexey-milovidov · 2026-06-06T01:32:50Z

Thanks for the review. Addressed in aa625ad:

Remote connections from clickhouse-local. Good catch — the field was serialized only inside the TCP branch, so a clickhouse-local query (LOCAL interface) forwarded to a remote shard dropped client_agent. I moved the serialization out of the interface-specific block to the end of the stream, so it is written/read for all interfaces and survives forwarding. Extended 04305_client_agent_detection with a clickhouse-local → remote(...) case that asserts the secondary query on the remote logs client_agent = claude-code (matched via initial_query_id).

Do we invoke the ClientInfo constructor / detection once? The env-var detection runs once per process, not per query. It happens in fillOSUserHostNameAndVersionInfo via this chain, called once at startup:

Client::processOptions / LocalServer::main
  -> ClientBase::initClientContext            (once)
    -> Context::setQueryKindInitial
      -> ClientInfo::setInitialQuery
        -> fillOSUserHostNameAndVersionInfo
          -> detectClientAgent                (reads env vars)

client_context is a single long-lived object reused for every query in the session, so the detected value is computed once and then just carried along (the ClientInfo object itself is copied around, e.g. via Context::createCopy, but detection is not re-run). The ClientInfo default constructor only initializes the socket-address members; it does not touch the environment, which is important because ClientInfo is also constructed server-side in many places where agent detection would be meaningless.

Also fixed the style check (SYSTEM FLUSH LOGS query_log instead of the global form).

…rg::alter PR #106102 (merged 2026-06-05 17:43 UTC) refactored Iceberg::alter to declare last_version and compression_method outside an if/else that assigns them in both branches. Clang-tidy cppcoreguidelines-init-variables flags the bare declarations and Build (arm_tidy) is built with -warnings-as-errors, so the build fails for every PR whose CI ran on master after this commit. CIDB shows 30+ unrelated PRs hitting this in the last 2 days (e.g. #89360, #103540, #106120, #106386, #106404, #106522, #105102, #106590). Initialize both variables to safe defaults at declaration. They are unconditionally overwritten in both if and else branches before use, so behavior is unchanged. The new defaults are only relevant if a future code path skips both assignments, in which case last_version=0 and compression_method=CompressionMethod::None are sane no-op values (the same defaults the old structured-binding form would produce through default-construction of the destructured aggregate). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit d72cac5)

alexey-milovidov · 2026-06-06T15:48:09Z

@groeneai, fix the failed test for Kafka and send a PR with the fix.

groeneai · 2026-06-06T20:17:56Z

@alexey-milovidov The failed Kafka test test_storage_kafka/test_avro_schema_registry.py::test_kafka_produce_consume_avro_confluent is an infrastructure flake unrelated to this PR's client_agent change. Details:

The full traceback ends in Poco::TimeoutException while the server tries to POST to the in-cluster Confluent schema-registry HTTP endpoint at AvroConfluentSchemaRegistry.cpp:209 (HTTPSession::receive -> receiveResponse):

Code: 1000. DB::Exception: Timeout: while registering schema under subject 'avro_confluent_roundtrip_NSPJJU-value':
While executing WaitForAsyncInsert.
...
12. AvroConfluentSchemaRegistry.cpp:209: ConfluentSchemaRegistry::registerSchema()
13. AvroConfluentSchemaRegistry.cpp:271: ConfluentSchemaRegistry::registerSchema()
14. AvroRowOutputFormat.cpp:633: AvroConfluentRowOutputFormat::write()

CIDB cross-check (30d):

test_storage_kafka/test_avro_schema_registry%: 5 hits on PR Optimize Profile Events by using Per CPU counters for server-wide and users #105056 (azat shard-profile-events-per-cpu, completely unrelated), 1 hit on this PR.
AvroConfluentSchemaRegistry + TimeoutException test_context: 1 PR / 1 hit (this one).

The PR diff (ClientInfo, client_agent env-var detection, protocol bump, doc updates) does not touch AvroConfluentSchemaRegistry, HTTPSession, the schema-registry test fixtures, or Kafka. It is a one-off network-timeout flake against the in-cluster schema-registry HTTP endpoint, not a PR-caused regression. I do not see any actionable fix to file in a separate PR — please rerun the integration job, it should pass.

If the same timeout starts hitting many unrelated PRs we will create a chronic-flaky tracking task for the registry test fixture; right now it is not flaky enough to warrant one (5 of 6 hits all on the same azat PR).

Session: cron:clickhouse-ci-task-worker:20260606-201300

alexey-milovidov · 2026-06-07T01:16:08Z

It is a one-off network-timeout flake against the in-cluster schema-registry HTTP endpoint, not a PR-caused regression. I do not see any actionable fix to file in a separate PR — please rerun the integration job, it should pass.

@groeneai, We always have to fix CI failures, even when they happen once in a blue moon. Find a way to work around network failures - e.g., add retries around that in the test or in the code that works with the schema registry.

groeneai · 2026-06-07T04:19:25Z

@alexey-milovidov fix PR opened: #106661

It adds a bounded retry loop with exponential backoff around ConfluentSchemaRegistry::fetchSchema and registerSchema, gated by two new format settings (format_avro_schema_registry_max_retries, format_avro_schema_registry_retry_initial_backoff_ms). The retry only fires on transport-class errors (timeouts, connection refused, DNS, HTTP 5xx/408/429); schema-validation failures stay fatal. The single Poco timeout that hit test_storage_kafka/test_avro_schema_registry.py::test_kafka_produce_consume_avro_confluent will be absorbed by the default 5 retries instead of aborting the INSERT.

`ClientInfo::write` now persists `client_agent` into async `Distributed` insert file headers. The batching path in `DistributedAsyncInsertDirectoryQueue` keyed `BatchHeader` only by `settings`, `query`, and `client_info.query_kind`, so two pending inserts with the same query/header but different `client_agent` values were merged and sent by a single `RemoteInserter` initialized from the first file's `client_info`. The remote shard then recorded all of the batched traffic under that first agent, breaking the attribution contract for async `Distributed` inserts. Add `client_info.client_agent` to both the `BatchHeader` equality comparison and its hash so files with different agents are not batched together. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-07T14:08:37Z

Picked this up via /continue-pr:

Merged master (branch was 271 commits behind and red). Clean merge, no conflicts; protocol revision 54485 remains unique after the merge.
Addressed the remaining unresolved review thread (async Distributed insert misattribution): client_info.client_agent is now part of BatchHeader::operator== and BatchHeader::Hash in DistributedAsyncInsertDirectoryQueue, so queued insert files with different detected agents are no longer batched under one ClientInfo (commit 08862e7a584). Thread replied + resolved.
Built the full clickhouse binary clean and verified 04305_client_agent_detection end-to-end against a local server (all 5 cases match the reference, including the clickhouse-local → remote(...) forwarding case): claude-code / cursor / my-custom-agent / empty / local->remote: claude-code.
Sole CI failure is test_storage_kafka/test_avro_schema_registry.py::test_kafka_produce_consume_avro_confluent — a one-off Poco::TimeoutException against the in-cluster Confluent schema-registry HTTP endpoint, unrelated to client_agent. This is already being fixed by @groeneai in Retry transient Confluent Schema Registry failures #106661 (retry loop around ConfluentSchemaRegistry).

clickhouse-gh · 2026-06-07T14:14:07Z

+    /// Sent for all interfaces (not only TCP): the detected client agent must also be preserved
+    /// when a clickhouse-local query (LOCAL interface) is forwarded to remote shards.
+    if (server_protocol_revision >= DBMS_MIN_REVISION_WITH_CLIENT_AGENT_IN_CLIENT_INFO)
+        writeBinary(client_agent, out);


ClientInfo::write is also used inside the persisted async Distributed insert header, before the existing rows / bytes / block-header fields are written. That header format is designed to stay append-compatible: old readers call client_info.read(header_buf, distributed_header.revision) and then immediately parse rows if header_buf.hasPendingData. A binary that does not know DBMS_MIN_REVISION_WITH_CLIENT_AGENT_IN_CLIENT_INFO will therefore leave this new client_agent string unread and interpret its length byte as rows, corrupting pending async Distributed files written by a newer node after downgrade or by any older binary that has to drain those files.

Please keep the embedded ClientInfo layout old-reader-compatible for distributed queue files, and persist client_agent as a trailing async-header field after the existing shard/table fields (or add an equivalent context-specific serialization path) so older readers can safely ignore it.

clickhouse-gh · 2026-06-07T18:01:52Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	84.50%	84.50%	+0.00%
Functions	92.30%	92.30%	+0.00%
Branches	77.20%	77.10%	-0.10%

Changed lines: Changed C/C++ lines covered by tests: 83/85 (97.65%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@alexey-milovidov

A single network glitch when talking to the Confluent Schema Registry currently aborts the surrounding INSERT. The HTTP client in `AvroConfluentSchemaRegistry` makes one POST/GET attempt and propagates the first `Poco::TimeoutException` or `Poco::Net::NetException` to the caller, so a transient registry outage breaks `Kafka` ingestion that relies on `AvroConfluent`. Add a bounded retry loop around `fetchSchema` and `registerSchema` with exponential backoff (capped at 10 seconds). Schema-validation errors (HTTP 409 `INCOMPATIBLE_SCHEMA`, `INCORRECT_DATA`, `BAD_ARGUMENTS`) stay fatal so misconfigured users do not wait through every retry. Two new format settings control the policy: - `format_avro_schema_registry_max_retries` (default 5, capped at 20, 0 disables retries) - `format_avro_schema_registry_retry_initial_backoff_ms` (default 100, capped at 60000) Reported by @alexey-milovidov on ClickHouse#106619. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov and others added 2 commits June 6, 2026 01:03

clickhouse-gh Bot added the pr-feature Pull request with new product feature label Jun 6, 2026

clickhouse-gh Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread src/Interpreters/ClientInfo.cpp Outdated

alexey-milovidov commented Jun 6, 2026

View reviewed changes

clickhouse-gh Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread src/Interpreters/ClientInfo.cpp

groeneai mentioned this pull request Jun 7, 2026

Retry transient Confluent Schema Registry failures #106661

Merged

alexey-milovidov and others added 2 commits June 7, 2026 13:58

Merge remote-tracking branch 'origin/master' into client-agent-detection

6c93c8a

clickhouse-gh Bot reviewed Jun 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect AI coding agent in client and expose it as client_agent#106619

Detect AI coding agent in client and expose it as client_agent#106619
alexey-milovidov wants to merge 6 commits into
masterfrom
client-agent-detection

alexey-milovidov commented Jun 6, 2026

Uh oh!

clickhouse-gh Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

alexey-milovidov left a comment

Uh oh!

alexey-milovidov commented Jun 6, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Jun 6, 2026

Uh oh!

groeneai commented Jun 6, 2026

Uh oh!

alexey-milovidov commented Jun 7, 2026 •

edited

Loading

Uh oh!

groeneai commented Jun 7, 2026

Uh oh!

alexey-milovidov commented Jun 7, 2026

Uh oh!

clickhouse-gh Bot Jun 7, 2026

Uh oh!

clickhouse-gh Bot commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexey-milovidov commented Jun 6, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

Final Verdict

Uh oh!

Uh oh!

alexey-milovidov left a comment

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented Jun 6, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Jun 6, 2026

Uh oh!

groeneai commented Jun 6, 2026

Uh oh!

alexey-milovidov commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groeneai commented Jun 7, 2026

Uh oh!

alexey-milovidov commented Jun 7, 2026

Uh oh!

clickhouse-gh Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot commented Jun 7, 2026

LLVM Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clickhouse-gh Bot commented Jun 6, 2026 •

edited

Loading

alexey-milovidov commented Jun 7, 2026 •

edited

Loading