Detect AI coding agent in client and expose it as client_agent#106619
Detect AI coding agent in client and expose it as client_agent#106619alexey-milovidov wants to merge 6 commits into
Conversation
clickhouse-client and clickhouse-local now detect whether they were invoked under a known AI coding agent (Claude Code, Cursor, OpenAI Codex, Gemini CLI, Goose, etc.) by inspecting environment variables only (no filesystem checks), and report the detected canonical agent id in a new `ClientInfo::client_agent` field. The field is serialized over the TCP protocol (gated on the new `DBMS_MIN_REVISION_WITH_CLIENT_AGENT_IN_CLIENT_INFO` revision) and exposed as a `client_agent` `LowCardinality(String)` column in `system.query_log`, `system.query_thread_log` and `system.processes`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The test runner itself may be invoked under an AI coding agent, whose environment variables would leak into the spawned clickhouse-client and make the test non-deterministic. Clear all known agent markers before each case so only the variable under test is set. Also cover the `cursor` case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [08862e7] Summary: ✅
AI ReviewSummaryThis PR adds environment-based AI agent detection to Findings❌ Blockers
Final VerdictStatus: ❌ Block Minimum required action: fix the async |
alexey-milovidov
left a comment
There was a problem hiding this comment.
The code looks good in general, but see the above about the usage of remote connections from clickhouse-local.
Also I'm curious - do we invoke the ClientInfo constructor once in clickhouse-client/clickhouse-local?
Serialize `client_agent` for all interfaces (at the end of the stream, independent of the initial interface) instead of only inside the `TCP` branch. Without this, a `clickhouse-local` query (`LOCAL` interface) that is forwarded to a remote shard via `RemoteQueryExecutor` would drop the field, so remote `system.query_log` / `system.query_thread_log` could not attribute traffic originating from `clickhouse-local`. Also use `SYSTEM FLUSH LOGS query_log` instead of the global form (style check) and extend the test to cover the `clickhouse-local` -> remote secondary-query propagation path.
|
Thanks for the review. Addressed in Remote connections from Do we invoke the
Also fixed the style check ( |
…rg::alter PR #106102 (merged 2026-06-05 17:43 UTC) refactored Iceberg::alter to declare last_version and compression_method outside an if/else that assigns them in both branches. Clang-tidy cppcoreguidelines-init-variables flags the bare declarations and Build (arm_tidy) is built with -warnings-as-errors, so the build fails for every PR whose CI ran on master after this commit. CIDB shows 30+ unrelated PRs hitting this in the last 2 days (e.g. #89360, #103540, #106120, #106386, #106404, #106522, #105102, #106590). Initialize both variables to safe defaults at declaration. They are unconditionally overwritten in both if and else branches before use, so behavior is unchanged. The new defaults are only relevant if a future code path skips both assignments, in which case last_version=0 and compression_method=CompressionMethod::None are sane no-op values (the same defaults the old structured-binding form would produce through default-construction of the destructured aggregate). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit d72cac5)
|
@groeneai, fix the failed test for Kafka and send a PR with the fix. |
|
@alexey-milovidov The failed Kafka test The full traceback ends in CIDB cross-check (30d):
The PR diff ( If the same timeout starts hitting many unrelated PRs we will create a chronic-flaky tracking task for the registry test fixture; right now it is not flaky enough to warrant one (5 of 6 hits all on the same azat PR). Session: cron:clickhouse-ci-task-worker:20260606-201300 |
@groeneai, We always have to fix CI failures, even when they happen once in a blue moon. Find a way to work around network failures - e.g., add retries around that in the test or in the code that works with the schema registry. |
|
@alexey-milovidov fix PR opened: #106661 It adds a bounded retry loop with exponential backoff around |
`ClientInfo::write` now persists `client_agent` into async `Distributed` insert file headers. The batching path in `DistributedAsyncInsertDirectoryQueue` keyed `BatchHeader` only by `settings`, `query`, and `client_info.query_kind`, so two pending inserts with the same query/header but different `client_agent` values were merged and sent by a single `RemoteInserter` initialized from the first file's `client_info`. The remote shard then recorded all of the batched traffic under that first agent, breaking the attribution contract for async `Distributed` inserts. Add `client_info.client_agent` to both the `BatchHeader` equality comparison and its hash so files with different agents are not batched together. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Picked this up via
|
| /// Sent for all interfaces (not only TCP): the detected client agent must also be preserved | ||
| /// when a clickhouse-local query (LOCAL interface) is forwarded to remote shards. | ||
| if (server_protocol_revision >= DBMS_MIN_REVISION_WITH_CLIENT_AGENT_IN_CLIENT_INFO) | ||
| writeBinary(client_agent, out); |
There was a problem hiding this comment.
ClientInfo::write is also used inside the persisted async Distributed insert header, before the existing rows / bytes / block-header fields are written. That header format is designed to stay append-compatible: old readers call client_info.read(header_buf, distributed_header.revision) and then immediately parse rows if header_buf.hasPendingData. A binary that does not know DBMS_MIN_REVISION_WITH_CLIENT_AGENT_IN_CLIENT_INFO will therefore leave this new client_agent string unread and interpret its length byte as rows, corrupting pending async Distributed files written by a newer node after downgrade or by any older binary that has to drain those files.
Please keep the embedded ClientInfo layout old-reader-compatible for distributed queue files, and persist client_agent as a trailing async-header field after the existing shard/table fields (or add an equivalent context-specific serialization path) so older readers can safely ignore it.
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 83/85 (97.65%) | Lost baseline coverage: none · Uncovered code |
A single network glitch when talking to the Confluent Schema Registry currently aborts the surrounding INSERT. The HTTP client in `AvroConfluentSchemaRegistry` makes one POST/GET attempt and propagates the first `Poco::TimeoutException` or `Poco::Net::NetException` to the caller, so a transient registry outage breaks `Kafka` ingestion that relies on `AvroConfluent`. Add a bounded retry loop around `fetchSchema` and `registerSchema` with exponential backoff (capped at 10 seconds). Schema-validation errors (HTTP 409 `INCOMPATIBLE_SCHEMA`, `INCORRECT_DATA`, `BAD_ARGUMENTS`) stay fatal so misconfigured users do not wait through every retry. Two new format settings control the policy: - `format_avro_schema_registry_max_retries` (default 5, capped at 20, 0 disables retries) - `format_avro_schema_registry_retry_initial_backoff_ms` (default 100, capped at 60000) Reported by @alexey-milovidov on ClickHouse#106619. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
clickhouse-clientandclickhouse-localnow detect whether they were invoked under a known AI coding agent (Claude Code, Cursor, OpenAI Codex, Gemini CLI, Goose, Cline, OpenCode, GitHub Copilot, etc.) by inspecting environment variables only — no filesystem probing. The detected canonical agent id (e.g.claude-code,cursor,gemini-cli) is stored in a newClientInfo::client_agentfield, sent over the TCP protocol (gated on a new protocol revision for backward compatibility), and exposed as aclient_agentLowCardinality(String)column insystem.query_log,system.query_thread_logandsystem.processes.The motivation is observability: server-side, operators can attribute query traffic to the AI coding agent that produced it, similar to how
clickhousectlalready tags its outbound User-Agent. A genericAGENTenvironment variable is also honored verbatim, and the field is empty when no agent is detected.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added detection of the AI coding agent (Claude Code, Cursor, Codex, Gemini CLI, Goose, etc.) that invoked
clickhouse-clientorclickhouse-local, based on environment variables. The detected agent is reported in the newclient_agentcolumn ofsystem.query_log,system.query_thread_logandsystem.processes.Documentation entry for user-facing changes
query_log,query_thread_logandprocesses).