Summary
When the operator creates tables on a newly added host, the endpoint list used to discover schemas is scoped to the entire ClickHouseInstallation rather than to the target host's cluster. In a CHI that defines multiple clusters, the operator may pick a host from an unrelated cluster as the query source, which can return an incorrect/incomplete set of CREATE statements for the new node.
The scoping is an old latent design issue (see timeline below); it became reliably reproducible for us after upgrading to 0.26.
Details
HostCreateTables (pkg/model/chi/schemer/schemer.go) calls getReplicatedObjectsSQLs / getDistributedObjectsSQLs. Both use api.ClickHouseInstallation{} as the Names(NameFQDNs, ...) scope:
pkg/model/chi/schemer/replicated.go lines 69, 76, 83
pkg/model/chi/schemer/distributed.go lines 59, 66, 73
With that scope, createFQDNs in pkg/model/chi/namer/name.go routes through createPodFQDNsOfCHI, which walks every host in every cluster of the CHI. QueryUnzip2Columns then calls QueryAny (pkg/model/clickhouse/cluster.go:76), which iterates the endpoints sequentially and returns the first one that answers — including hosts that belong to a different cluster than the one being scaled.
The SQL itself does filter by cluster name via clusterAllReplicas('<target cluster>', system.tables), but the replicated-table query in pkg/model/chi/schemer/sql.go (around line 205) joins against system.databases on the executing node:
FROM clusterAllReplicas('<cluster>', system.tables) tables
LOCAL JOIN system.databases databases on (databases.name = tables.database)
WHERE database NOT IN (...) AND databases.engine IN ('Ordinary','Atomic','Memory','Lazy')
The system.databases join resolves against the local databases of whichever node served the query. If QueryAny picks a node from a different cluster that does not host the same databases (or hosts them with different engines), schema discovery returns the wrong set of CREATE statements.
Beyond the LOCAL JOIN behavior, the intent is also off: the endpoints used to discover schemas for cluster X should be hosts in cluster X — not any host in the CHI. shouldCreateReplicatedObjects already uses api.ChiShard{} / api.Cluster{} scopes for related logic.
Why this started showing up after 0.26
Walking the git history on the relevant code:
- Scoping bug is old.
CreateFQDNs(host, …ClickHouseInstallation{}…) for schema discovery dates back to at least 2021 (commit 6b946799d, "dev: unify CreateFQDNs function"), which itself only renamed an older CreatePodFQDNsOfCHI(host.GetCHI()) call. Semantically this has always walked all clusters of the CHI.
- LOCAL JOIN form: commit
d49187d0b (2024-06, first included in release-0.23.6) switched sqlCreateTableReplicated from a has(subquery, ...) form to LOCAL JOIN system.databases. Both forms filter by the executing node's local databases, but the JOIN form is more strict (inner join) so cross-cluster misses are more visible.
- DNS rework in 0.26: commit
6d625de69 (Feb 2026, first included in release-0.26.0, "dev: rework k8s DNS") added a trailing dot to patternNamespaceDomain (%s.svc.cluster.local.) "to force absolute DNS lookup, avoiding slow search-suffix resolution with ndots:5". Before that, cross-namespace/cross-cluster endpoints in the CHI-wide list could fail DNS resolution fast under common k8s DNS configs, causing QueryAny to fall through and eventually hit a same-cluster host. After 0.26, every CHI endpoint resolves reliably, and QueryAny happily returns from the first endpoint in the slice — which, with CHI-wide scoping, can easily be a host from a different cluster.
So the 0.26 DNS fix did not introduce the scoping bug — it removed an accidental failure-mode that had been masking it.
Reproduction
Define a CHI with two clusters, e.g. a primary data cluster with multiple replicas and a secondary cluster (a small analytics or monitoring cluster) with a different set of databases / table engines. Create tables only on the primary cluster. Scale the primary cluster by adding a replica. Depending on which endpoint QueryAny selects, schema discovery for the new replica may run from the secondary cluster's node and return a result filtered by the secondary cluster's local system.databases.
Symptoms:
- Missing or inconsistent tables on the newly added replica relative to existing replicas in the same cluster.
- CREATE statements that correspond to the wrong cluster's schema context.
Proposed fix
Change Names(NameFQDNs, host, api.ClickHouseInstallation{}, false) to Names(NameFQDNs, host, api.Cluster{}, false) in both replicated.go and distributed.go. That scopes schema-discovery endpoints to the target host's cluster, matching the existing intent of these functions and the scoping already used by shouldCreateReplicatedObjects / shouldCreateDistributedObjects.
PR: #1965
Summary
When the operator creates tables on a newly added host, the endpoint list used to discover schemas is scoped to the entire
ClickHouseInstallationrather than to the target host's cluster. In a CHI that defines multipleclusters, the operator may pick a host from an unrelated cluster as the query source, which can return an incorrect/incomplete set of CREATE statements for the new node.The scoping is an old latent design issue (see timeline below); it became reliably reproducible for us after upgrading to
0.26.Details
HostCreateTables(pkg/model/chi/schemer/schemer.go) callsgetReplicatedObjectsSQLs/getDistributedObjectsSQLs. Both useapi.ClickHouseInstallation{}as theNames(NameFQDNs, ...)scope:pkg/model/chi/schemer/replicated.golines 69, 76, 83pkg/model/chi/schemer/distributed.golines 59, 66, 73With that scope,
createFQDNsinpkg/model/chi/namer/name.goroutes throughcreatePodFQDNsOfCHI, which walks every host in every cluster of the CHI.QueryUnzip2Columnsthen callsQueryAny(pkg/model/clickhouse/cluster.go:76), which iterates the endpoints sequentially and returns the first one that answers — including hosts that belong to a different cluster than the one being scaled.The SQL itself does filter by cluster name via
clusterAllReplicas('<target cluster>', system.tables), but the replicated-table query inpkg/model/chi/schemer/sql.go(around line 205) joins againstsystem.databaseson the executing node:The
system.databasesjoin resolves against the local databases of whichever node served the query. IfQueryAnypicks a node from a different cluster that does not host the same databases (or hosts them with different engines), schema discovery returns the wrong set of CREATE statements.Beyond the LOCAL JOIN behavior, the intent is also off: the endpoints used to discover schemas for cluster X should be hosts in cluster X — not any host in the CHI.
shouldCreateReplicatedObjectsalready usesapi.ChiShard{}/api.Cluster{}scopes for related logic.Why this started showing up after 0.26
Walking the git history on the relevant code:
CreateFQDNs(host, …ClickHouseInstallation{}…)for schema discovery dates back to at least 2021 (commit6b946799d, "dev: unify CreateFQDNs function"), which itself only renamed an olderCreatePodFQDNsOfCHI(host.GetCHI())call. Semantically this has always walked all clusters of the CHI.d49187d0b(2024-06, first included inrelease-0.23.6) switchedsqlCreateTableReplicatedfrom ahas(subquery, ...)form toLOCAL JOIN system.databases. Both forms filter by the executing node's local databases, but the JOIN form is more strict (inner join) so cross-cluster misses are more visible.6d625de69(Feb 2026, first included inrelease-0.26.0, "dev: rework k8s DNS") added a trailing dot topatternNamespaceDomain(%s.svc.cluster.local.) "to force absolute DNS lookup, avoiding slow search-suffix resolution with ndots:5". Before that, cross-namespace/cross-cluster endpoints in the CHI-wide list could fail DNS resolution fast under common k8s DNS configs, causingQueryAnyto fall through and eventually hit a same-cluster host. After 0.26, every CHI endpoint resolves reliably, andQueryAnyhappily returns from the first endpoint in the slice — which, with CHI-wide scoping, can easily be a host from a different cluster.So the 0.26 DNS fix did not introduce the scoping bug — it removed an accidental failure-mode that had been masking it.
Reproduction
Define a CHI with two clusters, e.g. a primary data cluster with multiple replicas and a secondary cluster (a small analytics or monitoring cluster) with a different set of databases / table engines. Create tables only on the primary cluster. Scale the primary cluster by adding a replica. Depending on which endpoint
QueryAnyselects, schema discovery for the new replica may run from the secondary cluster's node and return a result filtered by the secondary cluster's localsystem.databases.Symptoms:
Proposed fix
Change
Names(NameFQDNs, host, api.ClickHouseInstallation{}, false)toNames(NameFQDNs, host, api.Cluster{}, false)in bothreplicated.goanddistributed.go. That scopes schema-discovery endpoints to the target host's cluster, matching the existing intent of these functions and the scoping already used byshouldCreateReplicatedObjects/shouldCreateDistributedObjects.PR: #1965