Skip to content

HostCreateTables uses CHI-wide endpoints for schema discovery, leaks schemas across clusters in multi-cluster CHI #1964

@lukas-pfannschmidt-tr

Description

@lukas-pfannschmidt-tr

Summary

When the operator creates tables on a newly added host, the endpoint list used to discover schemas is scoped to the entire ClickHouseInstallation rather than to the target host's cluster. In a CHI that defines multiple clusters, the operator may pick a host from an unrelated cluster as the query source, which can return an incorrect/incomplete set of CREATE statements for the new node.

The scoping is an old latent design issue (see timeline below); it became reliably reproducible for us after upgrading to 0.26.

Details

HostCreateTables (pkg/model/chi/schemer/schemer.go) calls getReplicatedObjectsSQLs / getDistributedObjectsSQLs. Both use api.ClickHouseInstallation{} as the Names(NameFQDNs, ...) scope:

  • pkg/model/chi/schemer/replicated.go lines 69, 76, 83
  • pkg/model/chi/schemer/distributed.go lines 59, 66, 73

With that scope, createFQDNs in pkg/model/chi/namer/name.go routes through createPodFQDNsOfCHI, which walks every host in every cluster of the CHI. QueryUnzip2Columns then calls QueryAny (pkg/model/clickhouse/cluster.go:76), which iterates the endpoints sequentially and returns the first one that answers — including hosts that belong to a different cluster than the one being scaled.

The SQL itself does filter by cluster name via clusterAllReplicas('<target cluster>', system.tables), but the replicated-table query in pkg/model/chi/schemer/sql.go (around line 205) joins against system.databases on the executing node:

FROM clusterAllReplicas('<cluster>', system.tables) tables
LOCAL JOIN system.databases databases on (databases.name = tables.database)
WHERE database NOT IN (...) AND databases.engine IN ('Ordinary','Atomic','Memory','Lazy')

The system.databases join resolves against the local databases of whichever node served the query. If QueryAny picks a node from a different cluster that does not host the same databases (or hosts them with different engines), schema discovery returns the wrong set of CREATE statements.

Beyond the LOCAL JOIN behavior, the intent is also off: the endpoints used to discover schemas for cluster X should be hosts in cluster X — not any host in the CHI. shouldCreateReplicatedObjects already uses api.ChiShard{} / api.Cluster{} scopes for related logic.

Why this started showing up after 0.26

Walking the git history on the relevant code:

  • Scoping bug is old. CreateFQDNs(host, …ClickHouseInstallation{}…) for schema discovery dates back to at least 2021 (commit 6b946799d, "dev: unify CreateFQDNs function"), which itself only renamed an older CreatePodFQDNsOfCHI(host.GetCHI()) call. Semantically this has always walked all clusters of the CHI.
  • LOCAL JOIN form: commit d49187d0b (2024-06, first included in release-0.23.6) switched sqlCreateTableReplicated from a has(subquery, ...) form to LOCAL JOIN system.databases. Both forms filter by the executing node's local databases, but the JOIN form is more strict (inner join) so cross-cluster misses are more visible.
  • DNS rework in 0.26: commit 6d625de69 (Feb 2026, first included in release-0.26.0, "dev: rework k8s DNS") added a trailing dot to patternNamespaceDomain (%s.svc.cluster.local.) "to force absolute DNS lookup, avoiding slow search-suffix resolution with ndots:5". Before that, cross-namespace/cross-cluster endpoints in the CHI-wide list could fail DNS resolution fast under common k8s DNS configs, causing QueryAny to fall through and eventually hit a same-cluster host. After 0.26, every CHI endpoint resolves reliably, and QueryAny happily returns from the first endpoint in the slice — which, with CHI-wide scoping, can easily be a host from a different cluster.

So the 0.26 DNS fix did not introduce the scoping bug — it removed an accidental failure-mode that had been masking it.

Reproduction

Define a CHI with two clusters, e.g. a primary data cluster with multiple replicas and a secondary cluster (a small analytics or monitoring cluster) with a different set of databases / table engines. Create tables only on the primary cluster. Scale the primary cluster by adding a replica. Depending on which endpoint QueryAny selects, schema discovery for the new replica may run from the secondary cluster's node and return a result filtered by the secondary cluster's local system.databases.

Symptoms:

  • Missing or inconsistent tables on the newly added replica relative to existing replicas in the same cluster.
  • CREATE statements that correspond to the wrong cluster's schema context.

Proposed fix

Change Names(NameFQDNs, host, api.ClickHouseInstallation{}, false) to Names(NameFQDNs, host, api.Cluster{}, false) in both replicated.go and distributed.go. That scopes schema-discovery endpoints to the target host's cluster, matching the existing intent of these functions and the scoping already used by shouldCreateReplicatedObjects / shouldCreateDistributedObjects.

PR: #1965

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions