HostCreateTables uses CHI-wide endpoints for schema discovery, leaks schemas across clusters in multi-cluster CHI

## Summary

When the operator creates tables on a newly added host, the endpoint list used to discover schemas is scoped to the entire `ClickHouseInstallation` rather than to the target host's cluster. In a CHI that defines multiple `clusters`, the operator may pick a host from an unrelated cluster as the query source, which can return an incorrect/incomplete set of CREATE statements for the new node.

The scoping is an old latent design issue (see timeline below); it became reliably reproducible for us after upgrading to `0.26`.

## Details

`HostCreateTables` (`pkg/model/chi/schemer/schemer.go`) calls `getReplicatedObjectsSQLs` / `getDistributedObjectsSQLs`. Both use `api.ClickHouseInstallation{}` as the `Names(NameFQDNs, ...)` scope:

- `pkg/model/chi/schemer/replicated.go` lines 69, 76, 83
- `pkg/model/chi/schemer/distributed.go` lines 59, 66, 73

With that scope, `createFQDNs` in `pkg/model/chi/namer/name.go` routes through `createPodFQDNsOfCHI`, which walks every host in every cluster of the CHI. `QueryUnzip2Columns` then calls `QueryAny` (`pkg/model/clickhouse/cluster.go:76`), which iterates the endpoints sequentially and returns the first one that answers — including hosts that belong to a different cluster than the one being scaled.

The SQL itself does filter by cluster name via `clusterAllReplicas('<target cluster>', system.tables)`, but the replicated-table query in `pkg/model/chi/schemer/sql.go` (around line 205) joins against `system.databases` on the executing node:

```sql
FROM clusterAllReplicas('<cluster>', system.tables) tables
LOCAL JOIN system.databases databases on (databases.name = tables.database)
WHERE database NOT IN (...) AND databases.engine IN ('Ordinary','Atomic','Memory','Lazy')
```

The `system.databases` join resolves against the local databases of whichever node served the query. If `QueryAny` picks a node from a different cluster that does not host the same databases (or hosts them with different engines), schema discovery returns the wrong set of CREATE statements.

Beyond the LOCAL JOIN behavior, the intent is also off: the endpoints used to discover schemas for cluster X should be hosts in cluster X — not any host in the CHI. `shouldCreateReplicatedObjects` already uses `api.ChiShard{}` / `api.Cluster{}` scopes for related logic.

## Why this started showing up after 0.26

Walking the git history on the relevant code:

- **Scoping bug is old.** `CreateFQDNs(host, …ClickHouseInstallation{}…)` for schema discovery dates back to at least 2021 (commit `6b946799d`, "dev: unify CreateFQDNs function"), which itself only renamed an older `CreatePodFQDNsOfCHI(host.GetCHI())` call. Semantically this has always walked all clusters of the CHI.
- **LOCAL JOIN form**: commit `d49187d0b` (2024-06, first included in `release-0.23.6`) switched `sqlCreateTableReplicated` from a `has(subquery, ...)` form to `LOCAL JOIN system.databases`. Both forms filter by the executing node's local databases, but the JOIN form is more strict (inner join) so cross-cluster misses are more visible.
- **DNS rework in 0.26**: commit `6d625de69` (Feb 2026, first included in `release-0.26.0`, "dev: rework k8s DNS") added a trailing dot to `patternNamespaceDomain` (`%s.svc.cluster.local.`) "to force absolute DNS lookup, avoiding slow search-suffix resolution with ndots:5". Before that, cross-namespace/cross-cluster endpoints in the CHI-wide list could fail DNS resolution fast under common k8s DNS configs, causing `QueryAny` to fall through and eventually hit a same-cluster host. After 0.26, every CHI endpoint resolves reliably, and `QueryAny` happily returns from the first endpoint in the slice — which, with CHI-wide scoping, can easily be a host from a different cluster.

So the 0.26 DNS fix did not introduce the scoping bug — it removed an accidental failure-mode that had been masking it.

## Reproduction

Define a CHI with two clusters, e.g. a primary data cluster with multiple replicas and a secondary cluster (a small analytics or monitoring cluster) with a different set of databases / table engines. Create tables only on the primary cluster. Scale the primary cluster by adding a replica. Depending on which endpoint `QueryAny` selects, schema discovery for the new replica may run from the secondary cluster's node and return a result filtered by the secondary cluster's local `system.databases`.

Symptoms:
- Missing or inconsistent tables on the newly added replica relative to existing replicas in the same cluster.
- CREATE statements that correspond to the wrong cluster's schema context.

## Proposed fix

Change `Names(NameFQDNs, host, api.ClickHouseInstallation{}, false)` to `Names(NameFQDNs, host, api.Cluster{}, false)` in both `replicated.go` and `distributed.go`. That scopes schema-discovery endpoints to the target host's cluster, matching the existing intent of these functions and the scoping already used by `shouldCreateReplicatedObjects` / `shouldCreateDistributedObjects`.

PR: #1965

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HostCreateTables uses CHI-wide endpoints for schema discovery, leaks schemas across clusters in multi-cluster CHI #1964

Summary

Details

Why this started showing up after 0.26

Reproduction

Proposed fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HostCreateTables uses CHI-wide endpoints for schema discovery, leaks schemas across clusters in multi-cluster CHI #1964

Description

Summary

Details

Why this started showing up after 0.26

Reproduction

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions