Skip to content

roachtest: introduce testCluster interface for pluggable cluster backends#169382

Open
golgeek wants to merge 5 commits intocockroachdb:masterfrom
golgeek:ludo/rt-prepare-alternative-clusters-backend
Open

roachtest: introduce testCluster interface for pluggable cluster backends#169382
golgeek wants to merge 5 commits intocockroachdb:masterfrom
golgeek:ludo/rt-prepare-alternative-clusters-backend

Conversation

@golgeek
Copy link
Copy Markdown
Contributor

@golgeek golgeek commented Apr 29, 2026

Previously, the roachtest runner was tightly coupled to *clusterImpl (the roachprod-backed cluster implementation). The runner, cluster registry, github reporter, and test monitor all accessed clusterImpl fields directly, making it impossible to plug in an alternative cluster backend without modifying runner internals.

This was inadequate because upcoming work to support managed-service clusters (e.g. CockroachDB Cloud) requires the runner to operate against clusters it does not provision via roachprod.

This patch introduces testCluster, an internal interface that captures the runner's view of a cluster, and threads it through all runner plumbing in order to decouple the runner from the concrete roachprod implementation. The early commits define the interface and mechanically replace *clusterImpl with testCluster across the runner, registry, github reporter, and test monitor. Later commits clean up remaining concrete dependencies: the post-validation consistency check is inlined in the runner, the cluster factory is split into a public entry point and a roachprod-specific implementation, and the last type assertion against *clusterImpl in test helpers is replaced with an interface-based accessor. clusterImpl continues to be the only implementation; no behavior changes.

Epic: none
Release note: None

golgeek added 2 commits April 29, 2026 13:59
Introduce `testCluster`, an internal interface that captures the
runner's view of a cluster. The runner currently accesses many
`clusterImpl` fields directly; the interface forces these accesses
through methods so that alternative cluster backends can be plugged
in without touching runner internals.

The interface embeds `cluster.Cluster` (the public test API) and adds
the lifecycle and configuration methods the runner needs: destroy,
save, wipe, cockroach staging, labels, and architecture/encryption
knobs. `clusterImpl` satisfies the new interface; no behavior changes.

Epic: none
Release note: None
Replace `*clusterImpl` with `testCluster` throughout the runner,
cluster registry, github issue reporting, and test monitor. All
direct field accesses (e.g. `c.arch`, `c.encAtRest`,
`c.clusterSettings`) are replaced with the corresponding interface
methods introduced in the previous commit.

A few methods that were missing from the interface surface are added
here: `OS()`, `SetUseDRPC()`, `GetLiveMigrationVMs()`, `Extend()`,
`Saved()`, `status()`, and `SetClusterSetting()`.

In `github.go`, a nil-interface guard (`isNilTestCluster`) is added
because the cluster may be nil when constructing issue info for tests
that failed before cluster creation.

Epic: none
Release note: None
@golgeek golgeek requested a review from a team as a code owner April 29, 2026 21:42
@golgeek golgeek requested review from cpj2195 and shailendra-patel and removed request for a team April 29, 2026 21:42
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Apr 29, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

golgeek added 3 commits April 29, 2026 18:23
Remove `assertConsistentReplicas` entirely: drop it from the
`testCluster` interface and delete the implementation on `clusterImpl`.
The method's only caller in `postTestAssertions` now uses
`roachtestutil.CheckReplicaDivergenceOnDB` directly, wrapped in a
20-minute timeout. Behavior is unchanged.

Epic: none
Release note: None
Separate the cluster factory's public entry point (`newCluster`) from
the roachprod-specific implementation (`newRoachprodCluster`). This
lets a future managed-service backend provide its own `newCluster`
without modifying the roachprod path.

Introduce the type alias `roachprodCluster = clusterImpl` so that the
roachprod backend can be referenced by a self-documenting name. Rename
the `roachprodCluster` enum constant to `roachprodClusterType` to
avoid the name collision.

Epic: none
Release note: None
Replace a `c.(*clusterImpl).encAtRest` type assertion in the test
helpers with an interface-based accessor
(`c.(interface{ EncryptedAtRest() bool })`). This removes the last
direct reference to `*clusterImpl` outside the cluster package,
allowing tests to work with any `testCluster` implementation.

Epic: none
Release note: None
@golgeek golgeek force-pushed the ludo/rt-prepare-alternative-clusters-backend branch from b41b757 to 03aba12 Compare April 29, 2026 22:24
@golgeek
Copy link
Copy Markdown
Contributor Author

golgeek commented Apr 30, 2026

TeamCity smoke tests (ran on the PR stacked over this one):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants