Skip to content

feat(hetzner): floating-IP config surface + stable API endpoint wiring for Talos × Hetzner#5719

Merged
devantler merged 2 commits into
mainfrom
claude/hetzner-floating-ip-config
Jul 2, 2026
Merged

feat(hetzner): floating-IP config surface + stable API endpoint wiring for Talos × Hetzner#5719
devantler merged 2 commits into
mainfrom
claude/hetzner-floating-ip-config

Conversation

@devantler

Copy link
Copy Markdown
Contributor

🤖 Generated by the Daily AI Assistant

Fixes #5714 — slice 2 of the stable-API-endpoint lane (devantler-tech/platform#2120, maintainer-confirmed option A; slice 1 = the provider primitives, merged via #5699).

What

  • Config surface (OptionsHetzner, flat fields per the PlacementGroup* precedent): floatingIPEnabled (default false) + floatingIPLocation (defaults to the cluster's location, applied in applyHetznerDefaults).
  • Wiring (Talos × Hetzner create path): when enabled, updateConfigsWithEndpoint ensures the cluster-owned floating IP, attaches it to the first control-plane server, and renders it as the cluster endpoint via the existing PKI-preserving WithEndpoint + WithCertSANs seams. The SAN set = floating IP plus every control-plane node IP — the node IPs must stay verifiable because KSail's own readiness checks (and break-glass access) dial nodes directly.
  • Disabled ⇒ byte-identical behaviour: endpoint stays the first CP's reachable IP, no SAN patch, no Hetzner API calls (covered by tests).
  • Generated files: schemas/ksail-config.schema.json (gen_schema) + declarative-configuration.mdx (go generate ./docs/...) — regenerated, not hand-edited.
  • Registered the new string field on the env-var drift test's provider-infra skip list (consumed verbatim, like location).

Flags for steering

  • Known limitation (deliberate slice boundary): the floating IP is attached and rendered, but no node answers on it until a Talos VIP block exists — node-side ownership handover is filed as feat(hetzner): generate the Talos VIP machine-config for the floating-IP endpoint (node-side ownership handover) #5718 (deviceSelector-vs-name + token-delivery design decisions there). Until it lands, enabled clusters need a user-provided machine.network VIP patch; the schema description and field doc say so explicitly. Opt-in + default-off keeps this safe.
  • JSON casing: floatingIPEnabled/floatingIPLocation (not floatingIp…) to match the sibling workerPublicIPv4/controlPlanePublicIPv4 fields; tagliatelle silenced with a reasoned nolint per the matchOIDCIdentity precedent.

Validation

  • go build ./... clean; 1317 tests green across the touched packages (apis, provisioner/talos, configmanager/talos, schemas), including 7 new tests: defaults resolution (empty → cluster location, custom preserved), disabled-path no-op (endpoint = CP IP, no SAN patch, no API calls), enabled-path via the httptest fake Hetzner API from the floating_ip_test.go pattern (attach called once, endpoint = floating IP, SANs = FIP + both CP IPs), and the no-control-planes guard.
  • golangci-lint run --new-from-rev origin/main: no issues.

Follow-ups after this slice: #5718 (Talos VIP generation), then the platform-side Talos patch + ksail.prod.yaml + runbook Scenario 9 (HIGH blast-radius CP roll, maintainer-sequenced per #2120).

…g for Talos × Hetzner

Adds the opt-in config surface for the stable-API-endpoint lane
(platform#2120 option A): floatingIPEnabled + floatingIPLocation on
OptionsHetzner. When enabled, cluster create ensures the cluster-owned
floating IP (EnsureFloatingIP, #5699), attaches it to the first
control-plane server, and renders it as the cluster endpoint with the
floating IP plus every control-plane node IP in the certificate SANs
(node IPs must stay verifiable — readiness checks and break-glass access
dial nodes directly). Disabled (default) renders byte-identical configs.

Node-side ownership handover (Talos VIP block) ships separately (#5718);
until then the address must be claimed via a user Talos patch.

Fixes #5714
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: defc34eb-8df4-4810-b325-f97c0ad79f13

📥 Commits

Reviewing files that changed from the base of the PR and between 79564fe and d5dcb98.

⛔ Files ignored due to path filters (1)
  • web/ui/src/generated/ksail-config.ts is excluded by !**/generated/**
📒 Files selected for processing (2)
  • charts/ksail-operator/crds/ksail.io_clusters.yaml
  • pkg/svc/chat/docs_generated.go

📝 Walkthrough

Walkthrough

Adds Hetzner floating IP configuration, defaulting, Talos endpoint/SAN rendering, tests, and matching schema/docs updates.

Changes

Hetzner Floating IP Endpoint Feature

Layer / File(s) Summary
Hetzner options schema and defaulting
pkg/apis/cluster/v1alpha1/options.go, pkg/apis/cluster/v1alpha1/envvar_drift_test.go, pkg/svc/provisioner/cluster/talos/factory.go
Adds FloatingIPEnabled and FloatingIPLocation to OptionsHetzner, updates the drift-guard skip list, and defaults FloatingIPLocation to Location in applyHetznerDefaults.
Endpoint and certificate SAN rendering logic
pkg/svc/provisioner/cluster/talos/provisioner_hetzner.go
Updates updateConfigsWithEndpoint to accept ctx, hzProvider, clusterName, and conditionally provisions/attaches a floating IP via new ensureFloatingIPEndpoint, regenerating configs and cert SANs; updates the applyConfigsAndBootstrap call site.
Test seams and floating IP endpoint tests
pkg/svc/provisioner/cluster/talos/export_test.go, pkg/svc/provisioner/cluster/talos/provisioner_hetzner_floating_ip_test.go
Adds UpdateConfigsWithEndpointForTest and TalosConfigsForTest helpers, plus a new test file covering FloatingIPLocation defaulting, endpoint/SAN rendering with floating IP enabled/disabled, and no-control-plane error handling via a mocked Hcloud HTTP server.
Schema and documentation updates
docs/src/content/docs/configuration/declarative-configuration.mdx, schemas/ksail-config.schema.json
Documents floatingIPEnabled and floatingIPLocation in the declarative configuration docs and JSON schema.

Estimated code review effort: 3 (Moderate) | ~30 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Provisioner
  participant ensureFloatingIPEndpoint
  participant HetznerProvider
  participant TalosConfigs

  Provisioner->>Provisioner: updateConfigsWithEndpoint(ctx, hzProvider, clusterName, controlPlaneServers)
  alt FloatingIPEnabled
    Provisioner->>ensureFloatingIPEndpoint: ensure and attach floating IP
    ensureFloatingIPEndpoint->>HetznerProvider: create/find floating IP
    HetznerProvider-->>ensureFloatingIPEndpoint: floating IP address
    ensureFloatingIPEndpoint->>HetznerProvider: attach floating IP to control-plane server
    ensureFloatingIPEndpoint-->>Provisioner: floating IP + SAN list
    Provisioner->>TalosConfigs: regenerate configs with floating IP endpoint and SANs
  else FloatingIPDisabled
    Provisioner->>TalosConfigs: regenerate configs with first control-plane IP as endpoint
  end
Loading

Possibly related issues

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main Hetzner floating-IP and stable endpoint wiring change.
Description check ✅ Passed The description is directly related to the changeset and matches the implemented floating-IP configuration and wiring.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/hetzner-floating-ip-config

Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

MegaLinter analysis: Success

✅ Linters with no issues

actionlint, bash-exec, git_diff, hadolint, jscpd, jsonlint, lychee, markdown-table-formatter, markdownlint, prettier, prettier, shellcheck, shfmt, stylelint, syft, trivy-sbom, trufflehog, v8r, v8r, yamllint

Notices

📣 MegaLinter 9.5.0 is out! Discover the new features and security recommendations in the release announcement. (Skip this info by defining SECURITY_SUGGESTIONS: false)

See detailed reports in MegaLinter artifacts

MegaLinter is graciously provided by OX Security
Show us your support by starring ⭐ the repository

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/svc/provisioner/cluster/talos/provisioner_hetzner_floating_ip_test.go (1)

166-226: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add error-path coverage for ensureFloatingIPEndpoint.

Current tests cover disabled/enabled happy paths and the no-control-plane guard, but not EnsureFloatingIP failing (e.g. an existing non-ksail-owned floating IP, or ownership rejection) nor AttachFloatingIPToServer failing (e.g. the assign action returning an error). Both are realistic failure modes of this new opt-in path and aren't exercised.

Want me to draft these two additional test cases using the existing floatingIPEndpointTestServer-style httptest fixtures?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/svc/provisioner/cluster/talos/provisioner_hetzner_floating_ip_test.go`
around lines 166 - 226, Add error-path test coverage for
ensureFloatingIPEndpoint in the floating IP opt-in flow. Extend the existing
floatingIPEndpointTestServer-style fixture and the
TestUpdateConfigsWithEndpoint_FloatingIPEnabled area to cover two failures:
EnsureFloatingIP returning an error for a non-owned or rejected floating IP, and
AttachFloatingIPToServer failing when the assign action errors. Use
UpdateConfigsWithEndpointForTest and the Hetzner provider path to assert these
errors are surfaced.
pkg/svc/provisioner/cluster/talos/provisioner_hetzner.go (1)

159-178: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Redundant recomputation of the first control-plane node's address.

firstCPIP is computed at Line 159 and then discarded whenever FloatingIPEnabled is true (overwritten at Line 176), and ensureFloatingIPEndpoint's loop over controlPlaneServers (Lines 235-242) recomputes hetznerNodeTalosAddress for controlPlaneServers[0] a second time. Not a correctness issue, just avoidable duplicate work.

♻️ Possible simplification
-	certSANs := make([]string, 0, len(controlPlaneServers)+1)
-	certSANs = append(certSANs, endpointIP)
-
-	for _, server := range controlPlaneServers {
-		nodeIP, nodeAddrErr := hetznerNodeTalosAddress(server)
-		if nodeAddrErr != nil {
-			return "", nil, nodeAddrErr
-		}
-
-		certSANs = append(certSANs, nodeIP)
-	}
+	certSANs := make([]string, 0, len(controlPlaneServers)+1)
+	certSANs = append(certSANs, endpointIP, firstCPIP)
+
+	for _, server := range controlPlaneServers[1:] {
+		nodeIP, nodeAddrErr := hetznerNodeTalosAddress(server)
+		if nodeAddrErr != nil {
+			return "", nil, nodeAddrErr
+		}
+
+		certSANs = append(certSANs, nodeIP)
+	}

(requires threading firstCPIP into ensureFloatingIPEndpoint's parameters)

Also applies to: 232-242

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/svc/provisioner/cluster/talos/provisioner_hetzner.go` around lines 159 -
178, Avoid recomputing the first control-plane address in
provisioner_hetzner.go: the initial hetznerNodeTalosAddress result is already
available as firstCPIP in the main provisioning flow and is being recalculated
again inside ensureFloatingIPEndpoint. Thread firstCPIP into
ensureFloatingIPEndpoint and reuse it for the first server instead of calling
hetznerNodeTalosAddress(controlPlaneServers[0]) again, while keeping the
endpointIP assignment logic in the caller unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/svc/provisioner/cluster/talos/provisioner_hetzner_floating_ip_test.go`:
- Around line 166-226: Add error-path test coverage for ensureFloatingIPEndpoint
in the floating IP opt-in flow. Extend the existing
floatingIPEndpointTestServer-style fixture and the
TestUpdateConfigsWithEndpoint_FloatingIPEnabled area to cover two failures:
EnsureFloatingIP returning an error for a non-owned or rejected floating IP, and
AttachFloatingIPToServer failing when the assign action errors. Use
UpdateConfigsWithEndpointForTest and the Hetzner provider path to assert these
errors are surfaced.

In `@pkg/svc/provisioner/cluster/talos/provisioner_hetzner.go`:
- Around line 159-178: Avoid recomputing the first control-plane address in
provisioner_hetzner.go: the initial hetznerNodeTalosAddress result is already
available as firstCPIP in the main provisioning flow and is being recalculated
again inside ensureFloatingIPEndpoint. Thread firstCPIP into
ensureFloatingIPEndpoint and reuse it for the first server instead of calling
hetznerNodeTalosAddress(controlPlaneServers[0]) again, while keeping the
endpointIP assignment logic in the caller unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: de3a93c8-20cf-4f14-b3ef-9d3d24e569da

📥 Commits

Reviewing files that changed from the base of the PR and between 79b1f43 and 79564fe.

📒 Files selected for processing (8)
  • docs/src/content/docs/configuration/declarative-configuration.mdx
  • pkg/apis/cluster/v1alpha1/envvar_drift_test.go
  • pkg/apis/cluster/v1alpha1/options.go
  • pkg/svc/provisioner/cluster/talos/export_test.go
  • pkg/svc/provisioner/cluster/talos/factory.go
  • pkg/svc/provisioner/cluster/talos/provisioner_hetzner.go
  • pkg/svc/provisioner/cluster/talos/provisioner_hetzner_floating_ip_test.go
  • schemas/ksail-config.schema.json

@devantler devantler marked this pull request as ready for review July 2, 2026 19:21
@devantler devantler marked this pull request as draft July 2, 2026 19:21
@github-code-quality

Copy link
Copy Markdown
Contributor

Code Coverage Overview

Languages: Go

Go / code-coverage/go

The overall coverage in the branch remains at 64%, unchanged from the branch.

Show a code coverage summary of the most impacted files.
File 6dd9c01 d5dcb98 +/-
pkg/cli/cluster...ocal_service.go 92% 92% 0%
pkg/svc/provisi...alos/factory.go 22% 24% +2%
pkg/svc/provisi...oner_hetzner.go 8% 16% +8%

Code Coverage is in Public Preview. Learn more and provide us with your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

feat(hetzner): floating-IP config surface + stable API endpoint wiring for Talos × Hetzner

1 participant