fix(cluster-install): correct v1.4.x namespace, storageclass gate, dashboard OIDC#14
Conversation
…shboard OIDC Three corrections to the cluster-install skill found while installing Cozystack v1.4.2 on a fresh Talos cluster end to end. cozy-installer namespace: install into cozy-system with --create-namespace. v1.4.0 stopped templating the cozy-system Namespace in the chart and moved namespace creation to --create-namespace plus a pre-install labeler hook (in kube-system, hostNetwork, NotReady-tolerant) that stamps PSA=privileged. The previous --namespace kube-system form (no --create-namespace) makes the labeler hook fail with 'namespaces "cozy-system" not found' and aborts the install before the operator deploys. Both v1.3.x and v1.4.x forms are now documented side by side. StorageClasses: gate on a live 'kubectl get storageclass' check instead of a version branch. The prior guidance skipped this phase on installer_version >= 1.4.0 assuming the tenant CRD exposes spec.storageClasses and the operator auto-creates them. On v1.4.2 the shipped tenant CRD has no storageClasses field and nothing creates the classes, so the cluster reaches all-HRs-Ready with zero StorageClasses and every stateful PVC stuck Pending. The live check creates the linstor defaults whenever the cluster comes up empty and self-skips if a future release starts creating them. Also documents that SC creation belongs inside the Phase 8 watch loop, gated on linstor-controller Ready, to avoid deadlocking on PVC-dependent HRs. dashboard OIDC: enable authentication.oidc, set the root tenant spec.host (it does not inherit publishing.host), and expose keycloak. The non-OIDC token-proxy dashboard path is broken on v1.4.2 (container never binds its port and CrashLoops on its own liveness probe), so a working web dashboard requires Keycloak. This mirrors what the upstream e2e install does. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
📝 WalkthroughWalkthroughThis PR updates cluster-install documentation: the cozy-installer Helm release and commands now target ChangesCluster-install v1.4.0+ behavior updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the cozystack cluster installation documentation to support v1.4.x changes, including updating the installer namespace to cozy-system, documenting OIDC/Keycloak requirements for the dashboard, patching the root tenant's host, and shifting StorageClass creation to a live-cluster check inside the Phase 8 watch loop. The review feedback suggests normalizing the installer version variable to prevent prefix issues in Helm commands and replacing a placeholder with the correct variable in the documentation.
| # v1.4.0+ (current): release in cozy-system, --create-namespace REQUIRED, | ||
| # the pre-install labeler hook stamps PSA=privileged on the new namespace. | ||
| helm --kube-context $CTX upgrade --install cozy-installer \ | ||
| oci://ghcr.io/cozystack/cozystack/cozy-installer \ | ||
| --version $INSTALLER_VERSION \ | ||
| --namespace kube-system \ | ||
| --namespace cozy-system --create-namespace \ | ||
| --set cozystackOperator.variant=$INSTALLER_VARIANT \ | ||
| --set cozystack.apiServerHost=$API_HOST \ | ||
| --set cozystack.apiServerPort=$API_PORT \ | ||
| --wait --timeout 10m |
There was a problem hiding this comment.
The Helm command uses $INSTALLER_VERSION directly, which might contain the v prefix (e.g., v1.4.2). Helm's OCI client expects the exact tag without the v prefix. To prevent installation failures, normalize the version string to $INSTALLER_VERSION_OCI before running the helm command, matching the behavior in SKILL.md.
| # v1.4.0+ (current): release in cozy-system, --create-namespace REQUIRED, | |
| # the pre-install labeler hook stamps PSA=privileged on the new namespace. | |
| helm --kube-context $CTX upgrade --install cozy-installer \ | |
| oci://ghcr.io/cozystack/cozystack/cozy-installer \ | |
| --version $INSTALLER_VERSION \ | |
| --namespace kube-system \ | |
| --namespace cozy-system --create-namespace \ | |
| --set cozystackOperator.variant=$INSTALLER_VARIANT \ | |
| --set cozystack.apiServerHost=$API_HOST \ | |
| --set cozystack.apiServerPort=$API_PORT \ | |
| --wait --timeout 10m | |
| # Normalise: v1.3.3 → 1.3.3 (Helm's OCI client matches the registry tag as-is) | |
| INSTALLER_VERSION_OCI="${INSTALLER_VERSION#v}" | |
| # v1.4.0+ (current): release in cozy-system, --create-namespace REQUIRED, | |
| # the pre-install labeler hook stamps PSA=privileged on the new namespace. | |
| helm --kube-context $CTX upgrade --install cozy-installer \ | |
| oci://ghcr.io/cozystack/cozystack/cozy-installer \ | |
| --version "$INSTALLER_VERSION_OCI" \ | |
| --namespace cozy-system --create-namespace \ | |
| --set cozystackOperator.variant=$INSTALLER_VARIANT \ | |
| --set cozystack.apiServerHost=$API_HOST \ | |
| --set cozystack.apiServerPort=$API_PORT \ | |
| --wait --timeout 10m |
There was a problem hiding this comment.
Good catch — fixed in 4a4da47. Both helm examples now strip a leading v into $INSTALLER_VERSION_OCI before --version, matching Phase 6 in SKILL.md. Confirmed against the ghcr registry: the cozy-installer chart tags are published as X.Y.Z with no v prefix, so a v-prefixed --version would not resolve.
| # v1.3.x ONLY — do not use against a v1.4+ chart. | ||
| helm --kube-context $CTX upgrade --install cozy-installer \ | ||
| oci://ghcr.io/cozystack/cozystack/cozy-installer \ | ||
| --version $INSTALLER_VERSION \ | ||
| --namespace kube-system \ | ||
| --set cozystackOperator.variant=$INSTALLER_VARIANT \ | ||
| --wait --timeout 10m |
There was a problem hiding this comment.
The Helm command uses $INSTALLER_VERSION directly, which might contain the v prefix (e.g., v1.3.3). Helm's OCI client expects the exact tag without the v prefix. To prevent installation failures, normalize the version string to $INSTALLER_VERSION_OCI before running the helm command, matching the behavior in SKILL.md.
| # v1.3.x ONLY — do not use against a v1.4+ chart. | |
| helm --kube-context $CTX upgrade --install cozy-installer \ | |
| oci://ghcr.io/cozystack/cozystack/cozy-installer \ | |
| --version $INSTALLER_VERSION \ | |
| --namespace kube-system \ | |
| --set cozystackOperator.variant=$INSTALLER_VARIANT \ | |
| --wait --timeout 10m | |
| # v1.3.x ONLY — do not use against a v1.4+ chart. | |
| # Normalise: v1.3.3 → 1.3.3 (Helm's OCI client matches the registry tag as-is) | |
| INSTALLER_VERSION_OCI="${INSTALLER_VERSION#v}" | |
| helm --kube-context $CTX upgrade --install cozy-installer \ | |
| oci://ghcr.io/cozystack/cozystack/cozy-installer \ | |
| --version "$INSTALLER_VERSION_OCI" \ | |
| --namespace kube-system \ | |
| --set cozystackOperator.variant=$INSTALLER_VARIANT \ | |
| --wait --timeout 10m |
There was a problem hiding this comment.
Fixed in 4a4da47 — the v1.3.x example now normalises to $INSTALLER_VERSION_OCI as well.
| --output jsonpath='{range .items[?(@.status.conditions[?(@.type=="Ready" && @.status!="True")])]}{.metadata.namespace}/{.metadata.name} {end}' | ||
| ``` | ||
|
|
||
| On the full `system`-bundle path you may also want the root tenant's `etcd`/`monitoring`/`seaweedfs` services (this is what cozystack's own `hack/e2e-install-cozystack.bats` patches): extend the patch to `{"spec":{"ingress":true,"host":"<HOST>","monitoring":true,"etcd":true,"seaweedfs":true}}` when those were selected in Phase 4. Leave them at their defaults otherwise. |
There was a problem hiding this comment.
Fixed in eed8670 — the extended patch example now uses ${HOST} to match the watch-loop command above it.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/cozystack/skills/cluster-install/references/values-template.md`:
- Around line 124-125: Update the "After Package apply" example to patch both
the root tenant CR fields required by the Phase 8 watch loop: set spec.host to
the Package publishing.host value and set spec.ingress: true; locate the root
tenant CR editing command/example in values-template.md and modify it so it
writes both spec.host and spec.ingress (refer to the root tenant CR, spec.host,
spec.ingress, and the Phase 8 watch loop) instead of only patching ingress.
- Around line 31-37: The helm example uses $INSTALLER_VERSION directly which
fails if callers include a leading "v"; update the docs/snippet to explicitly
normalize the installer tag by stripping a leading "v" when used in the helm
--version argument (e.g., use the shell-parameter form that removes a leading v
such as ${INSTALLER_VERSION#v}) and mention in the surrounding text that OCI
chart tags must be X.Y.Z (no "v"); change the helm line that contains --version
$INSTALLER_VERSION to use the normalized form and keep the rest
(cozystackOperator.variant, --namespace cozy-system, --create-namespace)
unchanged.
In `@plugins/cozystack/skills/cluster-install/SKILL.md`:
- Around line 643-644: Scan SKILL.md for hardcoded namespace snippets
referencing "kube-system" (especially in operator-facing plan/actions/artifacts
and NOTES/install snippets mentioning "cozy-installer") and update them to the
v1.4+ default "cozy-system" unless the snippet is explicitly guarded by an
installer_version legacy branch for v1.3.x; ensure any examples or command lines
that create or reference the namespace match the guidance in
references/values-template.md, preserve the v1.3 `--namespace kube-system` form
only inside explicit legacy sections, and verify all occurrences of
"cozy-installer" namespace usage are aligned or commented as legacy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 344f950d-6fc7-4771-b271-29a3d88569aa
📒 Files selected for processing (2)
plugins/cozystack/skills/cluster-install/SKILL.mdplugins/cozystack/skills/cluster-install/references/values-template.md
…helm commands The cozy-installer OCI chart tags are published as X.Y.Z with no leading v (verified against the ghcr registry), but both helm command examples in the reference passed $INSTALLER_VERSION verbatim. An operator-supplied v1.4.2 would not match the registry tag 1.4.2 and the install would fail. Strip the leading v into $INSTALLER_VERSION_OCI before --version, matching what SKILL.md Phase 6 already does. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…tem namespace Phase 6 was updated to the v1.4 release layout (cozy-system, --create-namespace, pre-install labeler hook), but the plan-view summary, the artifacts summary, and the issue reproduction template still described the v1.3 form where cozy-installer lived in kube-system and the chart templated the namespace itself. An operator following those leftover snippets would run --namespace kube-system against a v1.4 chart and the labeler hook would abort the install. Bring all three in line with Phase 6. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Phase 8.6 was changed to create default StorageClasses whenever the live cluster comes up empty, on the finding that v1.4.2 does not auto-create them. But two sibling references still asserted the opposite — that v1.4+ exposes tenants.apps.cozystack.io spec.storageClasses and the operator creates the classes from the tenant declaration. That field is absent from the shipped tenant CRD on v1.4.2 and from the monorepo source through current HEAD, so nothing auto-creates StorageClasses on v1.4 either. Align the pitfall note and the wizard preflight entry with the live-gate guidance so the bundle no longer carries contradictory instructions. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…here
The Phase 8 watch loop was updated to set both spec.host and spec.ingress
on the root tenant, because it ships with spec.host: "" and does not
inherit publishing.host. Every other spot still patched or described ingress
alone: the plan-view action list, the standing rule in the lessons section,
the system-bundle extended-patch example (which also used a <HOST>
placeholder instead of ${HOST}), the After Package apply reference snippet,
and the stalled-install recovery snippet in known-failures. Set host in all
of them so no example leaves the per-tenant ingress objects rendering
against an empty domain.
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugins/cozystack/skills/cluster-install/references/provider-pitfalls.md (1)
81-81: 💤 Low valueConsider a version-agnostic heading to reduce maintenance.
The heading explicitly lists "v1.3.x and v1.4.2", but line 85 states the issue persists "through current HEAD", suggesting it affects versions beyond v1.4.2. A version-specific heading will require updates whenever a new release continues the same behavior. Consider a more general heading like "Cozystack does not create StorageClasses automatically" and document affected versions in the mechanism paragraph where they're already detailed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/cozystack/skills/cluster-install/references/provider-pitfalls.md` at line 81, Update the versioned heading "## Cozystack does not create StorageClasses automatically (v1.3.x and v1.4.2)" to a version-agnostic title such as "## Cozystack does not create StorageClasses automatically", and move the specific version details currently in the mechanism paragraph (which notes the behavior persists "through current HEAD") into that paragraph so affected versions are documented there instead of in the heading.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@plugins/cozystack/skills/cluster-install/references/provider-pitfalls.md`:
- Line 81: Update the versioned heading "## Cozystack does not create
StorageClasses automatically (v1.3.x and v1.4.2)" to a version-agnostic title
such as "## Cozystack does not create StorageClasses automatically", and move
the specific version details currently in the mechanism paragraph (which notes
the behavior persists "through current HEAD") into that paragraph so affected
versions are documented there instead of in the heading.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f523b947-54e3-42e8-8b15-832d69990cfa
📒 Files selected for processing (6)
plugins/cozystack/skills/cluster-install/SKILL.mdplugins/cozystack/skills/cluster-install/references/issue-templates.mdplugins/cozystack/skills/cluster-install/references/known-failures.mdplugins/cozystack/skills/cluster-install/references/provider-pitfalls.mdplugins/cozystack/skills/cluster-install/references/values-template.mdplugins/cozystack/skills/wizard/SKILL.md
✅ Files skipped from review due to trivial changes (2)
- plugins/cozystack/skills/wizard/SKILL.md
- plugins/cozystack/skills/cluster-install/references/issue-templates.md
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/cozystack/skills/cluster-install/references/values-template.md
- plugins/cozystack/skills/cluster-install/SKILL.md
What
Three corrections to the
cluster-installskill, found while installing Cozystack v1.4.2 on a fresh Talos cluster end to end. Each item is a place where the skill's guidance matched v1.3.x behaviour but breaks on v1.4.x.1. cozy-installer namespace (
references/values-template.md, Phase 6)The skill installed the chart with
--namespace kube-systemand explicitly avoided--create-namespace. That is the v1.3.x contract. As of v1.4.0 (cozystack/cozystack#2508) the chart no longer templates thecozy-systemNamespace on the helm path — the release lives incozy-systemand the caller must pass--create-namespace, after which a pre-install hook Job (cozy-system-labeler, inkube-system, hostNetwork, NotReady/CNI-not-ready tolerant) stamps the new namespace withPodSecurity enforce=privileged+cozystack.io/system=true.With the old form against a v1.4 chart the labeler hook fails with
namespaces "cozy-system" not foundand the whole install aborts before the operator deploys. Both forms are now documented side by side so the operator picks the one matchinginstaller_version.2. StorageClasses — gate on the live cluster, not a version number (Phase 8.6)
Phase 8.6 skipped on
installer_version >= 1.4.0, assuming the tenant CRD exposesspec.storageClassesand the operator auto-creates the classes. On v1.4.2 that field does not exist in the shipped CRD (kubectl get crd tenants.apps.cozystack.io -o yaml | grep -c storageClass→0) and nothing creates the StorageClasses, so the cluster reaches all-HRs-Ready withkubectl get storageclassempty. Every stateful PVC (keycloak-db, etcd, seaweedfs, vmstorage/vlstorage) then sitsPending, which cascades: keycloak CrashLoops with no DB, and cozystack-api/controller/dashboard never go Ready.The fix replaces the version branch with a live
kubectl get storageclasscheck — create the linstor defaults whenever the cluster comes up empty, self-skip if a future release starts creating them. Also documents that SC creation belongs inside the Phase 8 watch loop (gated onlinstor-controllerReady), not after all-HRs-Ready, to avoid deadlocking on PVC-dependent HRs.3. Dashboard requires OIDC/Keycloak (Phase 8 +
references/values-template.md)The skill enabled
spec.ingresson the root tenant but never enabled OIDC and never set the tenantspec.host. Two problems:authentication.oidc.enableddefaults tofalseand theisp-full*overlays do not turn it on, so no Keycloak deploys and the dashboard falls back to its non-OIDCtoken-proxycontainer — which on v1.4.2 never binds its port and CrashLoops on its own liveness probe. The cluster sits at "88/90 HRs Ready" forever.spec.host: ""and does not inheritpublishing.hostfrom the Platform Package, so per-tenant ingress objects render against an empty domain.The watch-loop patch now sets both
spec.hostandspec.ingress, and the skill enablesauthentication.oidc.enabledpluskeycloakinexposedServices(baked into the Platform Package, with a runtime-patch fallback). This mirrors what the upstreamhack/e2e-install-cozystack.batsdoes.Not in scope (already correct)
The OCI
install.image-ignored trap (Talos booted from a disk image skips the installer, so cozystack extensions never land) is already handled bytalos-bootstrapPhase 11 verification + Phase 11.5 auto-upgrade. No change needed there.Testing
All three were validated by a full end-to-end install of Cozystack v1.4.2 on a 3-node Talos cluster on a NAT'd cloud provider: cluster reached 98/98 HelmReleases Ready, 49/49 certificates issued, dashboard reachable over SSO. Docs-only change to the skill — no executable code paths altered beyond the embedded command snippets.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes