You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loaded debug-openshell-cluster skill. openshell gateway start against ghcr.io/nvidia/openshell/cluster:dev (current main, f17806c) aborts after ~2 min with × K8s namespace not ready.
Cluster container k3s log repeatedly emits ApplyManifestFailed for /var/lib/rancher/k3s/server/manifests/envoy-gateway-openshell.yaml — the server could not find the requested resource. The manifest defines a gateway.networking.k8s.io/v1GatewayClass; the bundled cluster image installs no Gateway API CRDs.
Dockerfile.images:204 wildcard-copies deploy/kube/manifests/*.yaml into /opt/openshell/manifests/; cluster-entrypoint.sh:354-359 then copies that dir into k3s's static manifest dir unconditionally. So the file leaks into the bundled k3s deploy path.
envoy-gateway-openshell.yaml:4-7 documents itself as opt-in: "Apply after a successful Skaffold deploy when gateway routing is enabled: mise run helm:gateway:apply". tasks/helm.toml:59-61 runs kubectl apply against the same path — that path expects Envoy Gateway CRDs already present.
The repeated retries on the unknown CRD slow the deploy controller enough that the bundled openshell HelmChart does not finish installing inside wait_for_namespace's 60-attempt / ~2 min budget (crates/openshell-bootstrap/src/lib.rs:1184), so the openshell namespace never appears.
Caller-side configuration cannot fix this — the file is baked into the cluster image. Fix has to land in the cluster image build.
Description
Actual:openshell gateway start against a clean cluster:dev times out because k3s cannot apply envoy-gateway-openshell.yaml. The openshell namespace is never created within the bootstrap budget.
Expected:gateway start completes; the bundled k3s applies only the manifests intended for the bundled deploy path (agent-sandbox.yaml, openshell-helmchart.yaml).
Reproduction Steps
Repro'd locally on macOS 15.x + Docker Desktop with openshell 0.0.36 (stable, PyPI) against the :dev cluster image. Same signature observed on Ubuntu 24.04 in CI.
uv tool install -U openshell (or any 0.0.36+ install). Ensure no prior gateway is running (openshell gateway destroy if needed).
Confirm the file is in the cluster image:
docker run --rm --entrypoint cat ghcr.io/nvidia/openshell/cluster:dev \
/opt/openshell/manifests/envoy-gateway-openshell.yaml
prints the GatewayClass eg YAML. Image digest at time of repro: sha256:20e0190c1cacbe13036afce465db93ea6fae9f763ec65bf9c3301d4d093ec75a.
After ~2 min, exits with × K8s namespace not ready ╰─▶ timed out waiting for namespace 'openshell' to exist. Container-logs section of the error includes repeated ApplyManifestFailed for envoy-gateway-openshell.yaml.
× Gateway failed: os-bugrepro
Error: × K8s namespace not ready
╰─▶ timed out waiting for namespace 'openshell' to exist: Error from server
(NotFound): namespaces "openshell" not found
container logs:
...
time="2026-05-06T13:24:08Z" level=error msg="Failed to process config:
failed to process /var/lib/rancher/k3s/server/manifests/envoy-gateway-
openshell.yaml: the server could not find the requested resource"
object="kube-system/envoy-gateway-openshell" kind="Addon"
apiVersion="k3s.cattle.io/v1" type="Warning" reason="ApplyManifestFailed"
message="Applying manifest at \"/var/lib/rancher/k3s/server/manifests/
envoy-gateway-openshell.yaml\" failed: the server could not find the
requested resource"
Suggested Fix
The file is documented as opt-in for Skaffold/Helm dev clusters that already have Envoy Gateway CRDs installed; it should not ship in the bundled-cluster path. Two minimally invasive options:
Move it out of deploy/kube/manifests/ (e.g. deploy/kube/gateway-extras/) and update tasks/helm.toml (helm:gateway:apply) to the new path.
Or narrow Dockerfile.images:204COPY to an explicit allowlist (agent-sandbox.yaml openshell-helmchart.yaml) instead of *.yaml.
Agent-First Checklist
I pointed my agent at the repo and had it investigate this issue
I loaded relevant skills (debug-openshell-cluster)
My agent could not resolve this — the diagnostic above explains why (root cause is in the cluster image build; no caller-side configuration reaches it)
Agent Diagnostic
debug-openshell-clusterskill.openshell gateway startagainstghcr.io/nvidia/openshell/cluster:dev(currentmain,f17806c) aborts after ~2 min with× K8s namespace not ready.ApplyManifestFailedfor/var/lib/rancher/k3s/server/manifests/envoy-gateway-openshell.yaml—the server could not find the requested resource. The manifest defines agateway.networking.k8s.io/v1GatewayClass; the bundled cluster image installs no Gateway API CRDs.Dockerfile.images:204wildcard-copiesdeploy/kube/manifests/*.yamlinto/opt/openshell/manifests/;cluster-entrypoint.sh:354-359then copies that dir into k3s's static manifest dir unconditionally. So the file leaks into the bundled k3s deploy path.envoy-gateway-openshell.yaml:4-7documents itself as opt-in: "Apply after a successful Skaffold deploy when gateway routing is enabled:mise run helm:gateway:apply".tasks/helm.toml:59-61runskubectl applyagainst the same path — that path expects Envoy Gateway CRDs already present.HelmChartdoes not finish installing insidewait_for_namespace's 60-attempt / ~2 min budget (crates/openshell-bootstrap/src/lib.rs:1184), so theopenshellnamespace never appears.git logconfirms regression in5116cc2(PR feat(helm): add kubernetes local-dev environment #1158, merged 2026-05-05). Pin4483c860did not have the file and reproduces clean.Caller-side configuration cannot fix this — the file is baked into the cluster image. Fix has to land in the cluster image build.
Description
Actual:
openshell gateway startagainst a cleancluster:devtimes out because k3s cannot applyenvoy-gateway-openshell.yaml. Theopenshellnamespace is never created within the bootstrap budget.Expected:
gateway startcompletes; the bundled k3s applies only the manifests intended for the bundled deploy path (agent-sandbox.yaml,openshell-helmchart.yaml).Reproduction Steps
Repro'd locally on macOS 15.x + Docker Desktop with
openshell 0.0.36(stable, PyPI) against the:devcluster image. Same signature observed on Ubuntu 24.04 in CI.uv tool install -U openshell(or any 0.0.36+ install). Ensure no prior gateway is running (openshell gateway destroyif needed).Confirm the file is in the cluster image:
prints the
GatewayClass egYAML. Image digest at time of repro:sha256:20e0190c1cacbe13036afce465db93ea6fae9f763ec65bf9c3301d4d093ec75a.Start a gateway against that image:
After ~2 min, exits with
× K8s namespace not ready ╰─▶ timed out waiting for namespace 'openshell' to exist. Container-logs section of the error includes repeatedApplyManifestFailedforenvoy-gateway-openshell.yaml.Environment
openshell 0.0.36from PyPI.ubuntu-24.04), Docker Engine 27.x.ghcr.io/nvidia/openshell/cluster:devatmain(f17806c).4483c860. Regression:5116cc2(PR feat(helm): add kubernetes local-dev environment #1158).Logs
Captured from the local repro above:
Suggested Fix
The file is documented as opt-in for Skaffold/Helm dev clusters that already have Envoy Gateway CRDs installed; it should not ship in the bundled-cluster path. Two minimally invasive options:
deploy/kube/manifests/(e.g.deploy/kube/gateway-extras/) and updatetasks/helm.toml(helm:gateway:apply) to the new path.Dockerfile.images:204COPYto an explicit allowlist (agent-sandbox.yaml openshell-helmchart.yaml) instead of*.yaml.Agent-First Checklist
debug-openshell-cluster)