-
Notifications
You must be signed in to change notification settings - Fork 178
Open
Labels
state:triage-neededOpened without agent diagnostics and needs triageOpened without agent diagnostics and needs triage
Description
Agent Diagnostic
- Loaded
debug-openshell-clusterskill from .agents/skills/ - Ran
openshell status→ "tls handshake eof" (server not running) - Ran
openshell doctor logs --lines 50→ orphaned cgroup cleanup, no errors - Ran
openshell doctor exec -- kubectl get namespaces→ openshell namespace exists - Ran
openshell doctor exec -- kubectl -n openshell get secrets→ TLS secrets
missing - Ran
openshell doctor exec -- kubectl -n openshell describe pod openshell-0:- FailedMount: secret "openshell-server-tls" not found
- FailedMount: secret "openshell-server-client-ca" not found
- Root cause: CLI timed out at
wait_for_namespace()before reaching
reconcile_pki()step - K3s was still initializing (~2 min on first run with image pulls)
- Workaround: manually generated PKI with openssl, applied secrets, ran
openshell gateway add --local
Description
On first run, openshell gateway start times out waiting for the openshell namespace
(~120s) while K3s is still initializing. The CLI exits before reaching the PKI
generation step in reconcile_pki().
The container keeps running and K3s eventually creates the namespace, but without
TLS secrets the openshell-0 pod is stuck in ContainerCreating with FailedMount
errors. The gateway never becomes healthy.
Expected: Gateway starts successfully with TLS secrets created.
Actual: Timeout at namespace wait, PKI step skipped, cluster left in broken state.
Reproduction Steps
- Fresh install:
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh - Run
openshell gateway start - Observe timeout error: "K8s namespace not ready"
- Container keeps running but
openshell statusshows "tls handshake eof" openshell doctor exec -- kubectl -n openshell get secretsshows no TLS secrets
Environment
- OS: macOS (Apple Silicon, Darwin 25.3.0)
- Docker: Colima + Docker Engine 27.4.0 (4 CPUs, 8GB RAM)
- OpenShell: 0.0.11-dev.2+g1d071b8d9
Logs
Deploying local gateway openshell...
Checking Docker
Downloading gateway
Initializing environment
Error: × K8s namespace not ready
╰─▶ timed out waiting for namespace 'openshell' to exist: Error from server
(NotFound):
namespaces "openshell" not found
# After timeout, pod status:
$ openshell doctor exec -- kubectl -n openshell describe pod openshell-0
Events:
Warning FailedMount kubelet MountVolume.SetUp failed for volume "tls-client-ca"
: secret "openshell-server-client-ca" not found
Warning FailedMount kubelet MountVolume.SetUp failed for volume "tls-cert" :
secret "openshell-server-tls" not foundAgent-First Checklist
- I pointed my agent at the repo and had it investigate this issue
- I loaded relevant skills (e.g.,
debug-openshell-cluster,debug-inference,openshell-cli) - My agent could not resolve this — the diagnostic above explains why
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
state:triage-neededOpened without agent diagnostics and needs triageOpened without agent diagnostics and needs triage