-
Notifications
You must be signed in to change notification settings - Fork 0
Troubleshooting
Problem: Kubeconfig has OCI exec plugin config but OCI CLI is not installed on the runner.
Solution: The workflow now builds kubeconfig programmatically from a static SA token instead of using OCI exec auth.
Problem: gh secret set KUBECONFIG_DEV --body "$(<kubeconfig)" stores truncated/broken data.
Root Cause: Base64-encoded kubeconfig files are too large (~4000 chars) and gh secret set --body has intermittent failure with large payloads.
Solution:
- Use a small SA token (~930 chars) instead of full kubeconfig
- Build kubeconfig programmatically in the workflow:
kubectl config set-cluster ... --server=... --certificate-authority=... kubectl config set-credentials ... --token=${{ secrets.SA_TOKEN }} kubectl config set-context ... --cluster=... --user=...
Problem: Deployment hangs for 5+ minutes then fails with context deadline exceeded.
Causes:
- Missing ServiceAccount (pod can't be created)
- Missing
ghcr-pullsecret (ImagePullBackOff) - Missing
oscar-app-secret(CrashLoopBackOff) - Missing DATABASE_URL (connection refused)
Debug: The CD workflow has debug commands that run on failure, but to inspect interactively:
kubectl get pods -n oscar-dev
kubectl describe pod <pod-name> -n oscar-dev
kubectl logs <pod-name> -n oscar-devProblem: App starts but crashes after 5 seconds because database is unreachable.
Solution: Deploy Postgres manually or set postgresql.enabled=true in values:
# Manual deployment
kubectl run postgres --image=docker.io/library/postgres:16-alpine \
--env=POSTGRES_PASSWORD=postgres \
--env=POSTGRES_DB=opencrm \
-n oscar-devProblem: ImagePullBackOff on official images like postgres:16-alpine.
Root Cause: OKE's CRI-O runtime refuses short image names unless they're in the public registry allowlist.
Solution: Always use full image path:
-
docker.io/library/postgres:16-alpine(notpostgres:16-alpine) -
docker.io/library/redis:7-alpine(notredis:7-alpine) -
docker.io/bitnami/nginx-ingress-controller:latest(notnginx-ingress-controller:latest)
Problem: kubectl commands fail on the GitHub runner.
Root Cause: Kubeconfig references OCI exec plugin (command: oci) which doesn't exist on the runner.
Solution: Recreate kubeconfig using kubectl config commands with static SA token (see Deployment-Guide#Service Account Setup).
Problem: Helm deploy succeeds but no pods appear.
Root Cause: Helm chart was missing templates/serviceaccount.yaml. The deployment referenced serviceAccountName: oscar but no such SA existed.
Solution: Add the template — see deploy/helm/opencrm/templates/serviceaccount.yaml.
Problem: Service is accessible internally but no ingress.
Root Cause: Helm chart was missing templates/ingress.yaml.
Solution: Add the template — see deploy/helm/opencrm/templates/ingress.yaml.
Problem: Docker build fails with this error.
Root Cause: Git tags like v0.1.0+abc123 contain + which Docker rejects.
Solution: Replace + with - in the version string:
VERSION=$(echo $VERSION | sed 's/+/-/g')Problem: Docker push fails.
Root Cause: ${{ github.repository }} expands to CaptDany/oscar (uppercase C).
Solution: Hardcode as captdany/oscar or pipe through tr '[:upper:]' '[:lower:]'.
Problem: Docker push denied.
Root Cause: Docker push to ghcr.io/CaptDany/oscar fails because the repository name has uppercase characters.
Solution: Use all-lowercase repository name: ghcr.io/captdany/oscar.
Problem: kubectl get certificate shows False/Ready after hours.
Root Cause: DNS for oscar-crm.cc doesn't resolve to the cluster IP, so Let's Encrypt can't complete HTTP-01 challenge.
Solution: Configure DNS first, then cert-manager will automatically complete the challenge. See DNS-TLS.
Problem: HTTP solver pods remain after failed challenge.
Solution: They're managed by cert-manager and will be cleaned up when the challenge succeeds, or you can delete them:
kubectl delete pod -n oscar-dev -l acme.cert-manager.io/http01-solver=trueProblem: Node.js 20 actions are deprecated. appears in workflow runs.
Context: GitHub Actions is phasing out Node.js 20. By June 16, 2026, all actions must use Node.js 24.
Fix: Update action versions when new versions ship with Node.js 24. Currently no action on our part.