Troubleshooting

CD Pipeline

"exec plugin: invalid apiVersion" (OCI CLI)

Problem: Kubeconfig has OCI exec plugin config but OCI CLI is not installed on the runner.

Solution: The workflow now builds kubeconfig programmatically from a static SA token instead of using OCI exec auth.

"gh secret set" corruption

Problem: gh secret set KUBECONFIG_DEV --body "$(<kubeconfig)" stores truncated/broken data.

Root Cause: Base64-encoded kubeconfig files are too large (~4000 chars) and gh secret set --body has intermittent failure with large payloads.

Solution:

Use a small SA token (~930 chars) instead of full kubeconfig

Build kubeconfig programmatically in the workflow:

kubectl config set-cluster ... --server=... --certificate-authority=...
kubectl config set-credentials ... --token=${{ secrets.SA_TOKEN }}
kubectl config set-context ... --cluster=... --user=...

"helm deploy" stuck beyond timeout

Problem: Deployment hangs for 5+ minutes then fails with context deadline exceeded.

Causes:

Missing ServiceAccount (pod can't be created)
Missing ghcr-pull secret (ImagePullBackOff)
Missing oscar-app-secret (CrashLoopBackOff)
Missing DATABASE_URL (connection refused)

Debug: The CD workflow has debug commands that run on failure, but to inspect interactively:

kubectl get pods -n oscar-dev
kubectl describe pod <pod-name> -n oscar-dev
kubectl logs <pod-name> -n oscar-dev

"Failed to ping database"

Problem: App starts but crashes after 5 seconds because database is unreachable.

Solution: Deploy Postgres manually or set postgresql.enabled=true in values:

# Manual deployment
kubectl run postgres --image=docker.io/library/postgres:16-alpine \
  --env=POSTGRES_PASSWORD=postgres \
  --env=POSTGRES_DB=opencrm \
  -n oscar-dev

OKE Cluster

CRI-O short name mode

Problem: ImagePullBackOff on official images like postgres:16-alpine.

Root Cause: OKE's CRI-O runtime refuses short image names unless they're in the public registry allowlist.

Solution: Always use full image path:

docker.io/library/postgres:16-alpine (not postgres:16-alpine)
docker.io/library/redis:7-alpine (not redis:7-alpine)
docker.io/bitnami/nginx-ingress-controller:latest (not nginx-ingress-controller:latest)

"exec: executable oci not found"

Problem: kubectl commands fail on the GitHub runner.

Root Cause: Kubeconfig references OCI exec plugin (command: oci) which doesn't exist on the runner.

Solution: Recreate kubeconfig using kubectl config commands with static SA token (see Deployment-Guide#Service Account Setup).

Kubernetes Objects

ServiceAccount not created

Problem: Helm deploy succeeds but no pods appear.

Root Cause: Helm chart was missing templates/serviceaccount.yaml. The deployment referenced serviceAccountName: oscar but no such SA existed.

Solution: Add the template — see deploy/helm/opencrm/templates/serviceaccount.yaml.

Ingress not created

Problem: Service is accessible internally but no ingress.

Root Cause: Helm chart was missing templates/ingress.yaml.

Solution: Add the template — see deploy/helm/opencrm/templates/ingress.yaml.

Docker Build

"invalid reference format"

Problem: Docker build fails with this error.

Root Cause: Git tags like v0.1.0+abc123 contain + which Docker rejects.

Solution: Replace + with - in the version string:

VERSION=$(echo $VERSION | sed 's/+/-/g')

"repository name must be lowercase"

Problem: Docker push fails.

Root Cause: ${{ github.repository }} expands to CaptDany/oscar (uppercase C).

Solution: Hardcode as captdany/oscar or pipe through tr '[:upper:]' '[:lower:]'.

"denied: installation not allowed" (ghcr.io)

Problem: Docker push denied.

Root Cause: Docker push to ghcr.io/CaptDany/oscar fails because the repository name has uppercase characters.

Solution: Use all-lowercase repository name: ghcr.io/captdany/oscar.

cert-manager

Certificate stuck at "pending"

Problem: kubectl get certificate shows False/Ready after hours.

Root Cause: DNS for oscar-crm.cc doesn't resolve to the cluster IP, so Let's Encrypt can't complete HTTP-01 challenge.

Solution: Configure DNS first, then cert-manager will automatically complete the challenge. See DNS-TLS.

cm-acme-http-solver pods stick around

Problem: HTTP solver pods remain after failed challenge.

Solution: They're managed by cert-manager and will be cleaned up when the challenge succeeds, or you can delete them:

kubectl delete pod -n oscar-dev -l acme.cert-manager.io/http01-solver=true

GitHub Actions

Node.js 20 deprecation warning

Problem: Node.js 20 actions are deprecated. appears in workflow runs.

Context: GitHub Actions is phasing out Node.js 20. By June 16, 2026, all actions must use Node.js 24.

Fix: Update action versions when new versions ship with Node.js 24. Currently no action on our part.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting

Troubleshooting

CD Pipeline

"exec plugin: invalid apiVersion" (OCI CLI)

"gh secret set" corruption

"helm deploy" stuck beyond timeout

"Failed to ping database"

OKE Cluster

CRI-O short name mode

"exec: executable oci not found"

Kubernetes Objects

ServiceAccount not created

Ingress not created

Docker Build

"invalid reference format"

"repository name must be lowercase"

"denied: installation not allowed" (ghcr.io)

cert-manager

Certificate stuck at "pending"

cm-acme-http-solver pods stick around

GitHub Actions

Node.js 20 deprecation warning

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally