Skip to content

fix(e2e): expose Go in DevSpace#22

Merged
rowan-stein merged 5 commits into
mainfrom
noa/issue-21
May 26, 2026
Merged

fix(e2e): expose Go in DevSpace#22
rowan-stein merged 5 commits into
mainfrom
noa/issue-21

Conversation

@casey-brooks
Copy link
Copy Markdown
Contributor

Summary

Investigation

  • Run 26409523632 progressed past the deployment-name patch, then the apps pod never listened on :50051; probes reported connection refused and the pod entered CrashLoopBackOff.
  • Reproduced the dev container shell environment locally with ghcr.io/agynio/devcontainer-go:1: the image contains Go at /usr/local/go/bin/go, but sh -lc resolves a PATH without /usr/local/go/bin, so go is not found.
  • With the explicit patched PATH, the same container resolves both go and buf.

Test & Lint Summary

  • buf generate --template buf.gen.yaml: passed
  • go vet ./...: passed with no errors
  • go test ./...: 2 packages passed, 0 failed, 0 skipped; 4 packages had no test files
  • go build ./...: passed
  • helm dependency build charts/apps: passed
  • helm lint charts/apps: 1 chart linted, 0 failed; no lint errors
  • helm template apps charts/apps --set fullnameOverride=apps: passed and confirmed rendered deployment name apps
  • docker run --rm --env PATH='/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' --entrypoint sh ghcr.io/agynio/devcontainer-go:1 -c 'echo PATH=$PATH; command -v go; go version; command -v buf; buf --version': passed and confirmed go/buf are resolvable

@casey-brooks
Copy link
Copy Markdown
Contributor Author

Test & Lint Summary

  • buf generate --template buf.gen.yaml: passed
  • go vet ./...: passed with no errors
  • go test ./...: 2 packages passed, 0 failed, 0 skipped; 4 packages had no test files
  • go build ./...: passed
  • helm dependency build charts/apps: passed
  • helm lint charts/apps: 1 chart linted, 0 failed; no lint errors
  • helm template apps charts/apps --set fullnameOverride=apps: passed and confirmed rendered deployment name apps
  • docker run --rm --env PATH='/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' --entrypoint sh ghcr.io/agynio/devcontainer-go:1 -c 'echo PATH=$PATH; command -v go; go version; command -v buf; buf --version': passed and confirmed go/buf are resolvable

@noa-lucent
Copy link
Copy Markdown

Review tooling note: the gh pr-review extension is not installed in this environment (unknown command "pr-review"), so I cannot open a pending inline review via the required extension. I reviewed the PR manually and did not find any code issues to block this change.

The deployment name now matches the rendered fullnameOverride=apps deployment, and the patched container PATH explicitly includes /usr/local/go/bin, which addresses the reported DevSpace crash caused by go not being resolvable.

noa-lucent
noa-lucent previously approved these changes May 25, 2026
Copy link
Copy Markdown

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. I reviewed the DevSpace deployment patch and found no blocking issues.

Note: the gh pr-review extension is unavailable in this environment, so I could not submit through that extension, but I completed the review with the standard GitHub CLI.

@rowan-stein
Copy link
Copy Markdown
Collaborator

E2E is still failing with probe connection refused + CrashLoopBackOff on port 50051 (run 26411083483). Please inspect the pod logs from the e2e diagnostics to see why the container exits before binding. The PATH fix may not be taking effect in CI.

@casey-brooks
Copy link
Copy Markdown
Contributor Author

Follow-up Update

Pushed 94bf2bc to keep the DevSpace pod alive while source sync/build is still in progress.

Investigation

  • Run 26411083483 has no downloadable artifacts (gh run download and the Actions artifacts API both returned no artifacts), so there were no pod log/event diagnostics available to pull.
  • The job log still shows only DevSpace status messages: the patched deployment succeeds, then kubelet probes hit :50051 before the source-deployed service is listening, eventually causing CrashLoopBackOff and DevSpace timeout.
  • This means Kubernetes liveness is killing the source-deploy container during the bootstrap/sync/build window. DevSpace only starts sync after the selected pod is Ready, so TCP readiness on the app port creates a deadlock for source deploy.

Fix

  • The DevSpace patch now replaces liveness/readiness probes with lightweight exec probes so the pod can become Ready for DevSpace sync.
  • The pipeline still performs its own explicit :50051 health wait after start_dev, so E2E remains gated on the app actually listening before tests proceed.
  • Increased that explicit health wait from 120s to 300s to cover first-run source sync, protobuf generation, module downloads, and go run compilation.

Test & Lint Summary

  • buf generate --template buf.gen.yaml: passed
  • go vet ./...: passed with no errors
  • go test ./...: 2 packages passed, 0 failed, 0 skipped; 4 packages had no test files
  • go build ./...: passed
  • helm dependency build charts/apps: passed
  • helm lint charts/apps: 1 chart linted, 0 failed; no lint errors
  • helm template apps charts/apps --set fullnameOverride=apps: passed and confirmed rendered deployment name apps
  • docker run --rm --env PATH='/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' --entrypoint sh ghcr.io/agynio/devcontainer-go:1 -c 'command -v go; go version; command -v buf; buf --version': passed and confirmed go/buf are resolvable

@casey-brooks
Copy link
Copy Markdown
Contributor Author

Follow-up Update

Pushed f73b5c3 to PR #22.

Diagnostics / Crash Reason

  • Run 26411083483 had no downloadable artifacts (gh run download and the Actions artifacts API both returned zero artifacts), so pod logs/events were not available from that run.
  • The newer run on 94bf2bc (26412596934) exposed the actual failure in the job log: DevSpace completed initial sync, then lost the pod with command terminated with exit code 137.
  • That means the container process was being killed/restarted during the initial source sync/startup path, not just failing because of PATH.

Fix

  • The patched pod now runs sleep infinity as its stable container command so DevSpace can complete readiness, port-forward setup, and initial source sync without racing the app bootstrap process.
  • After start_dev returns, the pipeline explicitly runs protobuf generation and starts go run ./cmd/apps-service inside the already-synced container with nohup.
  • The existing explicit :50051 health wait remains the gate before E2E proceeds.
  • If that health wait fails, the pipeline prints /tmp/apps-service.log to expose the actual service crash reason in the Actions log.

Test & Lint Summary

  • buf generate --template buf.gen.yaml: passed
  • go vet ./...: passed with no errors
  • go test ./...: 2 packages passed, 0 failed, 0 skipped; 4 packages had no test files
  • go build ./...: passed
  • helm dependency build charts/apps: passed
  • helm lint charts/apps: 1 chart linted, 0 failed; no lint errors
  • helm template apps charts/apps --set fullnameOverride=apps: passed and confirmed rendered deployment name apps
  • docker run --rm --env PATH='/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' --entrypoint sh ghcr.io/agynio/devcontainer-go:1 -c 'command -v go; go version; command -v buf; buf --version; command -v sleep': passed and confirmed required tools are resolvable

@rowan-stein
Copy link
Copy Markdown
Collaborator

E2E still failing. Key log line: (run 26413137414). So PATH/go availability is still wrong in the source-deploy container. The exec-probe workaround helped DevSpace sync, but we must ensure the container actually has Go installed and/or set the correct PATH/GO binary location (may not be /usr/local/go/bin in this image).

@casey-brooks
Copy link
Copy Markdown
Contributor Author

Follow-up Update

Pushed 2d7cf01 to PR #22.

Diagnosis

Latest run 26413137414 reached source sync and attempted to start the service, but /tmp/apps-service.log showed:

nohup: failed to run command 'go': No such file or directory

So the shell used by exec_container still could not resolve go from PATH, even though the image has Go installed at /usr/local/go/bin/go.

Fix

  • Added startup diagnostics to echo PATH and print the selected Go binary/version.
  • Resolve Go explicitly at runtime, in this order:
    1. /usr/local/go/bin/go
    2. /root/.nix-profile/bin/go
    3. /home/godev/.nix-profile/bin/go
    4. command -v go
  • Start the service with nohup "${GO_BIN}" run ./cmd/apps-service ... instead of relying on bare go.
  • If Go is not found, print discovered */bin/go candidates before failing.

Test & Lint Summary

  • buf generate --template buf.gen.yaml: passed
  • go vet ./...: passed with no errors
  • go test ./...: 2 packages passed, 0 failed, 0 skipped; 4 packages had no test files
  • go build ./...: passed
  • helm dependency build charts/apps: passed
  • helm lint charts/apps: 1 chart linted, 0 failed; no lint errors
  • helm template apps charts/apps --set fullnameOverride=apps: passed and confirmed rendered deployment name apps
  • docker run --rm --entrypoint bash ghcr.io/agynio/devcontainer-go:1 -lc '<go-discovery-script>': passed and confirmed the same discovery logic selects /usr/local/go/bin/go and finds buf

@rowan-stein
Copy link
Copy Markdown
Collaborator

Bootstrap PR #530 merged (tracing-app chart bump). Please re-run the E2E check for this PR so it picks up the updated stack.

@rowan-stein
Copy link
Copy Markdown
Collaborator

I can’t trigger re-run via workflow_dispatch (workflow lacks it). Since bootstrap #530 is merged, please hit Re-run jobs for the E2E check on this PR so it provisions with the updated tracing-app chart.

@rowan-stein
Copy link
Copy Markdown
Collaborator

E2E still failing due to tracing-app Playwright CreateAgent availability serialization. Fix is in agynio/e2e PR #149; once merged, please re-run E2E on this PR.

@rowan-stein rowan-stein merged commit 7649f60 into main May 26, 2026
2 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

E2E: devspace deploy crashloop after patch (connection refused on :50051)

3 participants