Skip to content

fix(e2e): fix cross-node networking in Kind cluster#29

Merged
jensens merged 1 commit intomainfrom
fix/e2e-networking
Apr 9, 2026
Merged

fix(e2e): fix cross-node networking in Kind cluster#29
jensens merged 1 commit intomainfrom
fix/e2e-networking

Conversation

@jensens
Copy link
Copy Markdown
Member

@jensens jensens commented Apr 9, 2026

Summary

Three fixes for the E2E dial tcp :9090 i/o timeout (#28):

1. Namespace label for NetworkPolicy

Label the operator namespace with vinyl.bluedynamics.eu/operator-namespace=true. Since Kind v0.24.0, kindnet enforces NetworkPolicies via kube-network-policies. The agent NetworkPolicy only allows ingress from namespaces with this label — without it, port 9090 is blocked.

2. TCP MTU probing

sudo sysctl -w net.ipv4.tcp_mtu_probing=1 before Kind cluster creation. GitHub Actions runners often have MTU < 1500 (VXLAN overhead), but kindnet's MTU detection is broken (kind #3940) — always sets 1500, causing silent packet drops on cross-node traffic.

3. Agent HTTP client timeout

Add explicit 30s timeout to the Go http.Client via --agent-client-timeout flag. Without this, the default client has no timeout, causing goroutines to hang indefinitely on unreachable agents.

Fixes #28

Test plan

  • go build ./... clean
  • Unit tests pass
  • All pre-commit hooks pass (fmt, vet, golangci-lint, shellcheck)
  • E2E Chainsaw tests (running in CI now)

🤖 Generated with Claude Code

Three fixes for the E2E dial tcp :9090 i/o timeout (#28):

1. Label operator namespace with vinyl.bluedynamics.eu/operator-namespace=true
   so the agent NetworkPolicy allows traffic from the operator. Since Kind
   v0.24.0, kindnet enforces NetworkPolicies — without this label, agent
   port 9090 is blocked.

2. Enable TCP MTU probing (sysctl net.ipv4.tcp_mtu_probing=1) before
   creating the Kind cluster. GitHub Actions runners often have MTU < 1500
   due to VXLAN encapsulation, and kindnet's MTU auto-detection is broken
   (always picks 1500). This causes silent packet drops on cross-node traffic.

3. Add explicit 30s timeout to the agent HTTP client. The Go default
   http.Client has no timeout, causing goroutines to hang indefinitely
   on unreachable agents instead of failing fast and retrying.

Fixes #28

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jensens jensens merged commit ef986fa into main Apr 9, 2026
7 of 8 checks passed
@jensens jensens deleted the fix/e2e-networking branch April 9, 2026 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

E2E: agent port 9090 unreachable in Kind cluster — dial tcp i/o timeout

1 participant