Skip to content

fix: only mark VinylCache Ready when all replicas have VCL#25

Merged
jensens merged 4 commits intomainfrom
fix/multi-replica-readiness
Apr 9, 2026
Merged

fix: only mark VinylCache Ready when all replicas have VCL#25
jensens merged 4 commits intomainfrom
fix/multi-replica-readiness

Conversation

@jensens
Copy link
Copy Markdown
Member

@jensens jensens commented Apr 9, 2026

Summary

The operator was marking VinylCache as Ready=True after pushing VCL to any available pods, without verifying all requested replicas were up. This caused all multi-replica E2E tests to fail (#20).

Changes

  • calculatePhase(): Require ReadyPeers >= TotalPeers for PhaseReady
  • updateStatus(): Set VCLSynced=False + Progressing=True when only partial replicas have VCL
  • Reconciler: Requeue after 5s (not 5min) when not all replicas are ready

Before

1/3 pods ready → VCLSynced=True → Ready=True (wrong!)

After

1/3 pods ready → VCLSynced=False → Progressing=True → requeue 5s
2/3 pods ready → VCLSynced=False → Progressing=True → requeue 5s
3/3 pods ready → VCLSynced=True → Ready=True ✓

Fixes #24
Refs #20

Test plan

  • Unit tests pass (go test ./internal/controller/...)
  • go build ./... clean
  • All pre-commit hooks pass (fmt, vet, golangci-lint)
  • E2E Chainsaw tests (xkey-invalidation, scaling, cluster-routing, ha-operator)

🤖 Generated with Claude Code

The operator was marking VinylCache as Ready after pushing VCL to any
available pods, without checking that all requested replicas were up.
This caused all multi-replica E2E tests to fail.

Changes:
- calculatePhase: require ReadyPeers >= TotalPeers for PhaseReady
- updateStatus: set VCLSynced=False and Progressing=True when partial
- Reconciler: requeue after 5s when not all replicas ready (vs 5min)

Fixes #24
Refs #20

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jensens jensens force-pushed the fix/multi-replica-readiness branch from 147ebb6 to bf65be0 Compare April 9, 2026 21:33
jensens and others added 3 commits April 10, 2026 00:01
The CI workflow was building the agent binary as cloud-vinyl-varnish:dev
(wrong name) and never loading a real varnish image. The install script
also never passed the agent image to Helm, so the operator defaulted to
cloud-vinyl-agent:<appVersion> which didn't exist in Kind.

Fixes:
- Build agent as cloud-vinyl-agent:dev (correct name)
- Pull and tag stock varnish:7.6 as cloud-vinyl-varnish:dev
- Pass AGENT_IMAGE to Helm via install-operator.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There is no custom cloud-vinyl-varnish image — use stock varnish:7.6
directly in test fixtures. Pre-pull into Kind so pods don't need to
fetch from Docker Hub during tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jensens jensens merged commit 78a5751 into main Apr 9, 2026
7 of 8 checks passed
@jensens jensens deleted the fix/multi-replica-readiness branch April 9, 2026 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: operator marks VinylCache Ready before all replicas are up

1 participant