Skip to content

fix: propagate probe fields and emit spec.storage to varnishd (#37)#38

Merged
jensens merged 2 commits intomainfrom
fix/probe-and-storage-propagation
Apr 18, 2026
Merged

fix: propagate probe fields and emit spec.storage to varnishd (#37)#38
jensens merged 2 commits intomainfrom
fix/probe-and-storage-propagation

Conversation

@jensens
Copy link
Copy Markdown
Member

@jensens jensens commented Apr 15, 2026

Summary

Two production bugs in v0.4.0 surfaced in #37, both dropping user spec on the floor:

  • Probe fields beyond .url were ignored. The backend template hardcoded interval=5s timeout=2s window=5 threshold=3. Prod saw healthy-but-slow Plone pods marked sick by the 2s timeout, which the shard director with healthy=CHOSEN turned into a 503 storm.
  • spec.storage never reached varnishd. The varnish container had no -s args, so varnishd used the stock image default -s malloc,100M. A spec requesting 1500M ran with 100M; 352k allocation failures, 24% hit ratio.

Fix

  • Generator resolves all backends[].probe fields (Interval, Timeout, Window, Threshold, ExpectedResponse) with the previous hardcoded values as defaults — no behaviour change for users who only set .url. expected_response is emitted only when set.
  • storageArgs() helper emits one -s <name>=<type>,<options> per spec.storage[] entry. Malloc → bytes. File → path,bytes. Unknown types are skipped (webhook already rejects them).

Tests

  • TestGenerate_ProbeFields_PropagatedToPerPodBackends — explicit spec values reach every per-pod backend.
  • TestGenerate_ProbeDefaults_AppliedWhenFieldsUnset — defaults match previously hardcoded values.
  • TestStorageArgs_{Malloc,File,Multiple,Empty} — the four interesting shapes.

Full go test ./... green.

Test plan (reviewer)

  • Install ghcr.io/bluedynamics/cloud-vinyl-operator:0.4.1 in a test cluster.
  • Apply a VinylCache with spec.storage: [{name: s0, type: malloc, size: 1500M}] and spec.backends[0].probe: {timeout: 5s, window: 10, threshold: 8}.
  • kubectl exec into the varnish pod: confirm varnishadm vcl.show -v shows the configured probe values and ps -f 1 shows -s s0=malloc,1500000000.

Refs #37

jensens added 2 commits April 15, 2026 15:37
Two v0.4.0 production bugs from #37:

- Probe fields besides URL were dropped: the backend template
  hardcoded interval=5s timeout=2s window=5 threshold=3 regardless
  of spec.backends[].probe values. Under load, a healthy-but-slow
  backend was marked sick by the 2s timeout, which the shard
  director with healthy=CHOSEN turned into a 503 storm. Fixed the
  generator to resolve all probe fields (with the same defaults
  as the previous hardcoded values) and threaded them through the
  template. ExpectedResponse is emitted only when set.

- spec.storage was persisted on the CR but never reached varnishd.
  The varnish container's Args had no -s flags, so varnishd fell
  through to the stock image default of -s malloc,100M. Production
  saw 352k allocation failures and a 24% hit ratio on a spec that
  asked for 1500M. Added storageArgs() helper emitting one -s
  <name>=<type>,<options> per spec.storage entry.

Refs #37
Local Claude Code session directory, not relevant to the repo.
Was flagged as an untracked change during every PR creation.
@jensens jensens merged commit 6be0c9d into main Apr 18, 2026
8 checks passed
@jensens jensens deleted the fix/probe-and-storage-propagation branch April 18, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant