Skip to content

feat: BAN ACL + handler (#39)#42

Merged
jensens merged 2 commits intomainfrom
feat/ban-support
Apr 18, 2026
Merged

feat: BAN ACL + handler (#39)#42
jensens merged 2 commits intomainfrom
feat/ban-support

Conversation

@jensens
Copy link
Copy Markdown
Member

@jensens jensens commented Apr 18, 2026

Summary

spec.invalidation.ban was wired through the CRD + webhook validator but never reached the generated VCL. Users who set ban.enabled: true saw the field accepted on the CR and their allowedSources CIDRs validated, but BAN requests fell through to the regular cache-miss path — never invalidating anything.

Fix

When spec.invalidation.ban.enabled: true:

  • vinyl_ban_allowed ACL is emitted containing:
    • 127.0.0.1 (always)
    • the operator pod IP (when set — invalidation-proxy source, via #40)
    • entries from spec.invalidation.ban.allowedSources
  • vcl_recv BAN handler:
    • 403 if client.ip not in vinyl_ban_allowed
    • 400 if the X-Vinyl-Ban header is missing
    • ban(req.http.X-Vinyl-Ban) + synth(200, "Banned") otherwise

Default behaviour is unchanged — when ban.enabled is unset/false, neither the ACL nor the handler are emitted.

Usage

spec:
  invalidation:
    ban:
      enabled: true
      allowedSources:
        - "10.0.0.0/8"
# From any IP in the ACL:
curl -X BAN -H 'X-Vinyl-Ban: obj.http.X-Cache-Tag ~ "user-42"' http://varnish.svc/

Tests

  • TestGenerate_BAN_DisabledByDefault — ACL + handler absent when disabled.
  • TestGenerate_BAN_EnabledEmitsACLAndHandler — both present, correct shape.
  • TestGenerate_BAN_OperatorIPInACL — operator IP lands in BAN ACL too.
  • TestGenerate_BAN_AllowedSourcesHonored — user CIDRs present.
  • TestGenerate_BAN_EnabledButNotConfigured_OnlyLocalhost — minimal ACL shape.

Full go test ./... green.

Out of scope

  • Rate limiting (spec.invalidation.ban.rateLimitPerMinute): still inert. Left as follow-up in the issue — needs decision on mechanism (Varnish-side vmod_ratelimit vs. operator-proxy-side). BAN expressions are idempotent so unchecked rate is usually fine, but production may want enforcement.
  • Docs: no new how-to yet. The CRD reference table documents the field; a how-to example can follow.
  • E2E chainsaw test: the e2e Varnish image is agent-only (no real Varnish), so end-to-end BAN exercising isn't possible in CI today.

Closes #39

🤖 Generated with Claude Code

jensens added 2 commits April 18, 2026 15:36
…abled

Previously spec.invalidation.ban was persisted on the CR and its
AllowedSources validated by the webhook, but none of it reached the
generated VCL: no vinyl_ban_allowed ACL, no BAN handler in vcl_recv.
Users who set ban.enabled: true saw the field accepted but BAN
requests fell through to the regular cache-miss path.

Generator changes:
- HasBAN flag on TemplateData (from spec.invalidation.ban.enabled).
- vinyl_ban_allowed ACL emitted when HasBAN, populated with:
    127.0.0.1        (always)
    <operator IP>    (when set — invalidation proxy source)
    <allowedSources> (from spec.invalidation.ban.allowedSources)
- BAN handler added to vcl_recv:
    - 403 when client.ip not in vinyl_ban_allowed
    - 400 when X-Vinyl-Ban header missing
    - ban(req.http.X-Vinyl-Ban) + synth(200, "Banned") otherwise

Rate limiting (spec.invalidation.ban.rateLimitPerMinute) is still
inert — tracked as future work in #39 but deferred for the MVP.

Closes #39
…ting

Critical, blocking-for-v0.4.2 issues raised by the end-to-end code
review of the combined #40 + #41 + #42 changeset:

- C1: cmd/operator/main.go never read POD_IP, so the OperatorIP field
  that #40 plumbed through the generator was always empty in
  production. Added os.Getenv("POD_IP") at reconciler construction
  so the PURGE ACL actually contains the operator pod IP and
  invalidation-proxy requests pass the ACL.

- C2: the BAN handler emitted the deprecated bare ban() form — the
  project's own architecture docs explicitly require std.ban().
  Switched to std.ban() and added a std.ban_error() check that
  returns synth(400) with the compile error when the X-Vinyl-Ban
  expression is malformed. Previously a bad expression silently
  returned 200 with no invalidation. Regression-guard assertion
  locks in the shape.

- I2: ban-lurker-friendly x-url / x-host headers were only emitted
  when HasXkey. With BAN enabled but xkey disabled, users writing
  ban("obj.http.x-url ~ ...") would find the header missing and
  fall back to req.* bans that accumulate O(n*m). Widened the gate
  to {{- if or .HasXkey .HasBAN }} in both vcl_backend_response and
  vcl_deliver.

- I3: trim on the BAN block (cosmetic — no stray blank line).

- I4: docs/sources/reference/vinylcache-spec.md updated:
  * shard.by/healthy note clarifies cluster-peer is auto, per-backend
    is user-snippet.
  * New invalidation rows for ban.enabled, ban.allowedSources,
    ban.rateLimitPerMinute (latter explicitly marked inert).
  * Security note + usage example for the X-Vinyl-Ban header.

- M1/M5: BAN-handler test now extracts the handler block and asserts
  on that substring (was: flat-file substring matches that could
  match noise elsewhere). Added TestGenerate_BAN_BanLurkerHeadersEmitted
  and TestGenerate_BAN_RateLimit_CurrentlyInert to lock in the
  deferred rate-limit state.

- E2E: vcl-validation cleanup timeout bumped from default (30s) to
  60s — the flake that reappeared in #42's CI run was the same
  symptom this test hit before the v0.4.0 release.

Refs #18 #36 #39
@jensens
Copy link
Copy Markdown
Member Author

jensens commented Apr 18, 2026

Critical fixes from end-to-end review pushed in ed5b06f. Scope expanded beyond BAN-only:

Critical (were blockers for v0.4.2):

Important:

  • Ban-lurker-friendly x-url/x-host headers are now also emitted when HasBAN (previously only HasXkey). Prevents the O(n*m) ban accumulation pattern when users write ban("obj.http.x-url ...") with xkey disabled.
  • Template whitespace trim on the BAN block.
  • Reference docs updated: BAN fields documented, shard.by/healthy note clarified (cluster-peer auto, per-backend user-snippet), security caveat + X-Vinyl-Ban usage example added.

Test tightening:

  • BAN-handler test extracts the handler block and asserts on that substring (tighter).
  • Regression guard: bare ban(req.http...) must never be emitted.
  • New TestGenerate_BAN_BanLurkerHeadersEmitted and TestGenerate_BAN_RateLimit_CurrentlyInert.

Flake:

  • vcl-validation cleanup timeout bumped from default (30s) to 60s — the context deadline exceeded CI fail was the same symptom from before v0.4.0.

Still deferred (noted as future work):

  • spec.invalidation.ban.rateLimitPerMinute enforcement — needs design decision (Varnish vmod_vsthrottle vs operator-proxy-side).
  • Cluster-peer behaviour change (C3 from the review): by=HASH / healthy=CHOSEN as defaults instead of the previous hardcoded by=URL. This will go into the v0.4.2 release notes as a user-observable change.

CI rerun incoming.

@jensens jensens merged commit 31222df into main Apr 18, 2026
8 checks passed
@jensens jensens deleted the feat/ban-support branch April 18, 2026 14:27
jensens added a commit that referenced this pull request Apr 18, 2026
v0.4.2 added features that weren't reflected in the how-to / explanation docs:

- BAN now has two routes: via the operator invalidation proxy
  (X-Ban-Expression header, validated proxy-side) and directly to
  varnishd (X-Vinyl-Ban header, requires spec.invalidation.ban.enabled).
  purge-ban.md and invalidation.md now document both, including
  status-code semantics and the security caveat about arbitrary ban
  expressions from any IP in the ACL.
- The Varnish-side ACLs (vinyl_purge_allowed / vinyl_ban_allowed)
  introduced by #40 + #42 are now explained in invalidation.md
  alongside the pre-existing proxy-side allowedSources.
- create-cache.md was still showing backends with host+port — updated
  to the current serviceRef shape and the explicit image field.

No code changes.

Refs #18 #36 #39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement BAN support: CRD is wired but VCL has no handler or ACL

1 participant