feat(ca): step-ca backend behind --ca-backend=step-ca#56
Conversation
omega gains a second non-disk CA backend after Vault PKI, validating that the ADR 0005 Plugin pattern holds against an upstream signer with a meaningfully different auth shape. step-ca authenticates every /1.0/sign call with a one-time-token (OTT): a JWT signed by a JWK provisioner whose matching public JWK is configured on the step-ca side in ca.json. omega loads the provisioner's private JWK at startup and mints a per-request OTT that pins the SPIFFE ID via `sans`, the CA root via `sha`, and a 5-minute `exp` so a leaked OTT cannot be replayed long after the fact. Trust anchors come from GET /roots.pem and are cached behind the same Lock/check/Unlock/fetch/Lock/store dance the Vault backend uses, so a transient step-ca blip serves the stale bundle rather than breaking every workload's handshake. X.509-SVID signing delegates to step-ca; JWT-SVID signing stays local for the ADR 0005 reason that per-token network signing would add a hop to every JWT validation and the 5-minute JWT-SVID TTL makes that trade-off unattractive. New flags: - --ca-step-ca-url - --ca-step-ca-provisioner - --ca-step-ca-provisioner-key-file - --ca-step-ca-ca-cert Tests (internal/server/identity/step_ca_test.go) stand up a step-ca mock that actually verifies the OTT signature against the configured public JWK and pins the sha claim to the served root, so a regression that breaks OTT signing trips the test instead of silently working against a permissive mock. A runnable examples/ca-step-ca/ demo follows in a separate PR to keep the scope of this one focused on the backend itself.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new CA backend for Smallstep step-ca, enabling X.509-SVID signing via the /1.0/sign endpoint using one-time tokens. The implementation includes the stepCAAuthority logic, new CLI flags, and comprehensive unit tests. Review feedback identifies a potential thundering herd issue in the BundlePEM refresh logic, where concurrent requests could overwhelm the upstream CA upon cache expiry. It is recommended to use singleflight to ensure only one refresh operation is in flight at a time.
gemini flagged the new step-ca BundlePEM as vulnerable to a
thundering herd: N concurrent IssueSVID callers across a TTL expiry
boundary would each trigger their own /roots.pem fetch. The same
shape exists in the older vault-pki backend, so fix both for
consistency.
Add a separate `refreshMu sync.Mutex` (not the existing
read/writeMu, which guards the cached bytes themselves) and use it
to serialise the refresh path:
1. Fast path: bundleMu.RLock + freshness check, return defensive
copy. Unchanged.
2. Slow path: capture stale bytes, release bundleMu, acquire
refreshMu, recheck the cache (some other goroutine may have
refreshed while we were queued), only then fetch.
Net effect under load is one HTTP round-trip per TTL window
instead of one per concurrent caller, without adding a new
dependency (an explicit `singleflight` would work too but the
recheck pattern stays self-contained).
Summary
Adds a second non-disk CA backend after Vault PKI, validating the ADR 0005 Plugin pattern against an upstream signer with a meaningfully different auth shape.
step-ca authenticates every
/1.0/signcall with a one-time-token (OTT): a JWT signed by a JWK provisioner whose matching public JWK is configured in step-ca'sca.json. omega:--ca-step-ca-provisioner-key-file).GET /roots.pemand pins the SHA-256 of the first root.sans), one root (sha), and a 5-minute expiry. A leaked OTT cannot be replayed long after the fact.{csr, ott, notBefore, notAfter}to/1.0/sign, validates the returned leaf carries the requested SPIFFE ID + CSR's public key, and returns the assembled SVID.X.509-SVID signing delegates to step-ca; JWT-SVID signing stays local (same ADR 0005 trade-off Vault PKI makes — per-token network signing would add a hop to every JWT validation, and the 5-minute JWT-SVID TTL makes that unattractive).
Trust anchors are cached behind the same
Lock/check/Unlock/fetch/Lock/storepattern the Vault backend uses, so a transient step-ca blip serves the stale bundle rather than breaking every workload's handshake.Scope layer
Plugin. Adds
KindStepCA+step_ca.gotointernal/server/identity/. No changes to theAuthorityinterface — the second backend in this layer is the validation that the interface holds, exactly the goal of ADR 0005.New flags
Test plan
go test -race -count=1 ./internal/server/identity/passesTestStepCAIssueSVID_RoundTrip,TestStepCARejectsForeignTrustDomain,TestStepCAMissingConfigRejected,TestStepCARejectsPublicProvisionerJWKshaclaim to the served root, so a regression that breaks OTT signing trips the test instead of silently working against a permissive mockgo vet ./...cleanmarkdownlint-cli2 CHANGELOG.md README.md docs/ca-plugin-guide.md—0 error(s)Docs
docs/ca-plugin-guide.mdStep 7 standards-alignment table — bumped from "vault-pki implemented" to "vault-pki + step-ca implemented"README.mdStandards-alignment row — sameCHANGELOG.mdUnreleased — new entry covering the flag set, the OTT pinning shape, and the JWT-stays-local rationaleFollow-up
A runnable
examples/ca-step-ca/demo with a mock-step-ca binary follows in a separate PR (same shape asexamples/ca-vault-pki/). Keeping it out of this PR keeps the scope to the backend implementation itself.