Skip to content

feat(ca): step-ca backend behind --ca-backend=step-ca#56

Merged
kanywst merged 2 commits into
mainfrom
feat/ca-step-ca
May 12, 2026
Merged

feat(ca): step-ca backend behind --ca-backend=step-ca#56
kanywst merged 2 commits into
mainfrom
feat/ca-step-ca

Conversation

@kanywst
Copy link
Copy Markdown
Member

@kanywst kanywst commented May 12, 2026

Summary

Adds a second non-disk CA backend after Vault PKI, validating the ADR 0005 Plugin pattern against an upstream signer with a meaningfully different auth shape.

step-ca authenticates every /1.0/sign call with a one-time-token (OTT): a JWT signed by a JWK provisioner whose matching public JWK is configured in step-ca's ca.json. omega:

  1. Loads the provisioner's private JWK at startup (--ca-step-ca-provisioner-key-file).
  2. Probes GET /roots.pem and pins the SHA-256 of the first root.
  3. For every CSR, mints a fresh OTT whose claims bind the request to one SPIFFE ID (sans), one root (sha), and a 5-minute expiry. A leaked OTT cannot be replayed long after the fact.
  4. POSTs {csr, ott, notBefore, notAfter} to /1.0/sign, validates the returned leaf carries the requested SPIFFE ID + CSR's public key, and returns the assembled SVID.

X.509-SVID signing delegates to step-ca; JWT-SVID signing stays local (same ADR 0005 trade-off Vault PKI makes — per-token network signing would add a hop to every JWT validation, and the 5-minute JWT-SVID TTL makes that unattractive).

Trust anchors are cached behind the same Lock/check/Unlock/fetch/Lock/store pattern the Vault backend uses, so a transient step-ca blip serves the stale bundle rather than breaking every workload's handshake.

Scope layer

Plugin. Adds KindStepCA + step_ca.go to internal/server/identity/. No changes to the Authority interface — the second backend in this layer is the validation that the interface holds, exactly the goal of ADR 0005.

New flags

--ca-step-ca-url <URL>
--ca-step-ca-provisioner <name>
--ca-step-ca-provisioner-key-file <path>
--ca-step-ca-ca-cert <path>

Test plan

  • go test -race -count=1 ./internal/server/identity/ passes
  • New tests: TestStepCAIssueSVID_RoundTrip, TestStepCARejectsForeignTrustDomain, TestStepCAMissingConfigRejected, TestStepCARejectsPublicProvisionerJWK
  • The roundtrip test stands up a step-ca mock that actually verifies the OTT signature against the configured public JWK and pins the sha claim to the served root, so a regression that breaks OTT signing trips the test instead of silently working against a permissive mock
  • go vet ./... clean
  • markdownlint-cli2 CHANGELOG.md README.md docs/ca-plugin-guide.md0 error(s)

Docs

  • docs/ca-plugin-guide.md Step 7 standards-alignment table — bumped from "vault-pki implemented" to "vault-pki + step-ca implemented"
  • README.md Standards-alignment row — same
  • CHANGELOG.md Unreleased — new entry covering the flag set, the OTT pinning shape, and the JWT-stays-local rationale

Follow-up

A runnable examples/ca-step-ca/ demo with a mock-step-ca binary follows in a separate PR (same shape as examples/ca-vault-pki/). Keeping it out of this PR keeps the scope to the backend implementation itself.

omega gains a second non-disk CA backend after Vault PKI, validating
that the ADR 0005 Plugin pattern holds against an upstream signer
with a meaningfully different auth shape.

step-ca authenticates every /1.0/sign call with a one-time-token
(OTT): a JWT signed by a JWK provisioner whose matching public JWK
is configured on the step-ca side in ca.json. omega loads the
provisioner's private JWK at startup and mints a per-request OTT
that pins the SPIFFE ID via `sans`, the CA root via `sha`, and a
5-minute `exp` so a leaked OTT cannot be replayed long after the
fact. Trust anchors come from GET /roots.pem and are cached behind
the same Lock/check/Unlock/fetch/Lock/store dance the Vault backend
uses, so a transient step-ca blip serves the stale bundle rather
than breaking every workload's handshake.

X.509-SVID signing delegates to step-ca; JWT-SVID signing stays
local for the ADR 0005 reason that per-token network signing
would add a hop to every JWT validation and the 5-minute JWT-SVID
TTL makes that trade-off unattractive.

New flags:
- --ca-step-ca-url
- --ca-step-ca-provisioner
- --ca-step-ca-provisioner-key-file
- --ca-step-ca-ca-cert

Tests (internal/server/identity/step_ca_test.go) stand up a
step-ca mock that actually verifies the OTT signature against the
configured public JWK and pins the sha claim to the served root,
so a regression that breaks OTT signing trips the test instead of
silently working against a permissive mock.

A runnable examples/ca-step-ca/ demo follows in a separate PR to
keep the scope of this one focused on the backend itself.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Warning

Rate limit exceeded

@kanywst has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 34 minutes and 27 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9643ae90-9fcf-4b06-8a2f-13105e228a2c

📥 Commits

Reviewing files that changed from the base of the PR and between e9b44d3 and c28f4a4.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • README.md
  • docs/ca-plugin-guide.md
  • internal/cli/server.go
  • internal/server/identity/authority.go
  • internal/server/identity/step_ca.go
  • internal/server/identity/step_ca_test.go
  • internal/server/identity/vault_pki.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/ca-step-ca

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new CA backend for Smallstep step-ca, enabling X.509-SVID signing via the /1.0/sign endpoint using one-time tokens. The implementation includes the stepCAAuthority logic, new CLI flags, and comprehensive unit tests. Review feedback identifies a potential thundering herd issue in the BundlePEM refresh logic, where concurrent requests could overwhelm the upstream CA upon cache expiry. It is recommended to use singleflight to ensure only one refresh operation is in flight at a time.

Comment thread internal/server/identity/step_ca.go
Comment thread internal/server/identity/step_ca.go
gemini flagged the new step-ca BundlePEM as vulnerable to a
thundering herd: N concurrent IssueSVID callers across a TTL expiry
boundary would each trigger their own /roots.pem fetch. The same
shape exists in the older vault-pki backend, so fix both for
consistency.

Add a separate `refreshMu sync.Mutex` (not the existing
read/writeMu, which guards the cached bytes themselves) and use it
to serialise the refresh path:

  1. Fast path: bundleMu.RLock + freshness check, return defensive
     copy. Unchanged.
  2. Slow path: capture stale bytes, release bundleMu, acquire
     refreshMu, recheck the cache (some other goroutine may have
     refreshed while we were queued), only then fetch.

Net effect under load is one HTTP round-trip per TTL window
instead of one per concurrent caller, without adding a new
dependency (an explicit `singleflight` would work too but the
recheck pattern stays self-contained).
@kanywst kanywst merged commit 9050fea into main May 12, 2026
25 checks passed
@kanywst kanywst deleted the feat/ca-step-ca branch May 12, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant