docs: README rewrite + follow-up fixes from external review#29
Merged
Conversation
README split: the configuration mega-table moved into a new
docs/configuration.md grouped by purpose (required, listeners,
identity, signing, replay store, rate limits, resource management,
production posture, observability), with the alerting playbook +
PromQL recipes alongside it. The README itself drops to ~370 lines
and gains a TL;DR runtime contract + Requirements section near the
top so a new reader can deploy without scrolling.
Also addresses a batch of follow-up items from a review pass:
- Doc-drift fixes:
- threat-model row 12 referenced token/seal.go and
token/seal_test.go; actual paths are token/token.go and
token/fuzz_test.go + token/token_test.go.
- README endpoint table now flags /readyz as living on the
metrics listener (port 9090), not the public router —
matches the actual main.go wiring.
- Per-replica scope of the in-process rate limiter is now
called out in both README (production-posture section) and
docs/configuration.md (rate-limiting section), so an
operator sizing N replicas multiplies the per-endpoint
ceiling accordingly.
- Code/contract fixes:
- MCP_TOOL_METRICS_MAX_CARDINALITY=0 is now accepted as the
documented "disable the cap" sentinel. The consumer at
metrics.ToolCardinality.ToolLabel already gates on
MaxCardinality > 0, so the only blocker was the config
validator rejecting 0 with "must be a positive integer".
Negative values still fail. New regression test
TestLoad_ToolMetrics_ZeroDisablesCap pins the contract.
- Supply-chain pins:
- Dockerfile base images (golang:1.26-alpine and
gcr.io/distroless/static-debian13:nonroot) pinned by
sha256 digest so a base-image rebuild can't silently
change the published binary's environment.
- CI's govulncheck install pinned to v1.1.4 (was @latest).
- README polish:
- TL;DR uses an explicit <your-idp-secret> placeholder
instead of a literal ellipsis (copy-paste-safe), and a
reachable-Redis hostname is called out for readers running
outside the demo compose network.
- Configuration table cross-references the key-rotation
runbook from the TOKEN_SIGNING_SECRET row.
- OIDC_ALLOW_INSECURE_HTTP moved next to the other
PROD_MODE-gated relaxation toggles in
docs/configuration.md.
- MCP_PER_SUBJECT_CONCURRENCY split into its own
"Resource management" sub-section (it caps in-flight
requests, not request rate).
- Observability heading renamed for a clean GitHub anchor.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two-part change:
1. README rewrite
The README's 39-row configuration mega-table was unscannable. Split into:
docs/configuration.md(NEW): full env-var reference grouped by purpose — Required / Listeners & logging / Identity & authorization / Signing & rotation / Replay store / Rate limiting / Resource management / Production posture / Logging & observability — plus the alerting playbook and PromQL recipes that lived inline.A new reader can land on the README, deploy in 5 minutes, and dig into the reference doc only when they need to. The architecture diagram, sealed-state TTL table, endpoint table, and supply-chain verification sections all stay in the README.
2. Follow-up fixes from a review pass
Doc drift caught by a fresh-eyes review:
docs/threat-model.mdrow 12 referencedtoken/seal.goandtoken/seal_test.go; actual paths aretoken/token.go+token/fuzz_test.go+token/token_test.go./readyzendpoint in the README table was flagged as if public; it lives on the metrics listener (port 9090). Fixed.docs/configuration.md(rate-limiting section). Operators sizingNreplicas multiply the per-endpoint ceiling accordingly.Contract fix:
MCP_TOOL_METRICS_MAX_CARDINALITY=0is now accepted as the documented "disable the cap" sentinel. The consumer atmetrics.ToolCardinality.ToolLabelalready gates onMaxCardinality > 0; the only blocker was the config validator rejecting0with"must be a positive integer". Negative values still fail. New regression testTestLoad_ToolMetrics_ZeroDisablesCappins the contract.Supply-chain pins:
@sha256digest (golang:1.26-alpineandgcr.io/distroless/static-debian13:nonroot). A base-image rebuild can no longer silently change the published binary's environment.govulncheckinstall pinned tov1.1.4(was@latest).README polish:
<your-idp-secret>placeholder (copy-paste-safe) and a reachable Redis hostname.TOKEN_SIGNING_SECRETrow cross-references the key-rotation runbook.OIDC_ALLOW_INSECURE_HTTPmoved next to the otherPROD_MODE-gated relaxation toggles.MCP_PER_SUBJECT_CONCURRENCYsplit into a dedicated "Resource management" sub-section (it's a concurrency cap, not a rate limit).Items deferred to a follow-up PR
The same review flagged two items that need their own focused PR with regression tests:
TOKEN_SIGNING_SECRETentropy gate is structurally flawed. A 64-charopenssl rand -hex 32value is drawn from a 16-symbol alphabet — the expected number of distinct chars is ~15.75, so a real random secret can intermittently fail the< 16 distinct bytescheck while a patterned0123456789abcdef0123456789abcdef(exactly 16 distinct bytes) passes. False-positive AND false-negative on the same heuristic. Will replace with an obvious-pattern check + align docs onmanifests/scripts/generate-signing-secret.sh.N × per-replicamath.Test plan
test,lint,keycloak-e2e,fuzz-smoke,manifest-prod,govulncheck,build).go test ./config/ -run "ToolMetrics" -vpasses (incl. newZeroDisablesCapregression).