admin: AdminForward integration for S3 admin ops (P2 slice 2b)#673
admin: AdminForward integration for S3 admin ops (P2 slice 2b)#673bootjp merged 3 commits intofeat/admin-s3-writesfrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 4m 3s —— View job PR Review: admin: AdminForward integration for S3 admin ops (P2 slice 2b)Todo
SummaryThe PR correctly implements follower→leader forwarding for the three S3 admin write operations, faithfully mirrors the Dynamo forwarding contract, and includes a solid test suite (19 new tests). The wiring change in Issues Found1.
|
There was a problem hiding this comment.
Code Review
This pull request introduces S3 bucket administrative operation forwarding, enabling follower nodes to delegate bucket creation, deletion, and ACL modifications to the leader via gRPC. The implementation includes updates to the gRPC client and server, Protobuf definitions, and the S3 HTTP handler. Feedback was provided to align the validation and decoding logic in the handleCreateBucket handler with the existing HTTP path to ensure consistency.
| if bytes.IndexByte(payload, 0) >= 0 { | ||
| return rejectForward(http.StatusBadRequest, "invalid_body", | ||
| "create-bucket payload contains a NUL byte") | ||
| } | ||
| dec := json.NewDecoder(bytes.NewReader(payload)) | ||
| dec.DisallowUnknownFields() | ||
| var body CreateBucketRequest | ||
| if err := dec.Decode(&body); err != nil { | ||
| return rejectForward(http.StatusBadRequest, "invalid_body", | ||
| "create-bucket payload is not valid JSON") | ||
| } | ||
| if dec.More() { | ||
| return rejectForward(http.StatusBadRequest, "invalid_body", | ||
| "create-bucket payload has trailing data") | ||
| } | ||
| if strings.TrimSpace(body.BucketName) == "" { | ||
| return rejectForward(http.StatusBadRequest, "invalid_body", | ||
| "bucket_name is required") | ||
| } |
There was a problem hiding this comment.
The validation and decoding logic in handleCreateBucket diverges from the leader-direct HTTP path (handleCreate in s3_handler.go). Reusing the existing validation helper ensures consistency. Additionally, ensure the request body size is limited using http.MaxBytesReader to prevent Denial of Service attacks from oversized payloads, as required by repository security rules.
| if bytes.IndexByte(payload, 0) >= 0 { | |
| return rejectForward(http.StatusBadRequest, "invalid_body", | |
| "create-bucket payload contains a NUL byte") | |
| } | |
| dec := json.NewDecoder(bytes.NewReader(payload)) | |
| dec.DisallowUnknownFields() | |
| var body CreateBucketRequest | |
| if err := dec.Decode(&body); err != nil { | |
| return rejectForward(http.StatusBadRequest, "invalid_body", | |
| "create-bucket payload is not valid JSON") | |
| } | |
| if dec.More() { | |
| return rejectForward(http.StatusBadRequest, "invalid_body", | |
| "create-bucket payload has trailing data") | |
| } | |
| if strings.TrimSpace(body.BucketName) == "" { | |
| return rejectForward(http.StatusBadRequest, "invalid_body", | |
| "bucket_name is required") | |
| } | |
| if bytes.IndexByte(payload, 0) >= 0 { | |
| return rejectForward(http.StatusBadRequest, "invalid_body", "request body contains a NUL byte") | |
| } | |
| dec := json.NewDecoder(bytes.NewReader(payload)) | |
| dec.DisallowUnknownFields() | |
| var body CreateBucketRequest | |
| if err := dec.Decode(&body); err != nil { | |
| return rejectForward(http.StatusBadRequest, "invalid_body", "request body is not valid JSON") | |
| } | |
| if dec.More() { | |
| return rejectForward(http.StatusBadRequest, "invalid_body", "request body has trailing data after the first JSON value") | |
| } | |
| if err := validateCreateBucketRequest(body); err != nil { | |
| return rejectForward(http.StatusBadRequest, "invalid_body", err.Error()) | |
| } |
References
- When handling HTTP requests, always limit the size of the request body (e.g., using http.MaxBytesReader) to prevent Denial of Service attacks from oversized payloads.
Three findings from claude + Gemini review on 70213e0: 1) **Issue 1 — `logUnexpectedSourceError` slog key was "table"** When called for bucket operations the field key was `"table"` but the value was a bucket name. Log queries on `table=` would find spurious bucket-error entries; queries on `bucket=` would miss the audit lines entirely. Renamed the parameter and the slog key to `resource` so the same forensic query works for both resource families. 2) **Gemini security-high + Claude Issue 2 — validation divergence** `handleCreateBucket` only checked `strings.TrimSpace(name) == ""` while the HTTP path's `validateCreateBucketRequest` also rejects whitespace-padded names like `" bucket "`. The forward path would have accepted them, then hit the adapter's `validateS3BucketName` with a less actionable error message — different SPA behaviour depending on whether the request was leader-direct or follower-forwarded. Fix: call `validateCreateBucketRequest(body)` in `handleCreateBucket` exactly like `decodeCreateTableRequest` is shared between the table-side handlers. 3) **Issue 3 — only CREATE_BUCKET tested for nil-BucketsSource → 501** `DELETE_BUCKET` and `PUT_BUCKET_ACL` had identical `if s.buckets == nil` guards but no coverage. Replaced `TestForwardServer_CreateBucket_NoBucketsSourceReturns501` with a table-driven `TestForwardServer_BucketOps_NoBucketsSourceReturns501` sweeping all three operations. A future op added without the nil guard fails CI immediately. Plus a new `TestForwardServer_CreateBucket_RejectsWhitespacePaddedName` that pins the validation-parity fix from #2. Rebased onto the latest `feat/admin-s3-writes` (which now carries the slice 2a review fixes) so the stack stays clean.
70213e0 to
e5ef0a9
Compare
|
@claude review e5ef0a9 (rebased onto the updated #669 branch) addresses all three findings:
Build / vet / lint / admin tests all clean. |
|
Claude finished @bootjp's task in 1m 52s —— View job Follow-up Review: Verifying fixes from e5ef0a9Todo
All three findings from the previous review are correctly addressed in e5ef0a9. Issue 1 — slog key
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 70213e047c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if deps.buckets != nil { | ||
| srv = srv.WithBucketsSource(deps.buckets) |
There was a problem hiding this comment.
Register AdminForward for S3-only deployments
This S3 wiring block is unreachable whenever Dynamo is disabled because readyForRegistration() still requires tables != nil; in a cluster started with --dynamoAddr empty but S3 enabled, leaders never register AdminForward, so follower-side S3 writes now attempt forwarding and consistently fail with gRPC Unimplemented (surfacing as 503) instead of transparently reaching the leader. Allow registration when roles exist and at least one source (tables or buckets) is present, and keep unsupported ops returning 501.
Useful? React with 👍 / 👎.
| if strings.TrimSpace(body.BucketName) == "" { | ||
| return rejectForward(http.StatusBadRequest, "invalid_body", | ||
| "bucket_name is required") |
There was a problem hiding this comment.
Enforce create-bucket validation parity on forwarded calls
The forwarded create-bucket handler only checks TrimSpace(bucket_name) == "" and skips validateCreateBucketRequest, so requests with leading/trailing whitespace can take a different path than leader-direct HTTP (which rejects them as invalid_body before hitting the source). This breaks the stated follower/leader response parity and can become a functional divergence if a BucketsSource implementation is less strict than the current adapter.
Useful? React with 👍 / 👎.
Two findings from claude review #675: 1. **P2 was marked ✅ shipped but #669 / #673 are still in flight.** PR description correctly noted slices 2a/2b haven't merged, but the table cell read "✅ shipped". Future readers reading the table alone would conclude main has the S3 write paths today, which is wrong. Fixed: P2 is now 🟡 partial with the in-flight PRs explicitly called out, mirroring the P4 row's shape. 2. **P1 row claimed criteria 1–6 but criterion 5 is in the outstanding items as deferred.** The two cells contradicted each other. Updated P1 to "criteria 1–4 + 6 (criterion 5 deferred — see outstanding items)" so the table is consistent with the outstanding-items section below. The cosmetic notes (Last-updated field, verbose Status header) are left for a future cleanup; they are not factually wrong, only non-conforming to the README convention. Calling them out as a follow-up rather than baking the precedent into this doc.
ddc3b9b to
5a67834
Compare
P4 deliverable from docs/design/2026_04_24_proposed_admin_dashboard.md Section 8: a single self-contained operator-facing reference for the admin HTTP listener. Covers: - Quick-start invocation for a loopback dev cluster - Required + optional flag reference, with explanations of why each guard exists (TLS hard-error, rolling-update caveats, etc.) - TLS topologies (loopback / TLS / discouraged plaintext-non-loopback) - Role model + how live role re-validation works on every state- changing request - The full /admin/api/v1/* surface (auth + cluster + dynamo + s3, including the slice 2 write paths and the AdminForward forwarding contract) - forwarded_from audit log shape and why it carries the follower's node ID - Troubleshooting guide for the common failure modes operators hit during initial bring-up (missing credentials, TLS hard-error, 401 ambiguity, 503 leader_unavailable, bucket_not_empty, blank-screen / placeholder bundle) - Cross-references to the design doc and architecture overview The Section 8 P4 plan also called out "TLS, read-only role, CSRF" as deliverables; those are already implemented (see config.go's validateTLS / validateAccessKeyRoles, the role gates in DynamoHandler.principalForWrite + S3Handler.principalForWrite, and the CSRFDoubleSubmit middleware. This doc stitches them into a single reference operators can land on without reading code. Stacked on #669 (P2 slice 2a) + #673 (P2 slice 2b) so the API- surface table can describe S3 write endpoints as shipped. Once both land in main, this rebases cleanly. EOF )
Slice 2b of P2 (docs/design/2026_04_24_proposed_admin_dashboard.md
Section 3.3.2 + 4.1): a follower-side S3 admin write
(POST /buckets, PUT /buckets/{name}/acl, DELETE /buckets/{name})
now hands off to the leader transparently, completing the same
end-to-end forwarding contract Dynamo writes received in #644 +
Stacked on #669 (P2 slice 2a). Once #669 merges, this rebases
cleanly onto main.
Proto (proto/admin_forward.proto):
- Three new ADMIN_OP enum values (CREATE_BUCKET / DELETE_BUCKET /
PUT_BUCKET_ACL) appended after the Dynamo block so existing
wire-format integers stay stable. Regenerated with the pinned
protoc 29.3 / protoc-gen-go 1.36.11 / protoc-gen-go-grpc 1.6.1.
Leader-side ForwardServer (internal/admin/forward_server.go):
- WithBucketsSource lets deployments wire the S3 dispatcher
optionally — Dynamo-only builds keep the receiver nil and the
three new operations return 501 NotImplemented.
- Three new dispatch arms: handleCreateBucket / handleDeleteBucket /
handlePutBucketAcl. Each one mirrors the leader-direct HTTP
path's payload contract (NUL-byte rejection, 64 KiB limit,
DisallowUnknownFields, trailing-token rejection, slash-in-name
rejection) so a hostile follower cannot smuggle a payload past
validations the leader-direct path enforces.
- forwardBucketsErrorResponse mirrors forwardErrorResponse on the
Dynamo side: ErrBucketsForbidden / NotLeader / NotFound /
AlreadyExists / NotEmpty + ValidationError each map to the same
HTTP status the leader-direct writeBucketsError produces, so
forwarded and leader-direct responses are byte-for-byte
indistinguishable from the SPA's view.
- isStructuredSourceError extended to recognise the bucket
sentinels so they are NOT logged at LevelError on the leader.
- Forward's switch was extracted into dispatchForward to keep the
parent function under cyclop's 10-branch ceiling as the operation
enum grew.
Follower-side LeaderForwarder (internal/admin/forward_client.go):
- Interface gains ForwardCreateBucket / ForwardDeleteBucket /
ForwardPutBucketAcl. PutBucketAcl carries both the bucket name
(from the URL path) and the new ACL (from the request body) in
one JSON payload — same approach handleDeleteBucket takes for
the bucket name.
- gRPCForwardClient methods reuse the existing forward() helper
for transport, so connection-cache reuse and ErrLeaderUnavailable
signalling behave identically across resource types.
Handler integration (internal/admin/s3_handler.go):
- New `forwarder LeaderForwarder` field + WithLeaderForwarder method.
- handleCreate / handlePutACL / handleDelete now consult tryForward*
helpers when the source returned ErrBucketsNotLeader; the helpers
are gated on `errors.Is(err, ErrBucketsNotLeader) && forwarder != nil`
so a leader-direct rejection (already-exists, not-found, etc.) is
never re-applied at the leader.
- writeForwardResult / writeForwardFailure mirror the Dynamo handler's
pattern: nosniff + Cache-Control:no-store + Retry-After:1 on 503.
ErrLeaderUnavailable does NOT log at LevelError (elections are
routine); transport errors do log so operators can investigate.
Wiring (main.go + main_admin_forward.go):
- adminForwardServerDeps gains a `buckets` field; readyForRegistration
still requires only TablesSource + RoleStore (so cluster-only or
Dynamo-only builds keep registering Dynamo forwarding without S3).
- runtimeServerRunner.start() now creates *adapter.S3Server BEFORE
startRaftServers (in addition to dynamoServer) so the leader-side
ForwardServer registration sees both adapters. The reorder is safe:
each adapter listens on its own address and the raft TCP listeners
are independent.
- ServerDeps.Forwarder now plumbs through buildS3HandlerForDeps too,
so the follower's S3Handler picks up the same LeaderForwarder
instance the Dynamo handler does.
Tests:
- 9 forward-server tests covering the three new bucket operations:
happy path / no-BucketsSource→501 / bad-JSON 400 / already-exists
409 / not-empty 409 / slash-in-name 400 / missing-acl 400 /
payload-too-large 413 (sweep over all three ops).
- 4 forward-client tests covering ForwardCreateBucket /
ForwardDeleteBucket / ForwardPutBucketAcl happy-path payload
shapes + ErrLeaderUnavailable on no-leader.
- stubLeaderForwarder gains 3 bucket-side forward methods so
existing dynamo tests still satisfy the LeaderForwarder
interface, and the new stub fields let bucket-handler tests
verify the forward arguments.
- 6 handler integration tests on S3Handler.tryForward*: forwarded
create / delete / put-acl happy paths (replay leader's status
+ payload + content-type), forwarder ErrLeaderUnavailable → 503
+ Retry-After, transport-error → 503 + no leakage, and a 3-axis
gate sweep proving the forwarder is NOT invoked on
AlreadyExists / Forbidden / generic source errors.
Closes design 3.3.2 acceptance criteria 2 (transparent forwarding)
+ 6 (forwarded_from in audit log) for S3 admin writes; criterion 3
(election-period 503 + retry) is also live for S3 because the
existing tryForward helpers reuse the same fallback paths.
Three findings from claude + Gemini review on 70213e0: 1) **Issue 1 — `logUnexpectedSourceError` slog key was "table"** When called for bucket operations the field key was `"table"` but the value was a bucket name. Log queries on `table=` would find spurious bucket-error entries; queries on `bucket=` would miss the audit lines entirely. Renamed the parameter and the slog key to `resource` so the same forensic query works for both resource families. 2) **Gemini security-high + Claude Issue 2 — validation divergence** `handleCreateBucket` only checked `strings.TrimSpace(name) == ""` while the HTTP path's `validateCreateBucketRequest` also rejects whitespace-padded names like `" bucket "`. The forward path would have accepted them, then hit the adapter's `validateS3BucketName` with a less actionable error message — different SPA behaviour depending on whether the request was leader-direct or follower-forwarded. Fix: call `validateCreateBucketRequest(body)` in `handleCreateBucket` exactly like `decodeCreateTableRequest` is shared between the table-side handlers. 3) **Issue 3 — only CREATE_BUCKET tested for nil-BucketsSource → 501** `DELETE_BUCKET` and `PUT_BUCKET_ACL` had identical `if s.buckets == nil` guards but no coverage. Replaced `TestForwardServer_CreateBucket_NoBucketsSourceReturns501` with a table-driven `TestForwardServer_BucketOps_NoBucketsSourceReturns501` sweeping all three operations. A future op added without the nil guard fails CI immediately. Plus a new `TestForwardServer_CreateBucket_RejectsWhitespacePaddedName` that pins the validation-parity fix from #2. Rebased onto the latest `feat/admin-s3-writes` (which now carries the slice 2a review fixes) so the stack stays clean.
e5ef0a9 to
cafac05
Compare
P4 deliverable from docs/design/2026_04_24_proposed_admin_dashboard.md Section 8: a single self-contained operator-facing reference for the admin HTTP listener. Covers: - Quick-start invocation for a loopback dev cluster - Required + optional flag reference, with explanations of why each guard exists (TLS hard-error, rolling-update caveats, etc.) - TLS topologies (loopback / TLS / discouraged plaintext-non-loopback) - Role model + how live role re-validation works on every state- changing request - The full /admin/api/v1/* surface (auth + cluster + dynamo + s3, including the slice 2 write paths and the AdminForward forwarding contract) - forwarded_from audit log shape and why it carries the follower's node ID - Troubleshooting guide for the common failure modes operators hit during initial bring-up (missing credentials, TLS hard-error, 401 ambiguity, 503 leader_unavailable, bucket_not_empty, blank-screen / placeholder bundle) - Cross-references to the design doc and architecture overview The Section 8 P4 plan also called out "TLS, read-only role, CSRF" as deliverables; those are already implemented (see config.go's validateTLS / validateAccessKeyRoles, the role gates in DynamoHandler.principalForWrite + S3Handler.principalForWrite, and the CSRFDoubleSubmit middleware. This doc stitches them into a single reference operators can land on without reading code. Stacked on #669 (P2 slice 2a) + #673 (P2 slice 2b) so the API- surface table can describe S3 write endpoints as shipped. Once both land in main, this rebases cleanly. EOF )
Latest claude review on #675 flagged that lines 3 and 6 still described P2 as fully shipped: Line 3: "P1, **P2**, and P3 of the implementation plan have shipped" Line 6: "after P1 + **P2** + P3 landed" But the status table I added in 7617daf correctly marks P2 as 🟡 partial (read-only slice 1 in #658 landed; write paths #669 and #673 still in flight). The header prose contradicting the table is exactly the kind of inconsistency that triggers the rename to `_partial_` in the first place — readers skimming the top of the doc would conclude P2 was done. Rewrote the Status line and the parenthetical on Last updated to match the table, calling out P1 + P3 as fully shipped, P2 as the read-only slice only, and P4 as TLS/role/CSRF live with the operator doc landing now. Pointed readers at the per-phase status table for the breakdown. No other changes; the table itself was already correct.
Codex on PR #673 caught that `readyForRegistration()` predicate gated registration on `tables != nil`, which meant a cluster started with `--dynamoAddr` empty but S3 enabled never registered the AdminForward gRPC service at all. Followers attempting to forward S3 admin writes (CREATE_BUCKET / DELETE_BUCKET / PUT_BUCKET_ACL) hit gRPC `Unimplemented` and surfaced as 503 to the SPA, instead of transparently reaching the leader. The S3 write path was advertised as "follower-forwarded" by P2 slice 2b but actually only worked when Dynamo was also configured. Fix is symmetric: relax the predicate to require RoleStore plus **at least one** source (TablesSource or BucketsSource) and have the dispatcher reject ops for the missing surface with the same 501 it already returns when the inverse source is nil. Per CLAUDE.md "test the bug first" convention, add the regression test before the fix: TestForwardServer_DynamoOps_NoTablesSourceReturns501 — symmetric with the existing TestForwardServer_BucketOps_NoBucketsSourceReturns501. With nil source, CREATE_TABLE used to slip past the source nil-deref because validation rejected the payload first; DELETE_TABLE panicked on `s.source.AdminDeleteTable`. Both now return 501. Plus three readyForRegistration assertions: - buckets-only deployment registers (the regression case) - missing-roles + buckets-only does NOT register - tables-only / both-sources continue to register While in there, the per-op nil-source guards I added tipped the linter's dupl detector. Refactored the dispatch path to keep forward_server tidy: 1. **`checkOpAvailability`**: pulls the per-op nil-source switch out of `dispatchForward` so the latter stays under the cyclomatic ceiling. One place to update when a future op joins the enum, instead of five. 2. **`decodeNamedPayload`**: the {"name": "..."} JSON shape is shared between Dynamo handleDelete and S3 handleDeleteBucket (NUL guard, DisallowUnknownFields, dec.More(), empty-name reject, slash reject — every guard had a paired comment citing the original review that introduced it). Centralising the helper keeps that audit trail in one place; the opLabel parameter ("delete" / "delete-bucket") preserves the path-specific wording in error responses. 3. **`auditDeleteSuccess`**: handleDelete and handleDeleteBucket each emitted a 7-line `admin_audit` block that differed only in the operation label and the `table` vs `bucket` field name. Folded into one helper so a future delete-shaped op automatically picks up the same field set. 4. **`notImplementedForwardResponse`**: deduplicates the 501 construction across the five Dynamo + S3 op cases. No behaviour change to any of the existing successful paths; all forward_server tests pass unchanged. Tests added: the S3-only registration assertion + the new symmetric DynamoOps_NoTablesSourceReturns501 sweep. go test -race ./internal/admin/ . — passes. golangci-lint run ./internal/admin/... . — 0 issues.
|
@claude review Addressed the Codex P1 finding on
The gemini security-high finding on Latest commit: f3e9278.
|
|
Claude finished @bootjp's task in 2m 12s —— View job Follow-up Review: Verifying Codex P1 fix (f3e9278)Todo
All three changes from the trigger comment are correctly implemented. Details: Codex P1 —
|
P4 deliverable from docs/design/2026_04_24_proposed_admin_dashboard.md Section 8: a single self-contained operator-facing reference for the admin HTTP listener. Covers: - Quick-start invocation for a loopback dev cluster - Required + optional flag reference, with explanations of why each guard exists (TLS hard-error, rolling-update caveats, etc.) - TLS topologies (loopback / TLS / discouraged plaintext-non-loopback) - Role model + how live role re-validation works on every state- changing request - The full /admin/api/v1/* surface (auth + cluster + dynamo + s3, including the slice 2 write paths and the AdminForward forwarding contract) - forwarded_from audit log shape and why it carries the follower's node ID - Troubleshooting guide for the common failure modes operators hit during initial bring-up (missing credentials, TLS hard-error, 401 ambiguity, 503 leader_unavailable, bucket_not_empty, blank-screen / placeholder bundle) - Cross-references to the design doc and architecture overview The Section 8 P4 plan also called out "TLS, read-only role, CSRF" as deliverables; those are already implemented (see config.go's validateTLS / validateAccessKeyRoles, the role gates in DynamoHandler.principalForWrite + S3Handler.principalForWrite, and the CSRFDoubleSubmit middleware. This doc stitches them into a single reference operators can land on without reading code. Stacked on #669 (P2 slice 2a) + #673 (P2 slice 2b) so the API- surface table can describe S3 write endpoints as shipped. Once both land in main, this rebases cleanly. EOF )
Per `docs/design/README.md`'s lifecycle convention, the admin dashboard design doc is now "partial": - **P1** (DynamoDB CRUD + AdminForward) — shipped via #634, #635, #644, #648 - **P2** (S3 buckets list/create/delete/ACL + DescribeTable) — shipped via #658, with #669 + #673 in flight - **P3** (React SPA + embed) — shipped via #649, #650 - **P4** (TLS / role / CSRF / operator docs) — TLS, role, CSRF are already live in P1; operator docs in #674 Independent of the in-flight slice 2 PRs (#669/#673) and the docs PR (#674) — this rename only reflects what is already on main today, plus an "Implementation status" table mapping each phase to the PR it landed in. ## What this PR changes - `git mv` the design doc from `2026_04_24_proposed_admin_dashboard.md` to `2026_04_24_partial_admin_dashboard.md` so its history follows - Add an "Implementation status" header table indexing each phase to the PRs that landed it - List the outstanding open items so future readers know what is still owed against the original proposal: - AdminForward acceptance criterion 5 (rolling-upgrade compat flag) — deferred - S3 object browser — explicitly out of scope per Section 2 Non-goals - TLS cert hot-reload — restart-to-rotate is the documented model When the rolling-upgrade flag lands, the doc gets renamed once more to `2026_04_24_implemented_admin_dashboard.md` per the README's lifecycle convention.
P4 deliverable from `docs/design/2026_04_24_proposed_admin_dashboard.md` Section 8: a single self-contained operator-facing reference for the admin HTTP listener. **Stacked on #669 (slice 2a) + #673 (slice 2b)** so the API-surface table can describe S3 write endpoints as shipped. Rebases cleanly onto main once those land. ## Sections - **Quick start** — minimal loopback-dev invocation - **Configuration reference** — required vs optional flags, with the rationale for each guard - **Hard-error startup conditions** — the explicit cases where the process refuses to start (missing creds, partial TLS, non-loopback without TLS, role-list overlap) - **TLS setup** — loopback / TLS / discouraged plaintext-non-loopback topologies - **Roles** — read-only vs full + how live role re-validation works on every state-changing request - **API surface** — full `/admin/api/v1/*` table including the slice 2 S3 write endpoints - **Follower → leader forwarding** — what the SPA sees, the rolling-update caveat, and the election-period 503 + Retry-After contract - **Audit log** — `admin_audit` slog shape and the `forwarded_from` field - **Troubleshooting** — the common bring-up failures (missing creds, TLS hard-error, 401 ambiguity, 503 leader_unavailable, bucket_not_empty, placeholder bundle blank-screen) ## What is NOT in this PR Section 8's P4 plan also called out "TLS, read-only role, CSRF" as deliverables; those are already implemented in main (see `config.go`'s `validateTLS` / `validateAccessKeyRoles`, the role gates in `DynamoHandler.principalForWrite` + `S3Handler.principalForWrite`, and the `CSRFDoubleSubmit` middleware). This doc stitches them into a single reference operators can land on without reading code. ## Test plan - [x] Markdown renders cleanly (manual check) - [x] Cross-references match real file paths (design doc, architecture overview, proto file) - [x] Flag names match `main.go`'s flag definitions verbatim - [x] The `bucket_not_empty` 409 response shape matches `S3Handler.writeBucketsError`
Three findings from claude + Gemini review on 70213e0: 1) **Issue 1 — `logUnexpectedSourceError` slog key was "table"** When called for bucket operations the field key was `"table"` but the value was a bucket name. Log queries on `table=` would find spurious bucket-error entries; queries on `bucket=` would miss the audit lines entirely. Renamed the parameter and the slog key to `resource` so the same forensic query works for both resource families. 2) **Gemini security-high + Claude Issue 2 — validation divergence** `handleCreateBucket` only checked `strings.TrimSpace(name) == ""` while the HTTP path's `validateCreateBucketRequest` also rejects whitespace-padded names like `" bucket "`. The forward path would have accepted them, then hit the adapter's `validateS3BucketName` with a less actionable error message — different SPA behaviour depending on whether the request was leader-direct or follower-forwarded. Fix: call `validateCreateBucketRequest(body)` in `handleCreateBucket` exactly like `decodeCreateTableRequest` is shared between the table-side handlers. 3) **Issue 3 — only CREATE_BUCKET tested for nil-BucketsSource → 501** `DELETE_BUCKET` and `PUT_BUCKET_ACL` had identical `if s.buckets == nil` guards but no coverage. Replaced `TestForwardServer_CreateBucket_NoBucketsSourceReturns501` with a table-driven `TestForwardServer_BucketOps_NoBucketsSourceReturns501` sweeping all three operations. A future op added without the nil guard fails CI immediately. Plus a new `TestForwardServer_CreateBucket_RejectsWhitespacePaddedName` that pins the validation-parity fix from #2. Rebased onto the latest `feat/admin-s3-writes` (which now carries the slice 2a review fixes) so the stack stays clean.
Codex on PR #673 caught that `readyForRegistration()` predicate gated registration on `tables != nil`, which meant a cluster started with `--dynamoAddr` empty but S3 enabled never registered the AdminForward gRPC service at all. Followers attempting to forward S3 admin writes (CREATE_BUCKET / DELETE_BUCKET / PUT_BUCKET_ACL) hit gRPC `Unimplemented` and surfaced as 503 to the SPA, instead of transparently reaching the leader. The S3 write path was advertised as "follower-forwarded" by P2 slice 2b but actually only worked when Dynamo was also configured. Fix is symmetric: relax the predicate to require RoleStore plus **at least one** source (TablesSource or BucketsSource) and have the dispatcher reject ops for the missing surface with the same 501 it already returns when the inverse source is nil. Per CLAUDE.md "test the bug first" convention, add the regression test before the fix: TestForwardServer_DynamoOps_NoTablesSourceReturns501 — symmetric with the existing TestForwardServer_BucketOps_NoBucketsSourceReturns501. With nil source, CREATE_TABLE used to slip past the source nil-deref because validation rejected the payload first; DELETE_TABLE panicked on `s.source.AdminDeleteTable`. Both now return 501. Plus three readyForRegistration assertions: - buckets-only deployment registers (the regression case) - missing-roles + buckets-only does NOT register - tables-only / both-sources continue to register While in there, the per-op nil-source guards I added tipped the linter's dupl detector. Refactored the dispatch path to keep forward_server tidy: 1. **`checkOpAvailability`**: pulls the per-op nil-source switch out of `dispatchForward` so the latter stays under the cyclomatic ceiling. One place to update when a future op joins the enum, instead of five. 2. **`decodeNamedPayload`**: the {"name": "..."} JSON shape is shared between Dynamo handleDelete and S3 handleDeleteBucket (NUL guard, DisallowUnknownFields, dec.More(), empty-name reject, slash reject — every guard had a paired comment citing the original review that introduced it). Centralising the helper keeps that audit trail in one place; the opLabel parameter ("delete" / "delete-bucket") preserves the path-specific wording in error responses. 3. **`auditDeleteSuccess`**: handleDelete and handleDeleteBucket each emitted a 7-line `admin_audit` block that differed only in the operation label and the `table` vs `bucket` field name. Folded into one helper so a future delete-shaped op automatically picks up the same field set. 4. **`notImplementedForwardResponse`**: deduplicates the 501 construction across the five Dynamo + S3 op cases. No behaviour change to any of the existing successful paths; all forward_server tests pass unchanged. Tests added: the S3-only registration assertion + the new symmetric DynamoOps_NoTablesSourceReturns501 sweep. go test -race ./internal/admin/ . — passes. golangci-lint run ./internal/admin/... . — 0 issues.
P4 deliverable from docs/design/2026_04_24_proposed_admin_dashboard.md Section 8: a single self-contained operator-facing reference for the admin HTTP listener. Covers: - Quick-start invocation for a loopback dev cluster - Required + optional flag reference, with explanations of why each guard exists (TLS hard-error, rolling-update caveats, etc.) - TLS topologies (loopback / TLS / discouraged plaintext-non-loopback) - Role model + how live role re-validation works on every state- changing request - The full /admin/api/v1/* surface (auth + cluster + dynamo + s3, including the slice 2 write paths and the AdminForward forwarding contract) - forwarded_from audit log shape and why it carries the follower's node ID - Troubleshooting guide for the common failure modes operators hit during initial bring-up (missing credentials, TLS hard-error, 401 ambiguity, 503 leader_unavailable, bucket_not_empty, blank-screen / placeholder bundle) - Cross-references to the design doc and architecture overview The Section 8 P4 plan also called out "TLS, read-only role, CSRF" as deliverables; those are already implemented (see config.go's validateTLS / validateAccessKeyRoles, the role gates in DynamoHandler.principalForWrite + S3Handler.principalForWrite, and the CSRFDoubleSubmit middleware. This doc stitches them into a single reference operators can land on without reading code. Stacked on #669 (P2 slice 2a) + #673 (P2 slice 2b) so the API- surface table can describe S3 write endpoints as shipped. Once both land in main, this rebases cleanly. EOF )
## Summary State-refresh of `docs/design/2026_04_24_partial_admin_dashboard.md` now that all four phases of the original P1–P4 plan have shipped. | Phase | Was | Now | |---|---|---| | P1 | shipped | shipped (no change) | | P2 | partial — read-only #658 only | **shipped** (+ #669 writes + #673 forwarding) | | P3 | shipped | shipped (no change) | | P4 | mostly | **shipped** (+ operator doc #674, deployment runbook #669, script wiring #669+#678) | Also added the **AdminDeleteBucket TOCTOU** to Outstanding open items — coderabbitai flagged it during PR #669 review (pre-existing race, `s3.go:deleteBucket` inherits the same shape; recorded as a code comment on `AdminDeleteBucket` for the future fix). Doc stays at `_partial_` rather than promoting to `_implemented_` because two outstanding items remain: AdminForward acceptance criterion 5 (rolling-upgrade flag, explicitly deferred at design time) and the new TOCTOU. Updated the closing rename trigger to cover both. ## Test plan - [x] No design changes — this is a state refresh - [x] Status table entries cross-checked against the merge commits on main - [x] AdminDeleteBucket TOCTOU description matches the code comment in adapter/s3_admin.go and the operator-side workaround in docs/admin_deployment.md
P2 slice 2b: a follower-side S3 admin write (POST /buckets, PUT /buckets/{name}/acl, DELETE /buckets/{name}) now hands off to the leader transparently, completing the same end-to-end forwarding contract Dynamo writes received in #644 + #648.
Stacked on #669 (P2 slice 2a). Once #669 merges, this rebases cleanly onto main.
Summary
ADMIN_OPenum values (CREATE_BUCKET/DELETE_BUCKET/PUT_BUCKET_ACL) appended after the Dynamo block so existing wire integers stay stable. Regenerated with the pinned protoc 29.3 / protoc-gen-go 1.36.11 / protoc-gen-go-grpc 1.6.1.ForwardServer.WithBucketsSourcelets deployments wire the S3 dispatcher optionally; Dynamo-only builds keep it nil and the new operations return 501. Three new dispatch arms mirror the leader-direct HTTP path's payload contract (NUL-byte rejection, 64 KiB limit,DisallowUnknownFields, trailing-token rejection, slash-in-name rejection).forwardBucketsErrorResponsemirrorsforwardErrorResponse:ErrBucketsForbidden/NotLeader/NotFound/AlreadyExists/NotEmpty+*ValidationErroreach map to the same HTTP statuswriteBucketsErrorproduces, so forwarded and leader-direct responses are byte-identical from the SPA's view.LeaderForwarderinterface gainsForwardCreateBucket/ForwardDeleteBucket/ForwardPutBucketAcl.ForwardPutBucketAclcarries both the bucket name (URL path) and the new ACL (request body) in one JSON payload — same approachForwardDeleteTabletakes for the table name.S3Handler.WithLeaderForwarder+tryForwardCreateBucket/tryForwardPutBucketAcl/tryForwardDeleteBucketgated onErrBucketsNotLeader && forwarder != nil.writeForwardResult/writeForwardFailuremirror the Dynamo handler's pattern (nosniff + Cache-Control:no-store + Retry-After:1 on 503).runtimeServerRunner.start()now creates*adapter.S3ServerbeforestartRaftServersso the leader-sideForwardServerregistration sees both adapters.adminForwardServerDeps.bucketsis wired throughregisterAdminForwardServer.ServerDeps.Forwardernow plumbs intoS3Handlertoo.What is NOT in this PR
Test plan
go build ./...go vet ./...golangci-lint run(admin + main + adapter + proto: 0 issues)go test ./internal/admin/ -count=1 -race— full admin suite passes (19 new tests across forward_server / forward_client / s3_handler integration)go test . -count=1 -race— main package passes./adapter/times out due to a pre-existing flake (verified earlier on PRs admin: AdminForward Phase 3 — register gRPC server + wire follower-side bridge #648 / admin: read-only S3 bucket endpoints (P2 slice 1) #658 / admin: S3 bucket write endpoints (P2 slice 2a) #669 — unrelated to this branch)Acceptance criteria coverage (Section 3.3.2 + 4.1)
validatePrincipalruns before bucket dispatch)forwarded_fromin audit logSelf-review (5 lenses)
BucketsSourceinterface that the leader-direct HTTP path uses; the only new code path is the JSON envelope + principal re-validation.gRPCForwardClientreuse the existingforward()helper, so connection-cache reuse andErrLeaderUnavailablesignalling behave identically across resource types. TheruntimeServerRunner.start()reorder is safe — each adapter listens on its own address and the raft TCP listeners are independent.forwardBucketsErrorResponseproduces byte-identical envelopes towriteBucketsErrorso forwarded and leader-direct responses are indistinguishable.