admin: CreateTable / DeleteTable write endpoints (P1, leader-only)#634
admin: CreateTable / DeleteTable write endpoints (P1, leader-only)#634
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 47 minutes and 36 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
There was a problem hiding this comment.
Code Review
This pull request introduces a SigV4-bypass administrative interface for DynamoDB-compatible tables, enabling listing, description, creation, and deletion through the admin dashboard. The implementation includes a new adapter layer for core logic, an HTTP handler with pagination and validation, and wiring into the admin server. Review feedback points out a redundant table name validation in the adapter and recommends refactoring the JSON response utility to encode the body before flushing headers, ensuring response reliability.
| if strings.TrimSpace(legacy.TableName) == "" { | ||
| return nil, newDynamoAPIError(http.StatusBadRequest, dynamoErrValidation, "missing table name") | ||
| } |
There was a problem hiding this comment.
This check for an empty table name is redundant. buildLegacyCreateTableInput (called at line 200) already performs this validation and returns a structured dynamoErrValidation error if the name is missing or whitespace-only. Removing this improves code maintainability by eliminating unreachable logic.
References
- Remove dead or unreachable code paths, such as asynchronous handlers for tasks that are always performed synchronously, to improve code clarity and maintainability.
| w.WriteHeader(status) | ||
| if err := json.NewEncoder(w).Encode(body); err != nil { | ||
| if logger == nil { | ||
| logger = slog.Default() | ||
| } | ||
| logger.LogAttrs(ctx, slog.LevelWarn, "admin response encode failed", | ||
| slog.String("error", err.Error()), | ||
| ) | ||
| } |
There was a problem hiding this comment.
Calling w.WriteHeader(status) before encoding the JSON body can lead to partial or malformed responses if json.NewEncoder(w).Encode(body) fails (e.g., due to a broken connection or an unexpected encoding error). Since the status code and headers are already flushed, the client might misinterpret a truncated response as a successful one. It is safer to encode the body into a buffer or use json.Marshal first, and only write the headers and status code if encoding succeeds.
func writeAdminJSONStatus(w http.ResponseWriter, ctx context.Context, logger *slog.Logger, status int, body any) {
b, err := json.Marshal(body)
if err != nil {
if logger == nil {
logger = slog.Default()
}
logger.LogAttrs(ctx, slog.LevelWarn, "admin response encode failed",
slog.String("error", err.Error()),
)
writeJSONError(w, http.StatusInternalServerError, "internal_error", "failed to encode response")
return
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
w.Header().Set("Cache-Control", "no-store")
w.WriteHeader(status)
_, _ = w.Write(b)
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 14ae1e9bb0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| dec := json.NewDecoder(body) | ||
| dec.DisallowUnknownFields() | ||
| var out CreateTableRequest | ||
| if err := dec.Decode(&out); err != nil { |
There was a problem hiding this comment.
Reject trailing JSON after create-table payload
decodeCreateTableRequest decodes exactly one JSON value and returns success immediately, so bodies like {"table_name":...}{"extra":1} are accepted even though they are not a single valid request document. This weakens the stated strict-decoding contract and can let malformed or concatenated payloads create/delete tables unexpectedly instead of returning 400 invalid_body; add an EOF check with a second decode/token read after the first decode succeeds.
Useful? React with 👍 / 👎.
14ae1e9 to
70e16e1
Compare
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@claude review (Gemini medium 2件 + Codex P2 1件対応) |
70e16e1 to
3b0d05d
Compare
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@claude review (Codex P2 slash-in-name 対応) |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
1c14fc1 to
070fa3f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 070fa3f0ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if dec.More() { | ||
| return CreateTableRequest{}, errors.New("request body has trailing data after the JSON object") |
There was a problem hiding this comment.
Reject NUL-delimited trailing request content
decodeCreateTableRequest relies on dec.More() to enforce a single JSON document, but with goccy/go-json a body like {"table_name":...}\x00{"extra":1} is accepted: the decoder treats the NUL byte as end-of-input, More() is false, and the trailing bytes are silently ignored. That means malformed/concatenated payloads can still pass invalid_body checks on POST /admin/api/v1/dynamo/tables, violating the strict-body contract for this new write path.
Useful? React with 👍 / 👎.
decodeCreateTableRequest used dec.More() to enforce a single JSON
document per request, but goccy/go-json treats a raw NUL as
end-of-input — a body like `{"table_name":...}\x00{"extra":1}`
parsed cleanly, dec.More() returned false, and the trailing
content was silently dropped. Codex P2 on PR #634 flagged this
as a payload-smuggling vector.
Fix: read the body once, scan for NUL before decoding. JSON has
no need for raw NUL (control characters must be \u-escaped per
RFC 8259), so any NUL is a strong signal of either tooling
misconfiguration or deliberate smuggling. Reject with 400.
The existing dec.More() check stays — it catches the well-formed
trailing-token cases (a second `{...}` or trailing `42`) that
would otherwise pass NUL-validation but still violate the
strict-body contract.
Tests: extend TestDynamoHandler_CreateTable_RejectsBadJSON with
two NUL vectors — the trailing `{"extra":1}` from the Codex
report, and a bare trailing NUL with no extra payload.
|
@claude review (Codex P2: NUL-byte payload smuggling 対応 — body 読み取り後に NUL byte 検出で 400 reject) |
decodeCreateTableRequest used dec.More() to enforce a single JSON
document per request, but goccy/go-json treats a raw NUL as
end-of-input — a body like `{"table_name":...}\x00{"extra":1}`
parsed cleanly, dec.More() returned false, and the trailing
content was silently dropped. Codex P2 on PR #634 flagged this
as a payload-smuggling vector.
Fix: read the body once, scan for NUL before decoding. JSON has
no need for raw NUL (control characters must be \u-escaped per
RFC 8259), so any NUL is a strong signal of either tooling
misconfiguration or deliberate smuggling. Reject with 400.
The existing dec.More() check stays — it catches the well-formed
trailing-token cases (a second `{...}` or trailing `42`) that
would otherwise pass NUL-validation but still violate the
strict-body contract.
Tests: extend TestDynamoHandler_CreateTable_RejectsBadJSON with
two NUL vectors — the trailing `{"extra":1}` from the Codex
report, and a bare trailing NUL with no extra payload.
84bf258 to
dcac6e4
Compare
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Two related findings on PR #635 / forward_server.go's handleDelete: - NUL-byte smuggling: handleDelete decoded with goccy/go-json but skipped the explicit NUL scan that decodeCreateTableRequest applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}` passes dec.More() because goccy treats NUL as end-of-input. Add the same pre-decode NUL rejection. - Slash-in-name divergence: the HTTP handleDelete and handleDescribe both reject `/` in the table name with 404, but the forwarded delete just passed body.Name straight through to AdminDeleteTable. A forwarded call could therefore act on slash-bearing tables that a leader-direct call would 404. Reject symmetrically before invoking the source. Tests: two new ForwardServer cases — NUL payload + slash name. Both confirm the source is never invoked when the precondition fails (defence in depth — an asymmetric stub source would still make the test green if we only checked the response code).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 84bf2580e0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err != nil { | ||
| writeJSONError(w, http.StatusBadRequest, "invalid_body", err.Error()) | ||
| return |
There was a problem hiding this comment.
Return 413 for oversized create-table bodies
POST /admin/api/v1/dynamo/tables currently maps every decode failure to 400 invalid_body, so when BodyLimit trips (http.MaxBytesError) the client still gets a 400 instead of the canonical 413 payload_too_large used elsewhere in admin handlers. This breaks the middleware contract documented in internal/admin/middleware.go and makes oversized requests indistinguishable from malformed JSON for callers and retry logic.
Useful? React with 👍 / 👎.
decodeCreateTableRequest used to surface every read/parse failure as the same generic "invalid_body" string, so handleCreate mapped all of them to 400 — including the BodyLimit/MaxBytesReader overflow that should produce 413 payload_too_large. The middleware contract in internal/admin/middleware.go promises 413 on oversized bodies (WriteMaxBytesError lives in that file exactly for this purpose). Codex P2 on PR #634 flagged the write path as the only handler that broke that contract: callers and retry logic could not distinguish "body too big" from "body malformed", and oversize requests would be retried as if a caller-side fix was possible. Fix: introduce errCreateBodyTooLarge as a sentinel returned only when io.ReadAll trips MaxBytesReader. handleCreate matches the sentinel via errors.Is and routes to WriteMaxBytesError, which emits the canonical 413 + payload_too_large body. All other decode paths still produce 400 invalid_body unchanged. Test: TestDynamoHandler_CreateTable_OversizedBodyReturns413 wraps the request body in MaxBytesReader (mirroring what the real BodyLimit middleware does) and confirms the response is 413 with a payload_too_large code. Also asserts the stub source is not touched on rejection.
Two related findings on PR #635 / forward_server.go's handleDelete: - NUL-byte smuggling: handleDelete decoded with goccy/go-json but skipped the explicit NUL scan that decodeCreateTableRequest applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}` passes dec.More() because goccy treats NUL as end-of-input. Add the same pre-decode NUL rejection. - Slash-in-name divergence: the HTTP handleDelete and handleDescribe both reject `/` in the table name with 404, but the forwarded delete just passed body.Name straight through to AdminDeleteTable. A forwarded call could therefore act on slash-bearing tables that a leader-direct call would 404. Reject symmetrically before invoking the source. Tests: two new ForwardServer cases — NUL payload + slash name. Both confirm the source is never invoked when the precondition fails (defence in depth — an asymmetric stub source would still make the test green if we only checked the response code).
|
@claude review (Codex P2 対応: oversized body は 400 ではなく 413 payload_too_large を返すよう修正) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 41a00638ee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| return &admin.ValidationError{Message: msg} | ||
| default: | ||
| return err //nolint:wrapcheck // forwarded so the handler logs but does not surface it. |
There was a problem hiding this comment.
Map leader-churn dispatch errors to not-leader response
The bridge currently only maps adapter.ErrAdminNotLeader, but AdminCreateTable/AdminDeleteTable can still return raw coordinator leadership-loss errors after the initial leader check (e.g., leadership changes between isVerifiedDynamoLeader and Dispatch in adapter/dynamodb_admin.go). Those errors fall through this default branch and become a generic 500, so transient leader churn is reported as server failure instead of 503 leader_unavailable with retry semantics.
Useful? React with 👍 / 👎.
decodeCreateTableRequest used dec.More() to enforce a single JSON
document per request, but goccy/go-json treats a raw NUL as
end-of-input — a body like `{"table_name":...}\x00{"extra":1}`
parsed cleanly, dec.More() returned false, and the trailing
content was silently dropped. Codex P2 on PR #634 flagged this
as a payload-smuggling vector.
Fix: read the body once, scan for NUL before decoding. JSON has
no need for raw NUL (control characters must be \u-escaped per
RFC 8259), so any NUL is a strong signal of either tooling
misconfiguration or deliberate smuggling. Reject with 400.
The existing dec.More() check stays — it catches the well-formed
trailing-token cases (a second `{...}` or trailing `42`) that
would otherwise pass NUL-validation but still violate the
strict-body contract.
Tests: extend TestDynamoHandler_CreateTable_RejectsBadJSON with
two NUL vectors — the trailing `{"extra":1}` from the Codex
report, and a bare trailing NUL with no extra payload.
decodeCreateTableRequest used to surface every read/parse failure as the same generic "invalid_body" string, so handleCreate mapped all of them to 400 — including the BodyLimit/MaxBytesReader overflow that should produce 413 payload_too_large. The middleware contract in internal/admin/middleware.go promises 413 on oversized bodies (WriteMaxBytesError lives in that file exactly for this purpose). Codex P2 on PR #634 flagged the write path as the only handler that broke that contract: callers and retry logic could not distinguish "body too big" from "body malformed", and oversize requests would be retried as if a caller-side fix was possible. Fix: introduce errCreateBodyTooLarge as a sentinel returned only when io.ReadAll trips MaxBytesReader. handleCreate matches the sentinel via errors.Is and routes to WriteMaxBytesError, which emits the canonical 413 + payload_too_large body. All other decode paths still produce 400 invalid_body unchanged. Test: TestDynamoHandler_CreateTable_OversizedBodyReturns413 wraps the request body in MaxBytesReader (mirroring what the real BodyLimit middleware does) and confirms the response is 413 with a payload_too_large code. Also asserts the stub source is not touched on rejection.
translateAdminTablesError previously only matched adapter.ErrAdminNotLeader, which AdminCreateTable raises from its up-front isVerifiedDynamoLeader guard. After that guard, the kv coordinator can still drop leadership mid-Dispatch — the resulting error (kv.ErrLeaderNotFound, adapter.ErrNotLeader, adapter.ErrLeaderNotFound, or any of the wire-level "not leader" phrases) fell through to the default branch and surfaced as a generic 500 with no Retry-After. Codex P2 on PR #634 flagged this as a regression in the write path's retry contract. Fix: introduce isLeaderChurnError that mirrors kv's closed list of leader-churn sentinels (the typed sentinels plus the wire-level phrases for errors that have crossed a gRPC boundary and lost their type). Hook it into translateAdminTablesError so any matching error maps to admin.ErrTablesNotLeader, which the handler already turns into 503 leader_unavailable + Retry-After: 1. Tests pin all four typed sentinels, all four wire phrases, and prove an unrelated "team leader" error still falls through to the default branch — the focused substring match avoids swallowing random sentences that happen to mention leadership.
|
@claude review |
|
Claude finished @bootjp's task in 2m 46s —— View job Code Review
Overall AssessmentSolid, well-structured PR. All findings from the previous review rounds (Gemini mediums + Codex P2 items) have been addressed. The code is clean, the test coverage is thorough, and the layered security model is correctly implemented. Three minor comment-staleness issues below — none block merge. Previously Raised Issues — Status
New Findings1. Stale route table in The layout comment still lists only // Layout:
// POST /admin/api/v1/auth/login
// POST /admin/api/v1/auth/logout
// GET /admin/api/v1/cluster
// GET /admin/api/v1/dynamo/tables (auth required, read-only)
// GET /admin/api/v1/dynamo/tables/{name} (auth required, read-only)Now that this PR ships 2. Stale // Tables is the read-only DynamoDB admin source. Optional: a nil
// value disables /admin/api/v1/dynamo/tables{,/{name}} ...
3. Slightly misleading "Read-only endpoints" comment ( // Read-only endpoints share the protect chain so a missing
// session or CSRF token still 401s/403s the same way as a write.
// The Audit middleware is a no-op for GET (it only logs state-
// changing methods) so we get the consistent guards without the
// noise of an audit line per dashboard poll.
var dynamoChain http.HandlerThis What Looks Good
Summary: Three stale comments, all trivial one-line fixes. No correctness, security, or behaviour issues found. LGTM with those addressed at your discretion. |
handleCreate / handleDelete previously trusted whatever role was embedded in the session JWT. Tokens are valid for one hour, and the role is captured at login time, so an access key downgraded or revoked mid-session would keep mutating tables for up to 60 minutes despite the operator's intent. Codex P1 on PR #635 flagged the gap; the leader-side ForwardServer already does this re-evaluation, the HTTP path now does it too so leader- direct and forwarded write paths share the same authorisation contract. Pieces: - Extract RoleStore + MapRoleStore into role_store.go (shared between the HTTP handler and the existing forward server). - DynamoHandler gains a roles RoleStore field plus WithRoleStore option. Nil keeps the previous trust-the-JWT behaviour for single-tenant deployments where the role map never changes. - principalForWrite centralises the check: pull principal from context (fail closed if missing), enforce RoleFull on the JWT role, then — when a RoleStore is configured — re-look-up the access key against the live map and reject if absent or downgraded. The downstream call uses the live role. - ServerDeps.Roles is now plumbed into the dynamo handler via MapRoleStore so production builds pick up the live check automatically. Tests: three new cases under the live-role-revalidation banner — downgrade (full → read_only), removal (access key absent), and the same revocation flow on the delete path. All three confirm the source layer is never touched on rejection.
Two related findings on PR #635 / forward_server.go's handleDelete: - NUL-byte smuggling: handleDelete decoded with goccy/go-json but skipped the explicit NUL scan that decodeCreateTableRequest applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}` passes dec.More() because goccy treats NUL as end-of-input. Add the same pre-decode NUL rejection. - Slash-in-name divergence: the HTTP handleDelete and handleDescribe both reject `/` in the table name with 404, but the forwarded delete just passed body.Name straight through to AdminDeleteTable. A forwarded call could therefore act on slash-bearing tables that a leader-direct call would 404. Reject symmetrically before invoking the source. Tests: two new ForwardServer cases — NUL payload + slash name. Both confirm the source is never invoked when the precondition fails (defence in depth — an asymmetric stub source would still make the test green if we only checked the response code).
Two changes on the AdminForward leader-side dispatcher: - Codex P2 on PR #635: forwarded 503 leader_unavailable lost the Retry-After: 1 header that the leader-direct HTTP path emits. Add retry_after_seconds to AdminForwardResponse so the follower's bridge can rebuild the same HTTP header (the bridge side wires it in the next PR; the proto + leader populate it here). Direct HTTP path is unaffected — the proto field is only consumed during forwarding. - Drop the duplicate RoleStore/MapRoleStore declarations from forward_server.go. They now live in role_store.go (added on PR #634 for HTTP-side role revalidation) so both surfaces share one definition. Tests: extend TestForwardServer_CreateTable_LeaderSteppedDownReturns503 to pin the retry_after_seconds=1 hint.
|
@claude review |
|
Claude finished @bootjp's task in 4m 19s —— View job Code Review
Overall AssessmentVery clean PR. The most recent commit ( Previously Raised Issues — Status
Remaining Findings (all carry-overs from the 17:10 review)1. Stale route table in // Layout:
//
// POST /admin/api/v1/auth/login (no auth, rate-limited)
// POST /admin/api/v1/auth/logout (auth required)
// GET /admin/api/v1/cluster (auth required)
// GET /admin/api/v1/dynamo/tables (auth required, read-only)
// GET /admin/api/v1/dynamo/tables/{name} (auth required, read-only)Missing 2. Stale // Tables is the read-only DynamoDB admin source. Optional: a nil
// value disables /admin/api/v1/dynamo/tables{,/{name}} ...
3. Stale "Read-only endpoints" comment ( // Read-only endpoints share the protect chain so a missing
// session or CSRF token still 401s/403s the same way as a write.
What Looks Good
Summary: Three stale comment-only fixes remain from the previous round ( |
Three doc-only fixes from Claude's review on PR #634 — the behavioural changes from this PR's commit series moved past their original comments: - ServerDeps.Tables description still said "read-only DynamoDB admin source" but TablesSource now exposes AdminCreateTable and AdminDeleteTable too. - buildAPIMux godoc layout omitted POST /dynamo/tables and DELETE /dynamo/tables/{name}, and the "(auth required, read-only)" annotation on the GET lines was wrong. - The dynamoChain wiring comment said "Read-only endpoints share the protect chain..." but the chain now serves both reads and writes; the rationale (Audit no-ops for GET) is unchanged. No behavioural change; doc-only.
Two related findings on PR #635 / forward_server.go's handleDelete: - NUL-byte smuggling: handleDelete decoded with goccy/go-json but skipped the explicit NUL scan that decodeCreateTableRequest applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}` passes dec.More() because goccy treats NUL as end-of-input. Add the same pre-decode NUL rejection. - Slash-in-name divergence: the HTTP handleDelete and handleDescribe both reject `/` in the table name with 404, but the forwarded delete just passed body.Name straight through to AdminDeleteTable. A forwarded call could therefore act on slash-bearing tables that a leader-direct call would 404. Reject symmetrically before invoking the source. Tests: two new ForwardServer cases — NUL payload + slash name. Both confirm the source is never invoked when the precondition fails (defence in depth — an asymmetric stub source would still make the test green if we only checked the response code).
Two changes on the AdminForward leader-side dispatcher: - Codex P2 on PR #635: forwarded 503 leader_unavailable lost the Retry-After: 1 header that the leader-direct HTTP path emits. Add retry_after_seconds to AdminForwardResponse so the follower's bridge can rebuild the same HTTP header (the bridge side wires it in the next PR; the proto + leader populate it here). Direct HTTP path is unaffected — the proto field is only consumed during forwarding. - Drop the duplicate RoleStore/MapRoleStore declarations from forward_server.go. They now live in role_store.go (added on PR #634 for HTTP-side role revalidation) so both surfaces share one definition. Tests: extend TestForwardServer_CreateTable_LeaderSteppedDownReturns503 to pin the retry_after_seconds=1 hint.
|
@claude review |
|
Claude finished @bootjp's task in 2m 27s —— View job Code Review
Overall AssessmentAll findings from every prior review round are resolved. No new correctness, security, or behavioural issues found. LGTM. Previously Raised Issues — Status
New FindingsNone. After reading all 11 changed files, the code is clean and no new issues were identified. A few implementation choices that are worth acknowledging as correct:
What Looks Good
Summary: All 15 tracked findings from prior rounds are resolved. No new issues. LGTM — ready to merge. |
Two related findings on PR #635 / forward_server.go's handleDelete: - NUL-byte smuggling: handleDelete decoded with goccy/go-json but skipped the explicit NUL scan that decodeCreateTableRequest applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}` passes dec.More() because goccy treats NUL as end-of-input. Add the same pre-decode NUL rejection. - Slash-in-name divergence: the HTTP handleDelete and handleDescribe both reject `/` in the table name with 404, but the forwarded delete just passed body.Name straight through to AdminDeleteTable. A forwarded call could therefore act on slash-bearing tables that a leader-direct call would 404. Reject symmetrically before invoking the source. Tests: two new ForwardServer cases — NUL payload + slash name. Both confirm the source is never invoked when the precondition fails (defence in depth — an asymmetric stub source would still make the test green if we only checked the response code).
Two changes on the AdminForward leader-side dispatcher: - Codex P2 on PR #635: forwarded 503 leader_unavailable lost the Retry-After: 1 header that the leader-direct HTTP path emits. Add retry_after_seconds to AdminForwardResponse so the follower's bridge can rebuild the same HTTP header (the bridge side wires it in the next PR; the proto + leader populate it here). Direct HTTP path is unaffected — the proto field is only consumed during forwarding. - Drop the duplicate RoleStore/MapRoleStore declarations from forward_server.go. They now live in role_store.go (added on PR #634 for HTTP-side role revalidation) so both surfaces share one definition. Tests: extend TestForwardServer_CreateTable_LeaderSteppedDownReturns503 to pin the retry_after_seconds=1 hint.
Stacked on #634. Lays down the proto contract and the leader-side dispatcher for the follower-to-leader forwarding path mandated by design Section 3.3. The follower-side client + bridge wiring lands in a separate PR to keep this one reviewable. ## Summary - New `proto/admin_forward.proto` with `service AdminForward` carrying an `AdminPrincipal` + opaque JSON payload + the operation enum. Operation-agnostic so adding a new admin endpoint does not require a wire-format bump. - `internal/admin/forward_server.go` is the leader-side gRPC handler. `validatePrincipal` re-evaluates the inbound access-key against the leader's own `RoleStore` and demotes a follower-claimed full role when the leader sees read-only — this is the design's *adapter side is the source of truth for authz* invariant for forwarded calls (Section 3.3.1). - `forwardErrorResponse` mirrors the HTTP handler's `writeTablesError` mapping (403 / 503 / 404 / 409 / 400 / 500) so a forwarded call is indistinguishable from a leader-direct call to the SPA. Raw adapter error text never leaves the leader; everything is encoded into the structured response payload. - `forwarded_from` is echoed into the leader's audit log line on every successful forwarded operation (acceptance criterion 6). ## What is NOT in this PR - Follower-side AdminForward client + bridge wiring (acceptance criterion 2). The follower's HTTP handler still returns `503 leader_unavailable` + `Retry-After: 1` from #634 — this PR is a no-op on the request path until the client wiring lands. - Election-period 503 handling (criterion 3) — comes with the client wiring. - Rolling-upgrade compatibility flag `admin.leader_forward_v2` (criterion 5) — depends on a Raft-level cluster version bump that is independently substantial. - gRPC server registration in `main.go` — wired in the next PR alongside the client. ## Acceptance criteria coverage | # | Criterion | This PR | |---|---|---| | 1 | Leader direct write | ✓ existing (PR #634) | | 2 | Follower forwards transparently | ⏳ next PR (client) | | 3 | Election period 503 + retry | ⏳ next PR (client) | | 4 | Leader demotes stale full role | ✓ test added | | 5 | Rolling-upgrade compat flag | ⏳ deferred | | 6 | `forwarded_from` in audit log | ✓ test added | ## Test plan - [x] `go build ./...` - [x] `go vet ./...` - [x] `golangci-lint run` (admin package: 0 issues) - [x] `go test ./internal/admin/ -run TestForwardServer -count=1` (11 leader-side tests pass — criterion 1, 4, 6, plus error-mapping coverage) - [ ] Wire follower side and exercise an end-to-end follower → leader call against a live cluster
…artial) (#644) Phase 1 + 2 of Task #26: the follower-side `LeaderForwarder` client and its integration into the dynamo HTTP handler. Builds on the AdminForward leader-side dispatcher landed via #635. ## Summary - New `LeaderForwarder` interface (`internal/admin/forward_client.go`) decouples the dynamo HTTP handler from `pb.AdminForwardClient`. The handler stays proto-free; the bridge in `main_admin.go` (next phase) plugs in the gRPC-backed implementation. - `gRPCForwardClient` translates a `CreateTableRequest` / table-name into an `AdminForwardRequest`, dials via a `GRPCConnFactory` (production wraps `kv.GRPCConnCache`), and re-shapes the response into `ForwardResult` (status, payload, content-type). - `forwarded_from = nodeID` is populated so the leader's audit log carries the trace (criterion 6, leader-side already shipped in #635). - Defensive: `status_code == 0` upgrades to `502 Bad Gateway`; missing `ContentType` fills the JSON default. Both surface transport bugs rather than producing silently-malformed SPA responses. - `ErrLeaderUnavailable` sentinel signals the "no leader known" case so the handler can map to 503 + `Retry-After: 1` (criterion 3). - `DynamoHandler` gains a `forwarder` field and `WithLeaderForwarder` option. When set, `handleCreate` / `handleDelete` catch `ErrTablesNotLeader` from the source and forward to the leader transparently — the SPA cannot tell forwarded from leader-direct. - `writeForwardResult` re-emits the leader's structured response verbatim (status + payload + content-type), so a forwarded `409 Conflict` from the leader stays `409` on the wire — no re-classification. - `writeForwardFailure` maps `ErrLeaderUnavailable` (election in flight) and gRPC transport errors to 503 + `Retry-After: 1`. `ErrLeaderUnavailable` is intentionally NOT logged at error level (elections are routine); transport errors are logged at LevelError so operators can investigate. ## What is NOT in this PR - gRPC server registration in `main.go` (production wiring of the `ForwardServer` from #635) — comes in the next phase. - The bridge that wraps `kv.GRPCConnCache` and supplies `LeaderAddressResolver` — same phase. - Election-period retry-loop on the client side (criterion 3 partial: this PR returns 503 + Retry-After; the SPA / client retries the request; criterion 3 fully needs the production bridge to dial actual leader-discovery). ## Test plan - [x] `go build ./...` - [x] `go vet ./...` - [x] `golangci-lint run` (admin package: 0 issues) - [x] `go test ./internal/admin/ -count=1 -race` - 8 forward-client unit tests: constructor input validation, both Forward operations including principal/payload/op-enum/`forwarded_from` round-trip, `ErrLeaderUnavailable`, dial/RPC errors propagated with `cockroachdb/errors` wrapping, zero status code upgrade, missing content type fallback - 9 handler integration tests: transparent forward for create + delete, no-forwarder fallback to 503, `ErrLeaderUnavailable` → 503 + Retry-After, transport error → 503 + log, leader 409 pass-through, leader 503 + Retry-After preserved, role check short-circuits before forward, body validation short-circuits before forward - [ ] Wire production bridge + register `pb.RegisterAdminForwardServer` in `main.go` and exercise an end-to-end follower → leader call against a real cluster (next PR). ## Acceptance criteria coverage | # | Criterion | This PR | |---|---|---| | 1 | Leader direct write | ✓ (in main since #634) | | 2 | Follower forwards transparently | ✓ wiring done; needs main.go gRPC registration to take effect | | 3 | Election-period 503 + retry | ✓ partial — handler returns 503 + Retry-After; full transparency needs the bridge | | 4 | Leader demotes stale full role | ✓ (in main since #635) | | 5 | Rolling-upgrade compat flag | ⏳ deferred (cluster-version bump) | | 6 | `forwarded_from` in audit log | ✓ (in main since #635) |
…de bridge (#648) Phase 3 of Task #26 / AdminForward design 3.3. Closes the wiring loop opened by #635 (leader-side dispatcher) and #644 (follower-side client + handler integration): the admin HTTP DynamoHandler now hands `ErrTablesNotLeader` writes off to the in-process `LeaderForwarder`, and the leader-side gRPC `AdminForward` service is registered on every node so the forwarded RPC actually lands somewhere. ## Summary - **`admin.ServerDeps.Forwarder`** (3315bf4) — new optional field on the high-level admin server composition surface. `admin.NewServer` calls `WithLeaderForwarder` when set; nil leaves the existing 503 + Retry-After:1 fallback untouched. Single-node / leader-only deployments are unaffected. - **`main_admin_forward.go`** (f3e529c) — production bridge between `kv.GRPCConnCache` and the admin layer's typed `PBAdminForwardClient` (via `pb.NewAdminForwardClient`). Includes `buildLeaderForwarder`, `roleStoreFromFlags`, `adminForwardServerDeps`, and a `readyForRegistration` gate. - **Runtime wiring** (19f6b70) — `runtimeServerRunner.start()` now creates `dynamoServer` BEFORE `startRaftServers`, so the leader-side `ForwardServer` can use it as its `TablesSource`. `startRaftServers` accepts the new `adminForwardServerDeps` bundle and delegates to `registerAdminForwardServer`. `startServers` constructs a single `*kv.GRPCConnCache` and forwards it (with the `Coordinator`) to `startAdminFromFlags`, which in turn builds the production `LeaderForwarder` via `buildAdminLeaderForwarder` and passes it through `ServerDeps.Forwarder`. ## What is NOT in this PR - Rolling-upgrade compatibility flag (criterion 5) — still deferred behind a cluster-version bump. - Election-period retry loop on the follower's bridge — the handler returns 503 + Retry-After:1, the SPA / curl re-issues, and the bridge dials again on the next attempt. Inline retry inside the handler would just hide the latency from operators. ## Test plan - [x] `go build ./...` - [x] `go vet ./...` - [x] `golangci-lint run` (main + admin packages: 0 issues) - [x] `go test ./internal/admin/ -count=1 -race` — full admin suite passes (existing 21 forward tests + 1 new `TestServer_ServerDepsForwarderIsWired`) - [x] `go test . -count=1 -race` — main package passes (4 new bridge / role-store / readyForRegistration tests) - [ ] `adapter` package times out at 240s on this branch AND on `main` — pre-existing flake unrelated to this PR (verified by checking out main and running the same suite) - [ ] End-to-end smoke against a 3-node cluster — needs the local Jepsen runner; blocked on the next PR or a manual run ## Acceptance criteria coverage | # | Criterion | This PR | |---|---|---| | 1 | Leader direct write | ✓ (in main since #634) | | 2 | Follower forwards transparently | ✓ — wiring complete; both leader-side gRPC service AND follower-side bridge are now plumbed | | 3 | Election-period 503 + retry | ✓ — handler still returns 503 + Retry-After:1 when no leader is known; the SPA / client retries and the next attempt re-dials | | 4 | Leader demotes stale full role | ✓ (in main since #635) | | 5 | Rolling-upgrade compat flag | ⏳ deferred (cluster-version bump) | | 6 | `forwarded_from` in audit log | ✓ (in main since #635 — bridge now stamps `--raftId` onto every forwarded request) | ## Self-review (5 lenses) 1. **Data loss**: No FSM / Raft / Pebble path changes; admin writes still go through the same `AdminCreateTable` / `AdminDeleteTable` Raft proposal once they reach the leader. 2. **Concurrency**: `runner.start()` reorder confirmed safe — `startDynamoDBServer` only depends on `coordinate` + `shardStore` (already constructed); raft TCP listeners are independent of the dynamo HTTP listener. Worst case during the 100 ms-or-so reorder window is the same "no leader yet" 503 the old order would have produced if a request landed before raft converged. 3. **Performance**: One additional `&kv.GRPCConnCache{}` allocation per process; one closure for the resolver. No hot-path changes. 4. **Data consistency**: Forwarded requests re-validate the principal at the leader (criterion 4, already shipped); commit-ts ordering is unchanged because the leader's `AdminCreateTable` path does its own `HLC.Next`. 5. **Test coverage**: 4 new main-package tests + 1 new admin-package server-level test. `TestServer_ServerDepsForwarderIsWired` is the regression test for "future refactor drops the forwarder before it reaches the dynamo handler" — exactly the failure mode the wiring change in this PR introduces the risk of. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added leader-forwarding capability for admin requests sent to follower nodes * Implemented role-based access control for admin API operations * Enhanced DynamoDB request routing to automatically direct operations to leader nodes * **Tests** * Added comprehensive integration and unit tests validating leader-forwarding paths <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Three findings from Claude review on PR #670: 1. (P1) translateAdminQueuesError did not catch leader-churn errors. AdminDeleteQueue passes the upfront isVerifiedSQSLeader check, then dispatches deleteQueueWithRetry which can hit a kv coordinator that just lost leadership. The resulting kv.ErrLeaderNotFound / adapter.ErrNotLeader / wrapped "not leader" suffixes were not in the translator's switch — they fell to default and the admin handler rendered a generic 500 instead of the spec'd 503 + Retry-After: 1. Added the `case isLeaderChurnError(err)` arm mirroring translateAdminTablesError's identical fix from PR #634. 2. (P2) No tests for translateAdminQueuesError. Mirrored the three Dynamo equivalents in main_admin_test.go: - TestTranslateAdminQueuesError_LeaderChurn covers every kv sentinel + canonical wrapped-suffix variant. - TestTranslateAdminQueuesError_LeaderPhraseInMiddleOfMessage pins the HasSuffix matcher behaviour against false positives on user-supplied error messages mid-string. - TestTranslateAdminQueuesError_UnrelatedErrorPassesThrough confirms the detector does not swallow innocent "leader" mentions outside the canonical phrase set. 3. (Low / polish) SqsList.tsx subtitle leaked the Go file path "adapter/sqs_admin.go" and the internal milestone name "Phase 3.A" to end users — DynamoList / S3List don't do this. Replaced with operator-facing prose describing what the page does. Verified: - go test -run TestTranslateAdminQueuesError . — passes - go build ./... clean - go test -race ./internal/admin/... + go test -race -run TestSQS ./adapter/ — pass - golangci-lint run ./adapter/... ./internal/admin/... ./... — 0 issues - cd web/admin && npm run lint (tsc --strict) clean
Three findings from Claude review on PR #670: 1. (P1) translateAdminQueuesError did not catch leader-churn errors. AdminDeleteQueue passes the upfront isVerifiedSQSLeader check, then dispatches deleteQueueWithRetry which can hit a kv coordinator that just lost leadership. The resulting kv.ErrLeaderNotFound / adapter.ErrNotLeader / wrapped "not leader" suffixes were not in the translator's switch — they fell to default and the admin handler rendered a generic 500 instead of the spec'd 503 + Retry-After: 1. Added the `case isLeaderChurnError(err)` arm mirroring translateAdminTablesError's identical fix from PR #634. 2. (P2) No tests for translateAdminQueuesError. Mirrored the three Dynamo equivalents in main_admin_test.go: - TestTranslateAdminQueuesError_LeaderChurn covers every kv sentinel + canonical wrapped-suffix variant. - TestTranslateAdminQueuesError_LeaderPhraseInMiddleOfMessage pins the HasSuffix matcher behaviour against false positives on user-supplied error messages mid-string. - TestTranslateAdminQueuesError_UnrelatedErrorPassesThrough confirms the detector does not swallow innocent "leader" mentions outside the canonical phrase set. 3. (Low / polish) SqsList.tsx subtitle leaked the Go file path "adapter/sqs_admin.go" and the internal milestone name "Phase 3.A" to end users — DynamoList / S3List don't do this. Replaced with operator-facing prose describing what the page does. Verified: - go test -run TestTranslateAdminQueuesError . — passes - go build ./... clean - go test -race ./internal/admin/... + go test -race -run TestSQS ./adapter/ — pass - golangci-lint run ./adapter/... ./internal/admin/... ./... — 0 issues - cd web/admin && npm run lint (tsc --strict) clean
Per `docs/design/README.md`'s lifecycle convention, the admin dashboard design doc is now "partial": - **P1** (DynamoDB CRUD + AdminForward) — shipped via #634, #635, #644, #648 - **P2** (S3 buckets list/create/delete/ACL + DescribeTable) — shipped via #658, with #669 + #673 in flight - **P3** (React SPA + embed) — shipped via #649, #650 - **P4** (TLS / role / CSRF / operator docs) — TLS, role, CSRF are already live in P1; operator docs in #674 Independent of the in-flight slice 2 PRs (#669/#673) and the docs PR (#674) — this rename only reflects what is already on main today, plus an "Implementation status" table mapping each phase to the PR it landed in. ## What this PR changes - `git mv` the design doc from `2026_04_24_proposed_admin_dashboard.md` to `2026_04_24_partial_admin_dashboard.md` so its history follows - Add an "Implementation status" header table indexing each phase to the PRs that landed it - List the outstanding open items so future readers know what is still owed against the original proposal: - AdminForward acceptance criterion 5 (rolling-upgrade compat flag) — deferred - S3 object browser — explicitly out of scope per Section 2 Non-goals - TLS cert hot-reload — restart-to-rotate is the documented model When the rolling-upgrade flag lands, the doc gets renamed once more to `2026_04_24_implemented_admin_dashboard.md` per the README's lifecycle convention.
Catches the three locations the prior round (7c63a7b) missed: adapter/sqs_admin_test.go (TestAdminQueueSummary_CreatedAtUsesMillisNotHLC, TestMetaAttributesForAdmin_IncludesQueueArnAndLastModified) and main_admin_test.go (TestTranslateAdminQueuesError_LeaderChurn). Same pattern: the why-rationale stays, only the "fifth-round Claude review on PR #670" / "seventh-round Claude review on PR #670" / "Codex P2 + Claude P1 on PR #670" attributions are dropped. Pre-existing PR #634 references at main_admin_test.go:373 / :397 were added by a prior merged PR and are out of scope here.
Stacked on #633 (the read-only chunk). Writes are limited to the leader node for now; follower-side
AdminForwardRPC (design Section 3.3 acceptance criteria 1-6) ships in a follow-up PR. Mergeable on its own — followers respond503 leader_unavailable+Retry-After: 1.Summary
POST /admin/api/v1/dynamo/tablesandDELETE /admin/api/v1/dynamo/tables/{name}both go through the existing protect chain (BodyLimit → SessionAuth → Audit → CSRF). The handler also enforcesRoleFullso a read-only key cannot create or delete even with a valid CSRF token.AdminCreateTable/AdminDeleteTabletake anAdminPrincipaland re-validate the role at the adapter layer even when a higher tier already enforced it. Preserves the design's adapter side is the source of truth for authz invariant (Section 3.2). Two sentinel errors (ErrAdminNotLeader,ErrAdminForbidden) signal the structured failure modes.main_admin.gotranslates adapter errors to admin sentinels (ErrTablesNotLeaderto 503 +Retry-After: 1,ErrTablesForbiddento 403,ResourceInUseto 409,ResourceNotFoundto 404,ValidationExceptionto 400). Raw adapter error text is never surfaced to clients; everything else falls through to a generic 500 with the original message logged at error level.DisallowUnknownFields); each validation message is plain English so the SPA can render it directly.adapter.AdminCreateTableInput/admin.CreateTableRequest) stay independent so neither package imports the other; the bridge keeps them in sync and any drift breaks the build there.Test plan
go build ./...go vet ./...golangci-lint run(admin, adapter, root: 0 issues)go test ./internal/admin/ -count=1(49 tests pass — 14 new write-handler unit tests, 4 new server-level integration tests)go test ./adapter/ -count=1 -run 'TestDynamoDB_Admin'(14 tests pass — 9 new write-path tests including duplicate rejection, role enforcement at adapter, validation errors, delete missing to ResourceNotFound, etc.)curl -X POST .../dynamo/tableswith full-role cookies + CSRF header to 201 + JSON summaryRetry-After: 1DELETEon a non-existent table to 404not_foundStacked roadmap
GET /tables+GET /tables/{name}(in review)POST+DELETE(leader-only)