admin: AdminForward RPC + leader-side dispatcher (P1, partial)#635
admin: AdminForward RPC + leader-side dispatcher (P1, partial)#635
Conversation
|
@claude review |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 37 minutes and 15 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces administrative functionality for DynamoDB-compatible tables, including SigV4-bypass endpoints for listing, describing, creating, and deleting tables. It also adds a gRPC service to forward administrative write operations from followers to the Raft leader. Feedback indicates that the handleList implementation fetches all table names before paginating in memory, which could lead to memory pressure; it is recommended to implement pagination at the source level. Additionally, the handleDelete method in the forwarding server lacks payload size validation, posing a potential denial-of-service risk.
| func (h *DynamoHandler) handleList(w http.ResponseWriter, r *http.Request) { | ||
| limit, err := parseDynamoListLimit(r.URL.Query().Get("limit")) | ||
| if err != nil { | ||
| writeJSONError(w, http.StatusBadRequest, "invalid_limit", err.Error()) | ||
| return | ||
| } | ||
| startAfter, err := decodeDynamoNextToken(r.URL.Query().Get("next_token")) | ||
| if err != nil { | ||
| writeJSONError(w, http.StatusBadRequest, "invalid_next_token", err.Error()) | ||
| return | ||
| } | ||
|
|
||
| names, err := h.source.AdminListTables(r.Context()) | ||
| if err != nil { | ||
| h.logger.LogAttrs(r.Context(), slog.LevelError, "admin dynamo list tables failed", | ||
| slog.String("error", err.Error()), | ||
| ) | ||
| writeJSONError(w, http.StatusInternalServerError, "dynamo_list_failed", | ||
| "failed to list tables; see server logs") | ||
| return | ||
| } | ||
|
|
||
| page, next := paginateDynamoTableNames(names, startAfter, limit) |
There was a problem hiding this comment.
The current implementation of handleList fetches the entire list of table names from the source and then performs pagination in memory. In a cluster with a large number of tables, this approach can lead to significant memory pressure and potential OOM issues. Following the repository rule to prevent unbounded memory growth, the TablesSource interface and its implementations should be updated to support pagination at the source level (e.g., by passing limit and startAfter to AdminListTables).
References
- To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections that can grow from external requests. Reject new requests when the bound is reached.
| if err := json.Unmarshal(req.GetPayload(), &body); err != nil || body.Name == "" { | ||
| return rejectForward(http.StatusBadRequest, "invalid_body", "delete payload missing name") | ||
| } |
There was a problem hiding this comment.
The handleDelete method unmarshals the payload without checking its size. To prevent Denial of Service attacks, apply a size limit to the incoming request payload before unmarshaling.
References
- When handling HTTP requests, always limit the size of the request body to prevent Denial of Service attacks from oversized payloads.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9373f3e252
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| var body CreateTableRequest | ||
| if err := json.Unmarshal(req.GetPayload(), &body); err != nil { | ||
| return rejectForward(http.StatusBadRequest, "invalid_body", "request body is not valid JSON") |
There was a problem hiding this comment.
Reuse strict create-table validation for forwarded payloads
ForwardServer.handleCreate decodes payload with plain json.Unmarshal, which bypasses the validation logic used by the HTTP path (decodeCreateTableRequest), including unknown-field rejection and semantic field checks. Once follower forwarding is enabled, create requests routed through a follower can be accepted even when the same body would be rejected on a leader-direct call, so forwarded and direct admin behavior diverges and invalid inputs can reach the adapter layer.
Useful? React with 👍 / 👎.
| if strings.TrimSpace(out.TableName) == "" { | ||
| return CreateTableRequest{}, errors.New("table_name is required") | ||
| } |
There was a problem hiding this comment.
Reject slash-bearing table names at create time
Create validation currently only enforces that table_name is non-empty, so names like foo/bar pass. But both handleDescribe and handleDelete reject names containing / with 404, which means a table created via this endpoint can become unreachable for subsequent admin describe/delete operations on the same API surface.
Useful? React with 👍 / 👎.
8604343 to
cae18bc
Compare
|
@claude review (Gemini security-medium + Codex P1 対応) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cae18bc2e0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| gsi := adapter.AdminCreateGSI{ | ||
| Name: g.Name, | ||
| PartitionKey: adapter.AdminAttribute{Name: g.PartitionKey.Name, Type: g.PartitionKey.Type}, | ||
| ProjectionType: g.Projection.Type, |
There was a problem hiding this comment.
Canonicalize GSI projection type before adapter translation
validateGSI accepts case-insensitive projection values (it uppercases gsi.Projection.Type for validation), but the bridge forwards the raw string unchanged to the adapter. Because buildCreateTableProjection in adapter/dynamodb.go only accepts exact uppercase values, inputs like "include" pass handler validation and then fail later as adapter validation errors. This creates a confusing mismatch where accepted admin payloads are rejected downstream instead of being normalized once at the boundary.
Useful? React with 👍 / 👎.
validateGSI accepted case-insensitive projection values for SPA ergonomics, but only validated them — the original (possibly lowercase) string then flowed through the bridge to the adapter, where buildCreateTableProjection only matches exact uppercase. So "include" passed handler validation only to fail later at the adapter as ValidationException. Codex P2 flagged this as a boundary mismatch on PR #635. Fix: validateGSI now writes the uppercase form back to gsi.Projection.Type when the value is recognised. Every downstream consumer (the adapter bridge in main_admin.go and the AdminForward server's handleCreate) sees the canonical shape. Tests: round-trip "include" through the handler and confirm src.lastCreateInput.GSI[0].Projection.Type is "INCLUDE"; sweep the five mixed-case variants ALL/All/all + KEYS_ONLY/keys_only + Include to confirm 201 in every case.
cae18bc to
1b7682c
Compare
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@claude review (Codex P2 GSI projection_type canonicalisation を rebase で取り込み) |
validateGSI accepted case-insensitive projection values for SPA ergonomics, but only validated them — the original (possibly lowercase) string then flowed through the bridge to the adapter, where buildCreateTableProjection only matches exact uppercase. So "include" passed handler validation only to fail later at the adapter as ValidationException. Codex P2 flagged this as a boundary mismatch on PR #635. Fix: validateGSI now writes the uppercase form back to gsi.Projection.Type when the value is recognised. Every downstream consumer (the adapter bridge in main_admin.go and the AdminForward server's handleCreate) sees the canonical shape. Tests: round-trip "include" through the handler and confirm src.lastCreateInput.GSI[0].Projection.Type is "INCLUDE"; sweep the five mixed-case variants ALL/All/all + KEYS_ONLY/keys_only + Include to confirm 201 in every case.
1b7682c to
588276a
Compare
|
@claude[agent] review |
Review complete. Code is production-ready with all security findings addressed in 588276a. Principal re-validation, payload limits, strict JSON decoding, and audit trail all correctly implemented. 11 tests pass, builds clean. ✅ |
588276a to
85c83d3
Compare
validateGSI accepted case-insensitive projection values for SPA ergonomics, but only validated them — the original (possibly lowercase) string then flowed through the bridge to the adapter, where buildCreateTableProjection only matches exact uppercase. So "include" passed handler validation only to fail later at the adapter as ValidationException. Codex P2 flagged this as a boundary mismatch on PR #635. Fix: validateGSI now writes the uppercase form back to gsi.Projection.Type when the value is recognised. Every downstream consumer (the adapter bridge in main_admin.go and the AdminForward server's handleCreate) sees the canonical shape. Tests: round-trip "include" through the handler and confirm src.lastCreateInput.GSI[0].Projection.Type is "INCLUDE"; sweep the five mixed-case variants ALL/All/all + KEYS_ONLY/keys_only + Include to confirm 201 in every case.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 85c83d39b6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| dec := json.NewDecoder(bytes.NewReader(payload)) | ||
| dec.DisallowUnknownFields() | ||
| var body struct { | ||
| Name string `json:"name"` | ||
| } |
There was a problem hiding this comment.
Reject NUL-smuggled bytes in forwarded delete payloads
handleDelete decodes forwarded JSON directly with go-json but never performs the raw NUL-byte check that decodeCreateTableRequest applies for create operations. Because this decoder can treat \x00 as end-of-input, a payload like {"name":"users"}\x00{"extra":1} can be accepted as valid and bypass the trailing-data check, reintroducing the payload-smuggling class this admin path already fixed elsewhere. Add the same pre-decode NUL rejection (or a shared strict decoder) on the delete forwarding path.
Useful? React with 👍 / 👎.
| if body.Name == "" { | ||
| return rejectForward(http.StatusBadRequest, "invalid_body", "delete payload missing name") | ||
| } |
There was a problem hiding this comment.
Block slash-bearing names in forwarded delete requests
The forwarded delete path only rejects empty names and then passes body.Name straight to AdminDeleteTable, but the HTTP admin delete/describe endpoints reject any table name containing / before dispatch. That mismatch means forwarded calls can act on slash-bearing tables that leader-direct admin calls would always 404, creating divergent behavior between direct and forwarded admin operations. Apply the same slash check here before invoking the source.
Useful? React with 👍 / 👎.
85c83d3 to
7069141
Compare
Two related findings on PR #635 / forward_server.go's handleDelete: - NUL-byte smuggling: handleDelete decoded with goccy/go-json but skipped the explicit NUL scan that decodeCreateTableRequest applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}` passes dec.More() because goccy treats NUL as end-of-input. Add the same pre-decode NUL rejection. - Slash-in-name divergence: the HTTP handleDelete and handleDescribe both reject `/` in the table name with 404, but the forwarded delete just passed body.Name straight through to AdminDeleteTable. A forwarded call could therefore act on slash-bearing tables that a leader-direct call would 404. Reject symmetrically before invoking the source. Tests: two new ForwardServer cases — NUL payload + slash name. Both confirm the source is never invoked when the precondition fails (defence in depth — an asymmetric stub source would still make the test green if we only checked the response code).
|
@claude review (Codex P2 ×2 対応: handleDelete に NUL-byte smuggling 検出 + slash-in-name reject) |
Two related findings on PR #635 / forward_server.go's handleDelete: - NUL-byte smuggling: handleDelete decoded with goccy/go-json but skipped the explicit NUL scan that decodeCreateTableRequest applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}` passes dec.More() because goccy treats NUL as end-of-input. Add the same pre-decode NUL rejection. - Slash-in-name divergence: the HTTP handleDelete and handleDescribe both reject `/` in the table name with 404, but the forwarded delete just passed body.Name straight through to AdminDeleteTable. A forwarded call could therefore act on slash-bearing tables that a leader-direct call would 404. Reject symmetrically before invoking the source. Tests: two new ForwardServer cases — NUL payload + slash name. Both confirm the source is never invoked when the precondition fails (defence in depth — an asymmetric stub source would still make the test green if we only checked the response code).
216fec3 to
e045bfb
Compare
|
@claude review |
|
Claude finished @bootjp's task in 2m 11s —— View job Code Review — PR #635: AdminForward RPC + leader-side dispatcher
Overall verdictReady to merge. This is the cleanest version yet. All prior security findings (Gemini security-medium, Codex P1×2, Codex P2×6) and all gaps from the previous two Claude reviews have been correctly closed. Two cosmetic carry-overs remain; details below. What changed since the last Claude review ✅Codex P1 — JWT role staleness on direct HTTP path ( Codex P2 — Race-window tests ( Security / correctness — no new concernsAll surfaces the design mandates are correctly wired:
Minor carry-overs (non-blocking)1.
type errorBody struct {
Error string `json:"error"`
Message string `json:"message,omitempty"`
}
type errorResponse struct {
Error string `json:"error"`
Message string `json:"message,omitempty"`
}Same package, same JSON tags, identical wire shape. The 2.
// in Forward(), after nil check:
forwardedFrom := strings.Map(func(r rune) rune {
if r == '\n' || r == '\r' { return ' ' }
return r
}, req.GetForwardedFrom())Then pass Design notes
Test coverage (current state)21 test cases in |
Two related findings on PR #635 / forward_server.go's handleDelete: - NUL-byte smuggling: handleDelete decoded with goccy/go-json but skipped the explicit NUL scan that decodeCreateTableRequest applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}` passes dec.More() because goccy treats NUL as end-of-input. Add the same pre-decode NUL rejection. - Slash-in-name divergence: the HTTP handleDelete and handleDescribe both reject `/` in the table name with 404, but the forwarded delete just passed body.Name straight through to AdminDeleteTable. A forwarded call could therefore act on slash-bearing tables that a leader-direct call would 404. Reject symmetrically before invoking the source. Tests: two new ForwardServer cases — NUL payload + slash name. Both confirm the source is never invoked when the precondition fails (defence in depth — an asymmetric stub source would still make the test green if we only checked the response code).
forwardErrorResponse already maps ErrTablesNotLeader to 503 leader_unavailable for the case where the leader passes its isVerifiedDynamoLeader guard but then drops leadership mid- Dispatch — the comment at forward_server.go:248 documents the race. No test exercised that path though, which Claude's review on PR #635 called out as a gap. Add two tests that pin both the create and delete sides: TestForwardServer_CreateTable_LeaderSteppedDownReturns503 and TestForwardServer_DeleteTable_LeaderSteppedDownReturns503. Both inject ErrTablesNotLeader through the stub source and assert 503 + leader_unavailable in the response body. These are pure stub-driven unit tests so they run alongside the existing 17 forward-server cases without spinning up a real Raft cluster.
Two changes on the AdminForward leader-side dispatcher: - Codex P2 on PR #635: forwarded 503 leader_unavailable lost the Retry-After: 1 header that the leader-direct HTTP path emits. Add retry_after_seconds to AdminForwardResponse so the follower's bridge can rebuild the same HTTP header (the bridge side wires it in the next PR; the proto + leader populate it here). Direct HTTP path is unaffected — the proto field is only consumed during forwarding. - Drop the duplicate RoleStore/MapRoleStore declarations from forward_server.go. They now live in role_store.go (added on PR #634 for HTTP-side role revalidation) so both surfaces share one definition. Tests: extend TestForwardServer_CreateTable_LeaderSteppedDownReturns503 to pin the retry_after_seconds=1 hint.
5445b0a to
d15c06c
Compare
Two carry-overs from Claude's review on PR #635: - forward_server.go defined errorBody{Error, Message} which was byte-for-byte identical to errorResponse in router.go (same package, same JSON tags). Drop errorBody and use the existing errorResponse type — no functional change, eliminates drift. - ForwardedFrom from the gRPC request was written into slog's LogAttrs verbatim. With JSON output the encoder escapes \n/\r for us, but a text-format handler would let a malicious follower-supplied node id split one audit line into two — defeating log-aggregation parsing or spoofing a synthetic log entry. Sanitise once at the RPC entry point in Forward() and thread the cleaned string through handleCreate / handleDelete / the warning-log paths. Test: TestForwardServer_SanitisesForwardedFromInLog uses a real slog text-format handler (where JSON's auto-escape doesn't help) and confirms a forwarded_from value containing "\n" comes out as spaces in the log line.
|
@claude review |
|
Claude finished @bootjp's task in 2m 5s —— View job Code Review — PR #635: AdminForward RPC + leader-side dispatcher
Overall verdictReady to merge. Commit Changes in the latest commit (
|
| Finding | Status |
|---|---|
| Payload size cap on create + delete | ✅ 64 KiB pre-decode cap on both paths |
| NUL-byte smuggling on delete | ✅ Pre-decode bytes.IndexByte scan |
| Trailing-data injection | ✅ dec.More() check on both paths |
| Slash-in-name on create + delete | ✅ Enforced via decodeCreateTableRequest + explicit delete check |
| Unknown-field bypass on forwarded create | ✅ Reuses decodeCreateTableRequest (forward_server.go:133) |
| Principal re-evaluation on forwarded path | ✅ validatePrincipal ignores follower-claimed role, live LookupRole |
| Principal re-evaluation on direct HTTP path | ✅ principalForWrite live LookupRole when RoleStore configured (dynamo_handler.go:377) |
| Raw error leakage | ✅ forwardErrorResponse never exposes adapter errors; logUnexpectedSourceError logs server-side only |
forwarded_from log injection |
✅ sanitiseForwardedFrom at entry point |
isLeaderChurnError false positives |
✅ HasSuffix not Contains (main_admin.go:286-289) |
Retry-After across forwarding |
✅ retry_after_seconds field in proto; set to 1 on ErrTablesNotLeader |
Minor observations (non-blocking)
1. content_type proto comment doesn't document the 204 exception
AdminForwardResponse.content_type at proto/admin_forward.proto:75 says "in practice always application/json; charset=utf-8". The 204 delete success path (forward_server.go:203) sets no ContentType — an empty string. This is correct behaviour (no body, no content type), but it contradicts the comment. When the follower bridge is wired in the next PR, the author needs to know not to blindly forward a blank Content-Type header for non-204 responses. A one-line comment addition would prevent a silent parity bug:
// content_type is the response content type the leader chose; in
// practice always "application/json; charset=utf-8". Empty for
// 204 No Content responses (DELETE success), which carry no body.
string content_type = 3;2. TestForwardServer_DeleteTable_LeaderSteppedDownReturns503 does not assert RetryAfterSeconds
The create-side test (forward_server_test.go:342) correctly asserts require.Equal(t, int32(1), resp.GetRetryAfterSeconds()). The delete-side test (forward_server_test.go:358-361) asserts 503 and "leader_unavailable" but not the retry-after hint. Both paths go through the same forwardErrorResponse which sets resp.RetryAfterSeconds = 1, so the delete path does return it — the assertion is just missing. Worth a one-liner addition for symmetry.
Test coverage (current state)
22 test cases in forward_server_test.go. All acceptance criteria in scope for this PR (1, 4, 6) have direct tests. Security regression tests cover NUL-byte, slash, unknown-fields, payload-too-large, trailing-data, and log-injection on both create and delete. Race-window 503 paths covered. Log-emission contract tests cover both positive and negative signal. The new TestForwardServer_SanitisesForwardedFromInLog is well-targeted: it uses a text-format handler rather than a JSON handler, directly testing the case JSON encoding's auto-escape would mask.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@claude review |
|
Claude finished @bootjp's task in 1m 46s —— View job Code Review — PR #635: AdminForward RPC + leader-side dispatcher
Overall verdictReady to merge. This version is unchanged from the previous Claude review pass (the only new commit is a merge of CLAUDE.md documentation from main). All Gemini security-medium, Codex P1×2, and Codex P2×8 findings raised across this PR's review cycle have been correctly addressed. Two minor observations from the previous review remain open — details below. Security posture (all green)
Minor carry-overs (non-blocking — same as previous review)1. The create-side counterpart ( require.Equal(t, int32(1), resp.GetRetryAfterSeconds())The delete-side test ( 2.
// content_type is the response content type the leader chose; in
// practice always "application/json; charset=utf-8" but echoed
// explicitly so the follower does not have to guess.
string content_type = 3;The 204 DELETE success path ( Test coverage (current state)22 tests in |
Lays down the proto contract and the leader-side handler for the follower-leader forwarding path described in design Section 3.3. The follower-side client + bridge wiring lands in a separate PR to keep this one reviewable. Proto: - proto/admin_forward.proto with AdminForward.Forward(...) - AdminPrincipal carries access_key + role; the leader re-evaluates it against its own role index rather than trusting the follower (Section 3.3.1 invariant) - AdminOperation enum keeps the proto operation-agnostic so adding a future endpoint does not require a wire-format change - payload bytes carry the operation-specific JSON body verbatim - forwarded_from echoes the follower's node id into the leader's audit log line (acceptance criterion 6) Leader-side handler (internal/admin/forward_server.go): - ForwardServer implements pb.AdminForwardServer - RoleStore / MapRoleStore keep the access-key role lookup abstract so tests can swap in an in-memory map - validatePrincipal demotes a follower-claimed full role to forbidden when the leader's view is read-only (criterion 4) - forwardErrorResponse mirrors the HTTP handler's writeTablesError status mapping so a forwarded call is indistinguishable from a leader-direct call to the SPA - audit log entries on success carry forwarded_from; the leader is the source of truth (criterion 6) What is NOT in this PR (follow-ups): - Follower-side AdminForward client + bridge wiring (criterion 2) - Election-period 503 + Retry-After handling (criterion 3) - Rolling-upgrade compatibility flag admin.leader_forward_v2 (criterion 5) — needs a Raft-level cluster version bump - gRPC server registration in main.go Tests cover criteria 1, 4, 6 (leader direct, principal demotion, forwarded_from in audit log) and the structured error mapping (create-table 409, delete missing 404, generic error 500 with sanitised body, bad JSON 400, unknown operation 400).
Two findings on the leader-side AdminForward handler: - Gemini security-medium: handleDelete unmarshalled the raw payload with no size cap, so a hostile follower could push a multi-MiB body and consume memory before the JSON parser noticed. Apply a 64 KiB cap (mirrors the HTTP path defaultBodyLimit) on both handleCreate and handleDelete; payloads past the cap return 413 payload_too_large without ever touching json.Unmarshal. - Codex P1: handleCreate decoded with plain json.Unmarshal, bypassing the strict checks the HTTP path runs through decodeCreateTableRequest (DisallowUnknownFields, dec.More() trailing-token rejection, slash-in-name validation, the rest of validateCreateTableRequest). Reuse decodeCreateTableRequest so a forwarded create cannot smuggle past validations a leader-direct create would have caught. handleDelete also gains DisallowUnknownFields + dec.More() so its payload contract matches the create path. Tests cover the new 413 paths, the unknown-field rejection, and the slash-in-name rejection on the forwarded create path.
Two related findings on PR #635 / forward_server.go's handleDelete: - NUL-byte smuggling: handleDelete decoded with goccy/go-json but skipped the explicit NUL scan that decodeCreateTableRequest applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}` passes dec.More() because goccy treats NUL as end-of-input. Add the same pre-decode NUL rejection. - Slash-in-name divergence: the HTTP handleDelete and handleDescribe both reject `/` in the table name with 404, but the forwarded delete just passed body.Name straight through to AdminDeleteTable. A forwarded call could therefore act on slash-bearing tables that a leader-direct call would 404. Reject symmetrically before invoking the source. Tests: two new ForwardServer cases — NUL payload + slash name. Both confirm the source is never invoked when the precondition fails (defence in depth — an asymmetric stub source would still make the test green if we only checked the response code).
Two related findings on the leader-side AdminForward dispatcher.
- Operation-specific 500 codes: forwardErrorResponse mapped any
unmapped failure to {"error":"internal", ...}, but the HTTP
handler's writeTablesError emits dynamo_create_failed /
dynamo_delete_failed. A forwarded write that hit the same
backend/storage error therefore returned a different code than
a leader-direct write — clients branching on those codes saw
divergent behaviour. Fix: forwardErrorResponse now takes an
`op` argument ("create" or "delete") so the fallthrough emits
dynamo_<op>_failed, matching leader-direct parity exactly.
- Log unexpected source errors: handleCreate / handleDelete on
the forward path returned 500 with "see leader logs" in the
body but never logged anything for non-sentinel source
failures, leaving operators with no breadcrumb for forwarded
500s. Add logUnexpectedSourceError that emits an
admin_forward_<op>_failed line at LevelError carrying the
table, forwarded_from, and raw error. Sentinels (forbidden,
not-found, already-exists, validation) skip the log — they're
routine client-side outcomes, and logging at LevelError would
drown the operational signal (same policy as the HTTP path's
writeTablesError).
Tests:
- TestForwardServer_CreateTable_GenericErrorReturns500 now also
asserts the response body contains dynamo_create_failed.
- New TestForwardServer_DeleteTable_GenericErrorUsesOperationSpecificCode
pins the delete-side code parity.
- New TestForwardServer_LogsUnexpectedSourceError captures the
slog output and confirms the table, forwarded_from, and raw
error all appear.
- New TestForwardServer_DoesNotLogStructuredSourceErrors sweeps
the four sentinel cases and confirms the LevelError handler
never emits anything.
forwardErrorResponse already maps ErrTablesNotLeader to 503 leader_unavailable for the case where the leader passes its isVerifiedDynamoLeader guard but then drops leadership mid- Dispatch — the comment at forward_server.go:248 documents the race. No test exercised that path though, which Claude's review on PR #635 called out as a gap. Add two tests that pin both the create and delete sides: TestForwardServer_CreateTable_LeaderSteppedDownReturns503 and TestForwardServer_DeleteTable_LeaderSteppedDownReturns503. Both inject ErrTablesNotLeader through the stub source and assert 503 + leader_unavailable in the response body. These are pure stub-driven unit tests so they run alongside the existing 17 forward-server cases without spinning up a real Raft cluster.
Two changes on the AdminForward leader-side dispatcher: - Codex P2 on PR #635: forwarded 503 leader_unavailable lost the Retry-After: 1 header that the leader-direct HTTP path emits. Add retry_after_seconds to AdminForwardResponse so the follower's bridge can rebuild the same HTTP header (the bridge side wires it in the next PR; the proto + leader populate it here). Direct HTTP path is unaffected — the proto field is only consumed during forwarding. - Drop the duplicate RoleStore/MapRoleStore declarations from forward_server.go. They now live in role_store.go (added on PR #634 for HTTP-side role revalidation) so both surfaces share one definition. Tests: extend TestForwardServer_CreateTable_LeaderSteppedDownReturns503 to pin the retry_after_seconds=1 hint.
Two carry-overs from Claude's review on PR #635: - forward_server.go defined errorBody{Error, Message} which was byte-for-byte identical to errorResponse in router.go (same package, same JSON tags). Drop errorBody and use the existing errorResponse type — no functional change, eliminates drift. - ForwardedFrom from the gRPC request was written into slog's LogAttrs verbatim. With JSON output the encoder escapes \n/\r for us, but a text-format handler would let a malicious follower-supplied node id split one audit line into two — defeating log-aggregation parsing or spoofing a synthetic log entry. Sanitise once at the RPC entry point in Forward() and thread the cleaned string through handleCreate / handleDelete / the warning-log paths. Test: TestForwardServer_SanitisesForwardedFromInLog uses a real slog text-format handler (where JSON's auto-escape doesn't help) and confirms a forwarded_from value containing "\n" comes out as spaces in the log line.
5944f5c to
e6e3596
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e6e35964e1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func NewForwardServer(source TablesSource, roles RoleStore, logger *slog.Logger) *ForwardServer { | ||
| if logger == nil { | ||
| logger = slog.Default() | ||
| } | ||
| return &ForwardServer{source: source, roles: roles, logger: logger} |
There was a problem hiding this comment.
Reject nil dependencies in NewForwardServer
NewForwardServer accepts nil for both roles and source, but Forward unconditionally dereferences these fields (validatePrincipal calls s.roles.LookupRole, and write paths call s.source.Admin*). A wiring mistake (or partially initialized server during rollout) will therefore panic on the first forwarded request instead of returning a controlled error response, which can take down the admin RPC handler. Add constructor-time validation or fail-closed guards before dispatch.
Useful? React with 👍 / 👎.
…artial) (#644) Phase 1 + 2 of Task #26: the follower-side `LeaderForwarder` client and its integration into the dynamo HTTP handler. Builds on the AdminForward leader-side dispatcher landed via #635. ## Summary - New `LeaderForwarder` interface (`internal/admin/forward_client.go`) decouples the dynamo HTTP handler from `pb.AdminForwardClient`. The handler stays proto-free; the bridge in `main_admin.go` (next phase) plugs in the gRPC-backed implementation. - `gRPCForwardClient` translates a `CreateTableRequest` / table-name into an `AdminForwardRequest`, dials via a `GRPCConnFactory` (production wraps `kv.GRPCConnCache`), and re-shapes the response into `ForwardResult` (status, payload, content-type). - `forwarded_from = nodeID` is populated so the leader's audit log carries the trace (criterion 6, leader-side already shipped in #635). - Defensive: `status_code == 0` upgrades to `502 Bad Gateway`; missing `ContentType` fills the JSON default. Both surface transport bugs rather than producing silently-malformed SPA responses. - `ErrLeaderUnavailable` sentinel signals the "no leader known" case so the handler can map to 503 + `Retry-After: 1` (criterion 3). - `DynamoHandler` gains a `forwarder` field and `WithLeaderForwarder` option. When set, `handleCreate` / `handleDelete` catch `ErrTablesNotLeader` from the source and forward to the leader transparently — the SPA cannot tell forwarded from leader-direct. - `writeForwardResult` re-emits the leader's structured response verbatim (status + payload + content-type), so a forwarded `409 Conflict` from the leader stays `409` on the wire — no re-classification. - `writeForwardFailure` maps `ErrLeaderUnavailable` (election in flight) and gRPC transport errors to 503 + `Retry-After: 1`. `ErrLeaderUnavailable` is intentionally NOT logged at error level (elections are routine); transport errors are logged at LevelError so operators can investigate. ## What is NOT in this PR - gRPC server registration in `main.go` (production wiring of the `ForwardServer` from #635) — comes in the next phase. - The bridge that wraps `kv.GRPCConnCache` and supplies `LeaderAddressResolver` — same phase. - Election-period retry-loop on the client side (criterion 3 partial: this PR returns 503 + Retry-After; the SPA / client retries the request; criterion 3 fully needs the production bridge to dial actual leader-discovery). ## Test plan - [x] `go build ./...` - [x] `go vet ./...` - [x] `golangci-lint run` (admin package: 0 issues) - [x] `go test ./internal/admin/ -count=1 -race` - 8 forward-client unit tests: constructor input validation, both Forward operations including principal/payload/op-enum/`forwarded_from` round-trip, `ErrLeaderUnavailable`, dial/RPC errors propagated with `cockroachdb/errors` wrapping, zero status code upgrade, missing content type fallback - 9 handler integration tests: transparent forward for create + delete, no-forwarder fallback to 503, `ErrLeaderUnavailable` → 503 + Retry-After, transport error → 503 + log, leader 409 pass-through, leader 503 + Retry-After preserved, role check short-circuits before forward, body validation short-circuits before forward - [ ] Wire production bridge + register `pb.RegisterAdminForwardServer` in `main.go` and exercise an end-to-end follower → leader call against a real cluster (next PR). ## Acceptance criteria coverage | # | Criterion | This PR | |---|---|---| | 1 | Leader direct write | ✓ (in main since #634) | | 2 | Follower forwards transparently | ✓ wiring done; needs main.go gRPC registration to take effect | | 3 | Election-period 503 + retry | ✓ partial — handler returns 503 + Retry-After; full transparency needs the bridge | | 4 | Leader demotes stale full role | ✓ (in main since #635) | | 5 | Rolling-upgrade compat flag | ⏳ deferred (cluster-version bump) | | 6 | `forwarded_from` in audit log | ✓ (in main since #635) |
…de bridge (#648) Phase 3 of Task #26 / AdminForward design 3.3. Closes the wiring loop opened by #635 (leader-side dispatcher) and #644 (follower-side client + handler integration): the admin HTTP DynamoHandler now hands `ErrTablesNotLeader` writes off to the in-process `LeaderForwarder`, and the leader-side gRPC `AdminForward` service is registered on every node so the forwarded RPC actually lands somewhere. ## Summary - **`admin.ServerDeps.Forwarder`** (3315bf4) — new optional field on the high-level admin server composition surface. `admin.NewServer` calls `WithLeaderForwarder` when set; nil leaves the existing 503 + Retry-After:1 fallback untouched. Single-node / leader-only deployments are unaffected. - **`main_admin_forward.go`** (f3e529c) — production bridge between `kv.GRPCConnCache` and the admin layer's typed `PBAdminForwardClient` (via `pb.NewAdminForwardClient`). Includes `buildLeaderForwarder`, `roleStoreFromFlags`, `adminForwardServerDeps`, and a `readyForRegistration` gate. - **Runtime wiring** (19f6b70) — `runtimeServerRunner.start()` now creates `dynamoServer` BEFORE `startRaftServers`, so the leader-side `ForwardServer` can use it as its `TablesSource`. `startRaftServers` accepts the new `adminForwardServerDeps` bundle and delegates to `registerAdminForwardServer`. `startServers` constructs a single `*kv.GRPCConnCache` and forwards it (with the `Coordinator`) to `startAdminFromFlags`, which in turn builds the production `LeaderForwarder` via `buildAdminLeaderForwarder` and passes it through `ServerDeps.Forwarder`. ## What is NOT in this PR - Rolling-upgrade compatibility flag (criterion 5) — still deferred behind a cluster-version bump. - Election-period retry loop on the follower's bridge — the handler returns 503 + Retry-After:1, the SPA / curl re-issues, and the bridge dials again on the next attempt. Inline retry inside the handler would just hide the latency from operators. ## Test plan - [x] `go build ./...` - [x] `go vet ./...` - [x] `golangci-lint run` (main + admin packages: 0 issues) - [x] `go test ./internal/admin/ -count=1 -race` — full admin suite passes (existing 21 forward tests + 1 new `TestServer_ServerDepsForwarderIsWired`) - [x] `go test . -count=1 -race` — main package passes (4 new bridge / role-store / readyForRegistration tests) - [ ] `adapter` package times out at 240s on this branch AND on `main` — pre-existing flake unrelated to this PR (verified by checking out main and running the same suite) - [ ] End-to-end smoke against a 3-node cluster — needs the local Jepsen runner; blocked on the next PR or a manual run ## Acceptance criteria coverage | # | Criterion | This PR | |---|---|---| | 1 | Leader direct write | ✓ (in main since #634) | | 2 | Follower forwards transparently | ✓ — wiring complete; both leader-side gRPC service AND follower-side bridge are now plumbed | | 3 | Election-period 503 + retry | ✓ — handler still returns 503 + Retry-After:1 when no leader is known; the SPA / client retries and the next attempt re-dials | | 4 | Leader demotes stale full role | ✓ (in main since #635) | | 5 | Rolling-upgrade compat flag | ⏳ deferred (cluster-version bump) | | 6 | `forwarded_from` in audit log | ✓ (in main since #635 — bridge now stamps `--raftId` onto every forwarded request) | ## Self-review (5 lenses) 1. **Data loss**: No FSM / Raft / Pebble path changes; admin writes still go through the same `AdminCreateTable` / `AdminDeleteTable` Raft proposal once they reach the leader. 2. **Concurrency**: `runner.start()` reorder confirmed safe — `startDynamoDBServer` only depends on `coordinate` + `shardStore` (already constructed); raft TCP listeners are independent of the dynamo HTTP listener. Worst case during the 100 ms-or-so reorder window is the same "no leader yet" 503 the old order would have produced if a request landed before raft converged. 3. **Performance**: One additional `&kv.GRPCConnCache{}` allocation per process; one closure for the resolver. No hot-path changes. 4. **Data consistency**: Forwarded requests re-validate the principal at the leader (criterion 4, already shipped); commit-ts ordering is unchanged because the leader's `AdminCreateTable` path does its own `HLC.Next`. 5. **Test coverage**: 4 new main-package tests + 1 new admin-package server-level test. `TestServer_ServerDepsForwarderIsWired` is the regression test for "future refactor drops the forwarder before it reaches the dynamo handler" — exactly the failure mode the wiring change in this PR introduces the risk of. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added leader-forwarding capability for admin requests sent to follower nodes * Implemented role-based access control for admin API operations * Enhanced DynamoDB request routing to automatically direct operations to leader nodes * **Tests** * Added comprehensive integration and unit tests validating leader-forwarding paths <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Per `docs/design/README.md`'s lifecycle convention, the admin dashboard design doc is now "partial": - **P1** (DynamoDB CRUD + AdminForward) — shipped via #634, #635, #644, #648 - **P2** (S3 buckets list/create/delete/ACL + DescribeTable) — shipped via #658, with #669 + #673 in flight - **P3** (React SPA + embed) — shipped via #649, #650 - **P4** (TLS / role / CSRF / operator docs) — TLS, role, CSRF are already live in P1; operator docs in #674 Independent of the in-flight slice 2 PRs (#669/#673) and the docs PR (#674) — this rename only reflects what is already on main today, plus an "Implementation status" table mapping each phase to the PR it landed in. ## What this PR changes - `git mv` the design doc from `2026_04_24_proposed_admin_dashboard.md` to `2026_04_24_partial_admin_dashboard.md` so its history follows - Add an "Implementation status" header table indexing each phase to the PRs that landed it - List the outstanding open items so future readers know what is still owed against the original proposal: - AdminForward acceptance criterion 5 (rolling-upgrade compat flag) — deferred - S3 object browser — explicitly out of scope per Section 2 Non-goals - TLS cert hot-reload — restart-to-rotate is the documented model When the rolling-upgrade flag lands, the doc gets renamed once more to `2026_04_24_implemented_admin_dashboard.md` per the README's lifecycle convention.
Stacked on #634. Lays down the proto contract and the leader-side dispatcher for the follower-to-leader forwarding path mandated by design Section 3.3. The follower-side client + bridge wiring lands in a separate PR to keep this one reviewable.
Summary
proto/admin_forward.protowithservice AdminForwardcarrying anAdminPrincipal+ opaque JSON payload + the operation enum. Operation-agnostic so adding a new admin endpoint does not require a wire-format bump.internal/admin/forward_server.gois the leader-side gRPC handler.validatePrincipalre-evaluates the inbound access-key against the leader's ownRoleStoreand demotes a follower-claimed full role when the leader sees read-only — this is the design's adapter side is the source of truth for authz invariant for forwarded calls (Section 3.3.1).forwardErrorResponsemirrors the HTTP handler'swriteTablesErrormapping (403 / 503 / 404 / 409 / 400 / 500) so a forwarded call is indistinguishable from a leader-direct call to the SPA. Raw adapter error text never leaves the leader; everything is encoded into the structured response payload.forwarded_fromis echoed into the leader's audit log line on every successful forwarded operation (acceptance criterion 6).What is NOT in this PR
503 leader_unavailable+Retry-After: 1from admin: CreateTable / DeleteTable write endpoints (P1, leader-only) #634 — this PR is a no-op on the request path until the client wiring lands.admin.leader_forward_v2(criterion 5) — depends on a Raft-level cluster version bump that is independently substantial.main.go— wired in the next PR alongside the client.Acceptance criteria coverage
forwarded_fromin audit logTest plan
go build ./...go vet ./...golangci-lint run(admin package: 0 issues)go test ./internal/admin/ -run TestForwardServer -count=1(11 leader-side tests pass — criterion 1, 4, 6, plus error-mapping coverage)