Skip to content

admin: CreateTable / DeleteTable write endpoints (P1, leader-only)#634

Merged
bootjp merged 18 commits intomainfrom
feat/admin-dynamo-tables-write
Apr 25, 2026
Merged

admin: CreateTable / DeleteTable write endpoints (P1, leader-only)#634
bootjp merged 18 commits intomainfrom
feat/admin-dynamo-tables-write

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 25, 2026

Stacked on #633 (the read-only chunk). Writes are limited to the leader node for now; follower-side AdminForward RPC (design Section 3.3 acceptance criteria 1-6) ships in a follow-up PR. Mergeable on its own — followers respond 503 leader_unavailable + Retry-After: 1.

Summary

  • POST /admin/api/v1/dynamo/tables and DELETE /admin/api/v1/dynamo/tables/{name} both go through the existing protect chain (BodyLimit → SessionAuth → Audit → CSRF). The handler also enforces RoleFull so a read-only key cannot create or delete even with a valid CSRF token.
  • Adapter side: AdminCreateTable / AdminDeleteTable take an AdminPrincipal and re-validate the role at the adapter layer even when a higher tier already enforced it. Preserves the design's adapter side is the source of truth for authz invariant (Section 3.2). Two sentinel errors (ErrAdminNotLeader, ErrAdminForbidden) signal the structured failure modes.
  • Bridge in main_admin.go translates adapter errors to admin sentinels (ErrTablesNotLeader to 503 + Retry-After: 1, ErrTablesForbidden to 403, ResourceInUse to 409, ResourceNotFound to 404, ValidationException to 400). Raw adapter error text is never surfaced to clients; everything else falls through to a generic 500 with the original message logged at error level.
  • Strict JSON decoding (DisallowUnknownFields); each validation message is plain English so the SPA can render it directly.
  • Two summary structs (adapter.AdminCreateTableInput / admin.CreateTableRequest) stay independent so neither package imports the other; the bridge keeps them in sync and any drift breaks the build there.

Test plan

  • go build ./...
  • go vet ./...
  • golangci-lint run (admin, adapter, root: 0 issues)
  • go test ./internal/admin/ -count=1 (49 tests pass — 14 new write-handler unit tests, 4 new server-level integration tests)
  • go test ./adapter/ -count=1 -run 'TestDynamoDB_Admin' (14 tests pass — 9 new write-path tests including duplicate rejection, role enforcement at adapter, validation errors, delete missing to ResourceNotFound, etc.)
  • Manual smoke against a running node:
    • curl -X POST .../dynamo/tables with full-role cookies + CSRF header to 201 + JSON summary
    • same against a follower to 503 + Retry-After: 1
    • DELETE on a non-existent table to 404 not_found

Stacked roadmap

  1. admin: read-only DynamoDB tables endpoints (P1) #633 read-only GET /tables + GET /tables/{name} (in review)
  2. THIS PRPOST + DELETE (leader-only)
  3. AdminForward RPC + follower-leader forwarding (Section 3.3 acceptance criteria 1-6)
  4. S3 read-only endpoints
  5. S3 write endpoints
  6. SPA (React + Vite, embed.FS)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 47 minutes and 36 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 47 minutes and 36 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32455620-226c-4575-af80-9d0c15d91006

📥 Commits

Reviewing files that changed from the base of the PR and between 8e3bb37 and 5783c2a.

📒 Files selected for processing (11)
  • adapter/dynamodb_admin.go
  • adapter/dynamodb_admin_test.go
  • internal/admin/dynamo_handler.go
  • internal/admin/dynamo_handler_test.go
  • internal/admin/role_store.go
  • internal/admin/router.go
  • internal/admin/server.go
  • internal/admin/server_test.go
  • main.go
  • main_admin.go
  • main_admin_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/admin-dynamo-tables-write

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 25, 2026

@claude review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a SigV4-bypass administrative interface for DynamoDB-compatible tables, enabling listing, description, creation, and deletion through the admin dashboard. The implementation includes a new adapter layer for core logic, an HTTP handler with pagination and validation, and wiring into the admin server. Review feedback points out a redundant table name validation in the adapter and recommends refactoring the JSON response utility to encode the body before flushing headers, ensuring response reliability.

Comment thread adapter/dynamodb_admin.go Outdated
Comment on lines +204 to +206
if strings.TrimSpace(legacy.TableName) == "" {
return nil, newDynamoAPIError(http.StatusBadRequest, dynamoErrValidation, "missing table name")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check for an empty table name is redundant. buildLegacyCreateTableInput (called at line 200) already performs this validation and returns a structured dynamoErrValidation error if the name is missing or whitespace-only. Removing this improves code maintainability by eliminating unreachable logic.

References
  1. Remove dead or unreachable code paths, such as asynchronous handlers for tasks that are always performed synchronously, to improve code clarity and maintainability.

Comment on lines +518 to +526
w.WriteHeader(status)
if err := json.NewEncoder(w).Encode(body); err != nil {
if logger == nil {
logger = slog.Default()
}
logger.LogAttrs(ctx, slog.LevelWarn, "admin response encode failed",
slog.String("error", err.Error()),
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling w.WriteHeader(status) before encoding the JSON body can lead to partial or malformed responses if json.NewEncoder(w).Encode(body) fails (e.g., due to a broken connection or an unexpected encoding error). Since the status code and headers are already flushed, the client might misinterpret a truncated response as a successful one. It is safer to encode the body into a buffer or use json.Marshal first, and only write the headers and status code if encoding succeeds.

func writeAdminJSONStatus(w http.ResponseWriter, ctx context.Context, logger *slog.Logger, status int, body any) {
	b, err := json.Marshal(body)
	if err != nil {
		if logger == nil {
			logger = slog.Default()
		}
		logger.LogAttrs(ctx, slog.LevelWarn, "admin response encode failed",
			slog.String("error", err.Error()),
		)
		writeJSONError(w, http.StatusInternalServerError, "internal_error", "failed to encode response")
		return
	}
	w.Header().Set("Content-Type", "application/json; charset=utf-8")
	w.Header().Set("Cache-Control", "no-store")
	w.WriteHeader(status)
	_, _ = w.Write(b)
}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14ae1e9bb0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

dec := json.NewDecoder(body)
dec.DisallowUnknownFields()
var out CreateTableRequest
if err := dec.Decode(&out); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject trailing JSON after create-table payload

decodeCreateTableRequest decodes exactly one JSON value and returns success immediately, so bodies like {"table_name":...}{"extra":1} are accepted even though they are not a single valid request document. This weakens the stated strict-decoding contract and can let malformed or concatenated payloads create/delete tables unexpectedly instead of returning 400 invalid_body; add an EOF check with a second decode/token read after the first decode succeeds.

Useful? React with 👍 / 👎.

@bootjp bootjp force-pushed the feat/admin-dynamo-tables-write branch from 14ae1e9 to 70e16e1 Compare April 25, 2026 14:11
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 25, 2026

@claude review (Gemini medium 2件 + Codex P2 1件対応)

@bootjp bootjp force-pushed the feat/admin-dynamo-tables-write branch from 70e16e1 to 3b0d05d Compare April 25, 2026 14:20
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 25, 2026

@claude review (Codex P2 slash-in-name 対応)

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 25, 2026

@claude review (PR #635 で指摘された Codex P2 を bottom-up に取り込み: GSI projection_type を validateGSI で canonicalise)

@bootjp bootjp force-pushed the feat/admin-dynamo-tables-write branch from 1c14fc1 to 070fa3f Compare April 25, 2026 14:46
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 25, 2026

@claude review (#633 の変更を rebase 取り込み)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 070fa3f0ca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +375 to +376
if dec.More() {
return CreateTableRequest{}, errors.New("request body has trailing data after the JSON object")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject NUL-delimited trailing request content

decodeCreateTableRequest relies on dec.More() to enforce a single JSON document, but with goccy/go-json a body like {"table_name":...}\x00{"extra":1} is accepted: the decoder treats the NUL byte as end-of-input, More() is false, and the trailing bytes are silently ignored. That means malformed/concatenated payloads can still pass invalid_body checks on POST /admin/api/v1/dynamo/tables, violating the strict-body contract for this new write path.

Useful? React with 👍 / 👎.

bootjp added a commit that referenced this pull request Apr 25, 2026
decodeCreateTableRequest used dec.More() to enforce a single JSON
document per request, but goccy/go-json treats a raw NUL as
end-of-input — a body like `{"table_name":...}\x00{"extra":1}`
parsed cleanly, dec.More() returned false, and the trailing
content was silently dropped. Codex P2 on PR #634 flagged this
as a payload-smuggling vector.

Fix: read the body once, scan for NUL before decoding. JSON has
no need for raw NUL (control characters must be \u-escaped per
RFC 8259), so any NUL is a strong signal of either tooling
misconfiguration or deliberate smuggling. Reject with 400.

The existing dec.More() check stays — it catches the well-formed
trailing-token cases (a second `{...}` or trailing `42`) that
would otherwise pass NUL-validation but still violate the
strict-body contract.

Tests: extend TestDynamoHandler_CreateTable_RejectsBadJSON with
two NUL vectors — the trailing `{"extra":1}` from the Codex
report, and a bare trailing NUL with no extra payload.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 25, 2026

@claude review (Codex P2: NUL-byte payload smuggling 対応 — body 読み取り後に NUL byte 検出で 400 reject)

bootjp added a commit that referenced this pull request Apr 25, 2026
decodeCreateTableRequest used dec.More() to enforce a single JSON
document per request, but goccy/go-json treats a raw NUL as
end-of-input — a body like `{"table_name":...}\x00{"extra":1}`
parsed cleanly, dec.More() returned false, and the trailing
content was silently dropped. Codex P2 on PR #634 flagged this
as a payload-smuggling vector.

Fix: read the body once, scan for NUL before decoding. JSON has
no need for raw NUL (control characters must be \u-escaped per
RFC 8259), so any NUL is a strong signal of either tooling
misconfiguration or deliberate smuggling. Reject with 400.

The existing dec.More() check stays — it catches the well-formed
trailing-token cases (a second `{...}` or trailing `42`) that
would otherwise pass NUL-validation but still violate the
strict-body contract.

Tests: extend TestDynamoHandler_CreateTable_RejectsBadJSON with
two NUL vectors — the trailing `{"extra":1}` from the Codex
report, and a bare trailing NUL with no extra payload.
@bootjp bootjp force-pushed the feat/admin-dynamo-tables-write branch from 84bf258 to dcac6e4 Compare April 25, 2026 15:12
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

bootjp added a commit that referenced this pull request Apr 25, 2026
Two related findings on PR #635 / forward_server.go's
handleDelete:

- NUL-byte smuggling: handleDelete decoded with goccy/go-json
  but skipped the explicit NUL scan that decodeCreateTableRequest
  applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}`
  passes dec.More() because goccy treats NUL as end-of-input.
  Add the same pre-decode NUL rejection.

- Slash-in-name divergence: the HTTP handleDelete and
  handleDescribe both reject `/` in the table name with 404, but
  the forwarded delete just passed body.Name straight through to
  AdminDeleteTable. A forwarded call could therefore act on
  slash-bearing tables that a leader-direct call would 404. Reject
  symmetrically before invoking the source.

Tests: two new ForwardServer cases — NUL payload + slash name.
Both confirm the source is never invoked when the precondition
fails (defence in depth — an asymmetric stub source would still
make the test green if we only checked the response code).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84bf2580e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +282 to +284
if err != nil {
writeJSONError(w, http.StatusBadRequest, "invalid_body", err.Error())
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return 413 for oversized create-table bodies

POST /admin/api/v1/dynamo/tables currently maps every decode failure to 400 invalid_body, so when BodyLimit trips (http.MaxBytesError) the client still gets a 400 instead of the canonical 413 payload_too_large used elsewhere in admin handlers. This breaks the middleware contract documented in internal/admin/middleware.go and makes oversized requests indistinguishable from malformed JSON for callers and retry logic.

Useful? React with 👍 / 👎.

bootjp added a commit that referenced this pull request Apr 25, 2026
decodeCreateTableRequest used to surface every read/parse failure
as the same generic "invalid_body" string, so handleCreate mapped
all of them to 400 — including the BodyLimit/MaxBytesReader
overflow that should produce 413 payload_too_large.

The middleware contract in internal/admin/middleware.go promises
413 on oversized bodies (WriteMaxBytesError lives in that file
exactly for this purpose). Codex P2 on PR #634 flagged the
write path as the only handler that broke that contract: callers
and retry logic could not distinguish "body too big" from "body
malformed", and oversize requests would be retried as if a
caller-side fix was possible.

Fix: introduce errCreateBodyTooLarge as a sentinel returned only
when io.ReadAll trips MaxBytesReader. handleCreate matches the
sentinel via errors.Is and routes to WriteMaxBytesError, which
emits the canonical 413 + payload_too_large body. All other
decode paths still produce 400 invalid_body unchanged.

Test: TestDynamoHandler_CreateTable_OversizedBodyReturns413
wraps the request body in MaxBytesReader (mirroring what the
real BodyLimit middleware does) and confirms the response is
413 with a payload_too_large code. Also asserts the stub source
is not touched on rejection.
bootjp added a commit that referenced this pull request Apr 25, 2026
Two related findings on PR #635 / forward_server.go's
handleDelete:

- NUL-byte smuggling: handleDelete decoded with goccy/go-json
  but skipped the explicit NUL scan that decodeCreateTableRequest
  applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}`
  passes dec.More() because goccy treats NUL as end-of-input.
  Add the same pre-decode NUL rejection.

- Slash-in-name divergence: the HTTP handleDelete and
  handleDescribe both reject `/` in the table name with 404, but
  the forwarded delete just passed body.Name straight through to
  AdminDeleteTable. A forwarded call could therefore act on
  slash-bearing tables that a leader-direct call would 404. Reject
  symmetrically before invoking the source.

Tests: two new ForwardServer cases — NUL payload + slash name.
Both confirm the source is never invoked when the precondition
fails (defence in depth — an asymmetric stub source would still
make the test green if we only checked the response code).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 25, 2026

@claude review (Codex P2 対応: oversized body は 400 ではなく 413 payload_too_large を返すよう修正)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41a00638ee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread main_admin.go
}
return &admin.ValidationError{Message: msg}
default:
return err //nolint:wrapcheck // forwarded so the handler logs but does not surface it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Map leader-churn dispatch errors to not-leader response

The bridge currently only maps adapter.ErrAdminNotLeader, but AdminCreateTable/AdminDeleteTable can still return raw coordinator leadership-loss errors after the initial leader check (e.g., leadership changes between isVerifiedDynamoLeader and Dispatch in adapter/dynamodb_admin.go). Those errors fall through this default branch and become a generic 500, so transient leader churn is reported as server failure instead of 503 leader_unavailable with retry semantics.

Useful? React with 👍 / 👎.

bootjp added a commit that referenced this pull request Apr 25, 2026
decodeCreateTableRequest used dec.More() to enforce a single JSON
document per request, but goccy/go-json treats a raw NUL as
end-of-input — a body like `{"table_name":...}\x00{"extra":1}`
parsed cleanly, dec.More() returned false, and the trailing
content was silently dropped. Codex P2 on PR #634 flagged this
as a payload-smuggling vector.

Fix: read the body once, scan for NUL before decoding. JSON has
no need for raw NUL (control characters must be \u-escaped per
RFC 8259), so any NUL is a strong signal of either tooling
misconfiguration or deliberate smuggling. Reject with 400.

The existing dec.More() check stays — it catches the well-formed
trailing-token cases (a second `{...}` or trailing `42`) that
would otherwise pass NUL-validation but still violate the
strict-body contract.

Tests: extend TestDynamoHandler_CreateTable_RejectsBadJSON with
two NUL vectors — the trailing `{"extra":1}` from the Codex
report, and a bare trailing NUL with no extra payload.
bootjp added a commit that referenced this pull request Apr 25, 2026
decodeCreateTableRequest used to surface every read/parse failure
as the same generic "invalid_body" string, so handleCreate mapped
all of them to 400 — including the BodyLimit/MaxBytesReader
overflow that should produce 413 payload_too_large.

The middleware contract in internal/admin/middleware.go promises
413 on oversized bodies (WriteMaxBytesError lives in that file
exactly for this purpose). Codex P2 on PR #634 flagged the
write path as the only handler that broke that contract: callers
and retry logic could not distinguish "body too big" from "body
malformed", and oversize requests would be retried as if a
caller-side fix was possible.

Fix: introduce errCreateBodyTooLarge as a sentinel returned only
when io.ReadAll trips MaxBytesReader. handleCreate matches the
sentinel via errors.Is and routes to WriteMaxBytesError, which
emits the canonical 413 + payload_too_large body. All other
decode paths still produce 400 invalid_body unchanged.

Test: TestDynamoHandler_CreateTable_OversizedBodyReturns413
wraps the request body in MaxBytesReader (mirroring what the
real BodyLimit middleware does) and confirms the response is
413 with a payload_too_large code. Also asserts the stub source
is not touched on rejection.
bootjp added a commit that referenced this pull request Apr 25, 2026
translateAdminTablesError previously only matched
adapter.ErrAdminNotLeader, which AdminCreateTable raises from its
up-front isVerifiedDynamoLeader guard. After that guard, the kv
coordinator can still drop leadership mid-Dispatch — the resulting
error (kv.ErrLeaderNotFound, adapter.ErrNotLeader,
adapter.ErrLeaderNotFound, or any of the wire-level "not leader"
phrases) fell through to the default branch and surfaced as a
generic 500 with no Retry-After. Codex P2 on PR #634 flagged this
as a regression in the write path's retry contract.

Fix: introduce isLeaderChurnError that mirrors kv's closed list of
leader-churn sentinels (the typed sentinels plus the wire-level
phrases for errors that have crossed a gRPC boundary and lost
their type). Hook it into translateAdminTablesError so any
matching error maps to admin.ErrTablesNotLeader, which the handler
already turns into 503 leader_unavailable + Retry-After: 1.

Tests pin all four typed sentinels, all four wire phrases, and
prove an unrelated "team leader" error still falls through to the
default branch — the focused substring match avoids swallowing
random sentences that happen to mention leadership.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 25, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 25, 2026

Claude finished @bootjp's task in 2m 46s —— View job


Code Review

  • Read repository context
  • Read changed files
  • Analyze adapter layer
  • Analyze HTTP handler
  • Analyze bridge/main wiring
  • Review tests
  • Post final review

Overall Assessment

Solid, well-structured PR. All findings from the previous review rounds (Gemini mediums + Codex P2 items) have been addressed. The code is clean, the test coverage is thorough, and the layered security model is correctly implemented. Three minor comment-staleness issues below — none block merge.


Previously Raised Issues — Status

Issue Reviewer Status
JSON encode-before-write race Gemini ✅ Fixed — writeAdminJSONStatus marshals to buffer first (dynamo_handler.go:622-654)
Redundant empty-name check in AdminCreateTable Gemini ✅ Removed — comment at dynamodb_admin.go:212-213 confirms dead code was cleaned up
413 for oversized body Codex P2 ✅ Fixed — errCreateBodyTooLarge + IsMaxBytesError check (dynamo_handler.go:296-298)
NUL-byte payload smuggling Codex P2 ✅ Fixed — bytes.IndexByte(raw, 0) guard before decoding (dynamo_handler.go:395-397)
Trailing JSON after valid object Codex P2 ✅ Fixed — dec.More() check after decode (dynamo_handler.go:409-411)
Leader-churn dispatch errors map to 503 Codex P2 ✅ Fixed — isLeaderChurnError in translateAdminTablesError (main_admin.go:247-254)
strings.ContainsHasSuffix in churn matcher Codex P2 ✅ Fixed — main_admin.go:286-289
Structured errors ordered before churn check Codex P2 ✅ Fixed — IsAdminTableAlreadyExists/IsAdminValidation before isLeaderChurnError (main_admin.go:237-246)
GSI projection_type canonicalisation Codex P2 ✅ Fixed — validateGSI uppercases in-place (dynamo_handler.go:500-507)
Outdated DynamoHandler struct comment Previous Claude ✅ Fixed — now describes GET+POST on /tables, GET+DELETE on /tables/{name} (dynamo_handler.go:157-163)
Whitespace table name propagating to adapter Previous Claude ✅ Fixed — in.TableName = strings.TrimSpace(in.TableName) at validation boundary (dynamo_handler.go:431)

New Findings

1. Stale route table in buildAPIMux godoc (internal/admin/server.go:136-137)

The layout comment still lists only GET routes for dynamo:

// Layout:
//   POST   /admin/api/v1/auth/login
//   POST   /admin/api/v1/auth/logout
//   GET    /admin/api/v1/cluster
//   GET    /admin/api/v1/dynamo/tables              (auth required, read-only)
//   GET    /admin/api/v1/dynamo/tables/{name}       (auth required, read-only)

Now that this PR ships POST /tables and DELETE /tables/{name}, the layout comment is missing two lines and the (read-only) annotation on both existing lines is wrong. Fix this →

2. Stale Tables field comment (internal/admin/server.go:35)

// Tables is the read-only DynamoDB admin source. Optional: a nil
// value disables /admin/api/v1/dynamo/tables{,/{name}} ...

TablesSource now declares AdminCreateTable and AdminDeleteTable, so calling it "read-only" is misleading. Fix this →

3. Slightly misleading "Read-only endpoints" comment (internal/admin/server.go:197-201)

// Read-only endpoints share the protect chain so a missing
// session or CSRF token still 401s/403s the same way as a write.
// The Audit middleware is a no-op for GET (it only logs state-
// changing methods) so we get the consistent guards without the
// noise of an audit line per dashboard poll.
var dynamoChain http.Handler

This dynamoChain now serves both reads and writes. The "Audit is a no-op for GET" rationale is still correct (writes produce audit entries, reads do not), but the opening "Read-only endpoints" label is misleading. A one-word fix ("Dynamo endpoints…" or "Read and write endpoints…") suffices. Minor — no correctness impact. Fix this →


What Looks Good

  • Adapter isolation: internal/admin has zero adapter imports; the dynamoTablesBridge in main_admin.go owns all cross-package translation. Struct divergence breaks the build at the bridge.
  • Defence-in-depth authz: Role is checked at the handler level (lines 289, 325) and at the adapter level (AdminCreateTable/AdminDeleteTable), correctly anticipating the future AdminForward RPC path where the adapter cannot trust that the caller already enforced role.
  • Error vocabulary isolation: translateAdminTablesError maps adapter internals to admin sentinels without leaking raw error text. Typed-sentinel checks before substring churn-matching prevents the false-503 class of bugs. The new TestTranslateAdminTablesError_LeaderPhraseInMiddleOfMessage test pins the HasSuffix behaviour precisely.
  • JSON safety: Marshal to buffer first, then write headers + body — the Gemini medium issue is correctly resolved and is covered by both unit tests and integration tests.
  • Request safety: NUL-byte detection, trailing-JSON rejection, and 413 for oversized bodies are all tested with dedicated cases in TestDynamoHandler_CreateTable_RejectsBadJSON and TestDynamoHandler_CreateTable_OversizedBodyReturns413.
  • Empty body contract: 204 No Content on DELETE verified with require.Empty(t, rec.Body.Bytes()) — SPA can skip body parsing on success.
  • Whitespace-name trimming correctly added at the handler validation boundary, with two new tests (TrimsWhitespaceFromTableName, WhitespaceOnlyNameRejected) that confirm trimming doesn't weaken the empty-name guard.
  • Server-level integration: Four new integration tests (TestServer_DynamoCreateTable_FullRoleHappyPath, _ReadOnlyRoleRejected, _MissingCSRFRejected, TestServer_DynamoDeleteTable_FullRoleHappyPath) close the end-to-end middleware stack under a real HTTP server.

Summary: Three stale comments, all trivial one-line fixes. No correctness, security, or behaviour issues found. LGTM with those addressed at your discretion.

handleCreate / handleDelete previously trusted whatever role was
embedded in the session JWT. Tokens are valid for one hour, and
the role is captured at login time, so an access key downgraded
or revoked mid-session would keep mutating tables for up to 60
minutes despite the operator's intent. Codex P1 on PR #635
flagged the gap; the leader-side ForwardServer already does
this re-evaluation, the HTTP path now does it too so leader-
direct and forwarded write paths share the same authorisation
contract.

Pieces:
- Extract RoleStore + MapRoleStore into role_store.go (shared
  between the HTTP handler and the existing forward server).
- DynamoHandler gains a roles RoleStore field plus
  WithRoleStore option. Nil keeps the previous trust-the-JWT
  behaviour for single-tenant deployments where the role map
  never changes.
- principalForWrite centralises the check: pull principal from
  context (fail closed if missing), enforce RoleFull on the JWT
  role, then — when a RoleStore is configured — re-look-up the
  access key against the live map and reject if absent or
  downgraded. The downstream call uses the live role.
- ServerDeps.Roles is now plumbed into the dynamo handler via
  MapRoleStore so production builds pick up the live check
  automatically.

Tests: three new cases under the live-role-revalidation banner —
downgrade (full → read_only), removal (access key absent), and
the same revocation flow on the delete path. All three confirm
the source layer is never touched on rejection.
bootjp added a commit that referenced this pull request Apr 25, 2026
Two related findings on PR #635 / forward_server.go's
handleDelete:

- NUL-byte smuggling: handleDelete decoded with goccy/go-json
  but skipped the explicit NUL scan that decodeCreateTableRequest
  applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}`
  passes dec.More() because goccy treats NUL as end-of-input.
  Add the same pre-decode NUL rejection.

- Slash-in-name divergence: the HTTP handleDelete and
  handleDescribe both reject `/` in the table name with 404, but
  the forwarded delete just passed body.Name straight through to
  AdminDeleteTable. A forwarded call could therefore act on
  slash-bearing tables that a leader-direct call would 404. Reject
  symmetrically before invoking the source.

Tests: two new ForwardServer cases — NUL payload + slash name.
Both confirm the source is never invoked when the precondition
fails (defence in depth — an asymmetric stub source would still
make the test green if we only checked the response code).
bootjp added a commit that referenced this pull request Apr 25, 2026
Two changes on the AdminForward leader-side dispatcher:

- Codex P2 on PR #635: forwarded 503 leader_unavailable lost
  the Retry-After: 1 header that the leader-direct HTTP path
  emits. Add retry_after_seconds to AdminForwardResponse so the
  follower's bridge can rebuild the same HTTP header (the bridge
  side wires it in the next PR; the proto + leader populate it
  here). Direct HTTP path is unaffected — the proto field is
  only consumed during forwarding.
- Drop the duplicate RoleStore/MapRoleStore declarations from
  forward_server.go. They now live in role_store.go (added on
  PR #634 for HTTP-side role revalidation) so both surfaces
  share one definition.

Tests: extend TestForwardServer_CreateTable_LeaderSteppedDownReturns503
to pin the retry_after_seconds=1 hint.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 25, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 25, 2026

Claude finished @bootjp's task in 4m 19s —— View job


Code Review

  • Read repository context
  • Read changed files
  • Analyze adapter layer
  • Analyze HTTP handler
  • Analyze bridge/main wiring
  • Review tests
  • Post final review

Overall Assessment

Very clean PR. The most recent commit (c2bfcd0) wires live role revalidation (Codex P1 from PR #635) correctly. All substantive findings from prior review rounds are now resolved. Three comment-staleness items from the previous Claude round (@claude review at 17:10 UTC) remain in internal/admin/server.go — otherwise LGTM.


Previously Raised Issues — Status

Issue Reviewer Status
JSON encode-before-write race Gemini ✅ Fixed — writeAdminJSONStatus marshals to buffer first (dynamo_handler.go:678-710)
Redundant empty-name check in AdminCreateTable Gemini ✅ Removed — comment at dynamodb_admin.go:212-213 confirms dead code cleaned up
413 for oversized body Codex P2 ✅ Fixed — errCreateBodyTooLarge + IsMaxBytesError check (dynamo_handler.go:306-310)
NUL-byte payload smuggling Codex P2 ✅ Fixed — bytes.IndexByte(raw, 0) guard before decoding (dynamo_handler.go:451-453)
Trailing JSON after valid object Codex P2 ✅ Fixed — dec.More() check after decode (dynamo_handler.go:465-467)
Leader-churn dispatch errors map to 503 Codex P2 ✅ Fixed — isLeaderChurnError in translateAdminTablesError (main_admin.go:247-254)
strings.ContainsHasSuffix in churn matcher Codex P2 ✅ Fixed — main_admin.go:286-289
Structured errors ordered before churn check Codex P2 ✅ Fixed — IsAdminTableAlreadyExists/IsAdminValidation before isLeaderChurnError (main_admin.go:237-246)
GSI projection_type canonicalisation Codex P2 ✅ Fixed — validateGSI uppercases in-place (dynamo_handler.go:556-563)
Outdated DynamoHandler struct comment Claude (prev) ✅ Fixed — now describes GET+POST on /tables, GET+DELETE on /tables/{name} (dynamo_handler.go:157-163)
Whitespace table name propagating to adapter Claude (prev) ✅ Fixed — in.TableName = strings.TrimSpace(in.TableName) (dynamo_handler.go:487)
Live role revalidation on writes Codex P1 (#635) ✅ Fixed — principalForWrite re-checks against RoleStore (dynamo_handler.go:356-390); covered by TestDynamoHandler_CreateTable_LiveRoleRevocation and _LiveRoleAccessKeyRemoved
Stale route table in buildAPIMux godoc Claude (prev) ❌ Still present — see below
Stale Tables field comment Claude (prev) ❌ Still present — see below
Stale "Read-only endpoints" comment Claude (prev) ❌ Still present — see below

Remaining Findings (all carry-overs from the 17:10 review)

1. Stale route table in buildAPIMux godoc (internal/admin/server.go:142-148)

// Layout:
//
//	POST   /admin/api/v1/auth/login                 (no auth, rate-limited)
//	POST   /admin/api/v1/auth/logout                (auth required)
//	GET    /admin/api/v1/cluster                    (auth required)
//	GET    /admin/api/v1/dynamo/tables              (auth required, read-only)
//	GET    /admin/api/v1/dynamo/tables/{name}       (auth required, read-only)

Missing POST /admin/api/v1/dynamo/tables (auth required, full role) and DELETE /admin/api/v1/dynamo/tables/{name} (auth required, full role); the (auth required, read-only) annotation on both GET lines is wrong. Fix this →

2. Stale Tables field comment (internal/admin/server.go:35-39)

// Tables is the read-only DynamoDB admin source. Optional: a nil
// value disables /admin/api/v1/dynamo/tables{,/{name}} ...

TablesSource now includes AdminCreateTable and AdminDeleteTable, so "read-only" is wrong. Fix this →

3. Stale "Read-only endpoints" comment (internal/admin/server.go:207-208)

// Read-only endpoints share the protect chain so a missing
// session or CSRF token still 401s/403s the same way as a write.

dynamoChain now serves both reads and writes; "Read-only endpoints" is misleading. The rationale (Audit is a no-op for GET) is still valid. One-word fix suffices: // Dynamo endpoints (reads and writes) share the protect chain…. Fix this →


What Looks Good

  • Adapter isolation intact: internal/admin has zero adapter imports; all cross-package translation lives in dynamoTablesBridge in main_admin.go. Struct drift breaks the build at the bridge — exactly as intended.
  • Defence-in-depth authz: Role enforced at the handler (principalForWrite:367) and at the adapter (AdminCreateTable:202, AdminDeleteTable:249). The adapter re-check is the one the future AdminForward RPC path will rely on, so the invariant is already correct for both the current direct path and the forwarded path.
  • Live role revalidation (new in c2bfcd0): principalForWrite re-evaluates the access key against MapRoleStore(deps.Roles) before touching the source; three unit tests and one integration test pin the revocation, downgrade, and key-removal cases. The principal that reaches the source carries the live role, not the JWT-frozen one.
  • Error vocabulary isolation: translateAdminTablesError maps adapter internals to admin sentinels without leaking raw text. Typed-sentinel checks before isLeaderChurnError prevent the false-503 class. TestTranslateAdminTablesError_LeaderPhraseInMiddleOfMessage directly pins the HasSuffix behavior.
  • JSON safety: Marshal-to-buffer pattern in writeAdminJSONStatus correctly prevents partial JSON on encode failure. Covered by both unit and integration tests.
  • Request safety: NUL-byte detection, trailing-JSON rejection, and 413 for oversized bodies all tested with dedicated cases. Correct ordering: NUL check → JSON decode → dec.More() → field validation.
  • 204 No Content contract: TestDynamoHandler_DeleteTable_HappyPath asserts require.Empty(t, rec.Body.Bytes()) — SPA can skip body parsing on success.
  • Middleware ordering: CSRF checked before handler body is read; Audit runs before CSRF so CSRF rejections still appear in the audit log. TestServer_DynamoCreateTable_MissingCSRFRejected verifies the gate and checks that the source was not touched.
  • isLeaderChurnError comment is correctly updated to explain why HasSuffix is required and where the canonical phrase list lives.

Summary: Three stale comment-only fixes remain from the previous round (server.go:35, server.go:142-148, server.go:207). No correctness, security, or behaviour issues found. LGTM once those are addressed.

bootjp added 2 commits April 26, 2026 02:27
Three doc-only fixes from Claude's review on PR #634 — the
behavioural changes from this PR's commit series moved past
their original comments:

- ServerDeps.Tables description still said "read-only DynamoDB
  admin source" but TablesSource now exposes AdminCreateTable
  and AdminDeleteTable too.
- buildAPIMux godoc layout omitted POST /dynamo/tables and
  DELETE /dynamo/tables/{name}, and the "(auth required,
  read-only)" annotation on the GET lines was wrong.
- The dynamoChain wiring comment said "Read-only endpoints
  share the protect chain..." but the chain now serves both
  reads and writes; the rationale (Audit no-ops for GET) is
  unchanged.

No behavioural change; doc-only.
bootjp added a commit that referenced this pull request Apr 25, 2026
Two related findings on PR #635 / forward_server.go's
handleDelete:

- NUL-byte smuggling: handleDelete decoded with goccy/go-json
  but skipped the explicit NUL scan that decodeCreateTableRequest
  applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}`
  passes dec.More() because goccy treats NUL as end-of-input.
  Add the same pre-decode NUL rejection.

- Slash-in-name divergence: the HTTP handleDelete and
  handleDescribe both reject `/` in the table name with 404, but
  the forwarded delete just passed body.Name straight through to
  AdminDeleteTable. A forwarded call could therefore act on
  slash-bearing tables that a leader-direct call would 404. Reject
  symmetrically before invoking the source.

Tests: two new ForwardServer cases — NUL payload + slash name.
Both confirm the source is never invoked when the precondition
fails (defence in depth — an asymmetric stub source would still
make the test green if we only checked the response code).
bootjp added a commit that referenced this pull request Apr 25, 2026
Two changes on the AdminForward leader-side dispatcher:

- Codex P2 on PR #635: forwarded 503 leader_unavailable lost
  the Retry-After: 1 header that the leader-direct HTTP path
  emits. Add retry_after_seconds to AdminForwardResponse so the
  follower's bridge can rebuild the same HTTP header (the bridge
  side wires it in the next PR; the proto + leader populate it
  here). Direct HTTP path is unaffected — the proto field is
  only consumed during forwarding.
- Drop the duplicate RoleStore/MapRoleStore declarations from
  forward_server.go. They now live in role_store.go (added on
  PR #634 for HTTP-side role revalidation) so both surfaces
  share one definition.

Tests: extend TestForwardServer_CreateTable_LeaderSteppedDownReturns503
to pin the retry_after_seconds=1 hint.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 25, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 25, 2026

Claude finished @bootjp's task in 2m 27s —— View job


Code Review

  • Read repository context
  • Read changed files
  • Check previously identified issues (stale comments in server.go)
  • Analyze adapter layer
  • Analyze HTTP handler
  • Analyze bridge/main wiring
  • Review tests
  • Post final review

Overall Assessment

All findings from every prior review round are resolved. No new correctness, security, or behavioural issues found. LGTM.


Previously Raised Issues — Status

Issue Reviewer Status
JSON encode-before-write race Gemini ✅ Fixed — marshal-to-buffer in writeAdminJSONStatus (dynamo_handler.go:679-710)
Redundant empty-name check in AdminCreateTable Gemini ✅ Removed — comment at dynamodb_admin.go:212-213 confirms dead code cleaned up
413 for oversized body Codex P2 ✅ Fixed — errCreateBodyTooLarge + IsMaxBytesError check (dynamo_handler.go:307-310)
NUL-byte payload smuggling Codex P2 ✅ Fixed — bytes.IndexByte(raw, 0) guard (dynamo_handler.go:451-453)
Trailing JSON after valid object Codex P2 ✅ Fixed — dec.More() check (dynamo_handler.go:465-467)
Leader-churn dispatch errors map to 503 Codex P2 ✅ Fixed — isLeaderChurnError in translateAdminTablesError (main_admin.go:247-254)
strings.ContainsHasSuffix in churn matcher Codex P2 ✅ Fixed — main_admin.go:286-289
Structured errors ordered before churn check Codex P2 ✅ Fixed — IsAdminTableAlreadyExists/IsAdminValidation before isLeaderChurnError (main_admin.go:237-246)
GSI projection_type canonicalisation Codex P2 ✅ Fixed — validateGSI uppercases in-place (dynamo_handler.go:556-563)
Outdated DynamoHandler struct comment Claude (prev) ✅ Fixed — now describes GET+POST on /tables, GET+DELETE on /tables/{name} (dynamo_handler.go:157-163)
Whitespace table name propagating to adapter Claude (prev) ✅ Fixed — strings.TrimSpace at validation boundary (dynamo_handler.go:487)
Live role revalidation on writes Codex P1 (#635) ✅ Fixed — principalForWrite re-checks against RoleStore (dynamo_handler.go:356-390)
Stale route table in buildAPIMux godoc Claude (prev) ✅ Fixed — now lists POST /tables and DELETE /tables/{name} (server.go:144-150)
Stale Tables field comment Claude (prev) ✅ Fixed — "read-only" removed; now covers list, describe, create, delete (server.go:35-40)
Stale "Read-only endpoints" comment Claude (prev) ✅ Fixed — "Dynamo endpoints (reads and writes) share the protect chain" (server.go:210-214)

New Findings

None. After reading all 11 changed files, the code is clean and no new issues were identified.

A few implementation choices that are worth acknowledging as correct:

  • Auth-before-parse ordering in handleCreate (dynamo_handler.go:301-312): principalForWrite runs before decodeCreateTableRequest. This means a read-only caller gets 403 rather than 413 even on an oversized body, but this is correct — the BodyLimit middleware has already capped the body at the connection level before the handler is entered, so the auth check doesn't bypass the size cap.
  • strings.EqualFold in buildLegacyGSI (dynamodb_admin.go:379): The adapter uses case-insensitive "INCLUDE" matching when populating NonKeyAttributes. Since validateGSI in the handler already canonicalises to uppercase before the request reaches the adapter, this is always uppercase in practice — the EqualFold is a defence-in-depth guard for direct callers of the adapter method.
  • NonKeyAttributes copy (main_admin.go:206): append([]string(nil), g.Projection.NonKeyAttributes...) correctly avoids slice aliasing between the admin DTO and the adapter input. Clean.

What Looks Good

  • Bridge isolation: internal/admin has zero adapter imports; all cross-package translation lives in dynamoTablesBridge in main_admin.go. Struct divergence breaks the build at the bridge.
  • Defence-in-depth authz: Role checked at the handler (principalForWrite:356) and at the adapter (AdminCreateTable:202, AdminDeleteTable:249). Both re-checks are correct for the future AdminForward RPC path.
  • Live role revalidation: principalForWrite re-evaluates against MapRoleStore(deps.Roles). Three unit tests + one integration test pin the revocation, downgrade, and key-removal cases.
  • Error vocabulary isolation: translateAdminTablesError maps adapter internals without leaking raw text. TestTranslateAdminTablesError_LeaderPhraseInMiddleOfMessage directly pins the HasSuffix false-positive protection.
  • Request safety: NUL-byte detection, trailing-JSON rejection, and 413 for oversized bodies all tested with dedicated cases and in the correct order: NUL → decode → dec.More() → field validation.
  • Test coverage: 49 handler unit tests + 14 adapter integration tests, covering all major failure modes (413, NUL smuggling, trailing JSON, 503 + Retry-After, 409, 404, 403, role enforcement at adapter). Server-level integration tests close the end-to-end middleware stack.

Summary: All 15 tracked findings from prior rounds are resolved. No new issues. LGTM — ready to merge.

@bootjp bootjp merged commit 7d9a607 into main Apr 25, 2026
8 checks passed
@bootjp bootjp deleted the feat/admin-dynamo-tables-write branch April 25, 2026 19:06
bootjp added a commit that referenced this pull request Apr 25, 2026
Two related findings on PR #635 / forward_server.go's
handleDelete:

- NUL-byte smuggling: handleDelete decoded with goccy/go-json
  but skipped the explicit NUL scan that decodeCreateTableRequest
  applies. Same vector as the #634 fix — `{"name":"users"}\x00{"extra":1}`
  passes dec.More() because goccy treats NUL as end-of-input.
  Add the same pre-decode NUL rejection.

- Slash-in-name divergence: the HTTP handleDelete and
  handleDescribe both reject `/` in the table name with 404, but
  the forwarded delete just passed body.Name straight through to
  AdminDeleteTable. A forwarded call could therefore act on
  slash-bearing tables that a leader-direct call would 404. Reject
  symmetrically before invoking the source.

Tests: two new ForwardServer cases — NUL payload + slash name.
Both confirm the source is never invoked when the precondition
fails (defence in depth — an asymmetric stub source would still
make the test green if we only checked the response code).
bootjp added a commit that referenced this pull request Apr 25, 2026
Two changes on the AdminForward leader-side dispatcher:

- Codex P2 on PR #635: forwarded 503 leader_unavailable lost
  the Retry-After: 1 header that the leader-direct HTTP path
  emits. Add retry_after_seconds to AdminForwardResponse so the
  follower's bridge can rebuild the same HTTP header (the bridge
  side wires it in the next PR; the proto + leader populate it
  here). Direct HTTP path is unaffected — the proto field is
  only consumed during forwarding.
- Drop the duplicate RoleStore/MapRoleStore declarations from
  forward_server.go. They now live in role_store.go (added on
  PR #634 for HTTP-side role revalidation) so both surfaces
  share one definition.

Tests: extend TestForwardServer_CreateTable_LeaderSteppedDownReturns503
to pin the retry_after_seconds=1 hint.
bootjp added a commit that referenced this pull request Apr 25, 2026
Stacked on #634. Lays down the proto contract and the leader-side
dispatcher for the follower-to-leader forwarding path mandated by design
Section 3.3. The follower-side client + bridge wiring lands in a
separate PR to keep this one reviewable.

## Summary

- New `proto/admin_forward.proto` with `service AdminForward` carrying
an `AdminPrincipal` + opaque JSON payload + the operation enum.
Operation-agnostic so adding a new admin endpoint does not require a
wire-format bump.
- `internal/admin/forward_server.go` is the leader-side gRPC handler.
`validatePrincipal` re-evaluates the inbound access-key against the
leader's own `RoleStore` and demotes a follower-claimed full role when
the leader sees read-only — this is the design's *adapter side is the
source of truth for authz* invariant for forwarded calls (Section
3.3.1).
- `forwardErrorResponse` mirrors the HTTP handler's `writeTablesError`
mapping (403 / 503 / 404 / 409 / 400 / 500) so a forwarded call is
indistinguishable from a leader-direct call to the SPA. Raw adapter
error text never leaves the leader; everything is encoded into the
structured response payload.
- `forwarded_from` is echoed into the leader's audit log line on every
successful forwarded operation (acceptance criterion 6).

## What is NOT in this PR

- Follower-side AdminForward client + bridge wiring (acceptance
criterion 2). The follower's HTTP handler still returns `503
leader_unavailable` + `Retry-After: 1` from #634 — this PR is a no-op on
the request path until the client wiring lands.
- Election-period 503 handling (criterion 3) — comes with the client
wiring.
- Rolling-upgrade compatibility flag `admin.leader_forward_v2`
(criterion 5) — depends on a Raft-level cluster version bump that is
independently substantial.
- gRPC server registration in `main.go` — wired in the next PR alongside
the client.

## Acceptance criteria coverage

| # | Criterion | This PR |
|---|---|---|
| 1 | Leader direct write | ✓ existing (PR #634) |
| 2 | Follower forwards transparently | ⏳ next PR (client) |
| 3 | Election period 503 + retry | ⏳ next PR (client) |
| 4 | Leader demotes stale full role | ✓ test added |
| 5 | Rolling-upgrade compat flag | ⏳ deferred |
| 6 | `forwarded_from` in audit log | ✓ test added |

## Test plan

- [x] `go build ./...`
- [x] `go vet ./...`
- [x] `golangci-lint run` (admin package: 0 issues)
- [x] `go test ./internal/admin/ -run TestForwardServer -count=1` (11
leader-side tests pass — criterion 1, 4, 6, plus error-mapping coverage)
- [ ] Wire follower side and exercise an end-to-end follower → leader
call against a live cluster
bootjp added a commit that referenced this pull request Apr 25, 2026
…artial) (#644)

Phase 1 + 2 of Task #26: the follower-side `LeaderForwarder` client and
its integration into the dynamo HTTP handler. Builds on the AdminForward
leader-side dispatcher landed via #635.

## Summary

- New `LeaderForwarder` interface (`internal/admin/forward_client.go`)
decouples the dynamo HTTP handler from `pb.AdminForwardClient`. The
handler stays proto-free; the bridge in `main_admin.go` (next phase)
plugs in the gRPC-backed implementation.
- `gRPCForwardClient` translates a `CreateTableRequest` / table-name
into an `AdminForwardRequest`, dials via a `GRPCConnFactory` (production
wraps `kv.GRPCConnCache`), and re-shapes the response into
`ForwardResult` (status, payload, content-type).
- `forwarded_from = nodeID` is populated so the leader's audit log
carries the trace (criterion 6, leader-side already shipped in #635).
- Defensive: `status_code == 0` upgrades to `502 Bad Gateway`; missing
`ContentType` fills the JSON default. Both surface transport bugs rather
than producing silently-malformed SPA responses.
- `ErrLeaderUnavailable` sentinel signals the "no leader known" case so
the handler can map to 503 + `Retry-After: 1` (criterion 3).
- `DynamoHandler` gains a `forwarder` field and `WithLeaderForwarder`
option. When set, `handleCreate` / `handleDelete` catch
`ErrTablesNotLeader` from the source and forward to the leader
transparently — the SPA cannot tell forwarded from leader-direct.
- `writeForwardResult` re-emits the leader's structured response
verbatim (status + payload + content-type), so a forwarded `409
Conflict` from the leader stays `409` on the wire — no
re-classification.
- `writeForwardFailure` maps `ErrLeaderUnavailable` (election in flight)
and gRPC transport errors to 503 + `Retry-After: 1`.
`ErrLeaderUnavailable` is intentionally NOT logged at error level
(elections are routine); transport errors are logged at LevelError so
operators can investigate.

## What is NOT in this PR

- gRPC server registration in `main.go` (production wiring of the
`ForwardServer` from #635) — comes in the next phase.
- The bridge that wraps `kv.GRPCConnCache` and supplies
`LeaderAddressResolver` — same phase.
- Election-period retry-loop on the client side (criterion 3 partial:
this PR returns 503 + Retry-After; the SPA / client retries the request;
criterion 3 fully needs the production bridge to dial actual
leader-discovery).

## Test plan

- [x] `go build ./...`
- [x] `go vet ./...`
- [x] `golangci-lint run` (admin package: 0 issues)
- [x] `go test ./internal/admin/ -count=1 -race`
- 8 forward-client unit tests: constructor input validation, both
Forward operations including principal/payload/op-enum/`forwarded_from`
round-trip, `ErrLeaderUnavailable`, dial/RPC errors propagated with
`cockroachdb/errors` wrapping, zero status code upgrade, missing content
type fallback
- 9 handler integration tests: transparent forward for create + delete,
no-forwarder fallback to 503, `ErrLeaderUnavailable` → 503 +
Retry-After, transport error → 503 + log, leader 409 pass-through,
leader 503 + Retry-After preserved, role check short-circuits before
forward, body validation short-circuits before forward
- [ ] Wire production bridge + register `pb.RegisterAdminForwardServer`
in `main.go` and exercise an end-to-end follower → leader call against a
real cluster (next PR).

## Acceptance criteria coverage

| # | Criterion | This PR |
|---|---|---|
| 1 | Leader direct write | ✓ (in main since #634) |
| 2 | Follower forwards transparently | ✓ wiring done; needs main.go
gRPC registration to take effect |
| 3 | Election-period 503 + retry | ✓ partial — handler returns 503 +
Retry-After; full transparency needs the bridge |
| 4 | Leader demotes stale full role | ✓ (in main since #635) |
| 5 | Rolling-upgrade compat flag | ⏳ deferred (cluster-version bump) |
| 6 | `forwarded_from` in audit log | ✓ (in main since #635) |
bootjp added a commit that referenced this pull request Apr 26, 2026
…de bridge (#648)

Phase 3 of Task #26 / AdminForward design 3.3. Closes the wiring loop
opened by #635 (leader-side dispatcher) and #644 (follower-side client +
handler integration): the admin HTTP DynamoHandler now hands
`ErrTablesNotLeader` writes off to the in-process `LeaderForwarder`, and
the leader-side gRPC `AdminForward` service is registered on every node
so the forwarded RPC actually lands somewhere.

## Summary

- **`admin.ServerDeps.Forwarder`** (3315bf4) — new optional field on
the high-level admin server composition surface. `admin.NewServer` calls
`WithLeaderForwarder` when set; nil leaves the existing 503 +
Retry-After:1 fallback untouched. Single-node / leader-only deployments
are unaffected.
- **`main_admin_forward.go`** (f3e529c) — production bridge between
`kv.GRPCConnCache` and the admin layer's typed `PBAdminForwardClient`
(via `pb.NewAdminForwardClient`). Includes `buildLeaderForwarder`,
`roleStoreFromFlags`, `adminForwardServerDeps`, and a
`readyForRegistration` gate.
- **Runtime wiring** (19f6b70) — `runtimeServerRunner.start()` now
creates `dynamoServer` BEFORE `startRaftServers`, so the leader-side
`ForwardServer` can use it as its `TablesSource`. `startRaftServers`
accepts the new `adminForwardServerDeps` bundle and delegates to
`registerAdminForwardServer`. `startServers` constructs a single
`*kv.GRPCConnCache` and forwards it (with the `Coordinator`) to
`startAdminFromFlags`, which in turn builds the production
`LeaderForwarder` via `buildAdminLeaderForwarder` and passes it through
`ServerDeps.Forwarder`.

## What is NOT in this PR

- Rolling-upgrade compatibility flag (criterion 5) — still deferred
behind a cluster-version bump.
- Election-period retry loop on the follower's bridge — the handler
returns 503 + Retry-After:1, the SPA / curl re-issues, and the bridge
dials again on the next attempt. Inline retry inside the handler would
just hide the latency from operators.

## Test plan

- [x] `go build ./...`
- [x] `go vet ./...`
- [x] `golangci-lint run` (main + admin packages: 0 issues)
- [x] `go test ./internal/admin/ -count=1 -race` — full admin suite
passes (existing 21 forward tests + 1 new
`TestServer_ServerDepsForwarderIsWired`)
- [x] `go test . -count=1 -race` — main package passes (4 new bridge /
role-store / readyForRegistration tests)
- [ ] `adapter` package times out at 240s on this branch AND on `main` —
pre-existing flake unrelated to this PR (verified by checking out main
and running the same suite)
- [ ] End-to-end smoke against a 3-node cluster — needs the local Jepsen
runner; blocked on the next PR or a manual run

## Acceptance criteria coverage

| # | Criterion | This PR |
|---|---|---|
| 1 | Leader direct write | ✓ (in main since #634) |
| 2 | Follower forwards transparently | ✓ — wiring complete; both
leader-side gRPC service AND follower-side bridge are now plumbed |
| 3 | Election-period 503 + retry | ✓ — handler still returns 503 +
Retry-After:1 when no leader is known; the SPA / client retries and the
next attempt re-dials |
| 4 | Leader demotes stale full role | ✓ (in main since #635) |
| 5 | Rolling-upgrade compat flag | ⏳ deferred (cluster-version bump) |
| 6 | `forwarded_from` in audit log | ✓ (in main since #635 — bridge now
stamps `--raftId` onto every forwarded request) |

## Self-review (5 lenses)

1. **Data loss**: No FSM / Raft / Pebble path changes; admin writes
still go through the same `AdminCreateTable` / `AdminDeleteTable` Raft
proposal once they reach the leader.
2. **Concurrency**: `runner.start()` reorder confirmed safe —
`startDynamoDBServer` only depends on `coordinate` + `shardStore`
(already constructed); raft TCP listeners are independent of the dynamo
HTTP listener. Worst case during the 100 ms-or-so reorder window is the
same "no leader yet" 503 the old order would have produced if a request
landed before raft converged.
3. **Performance**: One additional `&kv.GRPCConnCache{}` allocation per
process; one closure for the resolver. No hot-path changes.
4. **Data consistency**: Forwarded requests re-validate the principal at
the leader (criterion 4, already shipped); commit-ts ordering is
unchanged because the leader's `AdminCreateTable` path does its own
`HLC.Next`.
5. **Test coverage**: 4 new main-package tests + 1 new admin-package
server-level test. `TestServer_ServerDepsForwarderIsWired` is the
regression test for "future refactor drops the forwarder before it
reaches the dynamo handler" — exactly the failure mode the wiring change
in this PR introduces the risk of.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added leader-forwarding capability for admin requests sent to follower
nodes
  * Implemented role-based access control for admin API operations
* Enhanced DynamoDB request routing to automatically direct operations
to leader nodes

* **Tests**
* Added comprehensive integration and unit tests validating
leader-forwarding paths

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
bootjp added a commit that referenced this pull request Apr 26, 2026
Three findings from Claude review on PR #670:

1. (P1) translateAdminQueuesError did not catch leader-churn errors.
   AdminDeleteQueue passes the upfront isVerifiedSQSLeader check,
   then dispatches deleteQueueWithRetry which can hit a kv coordinator
   that just lost leadership. The resulting kv.ErrLeaderNotFound /
   adapter.ErrNotLeader / wrapped "not leader" suffixes were not in
   the translator's switch — they fell to default and the admin
   handler rendered a generic 500 instead of the spec'd 503 +
   Retry-After: 1. Added the `case isLeaderChurnError(err)` arm
   mirroring translateAdminTablesError's identical fix from PR #634.

2. (P2) No tests for translateAdminQueuesError. Mirrored the three
   Dynamo equivalents in main_admin_test.go:
     - TestTranslateAdminQueuesError_LeaderChurn covers every kv
       sentinel + canonical wrapped-suffix variant.
     - TestTranslateAdminQueuesError_LeaderPhraseInMiddleOfMessage
       pins the HasSuffix matcher behaviour against false positives
       on user-supplied error messages mid-string.
     - TestTranslateAdminQueuesError_UnrelatedErrorPassesThrough
       confirms the detector does not swallow innocent "leader"
       mentions outside the canonical phrase set.

3. (Low / polish) SqsList.tsx subtitle leaked the Go file path
   "adapter/sqs_admin.go" and the internal milestone name
   "Phase 3.A" to end users — DynamoList / S3List don't do this.
   Replaced with operator-facing prose describing what the page does.

Verified:
  - go test -run TestTranslateAdminQueuesError . — passes
  - go build ./... clean
  - go test -race ./internal/admin/... + go test -race -run TestSQS ./adapter/ — pass
  - golangci-lint run ./adapter/... ./internal/admin/... ./... — 0 issues
  - cd web/admin && npm run lint (tsc --strict) clean
bootjp added a commit that referenced this pull request Apr 26, 2026
Three findings from Claude review on PR #670:

1. (P1) translateAdminQueuesError did not catch leader-churn errors.
   AdminDeleteQueue passes the upfront isVerifiedSQSLeader check,
   then dispatches deleteQueueWithRetry which can hit a kv coordinator
   that just lost leadership. The resulting kv.ErrLeaderNotFound /
   adapter.ErrNotLeader / wrapped "not leader" suffixes were not in
   the translator's switch — they fell to default and the admin
   handler rendered a generic 500 instead of the spec'd 503 +
   Retry-After: 1. Added the `case isLeaderChurnError(err)` arm
   mirroring translateAdminTablesError's identical fix from PR #634.

2. (P2) No tests for translateAdminQueuesError. Mirrored the three
   Dynamo equivalents in main_admin_test.go:
     - TestTranslateAdminQueuesError_LeaderChurn covers every kv
       sentinel + canonical wrapped-suffix variant.
     - TestTranslateAdminQueuesError_LeaderPhraseInMiddleOfMessage
       pins the HasSuffix matcher behaviour against false positives
       on user-supplied error messages mid-string.
     - TestTranslateAdminQueuesError_UnrelatedErrorPassesThrough
       confirms the detector does not swallow innocent "leader"
       mentions outside the canonical phrase set.

3. (Low / polish) SqsList.tsx subtitle leaked the Go file path
   "adapter/sqs_admin.go" and the internal milestone name
   "Phase 3.A" to end users — DynamoList / S3List don't do this.
   Replaced with operator-facing prose describing what the page does.

Verified:
  - go test -run TestTranslateAdminQueuesError . — passes
  - go build ./... clean
  - go test -race ./internal/admin/... + go test -race -run TestSQS ./adapter/ — pass
  - golangci-lint run ./adapter/... ./internal/admin/... ./... — 0 issues
  - cd web/admin && npm run lint (tsc --strict) clean
bootjp added a commit that referenced this pull request Apr 26, 2026
Per `docs/design/README.md`'s lifecycle convention, the admin dashboard
design doc is now "partial":

- **P1** (DynamoDB CRUD + AdminForward) — shipped via #634, #635, #644,
#648
- **P2** (S3 buckets list/create/delete/ACL + DescribeTable) — shipped
via #658, with #669 + #673 in flight
- **P3** (React SPA + embed) — shipped via #649, #650
- **P4** (TLS / role / CSRF / operator docs) — TLS, role, CSRF are
already live in P1; operator docs in #674

Independent of the in-flight slice 2 PRs (#669/#673) and the docs PR
(#674) — this rename only reflects what is already on main today, plus
an "Implementation status" table mapping each phase to the PR it landed
in.

## What this PR changes

- `git mv` the design doc from `2026_04_24_proposed_admin_dashboard.md`
to `2026_04_24_partial_admin_dashboard.md` so its history follows
- Add an "Implementation status" header table indexing each phase to the
PRs that landed it
- List the outstanding open items so future readers know what is still
owed against the original proposal:
- AdminForward acceptance criterion 5 (rolling-upgrade compat flag) —
deferred
  - S3 object browser — explicitly out of scope per Section 2 Non-goals
  - TLS cert hot-reload — restart-to-rotate is the documented model

When the rolling-upgrade flag lands, the doc gets renamed once more to
`2026_04_24_implemented_admin_dashboard.md` per the README's lifecycle
convention.
bootjp added a commit that referenced this pull request Apr 26, 2026
Catches the three locations the prior round (7c63a7b) missed:
adapter/sqs_admin_test.go (TestAdminQueueSummary_CreatedAtUsesMillisNotHLC,
TestMetaAttributesForAdmin_IncludesQueueArnAndLastModified) and
main_admin_test.go (TestTranslateAdminQueuesError_LeaderChurn). Same
pattern: the why-rationale stays, only the "fifth-round Claude
review on PR #670" / "seventh-round Claude review on PR #670" /
"Codex P2 + Claude P1 on PR #670" attributions are dropped.

Pre-existing PR #634 references at main_admin_test.go:373 / :397
were added by a prior merged PR and are out of scope here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant