docs: add admin dashboard operator guide (P4)#674
docs: add admin dashboard operator guide (P4)#674bootjp merged 6 commits intofeat/admin-s3-writesfrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 3m 42s —— View job Review
Overall the doc is solid and well-structured. Flag names, CSRF fields, cookie names, credentials file format, Bug 1 — Key size is wrong (high)
const sessionSigningKeyLen = 64 // must decode to exactly 64 bytes
Bug 2 — Broken link in the Cross-references section (moderate)
The actual file on disk is Bug 3 — Audit log sample shows fields that aren't emitted (moderate)
The actual slog entries in and for forwarded writes: If operators build log parsers or dashboards against the documented shape, they will find zero matches. The Nit — Misleading "data-plane" label in troubleshooting (minor)
Verified correct ✓
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive operator guide for the elastickv admin dashboard, detailing its configuration, security model, and operational behavior. The review feedback correctly identifies a terminology contradiction regarding data-plane vs. admin endpoints and an inconsistent filename reference in the design rationale section.
| two seconds, check Raft leader status via the data-plane | ||
| `/admin/api/v1/cluster` endpoint or `cmd/elastickv-admin`. |
There was a problem hiding this comment.
Referring to the /admin/api/v1/cluster endpoint as "data-plane" is confusing and contradictory, as the introduction (line 16) explicitly distinguishes the admin listener from the data-plane endpoints (DynamoDB and S3).
| two seconds, check Raft leader status via the data-plane | |
| `/admin/api/v1/cluster` endpoint or `cmd/elastickv-admin`. | |
| two seconds, check Raft leader status via the | |
| /admin/api/v1/cluster endpoint or cmd/elastickv-admin. |
|
|
||
| ## Cross-references | ||
|
|
||
| - Design rationale: [docs/design/2026_04_24_partial_admin_dashboard.md](design/2026_04_24_partial_admin_dashboard.md) |
There was a problem hiding this comment.
The design document filename here (2026_04_24_partial_admin_dashboard.md) is inconsistent with the reference at the beginning of the file and in the pull request description (2026_04_24_proposed_admin_dashboard.md).
| - Design rationale: [docs/design/2026_04_24_partial_admin_dashboard.md](design/2026_04_24_partial_admin_dashboard.md) | |
| - Design rationale: [docs/design/2026_04_24_proposed_admin_dashboard.md](design/2026_04_24_proposed_admin_dashboard.md) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a5980e6b51
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| | Flag | Description | | ||
| |---|---| | ||
| | `-adminEnabled` | Master on/off switch. Default `false`. | | ||
| | `-adminSessionSigningKey` *or* `-adminSessionSigningKeyFile` *or* `ELASTICKV_ADMIN_SESSION_SIGNING_KEY` | Cluster-shared base64-encoded HS256 key (≥ 32 raw bytes / 44 base64 chars). **Must be the same on every node** — JWTs minted by node A are verified by node B during follower→leader forwarding, so a mismatch breaks the dashboard's read paths on follower nodes. The `*File` / env-var forms keep the secret out of `/proc/<pid>/cmdline`. | |
There was a problem hiding this comment.
State the exact 64-byte session key requirement
This row says the admin signing key can be ≥ 32 raw bytes / 44 base64 chars, but the server currently hard-fails unless the decoded key is exactly 64 bytes (sessionSigningKeyLen = 64 in internal/admin/config.go via decodeSigningKey). Operators who follow this guidance with a 32-byte key will be unable to start the admin listener.
Useful? React with 👍 / 👎.
| `-adminReadOnlyAccessKeys`. The dashboard does not distinguish the | ||
| two cases on the wire — both produce 401 — but the leader's audit | ||
| log shows the precise reason. |
There was a problem hiding this comment.
Document 403 for keys missing from admin allow-lists
The troubleshooting text says both bad secrets and non-allowlisted access keys return 401, but AuthService.authenticate distinguishes them: invalid key/secret returns 401 invalid_credentials, while a valid credential not present in -adminFullAccessKeys/-adminReadOnlyAccessKeys returns 403 forbidden (internal/admin/auth_handler.go). This mismatch can send operators down the wrong path when debugging login failures.
Useful? React with 👍 / 👎.
70213e0 to
e5ef0a9
Compare
|
@claude review 303f8f9 (rebased onto updated #673
|
…ane label Three substantive findings + one nit from claude review on #674: 1. **HS256 key size (high)**: said "≥ 32 raw bytes / 44 base64 chars" but the validator in internal/admin/config.go enforces exactly 64 raw bytes. A 32-byte key fails startup with a confusing error. Updated to "exactly 64 raw bytes — 88 base64 chars (standard padding) or 86 (RawURLEncoding)". Operators copying the doc into key-generation tooling now produce a key that actually validates. 2. **Audit log sample (moderate)**: showed method=POST / path=... / status=201 / duration=8.2ms — none of those fields are emitted. The actual slog entries use operation= + a resource key (bucket= or table=). Replaced the example with two real shapes (leader-direct and forwarded), so an operator building a log parser against the documented contract finds matches. 3. **Cross-reference link (moderate)**: line 249 used the _partial_ filename for the design doc, but #675 (the rename PR) hasnt landed yet. Point to _proposed_ (matches main today) with an inline note that the link will follow once #675 lands. The intro paragraph at line 5 already used _proposed_, so this removes the inconsistency claude flagged. 4. **"data-plane" label nit**: /admin/api/v1/cluster is an admin- plane endpoint, not data-plane. "data-plane" implied the DynamoDB / S3 / Redis ports. Replaced with "admin". The cosmetic Last-updated / Status verbosity items on #675 are tracked separately.
a5980e6 to
303f8f9
Compare
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Claude finished @bootjp's task in 3m 51s —— View job Review
Confirmed fixes ✓All four stated fixes are present and correct against the code.
Bug A —
|
e5ef0a9 to
cafac05
Compare
…ane label Three substantive findings + one nit from claude review on #674: 1. **HS256 key size (high)**: said "≥ 32 raw bytes / 44 base64 chars" but the validator in internal/admin/config.go enforces exactly 64 raw bytes. A 32-byte key fails startup with a confusing error. Updated to "exactly 64 raw bytes — 88 base64 chars (standard padding) or 86 (RawURLEncoding)". Operators copying the doc into key-generation tooling now produce a key that actually validates. 2. **Audit log sample (moderate)**: showed method=POST / path=... / status=201 / duration=8.2ms — none of those fields are emitted. The actual slog entries use operation= + a resource key (bucket= or table=). Replaced the example with two real shapes (leader-direct and forwarded), so an operator building a log parser against the documented contract finds matches. 3. **Cross-reference link (moderate)**: line 249 used the _partial_ filename for the design doc, but #675 (the rename PR) hasnt landed yet. Point to _proposed_ (matches main today) with an inline note that the link will follow once #675 lands. The intro paragraph at line 5 already used _proposed_, so this removes the inconsistency claude flagged. 4. **"data-plane" label nit**: /admin/api/v1/cluster is an admin- plane endpoint, not data-plane. "data-plane" implied the DynamoDB / S3 / Redis ports. Replaced with "admin". The cosmetic Last-updated / Status verbosity items on #675 are tracked separately.
303f8f9 to
b9b0b0c
Compare
Two findings from the latest claude review on #674: 1. **Audit log shapes (Bug A)**: the previous example claimed `admin_audit ... operation=delete_table table=orders` was a leader-direct sample, but `DynamoHandler.handleDelete` (dynamo_handler.go:379-396) does not emit any handler-level admin_audit — that shape only appears via `ForwardServer` on the forwarded path. It also omitted the `Audit` middleware line (middleware.go:206-232) which wraps every non-GET admin request with method= / path= / status= / remote= / duration=. An operator building a parser against the doc would have missed half of the actual lines. Rewrote the section as three explicit shapes — Audit middleware (always), S3Handler op line (only emits from s3_handler.go:299 /333/355), ForwardServer op line (covers both Dynamo and S3 forwarded paths). Removed the misleading `delete_table` leader-direct sample and replaced the Dynamo examples with forwarded-only ones, which is where they actually originate. Added a one-sentence note that a single request typically produces two lines (op-specific + middleware) so log parsers know to treat the key as a union. 2. **Login 401 vs 403 (Codex P2)**: the troubleshooting entry said "both produce 401" but auth_handler.go:336-345 returns 403 forbidden when the credentials match but the access key is not in `-adminFullAccessKeys` / `-adminReadOnlyAccessKeys`. Split into two entries: 401 invalid_credentials (bad access_key or secret) and 403 forbidden (valid SigV4 creds, but no admin role assignment), with the remediation specific to each case. No functional changes. Doc-only.
|
@claude review Addressed both findings from the previous review:
Latest commit: 05560e7. |
|
Claude finished @bootjp's task in 1m 47s —— View job Review
Both fixes are correct and verified against the code. LGTM. Bug A — Audit log shapes ✓ confirmed fixedThe section now documents three shapes:
The note at line 180 that "a single state-changing request typically produces two audit lines" is accurate. The qualifier "typically" correctly covers the edge case where a leader-direct DynamoDB write produces only the middleware line (no DynamoHandler operation line), while S3 and forwarded DynamoDB writes produce two. Bug B — 401 vs 403 at login ✓ confirmed fixedThe troubleshooting section now has two separate entries:
Both match No new findingsAll items from prior rounds remain confirmed. The doc is ready to merge once the stacked #669 / #673 dependencies land. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 05560e795e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| |---|---| | ||
| | `-adminListen` | host:port for the admin listener. Defaults to `127.0.0.1:8080`. | | ||
| | `-adminTLSCertFile` / `-adminTLSKeyFile` | PEM cert + key. Both must be set together; a partial config fails validation at startup. | | ||
| | `-adminAllowPlaintextNonLoopback` | Explicit opt-out for the non-loopback-without-TLS startup hard-error. **Strongly discouraged** — enables the dashboard to mint cookies without the `Secure` attribute and ship session JWTs over plaintext. Use only for short-lived test rigs you control. | |
There was a problem hiding this comment.
Correct plaintext non-loopback cookie behavior
This line says -adminAllowPlaintextNonLoopback makes the dashboard mint cookies without Secure, but that flag only bypasses the startup TLS guard (internal/admin/config.go); cookie security is controlled separately by -adminAllowInsecureDevCookie (main_admin.go passes it to AuthServiceOpts.InsecureCookie, and buildCookie uses that to set Secure). In the documented non-loopback HTTP scenario, operators who set only -adminAllowPlaintextNonLoopback can end up with Secure cookies that browsers won't send over plaintext, causing login/session behavior to fail contrary to this guidance.
Useful? React with 👍 / 👎.
| header. The SPA's API client honours `Retry-After` and re-issues | ||
| the request once. Operators investigating "intermittent 503s" |
There was a problem hiding this comment.
Remove unsupported automatic Retry-After retry claim
The guide claims the SPA API client honors Retry-After and retries once, but the frontend client currently performs a single fetch call with no retry path (web/admin/src/api/client.ts, and useApi also has no 503 retry logic). During leader elections, this mismatch changes operator expectations and can lead to unresolved write failures because users wait for an automatic retry that never happens.
Useful? React with 👍 / 👎.
P4 deliverable from docs/design/2026_04_24_proposed_admin_dashboard.md Section 8: a single self-contained operator-facing reference for the admin HTTP listener. Covers: - Quick-start invocation for a loopback dev cluster - Required + optional flag reference, with explanations of why each guard exists (TLS hard-error, rolling-update caveats, etc.) - TLS topologies (loopback / TLS / discouraged plaintext-non-loopback) - Role model + how live role re-validation works on every state- changing request - The full /admin/api/v1/* surface (auth + cluster + dynamo + s3, including the slice 2 write paths and the AdminForward forwarding contract) - forwarded_from audit log shape and why it carries the follower's node ID - Troubleshooting guide for the common failure modes operators hit during initial bring-up (missing credentials, TLS hard-error, 401 ambiguity, 503 leader_unavailable, bucket_not_empty, blank-screen / placeholder bundle) - Cross-references to the design doc and architecture overview The Section 8 P4 plan also called out "TLS, read-only role, CSRF" as deliverables; those are already implemented (see config.go's validateTLS / validateAccessKeyRoles, the role gates in DynamoHandler.principalForWrite + S3Handler.principalForWrite, and the CSRFDoubleSubmit middleware. This doc stitches them into a single reference operators can land on without reading code. Stacked on #669 (P2 slice 2a) + #673 (P2 slice 2b) so the API- surface table can describe S3 write endpoints as shipped. Once both land in main, this rebases cleanly. EOF )
…ane label Three substantive findings + one nit from claude review on #674: 1. **HS256 key size (high)**: said "≥ 32 raw bytes / 44 base64 chars" but the validator in internal/admin/config.go enforces exactly 64 raw bytes. A 32-byte key fails startup with a confusing error. Updated to "exactly 64 raw bytes — 88 base64 chars (standard padding) or 86 (RawURLEncoding)". Operators copying the doc into key-generation tooling now produce a key that actually validates. 2. **Audit log sample (moderate)**: showed method=POST / path=... / status=201 / duration=8.2ms — none of those fields are emitted. The actual slog entries use operation= + a resource key (bucket= or table=). Replaced the example with two real shapes (leader-direct and forwarded), so an operator building a log parser against the documented contract finds matches. 3. **Cross-reference link (moderate)**: line 249 used the _partial_ filename for the design doc, but #675 (the rename PR) hasnt landed yet. Point to _proposed_ (matches main today) with an inline note that the link will follow once #675 lands. The intro paragraph at line 5 already used _proposed_, so this removes the inconsistency claude flagged. 4. **"data-plane" label nit**: /admin/api/v1/cluster is an admin- plane endpoint, not data-plane. "data-plane" implied the DynamoDB / S3 / Redis ports. Replaced with "admin". The cosmetic Last-updated / Status verbosity items on #675 are tracked separately.
Two findings from the latest claude review on #674: 1. **Audit log shapes (Bug A)**: the previous example claimed `admin_audit ... operation=delete_table table=orders` was a leader-direct sample, but `DynamoHandler.handleDelete` (dynamo_handler.go:379-396) does not emit any handler-level admin_audit — that shape only appears via `ForwardServer` on the forwarded path. It also omitted the `Audit` middleware line (middleware.go:206-232) which wraps every non-GET admin request with method= / path= / status= / remote= / duration=. An operator building a parser against the doc would have missed half of the actual lines. Rewrote the section as three explicit shapes — Audit middleware (always), S3Handler op line (only emits from s3_handler.go:299 /333/355), ForwardServer op line (covers both Dynamo and S3 forwarded paths). Removed the misleading `delete_table` leader-direct sample and replaced the Dynamo examples with forwarded-only ones, which is where they actually originate. Added a one-sentence note that a single request typically produces two lines (op-specific + middleware) so log parsers know to treat the key as a union. 2. **Login 401 vs 403 (Codex P2)**: the troubleshooting entry said "both produce 401" but auth_handler.go:336-345 returns 403 forbidden when the credentials match but the access key is not in `-adminFullAccessKeys` / `-adminReadOnlyAccessKeys`. Split into two entries: 401 invalid_credentials (bad access_key or secret) and 403 forbidden (valid SigV4 creds, but no admin role assignment), with the remediation specific to each case. No functional changes. Doc-only.
05560e7 to
c9da923
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c9da9235a7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| **`Audit` middleware** — emitted for every non-GET/HEAD/OPTIONS | ||
| request that reaches the admin mux on this node, regardless of which | ||
| handler served it. Always present on the node that received the HTTP |
There was a problem hiding this comment.
Correct audit coverage statement for write requests
This text says the Audit middleware is emitted for every non-GET/HEAD/OPTIONS request regardless of handler, but routing does not do that: /auth/logout is explicitly wired through protectNoAudit (no Audit wrapper), and the protected chain runs SessionAuth before Audit, so unauthenticated write attempts are rejected before any middleware audit line is produced (internal/admin/server.go). As written, operators may incorrectly assume all write attempts are guaranteed to appear in admin_audit logs.
Useful? React with 👍 / 👎.
Two Codex P2 findings on the latest review of #674: 1. **Cookie Secure attribute (line 55 + the TLS topology section)**: I wrote that `-adminAllowPlaintextNonLoopback` "enables the dashboard to mint cookies without the `Secure` attribute". That conflates two flags. In the actual wiring (auth_handler.go:114 + auth_handler.go:458-468 + main_admin.go:597), `secureCookie` is `!opts.InsecureCookie` which comes from `-adminAllowInsecureDevCookie`, **not** `-adminAllowPlaintextNonLoopback`. The latter only bypasses the startup TLS guard (config.go:112). Operationally this is the difference between "fails closed" and "broken end-to-end": an operator who reads the old doc and sets only `-adminAllowPlaintextNonLoopback` gets a listener that accepts plaintext, but the dashboard still mints `Secure=true` cookies that the browser refuses to send back over the plaintext channel — login appears to succeed and every subsequent request 401s, exactly the bug Codex was guarding against. Rewrote the line-55 entry to scope `-adminAllowPlaintextNonLoopback` to the listener TLS guard only, and added the cross-reference to `-adminAllowInsecureDevCookie` for the cookie attribute. Updated the loopback section to drop the misleading "Secure regardless" claim and explain when a loopback rig actually needs the dev-cookie flag. Added a paragraph at the bottom of "Discouraged: plaintext non-loopback" calling out that a working plaintext rig needs both flags, and that the two are deliberately separate so misconfiguration fails closed on either axis instead of silently downgrading both at once. 2. **SPA Retry-After claim (line 172)**: I wrote that "the SPA's API client honours Retry-After and re-issues the request once." Codex caught that web/admin/src/api/client.ts:75 does a single `fetch` call with no retry path, and useApi.ts has no 503 retry logic either. The doc was setting an operator expectation the code does not meet — under leader election a user would see the 503 surface directly and conclude "intermittent failure" instead of re-issuing the action. Rewrote the section to describe the current behaviour accurately: server emits `Retry-After: 1`, but the SPA surfaces the 503 directly and the operator must re-issue. Kept the header documented because a future client (or an external operator script driving the JSON API) can still honour it — it is part of the server's contract. No code changes; doc-only. The cookie wiring claim was verified against auth_handler.go and main_admin.go; the SPA claim was verified against web/admin/src/api/client.ts and useApi.ts.
|
@claude review Addressed both Codex P2 findings on c9da923:
Latest commit: 02d52dc. Doc-only; no code changes. |
|
Claude finished @bootjp's task in 2m 36s —— View job Review
Both stated fixes are correct and verified. One Codex P2 finding from the Confirmed fixes ✓
Bug — Audit coverage claim still overstated (Codex P2 from
|
| Claim | Source | Result |
|---|---|---|
All -admin* flag names |
main.go:117–128 |
✓ |
| HS256 key exactly 64 bytes / 88 base64 std | config.go:14 |
✓ |
admin_session / admin_csrf / X-Admin-CSRF |
middleware.go:19–21 |
✓ |
bucket_not_empty JSON error code |
s3_handler.go:419 |
✓ |
| Credentials file JSON shape | main_sigv4_creds.go:16,20–21 |
✓ |
| TLS 1.2+ enforced | main_admin.go:536 |
✓ |
leader_unavailable 503 + Retry-After: 1 |
s3_handler.go:411–413 |
✓ |
| S3Handler operation shapes | s3_handler.go:299–304,333–339,355–360 |
✓ |
| ForwardServer shapes | forward_server.go:170–176,224–230,391–398 |
✓ |
| 401 vs 403 at login | auth_handler.go:336–346 |
✓ |
secureCookie: !opts.InsecureCookie |
auth_handler.go:114 |
✓ |
InsecureCookie wired from adminCfg.AllowInsecureDevCookie |
main_admin.go:597 |
✓ |
SPA single fetch, no retry |
client.ts:75 |
✓ |
design/2026_04_24_proposed_admin_dashboard.md link |
repo | ✓ |
Codex P2 on the latest review of #674: the audit-log section said the Audit middleware "is emitted for every non-GET/HEAD/ OPTIONS request that reaches the admin mux on this node, regardless of which handler served it." That overstates coverage: - `/auth/login` is wired through `publicAuth`, not `protect` — the middleware can't identify the actor without a pre-existing session, so it is intentionally skipped. AuthService emits `admin_audit action=login` directly (auth_handler.go:432). - `/auth/logout` is wired through `protectNoAudit` — the middleware is omitted to avoid emitting two audit lines per logout (a generic one plus the action-specific one); the specific line is the one operators want (auth_handler.go:442). - The `protect` chain is `BodyLimit → SessionAuth → Audit → CSRFDoubleSubmit → handler` (server.go:247-255). Audit only fires AFTER SessionAuth accepts the session, so an unauthenticated write attempt (missing / expired / invalid cookie) gets 401'd before reaching Audit and produces no middleware line. CSRF-rejected requests still produce one, by design, because the actor is known by then. An operator building log-monitoring on top of the previous text could conclude that every write attempt is guaranteed to appear in `admin_audit`, then miss CSRF-bypassing probes against an unauthenticated endpoint. Rewrote the middleware-shape section to: - Scope the rule to the protected mux chain. - Explain the ordering choice (Audit between SessionAuth and CSRFDoubleSubmit) so the CSRF-rejection-still-audits guarantee is preserved in the doc. - Enumerate the two auth endpoints that bypass the middleware and what they emit instead. - Adjust the intro paragraph: protected-chain mutations produce two audit lines, auth endpoints produce one (was: "every state-changing request typically produces two"). No behaviour change; doc-only. Verified against internal/admin/server.go:236-296 and internal/admin/auth_handler.go:432-449.
|
@claude review Addressed the new Codex P2 finding on The previous text overstated coverage by claiming the
Updated the middleware-shape section to enumerate which endpoints are wrapped and which aren't, plus the intro paragraph (auth endpoints produce one line, protected-chain mutations produce two). Latest commit: 9010085. Doc-only. |
|
Claude finished @bootjp's task in 2m 15s —— View job Review
The Codex P2 audit-coverage finding is correctly addressed. One new bug in the middleware sample path — introduced in this round. Confirmed fixes ✓
Bug — Wrong path in middleware audit sample (moderate)
The path constant is Nit —
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 02d52dc6e3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Login and logout emit their own `admin_audit` lines with | ||
| `action=login` / `action=logout` (plus `actor`, `claimed_actor`, | ||
| `remote`, `status`) so the JWT's lifetime can be correlated with the |
There was a problem hiding this comment.
Fix logout audit schema description
The guide says both login and logout admin_audit entries include claimed_actor, but auditLogout only logs action, actor, remote, and status (internal/admin/auth_handler.go). This mismatch can break log pipelines that validate field presence from this doc and will misclassify normal logout events as malformed.
Useful? React with 👍 / 👎.
| wherever the process slog handler is wired). A single state-changing | ||
| request typically produces **two** audit lines: one operation-specific | ||
| line from the source that performed the mutation, plus one generic | ||
| HTTP-shaped line from the `Audit` middleware. The shapes differ by |
There was a problem hiding this comment.
Don’t claim write requests usually emit two audit lines
This states a state-changing request typically yields an operation-specific line plus an Audit middleware line, but leader-direct Dynamo mutations only emit the middleware line (there is no Dynamo operation admin_audit emission in internal/admin/dynamo_handler.go). Operators who alert on a two-line pattern will undercount or incorrectly flag normal Dynamo writes.
Useful? React with 👍 / 👎.
Per `docs/design/README.md`'s lifecycle convention, the admin dashboard design doc is now "partial": - **P1** (DynamoDB CRUD + AdminForward) — shipped via #634, #635, #644, #648 - **P2** (S3 buckets list/create/delete/ACL + DescribeTable) — shipped via #658, with #669 + #673 in flight - **P3** (React SPA + embed) — shipped via #649, #650 - **P4** (TLS / role / CSRF / operator docs) — TLS, role, CSRF are already live in P1; operator docs in #674 Independent of the in-flight slice 2 PRs (#669/#673) and the docs PR (#674) — this rename only reflects what is already on main today, plus an "Implementation status" table mapping each phase to the PR it landed in. ## What this PR changes - `git mv` the design doc from `2026_04_24_proposed_admin_dashboard.md` to `2026_04_24_partial_admin_dashboard.md` so its history follows - Add an "Implementation status" header table indexing each phase to the PRs that landed it - List the outstanding open items so future readers know what is still owed against the original proposal: - AdminForward acceptance criterion 5 (rolling-upgrade compat flag) — deferred - S3 object browser — explicitly out of scope per Section 2 Non-goals - TLS cert hot-reload — restart-to-rotate is the documented model When the rolling-upgrade flag lands, the doc gets renamed once more to `2026_04_24_implemented_admin_dashboard.md` per the README's lifecycle convention.
Codex P2 on the latest review of #674: my line 260 said both login and logout `admin_audit` entries include `claimed_actor`, but `auditLogout` (auth_handler.go:442-449) only emits `action`, `actor`, `remote`, and `status`. `claimed_actor` is login-only because the field exists to distinguish the typed access key from the authenticated one — logout has no claim to verify (the actor comes from the already-validated session cookie). A log pipeline that validated field presence against the previous text would misclassify every successful logout as malformed, exactly the operator-facing bug Codex flagged. Replaced the one-line summary with two concrete sample lines — one for login (with `claimed_actor`), one for logout (without) — and a closing sentence telling parsers that `claimed_actor` is present-only-on-login. Same shape verified against auth_handler.go:432-449. No code changes; doc-only.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@claude review Addressed the Codex P2 finding on The previous text said both login and logout Replaced the one-line summary with two concrete sample audit lines (login with Latest commit: 85df320. Doc-only. |
|
Claude finished @bootjp's task in 2m 4s —— View job Review
The Confirmed fix ✓ —
|
| Shape | Source verified | Result |
|---|---|---|
auditLogin emits claimed_actor |
auth_handler.go:436 |
✓ |
auditLogout does NOT emit claimed_actor |
auth_handler.go:442–449 |
✓ |
Doc lines 266–269 show two distinct samples, claimed_actor present on login, absent on logout |
docs/admin.md:266–269 |
✓ |
"Log parsers consuming this shape should treat claimed_actor as present-only-on-login" |
docs/admin.md:271–272 |
✓ |
Bug — Middleware audit sample path still wrong (moderate, carried from previous round, not yet fixed)
docs/admin.md line 229:
admin_audit actor=AKIA_ADMIN role=full method=POST path=/admin/api/v1/buckets status=201 remote=10.0.0.7:51234 duration=8.2ms
internal/admin/s3_handler.go:26:
pathS3Buckets = "/admin/api/v1/s3/buckets"The /s3 segment is still missing in the sample path. An operator who copies this into a log filter or alert rule will find zero matches. This was first flagged in the round on commit 9010085 and was not included in the 85df320f changes.
Nit — "Typically two" intro contradicts the per-source detail for leader-direct Dynamo (Codex P2, partially open)
docs/admin.md lines 196–199:
A protected-chain mutation (Dynamo / S3 / cluster / keyviz writes) typically produces two audit lines: one operation-specific line from the source that performed the mutation, plus one generic HTTP-shaped line from the
Auditmiddleware.
docs/admin.md lines 232–235:
S3Handleroperation line … Only the S3 admin path emits these; the DynamoDB admin path relies on the middleware line plus the forwarded line below for its audit trail.
internal/admin/dynamo_handler.go: zero admin_audit slog calls — confirmed by grep.
The per-source section correctly states DynamoDB has no operation-specific line. But the intro's definition of "two" is "operation-specific line + middleware line," which is wrong for leader-direct Dynamo (one line only — the middleware). The "typically" qualifier was meant to cover this, but the two statements still directly contradict for the common case: a direct-leader DynamoDB write. An operator debugging a Dynamo operation audit trail who reads the intro first will expect two lines and spend time hunting a second line that was never emitted.
Suggested fix: change the intro to scope "two lines" to S3 explicitly and flag Dynamo as middleware-only except when forwarded:
A protected-chain S3 mutation typically produces two audit lines: one operation-specific line from
S3Handlerand one generic HTTP-shaped line from theAuditmiddleware. DynamoDB mutations produce one middleware line on the leader-direct path; forwarded DynamoDB operations additionally emit aForwardServeroperation line.
Verified correct ✓
All items confirmed across prior rounds still hold.
| Claim | Source | Result |
|---|---|---|
| HS256 key exactly 64 bytes / 88 base64 std | config.go:14 |
✓ |
admin_session / admin_csrf / X-Admin-CSRF |
middleware.go:19–21 |
✓ |
bucket_not_empty JSON error code |
s3_handler.go:419 |
✓ |
| TLS 1.2+ enforced | main_admin.go:536 |
✓ |
leader_unavailable 503 + Retry-After: 1 |
s3_handler.go:411–413 |
✓ |
| S3Handler operation shapes | s3_handler.go:299–304,333–339,355–360 |
✓ |
| ForwardServer shapes | forward_server.go:170–176,224–230,391–398 |
✓ |
| 401 vs 403 at login | auth_handler.go:336–346 |
✓ |
secureCookie: !opts.InsecureCookie |
auth_handler.go:114 |
✓ |
SPA single fetch, no retry |
client.ts:75 |
✓ |
| Audit middleware chain order (SessionAuth → Audit → CSRF) | server.go:247–254 |
✓ |
/auth/logout through protectNoAudit |
server.go:262–268, 284 |
✓ |
claimed_actor on login, absent on logout |
auth_handler.go:432–449 |
✓ |
…ane label Three substantive findings + one nit from claude review on #674: 1. **HS256 key size (high)**: said "≥ 32 raw bytes / 44 base64 chars" but the validator in internal/admin/config.go enforces exactly 64 raw bytes. A 32-byte key fails startup with a confusing error. Updated to "exactly 64 raw bytes — 88 base64 chars (standard padding) or 86 (RawURLEncoding)". Operators copying the doc into key-generation tooling now produce a key that actually validates. 2. **Audit log sample (moderate)**: showed method=POST / path=... / status=201 / duration=8.2ms — none of those fields are emitted. The actual slog entries use operation= + a resource key (bucket= or table=). Replaced the example with two real shapes (leader-direct and forwarded), so an operator building a log parser against the documented contract finds matches. 3. **Cross-reference link (moderate)**: line 249 used the _partial_ filename for the design doc, but #675 (the rename PR) hasnt landed yet. Point to _proposed_ (matches main today) with an inline note that the link will follow once #675 lands. The intro paragraph at line 5 already used _proposed_, so this removes the inconsistency claude flagged. 4. **"data-plane" label nit**: /admin/api/v1/cluster is an admin- plane endpoint, not data-plane. "data-plane" implied the DynamoDB / S3 / Redis ports. Replaced with "admin". The cosmetic Last-updated / Status verbosity items on #675 are tracked separately.
Two findings from the latest claude review on #674: 1. **Audit log shapes (Bug A)**: the previous example claimed `admin_audit ... operation=delete_table table=orders` was a leader-direct sample, but `DynamoHandler.handleDelete` (dynamo_handler.go:379-396) does not emit any handler-level admin_audit — that shape only appears via `ForwardServer` on the forwarded path. It also omitted the `Audit` middleware line (middleware.go:206-232) which wraps every non-GET admin request with method= / path= / status= / remote= / duration=. An operator building a parser against the doc would have missed half of the actual lines. Rewrote the section as three explicit shapes — Audit middleware (always), S3Handler op line (only emits from s3_handler.go:299 /333/355), ForwardServer op line (covers both Dynamo and S3 forwarded paths). Removed the misleading `delete_table` leader-direct sample and replaced the Dynamo examples with forwarded-only ones, which is where they actually originate. Added a one-sentence note that a single request typically produces two lines (op-specific + middleware) so log parsers know to treat the key as a union. 2. **Login 401 vs 403 (Codex P2)**: the troubleshooting entry said "both produce 401" but auth_handler.go:336-345 returns 403 forbidden when the credentials match but the access key is not in `-adminFullAccessKeys` / `-adminReadOnlyAccessKeys`. Split into two entries: 401 invalid_credentials (bad access_key or secret) and 403 forbidden (valid SigV4 creds, but no admin role assignment), with the remediation specific to each case. No functional changes. Doc-only.
Two Codex P2 findings on the latest review of #674: 1. **Cookie Secure attribute (line 55 + the TLS topology section)**: I wrote that `-adminAllowPlaintextNonLoopback` "enables the dashboard to mint cookies without the `Secure` attribute". That conflates two flags. In the actual wiring (auth_handler.go:114 + auth_handler.go:458-468 + main_admin.go:597), `secureCookie` is `!opts.InsecureCookie` which comes from `-adminAllowInsecureDevCookie`, **not** `-adminAllowPlaintextNonLoopback`. The latter only bypasses the startup TLS guard (config.go:112). Operationally this is the difference between "fails closed" and "broken end-to-end": an operator who reads the old doc and sets only `-adminAllowPlaintextNonLoopback` gets a listener that accepts plaintext, but the dashboard still mints `Secure=true` cookies that the browser refuses to send back over the plaintext channel — login appears to succeed and every subsequent request 401s, exactly the bug Codex was guarding against. Rewrote the line-55 entry to scope `-adminAllowPlaintextNonLoopback` to the listener TLS guard only, and added the cross-reference to `-adminAllowInsecureDevCookie` for the cookie attribute. Updated the loopback section to drop the misleading "Secure regardless" claim and explain when a loopback rig actually needs the dev-cookie flag. Added a paragraph at the bottom of "Discouraged: plaintext non-loopback" calling out that a working plaintext rig needs both flags, and that the two are deliberately separate so misconfiguration fails closed on either axis instead of silently downgrading both at once. 2. **SPA Retry-After claim (line 172)**: I wrote that "the SPA's API client honours Retry-After and re-issues the request once." Codex caught that web/admin/src/api/client.ts:75 does a single `fetch` call with no retry path, and useApi.ts has no 503 retry logic either. The doc was setting an operator expectation the code does not meet — under leader election a user would see the 503 surface directly and conclude "intermittent failure" instead of re-issuing the action. Rewrote the section to describe the current behaviour accurately: server emits `Retry-After: 1`, but the SPA surfaces the 503 directly and the operator must re-issue. Kept the header documented because a future client (or an external operator script driving the JSON API) can still honour it — it is part of the server's contract. No code changes; doc-only. The cookie wiring claim was verified against auth_handler.go and main_admin.go; the SPA claim was verified against web/admin/src/api/client.ts and useApi.ts.
Codex P2 on the latest review of #674: the audit-log section said the Audit middleware "is emitted for every non-GET/HEAD/ OPTIONS request that reaches the admin mux on this node, regardless of which handler served it." That overstates coverage: - `/auth/login` is wired through `publicAuth`, not `protect` — the middleware can't identify the actor without a pre-existing session, so it is intentionally skipped. AuthService emits `admin_audit action=login` directly (auth_handler.go:432). - `/auth/logout` is wired through `protectNoAudit` — the middleware is omitted to avoid emitting two audit lines per logout (a generic one plus the action-specific one); the specific line is the one operators want (auth_handler.go:442). - The `protect` chain is `BodyLimit → SessionAuth → Audit → CSRFDoubleSubmit → handler` (server.go:247-255). Audit only fires AFTER SessionAuth accepts the session, so an unauthenticated write attempt (missing / expired / invalid cookie) gets 401'd before reaching Audit and produces no middleware line. CSRF-rejected requests still produce one, by design, because the actor is known by then. An operator building log-monitoring on top of the previous text could conclude that every write attempt is guaranteed to appear in `admin_audit`, then miss CSRF-bypassing probes against an unauthenticated endpoint. Rewrote the middleware-shape section to: - Scope the rule to the protected mux chain. - Explain the ordering choice (Audit between SessionAuth and CSRFDoubleSubmit) so the CSRF-rejection-still-audits guarantee is preserved in the doc. - Enumerate the two auth endpoints that bypass the middleware and what they emit instead. - Adjust the intro paragraph: protected-chain mutations produce two audit lines, auth endpoints produce one (was: "every state-changing request typically produces two"). No behaviour change; doc-only. Verified against internal/admin/server.go:236-296 and internal/admin/auth_handler.go:432-449.
Codex P2 on the latest review of #674: my line 260 said both login and logout `admin_audit` entries include `claimed_actor`, but `auditLogout` (auth_handler.go:442-449) only emits `action`, `actor`, `remote`, and `status`. `claimed_actor` is login-only because the field exists to distinguish the typed access key from the authenticated one — logout has no claim to verify (the actor comes from the already-validated session cookie). A log pipeline that validated field presence against the previous text would misclassify every successful logout as malformed, exactly the operator-facing bug Codex flagged. Replaced the one-line summary with two concrete sample lines — one for login (with `claimed_actor`), one for logout (without) — and a closing sentence telling parsers that `claimed_actor` is present-only-on-login. Same shape verified against auth_handler.go:432-449. No code changes; doc-only.
## Summary State-refresh of `docs/design/2026_04_24_partial_admin_dashboard.md` now that all four phases of the original P1–P4 plan have shipped. | Phase | Was | Now | |---|---|---| | P1 | shipped | shipped (no change) | | P2 | partial — read-only #658 only | **shipped** (+ #669 writes + #673 forwarding) | | P3 | shipped | shipped (no change) | | P4 | mostly | **shipped** (+ operator doc #674, deployment runbook #669, script wiring #669+#678) | Also added the **AdminDeleteBucket TOCTOU** to Outstanding open items — coderabbitai flagged it during PR #669 review (pre-existing race, `s3.go:deleteBucket` inherits the same shape; recorded as a code comment on `AdminDeleteBucket` for the future fix). Doc stays at `_partial_` rather than promoting to `_implemented_` because two outstanding items remain: AdminForward acceptance criterion 5 (rolling-upgrade flag, explicitly deferred at design time) and the new TOCTOU. Updated the closing rename trigger to cover both. ## Test plan - [x] No design changes — this is a state refresh - [x] Status table entries cross-checked against the merge commits on main - [x] AdminDeleteBucket TOCTOU description matches the code comment in adapter/s3_admin.go and the operator-side workaround in docs/admin_deployment.md
P4 deliverable from
docs/design/2026_04_24_proposed_admin_dashboard.mdSection 8: a single self-contained operator-facing reference for the admin HTTP listener.Stacked on #669 (slice 2a) + #673 (slice 2b) so the API-surface table can describe S3 write endpoints as shipped. Rebases cleanly onto main once those land.
Sections
/admin/api/v1/*table including the slice 2 S3 write endpointsadmin_auditslog shape and theforwarded_fromfieldWhat is NOT in this PR
Section 8's P4 plan also called out "TLS, read-only role, CSRF" as deliverables; those are already implemented in main (see
config.go'svalidateTLS/validateAccessKeyRoles, the role gates inDynamoHandler.principalForWrite+S3Handler.principalForWrite, and theCSRFDoubleSubmitmiddleware). This doc stitches them into a single reference operators can land on without reading code.Test plan
main.go's flag definitions verbatimbucket_not_empty409 response shape matchesS3Handler.writeBucketsError