Skip to content

admin: read-only S3 bucket endpoints (P2 slice 1)#658

Merged
bootjp merged 3 commits intomainfrom
feat/admin-s3-readonly
Apr 26, 2026
Merged

admin: read-only S3 bucket endpoints (P2 slice 1)#658
bootjp merged 3 commits intomainfrom
feat/admin-s3-readonly

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 26, 2026

P2 slice 1 of docs/design/2026_04_24_proposed_admin_dashboard.md. Read-only S3 admin endpoints land first so the SPA's S3List and S3Detail pages stop hitting 404. Write paths (POST/PUT/DELETE/ACL) ship in slice 2 together with AdminForward integration; until then they reply 405.

Summary

  • *adapter.S3Server.AdminListBuckets / AdminDescribeBucket — SigV4-bypass read methods. Share loadBucketMetaAt + the metadata-prefix scan with the SigV4 path, so a SigV4 listBuckets and the admin dashboard cannot drift.
  • internal/admin/buckets_source.goBucketsSource interface + BucketSummary DTO + sentinel errors (ErrBucketsForbidden / ErrBucketsNotLeader / ErrBucketsNotFound / ErrBucketsAlreadyExists). ErrBucketsForbidden is wired in this slice; the others are reserved for slice 2 to keep the error vocabulary additive.
  • internal/admin/s3_handler.goS3Handler with handleList (paginated) + handleDescribe (404 for missing). Sub-paths under /buckets/{name}/ (the future /acl) return 404 here so a SPA bug pointed at this build cannot accidentally hit the describe path with a "{name}/acl" string.
  • Shared list_pagination.go — centralises base64url cursor + limit parsing/clamping so the Dynamo and S3 handlers cannot diverge on validation rules. Drops the now-redundant parseDynamoListLimit / decodeDynamoNextToken / encodeDynamoNextToken from dynamo_handler.go in favour of parseListPaginationParams / decodeListNextToken / encodeListNextToken.
  • apiRouteTable in server.go — bundles the precomposed middleware chains and dispatches by URL prefix. buildAPIMux's body went from 13 cyclomatic branches to 6, leaving headroom for the next resource family (SQS queues, etc.) to land without another refactor.
  • Production wiringmain_s3.go's startS3Server now returns *adapter.S3Server; runtimeServerRunner stores it; startAdminFromFlags accepts it and threads bucketsBridgeadmin.BucketsSourceServerDeps.Buckets.

What is NOT in this PR

  • AdminCreateBucket / AdminPutBucketAcl / AdminDeleteBucket on the adapter and their HTTP/AdminForward counterparts. Slice 2.
  • RoleStore plumb-through for S3. Slice 2 (read-only is fine for any authenticated session today; the write paths will need it).
  • forwarded_from audit trail for S3 admin writes. Slice 2 — needs the pb.AdminOperation enum extended.
  • The merge-freeze-acceptable changes to proto/admin_forward.proto. Slice 2.

Test plan

Acceptance criteria coverage (Section 4.1)

Endpoint This PR
GET /admin/api/v1/s3/buckets
POST /admin/api/v1/s3/buckets ⏳ slice 2
GET /admin/api/v1/s3/buckets/{name}
PUT /admin/api/v1/s3/buckets/{name}/acl ⏳ slice 2
DELETE /admin/api/v1/s3/buckets/{name} ⏳ slice 2

Self-review (5 lenses)

  1. Data loss: No FSM / Raft / Pebble path changes. AdminListBuckets / AdminDescribeBucket are read-only and use the same ScanAt / loadBucketMetaAt as the SigV4 path.
  2. Concurrency: No new shared state. The handler holds only the immutable BucketsSource interface; the bridge holds only the immutable *S3Server. pinReadTS is the same pattern the SigV4 path uses for snapshot stability.
  3. Performance: One additional metadata scan per GET /admin/api/v1/s3/buckets, identical to the SigV4 listBuckets. Pagination caps at 1000 buckets per page (silent clamp on oversize). No hot-path changes.
  4. Data consistency: Read-only; no commit-ts ordering risk. Admin reads use the same lease-read window as the SigV4 path.
  5. Test coverage: 13 handler tests + 4 adapter tests + the existing dynamo/cluster test surfaces still pass after the shared-helper refactor. The pagination cursor round-trip test pins the wire format so a future cursor-encoding change cannot ship without breaking the test.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added S3 bucket admin API endpoints to list all buckets and retrieve detailed bucket information, including creation timestamps.
    • Implemented pagination support with cursor-based next_token for browsing large bucket collections.
    • Added proper authorization checks and error handling for admin bucket operations (403 for forbidden, 404 for not found, 500 for errors).

Slice 1 of P2 (docs/design/2026_04_24_proposed_admin_dashboard.md
Section 4.1): GET /admin/api/v1/s3/buckets and GET /s3/buckets/{name}
become reachable so the SPA's S3List and S3Detail pages stop hitting
404. Write paths (POST / PUT / DELETE / ACL change) ship in slice 2
together with AdminForward integration; for now they reply 405.

Adapter:
- AdminListBuckets / AdminDescribeBucket on *adapter.S3Server bypass
  SigV4 the same way AdminListTables / AdminDescribeTable do on the
  Dynamo side. Both share loadBucketMetaAt + the metadata-prefix scan,
  so a SigV4 listBuckets and the admin dashboard cannot drift.
- AdminBucketSummary projects only the fields the dashboard needs;
  CreatedAt is left as the raw HLC and formatted in the bridge.

Admin package:
- BucketsSource interface + BucketSummary DTO with ISO-8601 CreatedAt.
- S3Handler with handleList (paginated) + handleDescribe (404 for
  missing). Sub-paths under /buckets/{name}/ (the future /acl) return
  404 here so a SPA bug pointed at this build cannot accidentally hit
  the describe path with a "{name}/acl" string.
- ServerDeps.Buckets wires the source into NewServer; nil silently
  disables the routes (mirrors the Tables nil contract).
- Shared list_pagination.go centralises base64url cursor + limit
  parsing and clamping so the Dynamo and S3 handlers cannot diverge
  on validation rules. Drops parseDynamoListLimit /
  decodeDynamoNextToken in favour of the shared helpers.
- buildAPIMux's route dispatch refactored into apiRouteTable so adding
  S3 (and future SQS / queues) does not push the function past
  cyclop's 10-branch ceiling.

Production wiring:
- main_s3.go's startS3Server now returns *adapter.S3Server so the
  admin listener can hold a reference. (nil, nil) is the explicit
  "S3 disabled" signal — empty --s3Address is a valid configuration,
  not an error.
- main.go's runtimeServerRunner stores s3Server alongside dynamoServer.
- main_admin.go's startAdminFromFlags accepts *adapter.S3Server,
  builds bucketsBridge → admin.BucketsSource, and threads it through
  startAdminServer / buildAdminHTTPServer / ServerDeps.Buckets.

Tests:
- 13 handler tests covering empty + populated list, pagination cursor
  round-trip, invalid limit / next_token shapes, oversize-limit
  silent clamp, list / describe storage failures, ErrBucketsForbidden
  → 403, missing bucket → 404, non-GET methods → 405,
  /buckets/{name}/sub → 404, FormatBucketCreatedAt second-precision
  RFC3339 + zero-HLC empty string.
- 4 adapter-level tests for AdminListBuckets / AdminDescribeBucket
  against an in-memory MVCCStore: empty list returns non-nil
  []AdminBucketSummary{}, populated list reflects buckets created via
  the SigV4 path, describe returns the populated summary with ACL
  preserved, missing bucket returns (nil, false, nil).

P2 slice 2 (next PR) will add AdminCreateBucket / AdminPutBucketAcl /
AdminDeleteBucket on the adapter, the corresponding write methods on
BucketsSource, and forwarding integration so a follower can hand S3
admin writes to the leader transparently.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 37 minutes and 44 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 37 minutes and 44 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 185f05fb-4d82-4262-97c4-f092df5bae53

📥 Commits

Reviewing files that changed from the base of the PR and between ab0b074 and 1bbd1da.

📒 Files selected for processing (8)
  • adapter/s3_admin.go
  • adapter/s3_admin_test.go
  • internal/admin/dynamo_handler.go
  • internal/admin/s3_handler.go
  • internal/admin/s3_handler_test.go
  • kv/hlc.go
  • main_admin.go
  • main_admin_forward_test.go
📝 Walkthrough

Walkthrough

This PR adds S3 bucket administration endpoints to the admin API by introducing an adapter layer for bucket metadata introspection, an HTTP handler with pagination support, dedicated pagination utilities, and integration into the existing admin server infrastructure.

Changes

Cohort / File(s) Summary
S3 Adapter Layer
adapter/s3_admin.go, adapter/s3_admin_test.go
New AdminBucketSummary DTO and methods AdminListBuckets and AdminDescribeBucket to scan and project S3 bucket metadata at pinned read timestamps. Tests validate pagination, field projections, and error handling for both operations.
Admin Domain Layer
internal/admin/buckets_source.go
New BucketsSource interface with read-only bucket methods, BucketSummary DTO for JSON responses, and sentinel error variables (ErrBucketsForbidden, ErrBucketsNotLeader, ErrBucketsNotFound, ErrBucketsAlreadyExists) to standardize error handling across admin handlers.
Pagination Utilities
internal/admin/list_pagination.go
Centralized pagination parser (parseListPaginationParams) that validates and derives limit and startAfter from query parameters with standardized error responses, plus base64url encoder/decoder functions.
S3 Admin HTTP Handler
internal/admin/s3_handler.go, internal/admin/s3_handler_test.go
New S3Handler serving GET /buckets (list with pagination) and /buckets/{name} (describe) endpoints. Includes route validation, error mapping (403/404/500), timestamp formatting, and pagination logic. Tests cover success/failure paths, pagination, validation, and method restrictions.
Admin Server Integration
internal/admin/server.go, internal/admin/dynamo_handler.go
Refactored pagination logic in dynamo_handler.go to use shared parseListPaginationParams and encodeListNextToken. Extended server.go with buildS3HandlerForDeps, dependency validation helper, and routing dispatch logic for S3 paths.
Startup and Wiring
main.go, main_s3.go, main_admin.go, main_s3_test.go, main_admin_test.go
Updated startS3Server to return (*adapter.S3Server, error) for downstream reuse. Wired S3 server into admin handler via new newBucketsSource bridge in main_admin.go, translating adapter DTOs to admin domain types. Updated tests to handle new signatures.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Handler as S3Handler
    participant Pagination as Pagination Logic
    participant Bridge as bucketsBridge
    participant Adapter as S3Server (adapter)
    participant Store as MVCC Store

    Client->>Handler: GET /admin/api/v1/s3/buckets?limit=10&next_token=...
    Handler->>Pagination: parseListPaginationParams(req)
    Pagination-->>Handler: (limit, startAfter) or error
    Handler->>Bridge: AdminListBuckets(ctx)
    Bridge->>Adapter: AdminListBuckets(ctx)
    Adapter->>Store: metadata prefix scan @ pinned timestamp
    Store-->>Adapter: [bucket metadata]
    Adapter-->>Bridge: [AdminBucketSummary]
    Bridge-->>Handler: [BucketSummary]
    Handler->>Handler: paginateBuckets(results, startAfter, limit)
    Handler-->>Client: { buckets: [...], next_token?: "..." }

    Client->>Handler: GET /admin/api/v1/s3/buckets/my-bucket
    Handler->>Bridge: AdminDescribeBucket(ctx, "my-bucket")
    Bridge->>Adapter: AdminDescribeBucket(ctx, "my-bucket")
    Adapter->>Store: load bucket metadata @ pinned timestamp
    Store-->>Adapter: bucket metadata or not found
    Adapter-->>Bridge: AdminBucketSummary or (nil, false, nil)
    Bridge-->>Handler: BucketSummary or error
    Handler-->>Client: { name: "my-bucket", ... } or 404
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Hop, hop! New buckets in the admin view,
Paginated lists with cursors spun true,
From MVCC stores to handlers so fine,
S3 introspection's now mine, all mine! 🪣✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing read-only S3 bucket admin endpoints, with a specific slice marker indicating this is the first part of a multi-phase effort.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/admin-s3-readonly

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 3m 49s —— View job


Code Review — admin: read-only S3 bucket endpoints (P2 slice 1)

  • Gather context and understand the PR
  • Read changed files
  • Review adapter/s3_admin.go
  • Review internal/admin/buckets_source.go
  • Review internal/admin/list_pagination.go
  • Review internal/admin/s3_handler.go
  • Review internal/admin/server.go changes
  • Review main_s3.go / main_admin.go / main.go changes
  • Review test files
  • Check dynamo_handler.go refactor

Overall this is a clean, well-structured slice. The layering (adapter → bridge → BucketsSource → S3Handler) mirrors the Dynamo side faithfully, the shared list_pagination.go extraction avoids future divergence, and the test coverage is solid. A few issues below — one functional truncation bug, one dead-code defect, and a couple of nits.


Issue 1 — AdminListBuckets silently truncates at 1000 buckets (functional)

adapter/s3_admin.go:42-48

kvs, err := s.store.ScanAt(ctx,
    []byte(s3keys.BucketMetaPrefix),
    prefixScanEnd([]byte(s3keys.BucketMetaPrefix)),
    s3MaxKeys, readTS)   // hard ceiling of 1000

s3MaxKeys = 1000. If a deployment has > 1000 buckets, AdminListBuckets silently returns only the first 1000, and the admin handler's own pagination (also capped at 1000) runs over this already-truncated slice — pages 2+ are unreachable.

The interface contract in buckets_source.go says "returns every bucket this server knows about" which is inaccurate in this case.

Compare with DynamoDBServer.listTableNames which iterates through all pages via scanAllByPrefixAt's internal loop — so Dynamo and S3 behave asymmetrically. Since the PR explicitly ties the admin scan to the SigV4 path's behavior ("so a SigV4 listBuckets and the admin dashboard cannot drift"), the limit is inherited from the SigV4 path — but the interface doc and the pagination handler both imply unbounded coverage.

Suggested fix: either (a) document the 1000-bucket ceiling on the BucketsSource.AdminListBuckets interface and add the same caveat to the adapter docstring, or (b) make AdminListBuckets use a paginated loop like scanAllByPrefixAt on the Dynamo side. Option (a) is lower risk for this slice.

Fix this →


Issue 2 — Dead code: page == nil is unreachable in paginateBuckets

internal/admin/s3_handler.go:189-213

page := buckets[start:end]        // slice expression on non-nil slice → always non-nil
if end < len(buckets) && len(page) > 0 {
    return page, page[len(page)-1].Name
}
if page == nil {                  // ← unreachable; slice expression never returns nil
    return []BucketSummary{}, ""
}
return page, ""

A slice expression on a non-nil slice in Go is never nil, even when the resulting length is 0. The if page == nil branch is dead code. The parallel paginateDynamoTableNames in dynamo_handler.go does not have this guard (it just returns the empty sub-slice), confirming the guard isn't needed.

This won't produce a wrong result today (the empty sub-slice returned by the last return page, "" is correct), but the dead branch adds noise and implies a nil case that can't exist.

Fix this →


Issue 3 — Orphaned comment atop paginateDynamoTableNames (nit)

internal/admin/dynamo_handler.go:712-720

// parseDynamoListLimit translates the ?limit= query parameter into a
// concrete page size. The shared parseListLimit lives in
// list_pagination.go; this comment is preserved here only because
// it documents the historical rationale ...
//
// paginateDynamoTableNames slices `names` ...
func paginateDynamoTableNames(...) { ... }

The first paragraph references parseDynamoListLimit, which this PR removed. The comment header no longer matches the function below it. Future readers will be confused about which function is being introduced. The second paragraph is the real doc for paginateDynamoTableNames. The first paragraph should either be removed entirely or, if the historical rationale is worth keeping, converted to a comment inside parseListLimit in list_pagination.go.

Fix this →


Issue 4 — hlcPhysicalShift can silently drift from kv.hlcLogicalBits (low-risk nit)

internal/admin/s3_handler.go:31

const hlcPhysicalShift = 16

kv/hlc.go has const hlcLogicalBits = 16. The comment explains the import-cycle rationale ("admin must not import kv"), which is correct. But if hlcLogicalBits ever changes (e.g. a future HLC format revision), hlcPhysicalShift won't follow automatically — and the test TestFormatBucketCreatedAt_RoundtripsSecondPrecision would still pass since it encodes the constant's current value as the expected test input.

A static assertion (a compile-time test: _ = [1]struct{}{}[hlcPhysicalShift-16]) or a dedicated test that reads the constant from a shared internal/hlctypes package (if/when that ever exists) would make this drift-proof. Not blocking, but worth tracking.


Issue 5 — wrapcheck nolint at bridge boundary loses callsite frame (nit)

main_admin.go:196,208

return nil, err //nolint:wrapcheck // adapter wraps internally with cockroachdb/errors.

cockroachdb/errors preserves the stack trace from the original callsite, so the nolint is defensible. However, the bridge callsite in bucketsBridge.AdminListBuckets / AdminDescribeBucket doesn't appear anywhere in the error chain — if an operator is debugging a 500 from /admin/api/v1/s3/buckets, the trace jumps from the adapter to the handler with no indication the bridge was traversed. A errors.Wrapf(err, "admin buckets list") would add context without extra cost (the stack is already there). Minor, and consistent with the Dynamo bridge's nolint pattern, but worth mentioning.


Positive observations

  • The apiRouteTable refactor is a clear improvement — cyclomatic complexity drops from 13 branches to 6 and each new resource family adds exactly one field + a two-line predicate, no rewrites needed.
  • Applying the full protect middleware chain (session + CSRF) to read-only S3 endpoints is the right security posture — even an idempotent GET can leak bucket names to a cross-site attacker.
  • The pagination cursor round-trip test (TestS3Handler_ListBuckets_PaginationCursorRoundtrips) pins the wire format explicitly, which is exactly the right regression anchor for a user-visible API cursor.
  • The error-leak tests (require.NotContains(rec.Body.String(), "sentinel")) are a good pattern.
  • summaryFromBucketMeta centralizing the DTO projection means AdminListBuckets and AdminDescribeBucket cannot produce different shapes for the same bucket.
  • buildS3HandlerForDeps correctly defers MapRoleStore to slice 2 — read-only doesn't need it and keeping the wiring minimal now avoids shipping a half-plumbed interface.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces read-only S3 admin endpoints to the dashboard, allowing users to list and describe S3 buckets. It includes the implementation of the AdminListBuckets and AdminDescribeBucket methods in the S3 adapter, a new S3Handler in the admin package, and shared pagination logic used by both DynamoDB and S3 admin handlers. Feedback suggests ensuring that bucket listing handles potential ErrBucketsForbidden errors for consistency and addressing the use of a hard-coded limit in metadata scans to prevent incomplete results while maintaining memory safety.

Comment thread adapter/s3_admin.go Outdated
Comment on lines +42 to +45
kvs, err := s.store.ScanAt(ctx,
[]byte(s3keys.BucketMetaPrefix),
prefixScanEnd([]byte(s3keys.BucketMetaPrefix)),
s3MaxKeys, readTS)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The AdminListBuckets method uses s3MaxKeys as a hard limit for the metadata scan, which may cause the admin dashboard to show incomplete results. While this should be updated to retrieve more entries to satisfy the BucketsSource interface, ensure that a fixed upper bound is still applied to the total number of buckets retrieved to prevent unbounded memory growth and potential OOM issues from collections that grow based on external requests.

References
  1. To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections that can grow from external requests, such as pending configuration changes. Reject new requests when the bound is reached.

Comment on lines +120 to +127
if err != nil {
h.logger.LogAttrs(r.Context(), slog.LevelError, "admin s3 list buckets failed",
slog.String("error", err.Error()),
)
writeJSONError(w, http.StatusInternalServerError, "s3_list_failed",
"failed to list buckets; see server logs")
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The handleList method does not handle ErrBucketsForbidden, which is inconsistent with the implementation in handleDescribe. If the underlying source returns this error (e.g., due to future role-based access control integration), the user will receive a 500 Internal Server Error instead of a 403 Forbidden. It is better to handle this sentinel error consistently across all admin endpoints.

Suggested change
if err != nil {
h.logger.LogAttrs(r.Context(), slog.LevelError, "admin s3 list buckets failed",
slog.String("error", err.Error()),
)
writeJSONError(w, http.StatusInternalServerError, "s3_list_failed",
"failed to list buckets; see server logs")
return
}
buckets, err := h.source.AdminListBuckets(r.Context())
if err != nil {
if errors.Is(err, ErrBucketsForbidden) {
writeJSONError(w, http.StatusForbidden, "forbidden",
"this endpoint requires a full-access role")
return
}
h.logger.LogAttrs(r.Context(), slog.LevelError, "admin s3 list buckets failed",
slog.String("error", err.Error()),
)
writeJSONError(w, http.StatusInternalServerError, "s3_list_failed",
"failed to list buckets; see server logs")
return
}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab0b074a3d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/s3_admin.go Outdated
Comment on lines +42 to +45
kvs, err := s.store.ScanAt(ctx,
[]byte(s3keys.BucketMetaPrefix),
prefixScanEnd([]byte(s3keys.BucketMetaPrefix)),
s3MaxKeys, readTS)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop truncating AdminListBuckets at 1000 entries

AdminListBuckets is documented and used as the full source for admin pagination, but this scan is hard-limited to s3MaxKeys (1000). In clusters with more than 1000 buckets, /admin/api/v1/s3/buckets will silently omit everything after the first 1000, and next_token will stop advancing once the in-memory page reaches that cap, making the remaining buckets unreachable.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
internal/admin/dynamo_handler.go (1)

712-717: Stale doc-comment fragment above paginateDynamoTableNames.

The comment block now has two function headers stacked together: a paragraph describing the removed parseDynamoListLimit followed (after a blank //) by the doc for paginateDynamoTableNames. This makes paginateDynamoTableNames's godoc start with text about a function that no longer exists in this file. If the historical rationale is worth preserving, consider moving it to a package-level comment near parseListLimit in list_pagination.go (where the policy actually lives) and let paginateDynamoTableNames keep a clean one-paragraph doc.

♻️ Proposed cleanup
-// parseDynamoListLimit translates the ?limit= query parameter into a
-// concrete page size. The shared parseListLimit lives in
-// list_pagination.go; this comment is preserved here only because
-// it documents the historical rationale for the default / clamp
-// policy that the shared helper inherited.
-//
 // paginateDynamoTableNames slices `names` (already lex-sorted by the
 // adapter) into a single page starting strictly after `startAfter`.
 // The second return is the opaque cursor the client should pass back
 // for the next call, or "" if this is the last page.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/admin/dynamo_handler.go` around lines 712 - 717, The doc comment
above paginateDynamoTableNames includes a stale fragment describing
parseDynamoListLimit; remove that orphan paragraph so paginateDynamoTableNames
has a single clean godoc paragraph, and if the historical rationale for the
default/clamp policy should be preserved move it into a package-level comment
near parseListLimit in list_pagination.go (or update parseListLimit's comment)
instead of leaving it above paginateDynamoTableNames.
internal/admin/s3_handler.go (1)

119-127: handleList doesn't map ErrBucketsForbiddenhandleDescribe does. Mirror the mapping for consistency.

handleDescribe (lines 147-151) maps ErrBucketsForbidden to 403, but handleList returns 500 for the same sentinel. The PR notes describe ErrBucketsForbidden as "wired" and the test TestS3Handler_DescribeBucket_ForbiddenReturns403 pins describe — asymmetry here means slice 2's role gate, when it grows a list-side check, will silently surface 500s with a misleading s3_list_failed error code until someone notices. Better to wire both paths now while the code is small.

♻️ Proposed fix
 	buckets, err := h.source.AdminListBuckets(r.Context())
 	if err != nil {
+		if errors.Is(err, ErrBucketsForbidden) {
+			writeJSONError(w, http.StatusForbidden, "forbidden",
+				"this endpoint requires a full-access role")
+			return
+		}
 		h.logger.LogAttrs(r.Context(), slog.LevelError, "admin s3 list buckets failed",
 			slog.String("error", err.Error()),
 		)
 		writeJSONError(w, http.StatusInternalServerError, "s3_list_failed",
 			"failed to list buckets; see server logs")
 		return
 	}

A test paralleling TestS3Handler_DescribeBucket_ForbiddenReturns403 for the list path would also lock this in.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/admin/s3_handler.go` around lines 119 - 127, handleList currently
treats ErrBucketsForbidden from h.source.AdminListBuckets as a 500 but
handleDescribe maps the same sentinel to 403; update handleList to mirror
handleDescribe by detecting the sentinel ErrBucketsForbidden returned from
h.source.AdminListBuckets and return writeJSONError(w, http.StatusForbidden,
"forbidden", "access to buckets is forbidden") (and log at error level with the
same message) instead of the 500 path; ensure the error check uses the same
sentinel comparison (ErrBucketsForbidden) as in handleDescribe and add a unit
test analogous to TestS3Handler_DescribeBucket_ForbiddenReturns403 that asserts
a 403 is returned from handleList when the source returns ErrBucketsForbidden.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapter/s3_admin.go`:
- Around line 38-48: AdminListBuckets currently calls ScanAt once with s3MaxKeys
(1000) and returns that page, dropping any further keys; fix by looping the
ScanAt calls until exhaustion: initialize the scan start to
[]byte(s3keys.BucketMetaPrefix), call s.store.ScanAt(ctx, start,
prefixScanEnd([]byte(s3keys.BucketMetaPrefix)), s3MaxKeys, readTS) in a loop,
append each call's kvs into the accumulated result slice, and when len(kvs) <
s3MaxKeys break; otherwise advance start to the lexicographic successor of the
last returned key (use the last kvs element's Key and increment it to avoid
rereading) and continue; return the accumulated AdminBucketSummary list at the
end and keep the existing readTS/readPin handling.

In `@internal/admin/s3_handler.go`:
- Around line 87-94: Update the misleading comment to state that subpaths
containing "/" are rejected with 404 (not 405) so it matches the implemented
behavior in the strings.Contains(name, "/") branch and the
TestS3Handler_DescribeBucket_SubpathReturns404 test; also change the
writeJSONError call to include a non-empty message (e.g. "bucket does not exist"
or "not found") instead of the empty string so it follows the convention used
elsewhere (see writeJSONError and existing message on line with "bucket does not
exist").

---

Nitpick comments:
In `@internal/admin/dynamo_handler.go`:
- Around line 712-717: The doc comment above paginateDynamoTableNames includes a
stale fragment describing parseDynamoListLimit; remove that orphan paragraph so
paginateDynamoTableNames has a single clean godoc paragraph, and if the
historical rationale for the default/clamp policy should be preserved move it
into a package-level comment near parseListLimit in list_pagination.go (or
update parseListLimit's comment) instead of leaving it above
paginateDynamoTableNames.

In `@internal/admin/s3_handler.go`:
- Around line 119-127: handleList currently treats ErrBucketsForbidden from
h.source.AdminListBuckets as a 500 but handleDescribe maps the same sentinel to
403; update handleList to mirror handleDescribe by detecting the sentinel
ErrBucketsForbidden returned from h.source.AdminListBuckets and return
writeJSONError(w, http.StatusForbidden, "forbidden", "access to buckets is
forbidden") (and log at error level with the same message) instead of the 500
path; ensure the error check uses the same sentinel comparison
(ErrBucketsForbidden) as in handleDescribe and add a unit test analogous to
TestS3Handler_DescribeBucket_ForbiddenReturns403 that asserts a 403 is returned
from handleList when the source returns ErrBucketsForbidden.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04c9b677-e6a1-4426-baa6-291eb3b36db0

📥 Commits

Reviewing files that changed from the base of the PR and between bf67fa5 and ab0b074.

📒 Files selected for processing (13)
  • adapter/s3_admin.go
  • adapter/s3_admin_test.go
  • internal/admin/buckets_source.go
  • internal/admin/dynamo_handler.go
  • internal/admin/list_pagination.go
  • internal/admin/s3_handler.go
  • internal/admin/s3_handler_test.go
  • internal/admin/server.go
  • main.go
  • main_admin.go
  • main_admin_test.go
  • main_s3.go
  • main_s3_test.go

Comment thread adapter/s3_admin.go
Comment thread internal/admin/s3_handler.go
Six findings from Codex / Gemini / Claude on PR #658:

1) Codex P1 + Gemini medium + Claude Issue 1: AdminListBuckets
   silently truncated at s3MaxKeys (1000) buckets — pages 2+ of the
   admin paginated list were unreachable, and the BucketsSource doc
   said "every bucket". Replaced the single ScanAt with a paginated
   loop mirroring scanAllByPrefixAt on the Dynamo side. The per-call
   ScanAt page stays at 1000 (named `adminBucketScanPage` so the
   intent is explicit), but the accumulator continues until the
   metadata prefix is exhausted. The defensive prefix-mismatch halt
   keeps the loop safe under a hypothetical mid-scan layout change.
   New test `TestS3Server_AdminListBuckets_PaginatesPastSinglePage`
   exercises 1100 buckets to lock the regression.

2) Gemini medium: handleList didn't translate ErrBucketsForbidden
   to 403, while handleDescribe did. Added the same mapping so both
   read paths surface a future role gate identically.

3) Claude Issue 2: dead `page == nil` branch in paginateBuckets — a
   slice expression on a non-nil slice is itself non-nil even when
   length is zero, so the guard could not fire. Dropped the dead
   branch (paginateDynamoTableNames already does the same).

4) Claude Issue 3: orphaned comment block above
   paginateDynamoTableNames referenced parseDynamoListLimit which
   this PR removed. Cleaned up.

5) Claude Issue 4: hlcPhysicalShift drift risk vs kv.hlcLogicalBits.
   Exported `kv.HLCLogicalBits` and added
   `TestAdminHLCPhysicalShiftMatchesKVLogicalBits` in the main
   package (which already imports both kv and admin) so a future
   layout change in one without the other fails CI immediately
   rather than silently rendering every CreatedAt at the wrong hour.

6) Claude Issue 5: bridge nolint:wrapcheck lost the call-site frame
   in the error chain. Replaced both bucketsBridge sites with
   errors.Wrap / errors.Wrapf so an operator debugging a 500 sees
   the bridge in the stack trace.

The internal helper `appendAdminBucketSummaries` was extracted to
keep AdminListBuckets under the cyclomatic ceiling — same shape
the inner-loop body had before, just moved to its own function so
the scan-loop reads as a control-flow skeleton rather than a
mixed-concerns block.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

2636173 addresses all 6 findings from the previous review pass:

  1. Codex P1 / Gemini medium / Claude Issue 1: AdminListBuckets now walks the metadata prefix in a paginated loop (mirroring scanAllByPrefixAt on Dynamo). New test TestS3Server_AdminListBuckets_PaginatesPastSinglePage exercises 1100 buckets to pin the fix.
  2. Gemini medium: handleList now maps ErrBucketsForbidden → 403, symmetric with handleDescribe.
  3. Claude Issue 2: dropped the dead page == nil branch in paginateBuckets.
  4. Claude Issue 3: cleaned up the orphaned comment about removed parseDynamoListLimit.
  5. Claude Issue 4: exported kv.HLCLogicalBits and added TestAdminHLCPhysicalShiftMatchesKVLogicalBits in main (which imports both packages) as a drift guard.
  6. Claude Issue 5: replaced both bucketsBridge nolint:wrapcheck sites with errors.Wrap / errors.Wrapf so the bridge frame appears in the error chain.

Build / vet / lint / main + admin + adapter targeted tests all pass.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 3m 2s —— View job


Code Review — follow-up pass on commit 2636173

  • Gather context and understand the PR
  • Read changed files in the latest commit
  • Verify Issue 1 fix: AdminListBuckets pagination loop
  • Verify Issue 2 fix: ErrBucketsForbidden in handleList
  • Verify Issue 3 fix: dead page == nil branch removal
  • Verify Issue 4 fix: orphaned dynamo_handler comment
  • Verify Issue 5 fix: exported HLCLogicalBits + drift guard test
  • Verify Issue 6 fix: errors.Wrap in bucketsBridge
  • Fresh review pass on changed files

All 6 tracked fixes verified and correct. Details below, followed by two residual nits that weren't in the tracked list.


Issue 1 — AdminListBuckets pagination loop ✅

adapter/s3_admin.go:56-93

The loop structure is correct: ScanAt pages at adminBucketScanPage (1000), nextScanCursor advances the start key by appending a zero byte (same pattern as every other Dynamo/S3 scan loop in the adapter package), and the prefix-mismatch short-circuit in appendAdminBucketSummaries handles any layout-change defensive case. finaliseAdminBucketList sorts after accumulation rather than assuming each page arrives sorted — safe given that nextScanCursor is lex-monotone.

TestS3Server_AdminListBuckets_PaginatesPastSinglePage creates 1100 zero-padded buckets (numeric == lexicographic order) and asserts both count and first/last names — solid regression anchor.

One observation (not a defect): prefixScanEnd returns nil when the prefix is all-0xFF bytes. The loop already guards this with if end != nil && bytes.Compare(start, end) > 0, and ScanAt(ctx, start, nil, …) scans to end-of-store on a nil upper bound — behaviour consistent with the rest of the codebase. Not a new issue, just noting the nil-end path is exercised here for the first time in the admin bucket context.


Issue 2 — ErrBucketsForbidden in handleList ✅

internal/admin/s3_handler.go:119-136

handleList now mirrors handleDescribe exactly. The added comment is clear about the future slice-2 role-gate motivation. Good.

Residual gap: s3_handler_test.go was not touched in this commit. The CodeRabbit reviewer and CLAUDE.md's convention both ask for a test alongside the code fix. TestS3Handler_DescribeBucket_ForbiddenReturns403 exists for the describe path but there is no parallel TestS3Handler_ListBuckets_ForbiddenReturns403. The fix is live and correct, but the regression test is missing.

Fix this →


Issue 3 — Dead page == nil branch removal ✅

internal/admin/s3_handler.go:199-225

Branch removed. The replacement comment explains the Go slice semantics correctly. paginateBuckets now matches paginateDynamoTableNames structurally.


Issue 4 — Orphaned parseDynamoListLimit comment ✅

internal/admin/dynamo_handler.go:712-716

The comment block now only documents paginateDynamoTableNames. Clean.


Issue 5 — HLC drift guard ✅

kv/hlc.go:19, main_admin_forward_test.go:192-202

kv.HLCLogicalBits = 16 is exported with a clear docstring. TestAdminHLCPhysicalShiftMatchesKVLogicalBits encodes wallMillis << kv.HLCLogicalBits, passes it through admin.FormatBucketCreatedAt (which uses the local hlcPhysicalShift = 16), and asserts the exact RFC3339 string. If kv.HLCLogicalBits ever changes, the shift produces a different bit pattern and the formatter — still using the old hlcPhysicalShift — reads the wrong physical bits, causing the assertion to fail with a mismatched timestamp. The round-trip design is the right approach given the import-cycle constraint. Well done.


Issue 6 — errors.Wrap in bucketsBridge ✅

main_admin.go:193-213

Both AdminListBuckets and AdminDescribeBucket now add the bridge frame via errors.Wrap / errors.Wrapf. An operator debugging a 500 from /admin/api/v1/s3/buckets now sees admin buckets bridge: list: … in the chain. The dynamoTablesBridge methods still carry nolint:wrapcheck — that inconsistency pre-dates this PR and is out of scope here.


Residual nit not in the tracked list — stale "405" comment

internal/admin/s3_handler.go:87-90

// /buckets/{name}/acl is reserved for the next slice. Reject
// it with 405 here so a SPA bug that calls PUT /acl on this
// build sees a sensible error instead of mistakenly hitting
// the describe path.
if strings.Contains(name, "/") {
    writeJSONError(w, http.StatusNotFound, "not_found", "")  // ← 404, not 405

The comment says 405; the code correctly returns 404 (matching TestS3Handler_DescribeBucket_SubpathReturns404). CodeRabbit flagged this in the first review pass; it wasn't among the 6 tracked issues and was not touched in this commit. The code behaviour is right but the comment misleads the next reader. The empty message string ("") also departs from the convention used at line 171 ("bucket does not exist").

Fix this →


Summary

The 6 tracked fixes are all correct and well-implemented. Two items remain open:

  1. Missing test for handleList + ErrBucketsForbidden (per CLAUDE.md convention and the CodeRabbit suggestion that accompanied Issue 2's code fix).
  2. Stale "405" comment + empty message string in the subpath 404 path (cosmetic, pre-existing from the first commit).

Either of these could be addressed in a follow-up commit before merge, or documented as known items for slice 2. The rest of the PR is clean and ready to land.

Two residual nits from claude review #2 on 2636173:

1) Missing regression test for the new handleList ErrBucketsForbidden
   → 403 mapping. Added TestS3Handler_ListBuckets_ForbiddenReturns403
   mirroring TestS3Handler_DescribeBucket_ForbiddenReturns403 — both
   read endpoints are now symmetric in coverage, locking down the
   slice-2 role-gate contract.

2) Stale comment said "Reject it with 405" but the code (and the
   pinned TestS3Handler_DescribeBucket_SubpathReturns404 test)
   returns 404 — the comment was inherited from an earlier draft
   when 405 was the intended status. Updated the comment to match
   the code, and replaced the empty error-message string with
   "no admin S3 handler is registered for this path" so the
   response shape matches the convention used elsewhere in the
   file (CodeRabbit minor + claude residual nit).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

1bbd1da addresses both residual nits from your previous review:

  1. Missing test (Issue 2 follow-up): Added TestS3Handler_ListBuckets_ForbiddenReturns403 mirroring the describe-side test. Both read paths now have symmetric coverage of the slice-2 role-gate contract.
  2. Stale 405 comment (CodeRabbit minor): updated to say 404 (matches the code + the pinned TestS3Handler_DescribeBucket_SubpathReturns404 test). Replaced the empty error message with "no admin S3 handler is registered for this path" to match the convention elsewhere in the file.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 1m 23s —— View job


Code Review — follow-up pass on commit 1bbd1da

  • Gather context
  • Read s3_handler.go — verify stale 405 comment fix
  • Read s3_handler_test.go — verify TestS3Handler_ListBuckets_ForbiddenReturns403
  • Final review pass

Both residual nits verified and clean. Nothing new.

Fix 1 — Stale 405 comment (internal/admin/s3_handler.go:87-96) ✅

The comment now correctly says "404", references TestS3Handler_DescribeBucket_SubpathReturns404, and includes a one-line note about CodeRabbit catching the original error — helpful provenance for future readers. The writeJSONError call now passes "no admin S3 handler is registered for this path" which matches the convention throughout the file. Comment and code are now in sync.

Fix 2 — Missing list-path forbidden test (internal/admin/s3_handler_test.go:194-202) ✅

TestS3Handler_ListBuckets_ForbiddenReturns403 is structurally symmetric with TestS3Handler_DescribeBucket_ForbiddenReturns403:

  • Uses listErr: ErrBucketsForbidden to trigger the sentinel path
  • Asserts http.StatusForbidden (403)
  • Asserts "forbidden" in the response body (writeJSONError encodes code as the error field, so "forbidden" lands in the JSON)

The docstring accurately describes the future role-gate context and the regression it's pinning. No gap between test intent and implementation.


Overall verdict: All 6 originally tracked issues and both residual nits are resolved. The PR is clean — logic, tests, comments, and error contracts are all consistent. Ready to merge.

@bootjp bootjp merged commit 12e8c11 into main Apr 26, 2026
9 checks passed
@bootjp bootjp deleted the feat/admin-s3-readonly branch April 26, 2026 11:07
bootjp added a commit that referenced this pull request Apr 26, 2026
Replaces PR #659, which conflicted heavily after main moved (PR #649
squashed; PR #658 added S3 admin endpoints; the Approximate counters
implementation now lives directly in adapter/sqs_catalog.go).

This PR:

Backend (adapter/sqs_admin.go + internal/admin/sqs_handler.go):
- SQSServer.AdminListQueues / AdminDescribeQueue / AdminDeleteQueue
  are SigV4-bypass entrypoints, mirroring the AdminListTables /
  AdminListBuckets pattern.
- AdminDescribeQueue uses the existing scanApproxCounters from
  sqs_catalog.go (already on main) so the admin path returns the
  same Visible / NotVisible / Delayed numbers as
  GetQueueAttributes("All") would, taken at one snapshot read TS.
- sqsQueuesBridge in main_admin.go re-shapes
  adapter.AdminQueueSummary into admin.QueueSummary, keeping
  internal/admin free of the heavy adapter dependency tree —
  same pattern as dynamoTablesBridge / s3BucketsBridge.
- admin.QueuesSource is opt-in; deployments that don't run
  --sqsAddress leave /admin/api/v1/sqs/* off the wire and the
  SPA renders a soft "endpoint pending" notice on the 404.
- Role re-evaluation against the live RoleStore on DELETE so a
  downgraded key cannot keep mutating with a still-valid JWT.
- apiRouteTable.dispatch refactored: resourceHandlerFor extracted
  so the dispatcher stays under cyclop=10 as new resources land
  (Dynamo, S3, SQS, future).

Frontend (web/admin/src/pages/SqsList.tsx, SqsDetail.tsx):
- /sqs queue list with refresh + per-row link to detail.
- /sqs/:name detail showing FIFO badge, counters card (Visible /
  In-flight / Delayed), raw attributes table, and a Delete
  confirmation Modal gated by RequireFullAccess.
- api/client.ts gains listQueues / describeQueue / deleteQueue
  with the same AbortSignal pattern used for cluster / dynamo /
  s3 reads.
- Layout nav adds an SQS tab between DynamoDB and S3.

Out of scope (recorded in the SQS partial design doc §16.2):
- PurgeQueue from the SPA. Underlying purgeQueueWithRetry is on
  main; the admin entrypoint is a trivial follow-up.
- Send / Peek / CreateQueue from the SPA. Each needs its own
  adapter entrypoint and form UX; deferred to keep this PR
  focused.

Verified with go build ./..., go test -race ./internal/admin/...,
go test -race -run TestSQS ./adapter/, go test -run TestStartAdmin .,
golangci-lint run ./adapter/... ./internal/admin/... ./...
(0 issues, no //nolint), and cd web/admin && npm run build.
bootjp added a commit that referenced this pull request Apr 26, 2026
Latest claude review on #675 flagged that lines 3 and 6 still
described P2 as fully shipped:

  Line 3: "P1, **P2**, and P3 of the implementation plan have shipped"
  Line 6: "after P1 + **P2** + P3 landed"

But the status table I added in 7617daf correctly marks P2 as
🟡 partial (read-only slice 1 in #658 landed; write paths #669
and #673 still in flight). The header prose contradicting the
table is exactly the kind of inconsistency that triggers the
rename to `_partial_` in the first place — readers skimming the
top of the doc would conclude P2 was done.

Rewrote the Status line and the parenthetical on Last updated to
match the table, calling out P1 + P3 as fully shipped, P2 as the
read-only slice only, and P4 as TLS/role/CSRF live with the
operator doc landing now. Pointed readers at the per-phase status
table for the breakdown.

No other changes; the table itself was already correct.
bootjp added a commit that referenced this pull request Apr 26, 2026
Per `docs/design/README.md`'s lifecycle convention, the admin dashboard
design doc is now "partial":

- **P1** (DynamoDB CRUD + AdminForward) — shipped via #634, #635, #644,
#648
- **P2** (S3 buckets list/create/delete/ACL + DescribeTable) — shipped
via #658, with #669 + #673 in flight
- **P3** (React SPA + embed) — shipped via #649, #650
- **P4** (TLS / role / CSRF / operator docs) — TLS, role, CSRF are
already live in P1; operator docs in #674

Independent of the in-flight slice 2 PRs (#669/#673) and the docs PR
(#674) — this rename only reflects what is already on main today, plus
an "Implementation status" table mapping each phase to the PR it landed
in.

## What this PR changes

- `git mv` the design doc from `2026_04_24_proposed_admin_dashboard.md`
to `2026_04_24_partial_admin_dashboard.md` so its history follows
- Add an "Implementation status" header table indexing each phase to the
PRs that landed it
- List the outstanding open items so future readers know what is still
owed against the original proposal:
- AdminForward acceptance criterion 5 (rolling-upgrade compat flag) —
deferred
  - S3 object browser — explicitly out of scope per Section 2 Non-goals
  - TLS cert hot-reload — restart-to-rotate is the documented model

When the rolling-upgrade flag lands, the doc gets renamed once more to
`2026_04_24_implemented_admin_dashboard.md` per the README's lifecycle
convention.
bootjp added a commit that referenced this pull request Apr 26, 2026
P2 slice 2b: a follower-side S3 admin write (POST /buckets, PUT
/buckets/{name}/acl, DELETE /buckets/{name}) now hands off to the leader
transparently, completing the same end-to-end forwarding contract Dynamo
writes received in #644 + #648.

**Stacked on #669 (P2 slice 2a).** Once #669 merges, this rebases
cleanly onto main.

## Summary

- **Proto** — three new `ADMIN_OP` enum values (`CREATE_BUCKET` /
`DELETE_BUCKET` / `PUT_BUCKET_ACL`) appended after the Dynamo block so
existing wire integers stay stable. Regenerated with the pinned protoc
29.3 / protoc-gen-go 1.36.11 / protoc-gen-go-grpc 1.6.1.
- **Leader-side** — `ForwardServer.WithBucketsSource` lets deployments
wire the S3 dispatcher optionally; Dynamo-only builds keep it nil and
the new operations return 501. Three new dispatch arms mirror the
leader-direct HTTP path's payload contract (NUL-byte rejection, 64 KiB
limit, `DisallowUnknownFields`, trailing-token rejection, slash-in-name
rejection). `forwardBucketsErrorResponse` mirrors
`forwardErrorResponse`: `ErrBucketsForbidden` / `NotLeader` / `NotFound`
/ `AlreadyExists` / `NotEmpty` + `*ValidationError` each map to the same
HTTP status `writeBucketsError` produces, so forwarded and leader-direct
responses are byte-identical from the SPA's view.
- **Follower-side** — `LeaderForwarder` interface gains
`ForwardCreateBucket` / `ForwardDeleteBucket` / `ForwardPutBucketAcl`.
`ForwardPutBucketAcl` carries both the bucket name (URL path) and the
new ACL (request body) in one JSON payload — same approach
`ForwardDeleteTable` takes for the table name.
- **Handler integration** — `S3Handler.WithLeaderForwarder` +
`tryForwardCreateBucket` / `tryForwardPutBucketAcl` /
`tryForwardDeleteBucket` gated on `ErrBucketsNotLeader && forwarder !=
nil`. `writeForwardResult` / `writeForwardFailure` mirror the Dynamo
handler's pattern (nosniff + Cache-Control:no-store + Retry-After:1 on
503).
- **Wiring** — `runtimeServerRunner.start()` now creates
`*adapter.S3Server` before `startRaftServers` so the leader-side
`ForwardServer` registration sees both adapters.
`adminForwardServerDeps.buckets` is wired through
`registerAdminForwardServer`. `ServerDeps.Forwarder` now plumbs into
`S3Handler` too.

## What is NOT in this PR

- Rolling-upgrade compatibility flag (criterion 5) — still deferred
behind a cluster-version bump.

## Test plan

- [x] `go build ./...`
- [x] `go vet ./...`
- [x] `golangci-lint run` (admin + main + adapter + proto: 0 issues)
- [x] `go test ./internal/admin/ -count=1 -race` — full admin suite
passes (19 new tests across forward_server / forward_client / s3_handler
integration)
- [x] `go test . -count=1 -race` — main package passes
- [x] Targeted adapter tests pass
- [ ] Full `./adapter/` times out due to a pre-existing flake (verified
earlier on PRs #648 / #658 / #669 — unrelated to this branch)
- [ ] End-to-end smoke against a 3-node cluster — manual exercise after
#669 + this land

## Acceptance criteria coverage (Section 3.3.2 + 4.1)

| # | Criterion | Status |
|---|---|---|
| 1 | Leader direct write | ✓ for S3 (#669 / slice 2a) |
| 2 | Follower forwards transparently | ✅ for S3 (this PR) |
| 3 | Election-period 503 + retry | ✅ for S3 (this PR — same fallback as
Dynamo) |
| 4 | Leader demotes stale full role | ✅ for S3 (existing
`validatePrincipal` runs before bucket dispatch) |
| 5 | Rolling-upgrade compat flag | ⏳ deferred (cluster-version bump) |
| 6 | `forwarded_from` in audit log | ✅ for S3 (this PR — leader-side
audit log carries it) |

## Self-review (5 lenses)

1. **Data loss**: No FSM / Raft / Pebble path changes. The leader-side
dispatcher reuses the same `BucketsSource` interface that the
leader-direct HTTP path uses; the only new code path is the JSON
envelope + principal re-validation.
2. **Concurrency**: New methods on `gRPCForwardClient` reuse the
existing `forward()` helper, so connection-cache reuse and
`ErrLeaderUnavailable` signalling behave identically across resource
types. The `runtimeServerRunner.start()` reorder is safe — each adapter
listens on its own address and the raft TCP listeners are independent.
3. **Performance**: One additional gRPC round-trip on follower-side
admin writes; identical to Dynamo's path.
4. **Data consistency**: Leader-side dispatcher applies the same strict
body decode the leader-direct HTTP path does (NUL-byte rejection,
trailing-token rejection, slash-in-name rejection).
`forwardBucketsErrorResponse` produces byte-identical envelopes to
`writeBucketsError` so forwarded and leader-direct responses are
indistinguishable.
5. **Test coverage**: 19 new tests — 9 forward-server (happy paths, 501
no-source, bad-JSON, already-exists, not-empty, slash-in-name,
missing-acl, payload-too-large sweep), 4 forward-client (3 happy-path
payload shapes + ErrLeaderUnavailable for all three), 6
handler-integration (forwarded create/delete/put-acl happy +
leader-unavailable + transport-error + 3-axis non-NotLeader gate sweep).
bootjp added a commit that referenced this pull request Apr 26, 2026
)

## Summary

**Replaces #659**, which has unresolvable conflicts now that main has
moved on (PR #649 squashed into main; PR #658 added the S3 admin
endpoints; the Approximate counters implementation now lives directly in
`adapter/sqs_catalog.go` via `scanApproxCounters`). Rather than a
multi-day rebase, this PR re-applies the unique SQS admin code on a
fresh branch off current main.

What survived from #659:
- `adapter/sqs_admin.go` — `SQSServer.AdminListQueues` /
`AdminDescribeQueue` / `AdminDeleteQueue` (SigV4-bypass entrypoints,
same shape as `AdminListTables` / `AdminListBuckets`).
- `internal/admin/sqs_handler.go` — HTTP handler for
`/admin/api/v1/sqs/queues{,/{name}}` with role re-evaluation on DELETE.
- `web/admin/src/pages/SqsList.tsx` / `SqsDetail.tsx` — SPA pages for
the queues view + delete confirmation.

What changed during the re-apply:
- `AdminQueueCounters` is now `int64` (matches `sqsApproxCounters` from
main; bridge does no width conversion).
- `AdminDescribeQueue` calls main's `scanApproxCounters` instead of the
duplicate `computeApproxCounters` from the old branch — same numeric
output, single implementation.
- Dropped the `CountersTruncated` field; main's counter type doesn't
expose truncation. SPA's "truncated" pill came out with it.
- `apiRouteTable.dispatch` refactored to extract `resourceHandlerFor` so
the dispatcher stays under cyclop=10 as new resources land.

## Backend

- Re-evaluates the principal's role against the live `MapRoleStore` on
every `DELETE` so a downgraded key cannot keep mutating with a
still-valid JWT (Codex P1 pattern from earlier admin PRs).
- `admin.QueuesSource` is **opt-in**: deployments without `--sqsAddress`
leave `/admin/api/v1/sqs/*` off the wire; the SPA renders a soft
"endpoint pending" notice on the 404, mirroring the Tables / Buckets
`nil` contract.
- The bridge in `main_admin.go` (`sqsQueuesBridge`,
`convertAdminQueueSummary`, `translateAdminQueuesError`) keeps
`internal/admin` free of the heavy adapter dependency tree, same
architectural pattern as Dynamo and S3.

## Frontend

- **/sqs** queue list with refresh + per-row link to detail.
- **/sqs/:name** detail showing FIFO badge, counters card (Visible /
In-flight / Delayed), raw attributes table, and a Delete confirmation
`Modal` gated by `RequireFullAccess`.
- `api/client.ts` gains `listQueues` / `describeQueue` / `deleteQueue`
with the same `AbortSignal` pattern used for `cluster` / `dynamo` / `s3`
reads.
- Layout nav adds an SQS tab between DynamoDB and S3.

## Out of scope (recorded in
`docs/design/2026_04_24_proposed_sqs_compatible_adapter.md` Section 14,
deferred per the SQS partial doc §16.2)

- **PurgeQueue from the SPA** — the underlying `purgeQueueWithRetry`
adapter method is on main; the admin entrypoint is a trivial follow-up.
- **Send / Peek / CreateQueue from the SPA** — each needs its own
SigV4-bypass adapter entrypoint and form UX; deferred to keep this PR
focused.

## Test plan

- [x] `go build ./...` — clean
- [x] `go test -race ./internal/admin/...` — passes
- [x] `go test -race -run TestSQS ./adapter/` — passes
- [x] `go test -run TestStartAdmin .` — passes
- [x] `golangci-lint run ./adapter/... ./internal/admin/... ./...` — `0
issues.`, no `//nolint`
- [x] `cd web/admin && npm run build` — 49 modules, 199 KB JS / 61 KB
gzip + 14.7 KB CSS
- [ ] Manual smoke (after PR lands): start a node with `--sqsAddress
:4566 --adminEnabled --adminAllowInsecureDevCookie`, create a queue,
send a few messages, hit `/admin/sqs/<name>` → counters match
`GetQueueAttributes("All")`, Delete dialog returns to list.

## Self-review (5 lenses)

1. **Data loss** — Delete reuses the existing `deleteQueueWithRetry` OCC
path; counters are read-only. No new write paths.
2. **Concurrency** — Per-request leader check on Delete; counters scan
uses one snapshot read TS.
3. **Performance** — Counters bounded by main's existing
`sqsApproxCounterScanLimit`; admin reads are cheap point lookups + one
bounded scan.
4. **Data consistency** — `AdminDescribeQueue` and SigV4
`GetQueueAttributes` both call `scanApproxCounters` at a fresh
`nextTxnReadTS`, so a single point in time produces the same counters
via either surface.
5. **Test coverage** — Existing admin / SQS race suites stay green via
the new `nil` Queues argument added to `startAdminServer` call sites;
the new bridge is exercised by the cross-package build itself.

## Stacking

This PR is **independent** — branched from current `main` (which has the
merged versions of #649 / #658 / #650 / counter implementation). Closing
#659 in favour of this clean rewrite.
bootjp added a commit that referenced this pull request Apr 26, 2026
P2 slice 2a of
[docs/design/2026_04_24_proposed_admin_dashboard.md](https://github.com/bootjp/elastickv/blob/main/docs/design/2026_04_24_proposed_admin_dashboard.md).
Ships the S3 admin write endpoints so the SPA's S3 modals stop receiving
405. Slice 2b will plumb AdminForward so a follower can hand these
writes off to the leader transparently.

## Summary

- **`*adapter.S3Server.AdminCreateBucket` / `AdminPutBucketAcl` /
`AdminDeleteBucket`** — SigV4-bypass write methods with three in-method
guards: principal must be `AdminRoleFull`, the local node must be the
verified S3 leader, and bucket-name / ACL must pass the existing
validators. `AdminCreateBucket` reuses the same atomic bucket-meta + ACL
+ generation-key txn the SigV4 path does — no new code path through the
storage layer. `AdminDeleteBucket` rejects non-empty buckets (the
dashboard cannot force recursive delete by design).
- **`internal/admin` write surface** — `BucketsSource` gains the three
write methods + `CreateBucketRequest` / `PutBucketACLRequest` types with
the documented JSON shapes. `S3Handler.serveCollection` +
`servePerBucket` route POST/PUT/DELETE through dedicated handlers with
`principalForWrite` re-validating the role on every request against the
live `MapRoleStore`.
- **Strict body decoder** — `decodeAdminS3JSONBody` is generic over the
request type, applies `DisallowUnknownFields`, rejects NUL bytes,
rejects trailing tokens, and caps at 64 KiB (matches design 4.4). Used
by both POST and PUT.
- **`writeBucketsError`** translates the source-side sentinels into the
design's HTTP statuses: 403 forbidden / 503 + Retry-After:1
leader_unavailable / 404 not_found / 409 already_exists / 409
bucket_not_empty / 400 invalid_request via `*ValidationError`.
- **Bridge** — `bucketsBridge` gains write methods running through
`translateAdminBucketsError`, mirroring `translateAdminTablesError` on
the Dynamo side. Leader-churn sentinels from the kv coordinator route to
`admin.ErrBucketsNotLeader` so the SPA's retry contract stays intact.

## What is NOT in this PR

- AdminForward integration for S3 admin writes — slice 2b.
- Rolling-upgrade compatibility flag (criterion 5) — still deferred
behind a cluster-version bump.

## Test plan

- [x] `go build ./...`
- [x] `go vet ./...`
- [x] `golangci-lint run` (admin + main + adapter packages: 0 issues)
- [x] `go test ./internal/admin/ -count=1 -race` — 19 new handler tests
pass
- [x] `go test . -count=1 -race` — main package passes
- [x] `go test -run "TestS3Server_Admin" ./adapter/ -count=1 -race` — 10
new adapter tests pass
- [ ] Full `go test ./adapter/` times out at 120s due to a pre-existing
flake (verified earlier on PRs #648 / #658 against `main` — unrelated to
this branch)
- [ ] End-to-end smoke against a 3-node cluster — slice 2b first, then a
manual exercise

## Acceptance criteria coverage (Section 4.1)

| Endpoint | This PR |
|---|---|
| `GET /admin/api/v1/s3/buckets` | ✓ (#658) |
| `GET /admin/api/v1/s3/buckets/{name}` | ✓ (#658) |
| `POST /admin/api/v1/s3/buckets` | ✅ |
| `PUT /admin/api/v1/s3/buckets/{name}/acl` | ✅ |
| `DELETE /admin/api/v1/s3/buckets/{name}` | ✅ |

## Self-review (5 lenses)

1. **Data loss**: `AdminCreateBucket` reuses `s.coordinator.Dispatch`
with the same `OperationGroup` shape as the SigV4 path — bucket meta +
generation key in one txn. No new FSM / Pebble / Raft path.
`AdminDeleteBucket`'s "must be empty" guard is a SnapshotAt scan + size
check identical to the SigV4 path.
2. **Concurrency**: Writes go through `retryS3Mutation` which already
handles transient mid-dispatch leader churn. The leader check is
`isVerifiedS3Leader` — same primitive the SigV4 path uses. Role gate is
re-evaluated against the live `MapRoleStore` on every request, so a key
downgrade picked up between login and write is enforced immediately.
3. **Performance**: One additional load-bucket-meta read on PutACL /
Delete. No hot-path changes; admin writes are operator-rate, not
data-plane-rate.
4. **Data consistency**: `AdminCreateBucket` writes (BucketMetaKey,
BucketGenerationKey) atomically. `AdminPutBucketAcl` mutates only
`meta.Acl` and re-encodes the entire BucketMeta — generation is
preserved so existing object references stay valid. `AdminDeleteBucket`
removes only BucketMetaKey (BucketGenerationKey is left behind, matching
the SigV4 path's behaviour — a future re-create gets a fresh
generation).
5. **Test coverage**: 29 new tests (19 admin-package + 10 adapter-level)
covering happy paths, role gates, leader checks, validation rejections,
all four sentinel error mappings, and the cross-method missing-principal
401. The existing `TestS3Handler_DescribeBucket_SubpathReturns404` was
superseded by two more precise tests now that `/acl` is a real
sub-resource.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* S3 bucket admin endpoints: create, update ACL, and delete with
enforced write authorization, input validation, leader-forwarding, and
clear HTTP error mappings.
* **Documentation**
* New admin docs covering dashboard config, TLS/role semantics, audit
logging, and troubleshooting for admin operations.
* **Tests**
* Extensive end-to-end and unit tests for bucket lifecycle, forwarding,
auth/validation, and error scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
bootjp added a commit that referenced this pull request Apr 27, 2026
## Summary

State-refresh of `docs/design/2026_04_24_partial_admin_dashboard.md` now
that all four phases of the original P1–P4 plan have shipped.

| Phase | Was | Now |
|---|---|---|
| P1 | shipped | shipped (no change) |
| P2 | partial — read-only #658 only | **shipped** (+ #669 writes + #673
forwarding) |
| P3 | shipped | shipped (no change) |
| P4 | mostly | **shipped** (+ operator doc #674, deployment runbook
#669, script wiring #669+#678) |

Also added the **AdminDeleteBucket TOCTOU** to Outstanding open items —
coderabbitai flagged it during PR #669 review (pre-existing race,
`s3.go:deleteBucket` inherits the same shape; recorded as a code comment
on `AdminDeleteBucket` for the future fix).

Doc stays at `_partial_` rather than promoting to `_implemented_`
because two outstanding items remain: AdminForward acceptance criterion
5 (rolling-upgrade flag, explicitly deferred at design time) and the new
TOCTOU. Updated the closing rename trigger to cover both.

## Test plan

- [x] No design changes — this is a state refresh
- [x] Status table entries cross-checked against the merge commits on
main
- [x] AdminDeleteBucket TOCTOU description matches the code comment in
adapter/s3_admin.go and the operator-side workaround in
docs/admin_deployment.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant