Skip to content

admin: P1 foundation (auth, router, cluster, listener) — no writes yet#623

Open
bootjp wants to merge 14 commits intomainfrom
feat/admin-dashboard-impl
Open

admin: P1 foundation (auth, router, cluster, listener) — no writes yet#623
bootjp wants to merge 14 commits intomainfrom
feat/admin-dashboard-impl

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 24, 2026

Summary

First PR toward the admin dashboard designed in docs/design/2026_04_24_proposed_admin_dashboard.md (merged as #611). Introduces the read-only foundation: listener wiring, auth, router, middleware, cluster info / healthz endpoints. No write endpoints are included — per the design doc P1 DoD they ship together with AdminForward and the 3.3.2 acceptance criteria in a follow-up.

What is in scope

  • internal/admin/ package: config validation, JWT (HS256 + 2-key rotation), strict-prefix router, middleware chain (body limit, session auth, role gate, CSRF double-submit, audit slog), login/logout with per-IP rate limiter, cluster + healthz handlers, Server facade.
  • main_admin.go: flag wiring, config-to-admin.Config translation, TLS and loopback enforcement, errgroup lifecycle registration.
  • Full unit tests and two in-process integration tests (plaintext listener + self-signed TLS).

What is NOT in scope (deferred to follow-up PRs)

  • AdminForward internal gRPC RPC and follower→leader forwarding (Section 3.3.2 acceptance criteria 1–6).
  • Adapter internal entrypoints taking AuthPrincipal (DynamoDB CreateTable/DeleteTable, S3 CreateBucket/DeleteBucket/PutBucketAcl).
  • Any write endpoint on the admin surface.
  • React SPA + go:embed (design P3).

This keeps the first PR focused and reviewable; the DoD remains respected because no write endpoint ships without the acceptance criteria being green.

Security posture

  • Admin is off by default (-adminEnabled=false), default bind is loopback.
  • Non-loopback bind without TLS is a hard startup failure unless the explicit opt-out flag is set.
  • Session cookie: HttpOnly + Secure + SameSite=Strict + Path=/admin + Max-Age=3600.
  • CSRF: double-submit cookie; localStorage is never used.
  • Role overlap between read_only_access_keys and full_access_keys is a hard startup failure (no silent last-writer-wins).
  • JWT signing key is cluster-shared and rotatable (primary + previous); missing key with admin enabled is a hard startup failure.
  • Login is rate-limited to 5 req/min per IP, constant-time secret comparison.
  • POST/PUT bodies are capped at 64 KiB via http.MaxBytesReader.
  • Every state-changing admin request is logged with admin_audit slog attributes.

Test plan

  • go test -race ./internal/admin/... . — green
  • golangci-lint run ./... --timeout=5m — 0 issues
  • In-process boot of admin listener over plaintext; /admin/healthz returns ok
  • In-process boot with self-signed TLS; /admin/healthz over HTTPS returns ok
  • Invalid config (non-loopback without TLS, missing signing key, duplicate role assignment, wrong-length base64 key) rejected at startup with a descriptive error
  • Login happy path issues both cookies with the hardened attributes verified via regex
  • Rate-limiter test: 6th login attempt from the same IP returns 429 + Retry-After: 60; different IPs are independent
  • JWT verifier accepts tokens signed by the previous key during rotation and rejects them after rotation completes
  • Follow-up PR: AdminForward + write endpoints + acceptance-criteria test matrix

Related

Lays down the read-only slice of the admin dashboard per
docs/design/2026_04_24_proposed_admin_dashboard.md. No write endpoints
yet — the 3.3.2 acceptance criteria still block those, and they ship
together with AdminForward in a follow-up.

internal/admin:
  - Config validation: hard startup failure on missing signing key,
    non-loopback without TLS, duplicate role assignments, wrong-length
    HS256 keys.
  - JWT signer/verifier: HS256, 1h TTL, accepts primary + previous key
    for rotation, rejects expired / future-dated / unknown-role tokens.
  - Router with strict prefix ordering so /admin/api/v1/* and
    /admin/healthz are never shadowed by the SPA fallback.
  - Middleware: 64 KiB BodyLimit + MaxBytesError translation,
    SessionAuth, RequireWriteRole, CSRF double-submit, slog-backed
    admin_audit logger for write paths.
  - Login / logout: Content-Type and per-IP 5/min rate limit,
    constant-time credential comparison, HttpOnly+Secure+SameSite=Strict
    session cookie + separate SPA-readable CSRF cookie on success,
    explicit 403 when a valid access key is not in the admin role table.
  - Cluster + healthz handlers behind a small ClusterInfoSource
    interface so the handler stays tested without main.go types.
  - Server ties everything together; NewServer returns a hard error
    on missing dependencies so misuse is caught at startup.

main_admin.go + flags:
  - New admin-* CLI flags mirroring the Config surface.
  - startAdminServer validates config, builds the signer/verifier,
    loads the existing S3 credential map as the admin credential
    store, and attaches the server lifecycle to errgroup with a clean
    shutdown path.
  - ClusterInfoSource reads leader/state from raftGroupRuntime.

Tests: happy paths + every failure branch across config, JWT,
router, middleware, auth, cluster, and an in-process main_admin
integration test that boots the real listener over plaintext and
TLS. golangci-lint clean. go test ./... green.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 45 minutes and 54 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 45 minutes and 54 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 959800b2-e81c-4fde-831e-f03f9f08bc3f

📥 Commits

Reviewing files that changed from the base of the PR and between 5177c5e and a4297f8.

📒 Files selected for processing (26)
  • internal/admin/auth_audit_test.go
  • internal/admin/auth_handler.go
  • internal/admin/auth_handler_test.go
  • internal/admin/cluster_handler.go
  • internal/admin/cluster_handler_test.go
  • internal/admin/config.go
  • internal/admin/config_test.go
  • internal/admin/jwt.go
  • internal/admin/jwt_test.go
  • internal/admin/logout_csrf_test.go
  • internal/admin/middleware.go
  • internal/admin/middleware_test.go
  • internal/admin/principal.go
  • internal/admin/ratelimit.go
  • internal/admin/ratelimit_test.go
  • internal/admin/router.go
  • internal/admin/router_test.go
  • internal/admin/server.go
  • internal/admin/server_nil_creds_test.go
  • internal/admin/server_test.go
  • internal/raftengine/etcd/wal_purge.go
  • internal/raftengine/etcd/wal_purge_test.go
  • internal/raftengine/etcd/wal_store_test.go
  • main.go
  • main_admin.go
  • main_admin_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/admin-dashboard-impl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new admin HTTP service for the application, including authentication via JWTs, role-based access control, rate limiting, and an audit logging middleware. The implementation includes a router for managing API endpoints, static assets, and an SPA fallback. My review identified several security and reliability concerns: the cluster handler exposes raw internal errors, the credential comparison logic is vulnerable to timing attacks, the audit middleware fails to capture actors for login/logout events, the secret key trimming may cause authentication inconsistencies, and the rate limiter's cleanup logic does not sufficiently prevent potential memory exhaustion from unbounded map growth.

Comment thread internal/admin/cluster_handler.go Outdated
}
info, err := h.source.Describe(r.Context())
if err != nil {
writeJSONError(w, http.StatusInternalServerError, "cluster_describe_failed", err.Error())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Exposing the raw error message from the cluster info source in the HTTP response can lead to information leakage about the internal state or configuration of the system. It is safer to log the detailed error for internal troubleshooting and return a generic error message to the client.

Comment thread internal/admin/auth_handler.go Outdated
Comment on lines +182 to +192
dummy := strings.Repeat("x", dummySecretLen)
_ = subtle.ConstantTimeCompare([]byte(req.SecretKey), []byte(dummy))
writeJSONError(w, http.StatusUnauthorized, "invalid_credentials",
"access_key or secret_key is invalid")
return AuthPrincipal{}, false
}
if subtle.ConstantTimeCompare([]byte(req.SecretKey), []byte(expected)) != 1 {
writeJSONError(w, http.StatusUnauthorized, "invalid_credentials",
"access_key or secret_key is invalid")
return AuthPrincipal{}, false
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The credential comparison logic is susceptible to a timing attack that leaks the length of the secret key. subtle.ConstantTimeCompare returns 0 immediately if the lengths of the input slices differ, allowing an attacker to determine the length of the expected secret by measuring response times. To mitigate this, you should hash both the provided secret and the expected (or dummy) secret using a fixed-length cryptographic hash (e.g., SHA-256) before performing the constant-time comparison.

Comment on lines +228 to +237
principal, _ := PrincipalFromContext(r.Context())
logger.LogAttrs(r.Context(), slog.LevelInfo, "admin_audit",
slog.String("actor", principal.AccessKey),
slog.String("role", string(principal.Role)),
slog.String("method", r.Method),
slog.String("path", r.URL.Path),
slog.Int("status", rec.status),
slog.String("remote", r.RemoteAddr),
slog.Duration("duration", time.Since(start)),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Audit middleware does not capture the actor (AccessKey) for login and logout requests because these endpoints do not use the SessionAuth middleware to populate the principal in the request context. Consequently, the audit logs for these critical security events will have an empty actor field. Consider updating the login handler to set the principal in the context upon successful authentication (or even for failed attempts) so the middleware can record it, or implement specific auditing within the authentication handlers.

Comment thread internal/admin/auth_handler.go Outdated
Comment on lines +162 to +163
req.AccessKey = strings.TrimSpace(req.AccessKey)
req.SecretKey = strings.TrimSpace(req.SecretKey)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Trimming whitespace from the SecretKey is generally discouraged as secrets should be treated as opaque credentials. If the underlying S3 or DynamoDB adapters do not perform this trimming, it creates an inconsistency where a credential might be accepted by the admin service but rejected by other protocol adapters (or vice versa), leading to potential security discrepancies and difficult-to-debug authentication issues.

Comment thread internal/admin/ratelimit.go Outdated
Comment on lines +57 to +63
if len(rl.entries) > rateLimiterMaxEntries {
for k, v := range rl.entries {
if now.Sub(v.windowStart) > rl.window {
delete(rl.entries, k)
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The rate limiter's cleanup logic only removes expired entries and does not enforce a hard limit on the total number of entries in the entries map. An attacker using a large number of distinct IP addresses could cause the map to grow indefinitely, potentially leading to memory exhaustion (OOM). Following the general rules for this repository, you should apply a fixed bound to collections that can grow from external requests and reject new requests (or drop the oldest entries) when the bound is reached.

References
  1. To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections that can grow from external requests, such as pending configuration changes. Reject new requests when the bound is reached.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 535d55c2db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread main_admin.go
Comment on lines +55 to +57
staticCreds, err := loadS3StaticCredentials(*s3CredsFile)
if err != nil {
return errors.Wrapf(err, "load static credentials for admin listener")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip credential-file loading when admin is disabled

startAdminFromFlags always parses --s3CredentialsFile before it knows whether the admin listener is enabled. In run(), this function is called unconditionally, so a bad or missing credentials file now aborts process startup even when --adminEnabled=false (and even if S3 is disabled). This turns a disabled feature into a hard dependency and can break existing deployments that never intended to use admin auth.

Useful? React with 👍 / 👎.

Comment thread internal/admin/cluster_handler.go Outdated
}
info, err := h.source.Describe(r.Context())
if err != nil {
writeJSONError(w, http.StatusInternalServerError, "cluster_describe_failed", err.Error())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid exposing backend error details in cluster responses

On Describe failures, the handler returns err.Error() directly to clients. That leaks internal implementation details (for example storage, raft, or node-local error text) through a public admin API response, and it contradicts the surrounding contract comment that says only an error code should be exposed. Return a generic message here and keep the detailed error in server logs.

Useful? React with 👍 / 👎.

Addresses the 7 findings on PR #623 from gemini-code-assist and
chatgpt-codex:

- cluster_handler: stop leaking the raft/store error text to clients;
  log the full error via slog and return a generic
  "cluster_describe_failed" body. Test updated to assert the sentinel
  string does not appear in the response.
- auth_handler credential compare: hash both sides with SHA-256 before
  subtle.ConstantTimeCompare so length-based timing side channels
  cannot leak the secret size. Keep the unknown-access-key timing
  parity with a fixed placeholder digest.
- auth_handler input handling: stop trimming SecretKey. Trimming was
  accepting inputs the SigV4 data-plane adapter would reject, creating
  a cross-protocol inconsistency; AccessKey (identifier) still trims.
- auth_handler audit: login/logout now emit their own admin_audit
  slog entries. The generic Audit middleware cannot see the claimed
  actor at that point in the chain, so the Audit wrapper is dropped
  from publicAuth and replaced with explicit logging that captures
  actor, claimed_actor, remote, and status. Logout best-effort
  decodes the session cookie via the Verifier so the audit line can
  name the caller even when the session has not yet been revoked.
- ratelimit: enforce a hard cap on distinct tracked IPs. Once the
  cap is reached we sweep expired windows first; if that still
  leaves the map full we refuse new entries instead of silently
  evicting a legitimate user in-progress state. Adds dedicated
  tests for the cap and window-reclamation paths.
- main_admin: do not load --s3CredentialsFile when admin.enabled is
  false. A bad credentials file should not abort process startup
  when the admin listener is disabled.

Tests: new auth_audit_test.go covers actor recording on login
success / failure and logout with and without a session cookie,
plus a same-length/different-length timing smoke check.
golangci-lint ./... --new-from-rev=origin/main --timeout=5m: 0 issues.
go test -race ./internal/admin/... . green.
Comment thread internal/admin/auth_handler.go Fixed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5006462384

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/admin/config.go Outdated
Comment on lines +92 to +94
tlsConfigured := strings.TrimSpace(c.TLSCertFile) != "" && strings.TrimSpace(c.TLSKeyFile) != ""
if tlsConfigured || !addressRequiresTLS(strings.TrimSpace(c.Listen)) || c.AllowPlaintextNonLoopback {
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject partial admin TLS configuration

validateTLS only treats TLS as configured when both cert and key are non-empty, but it does not fail when exactly one is provided. In loopback mode (or with allow_plaintext_non_loopback=true), this path validates successfully and startup continues in plaintext, so a typo in either TLS flag silently downgrades transport security instead of failing fast. This is especially risky because operators may believe TLS is enabled when it is not.

Useful? React with 👍 / 👎.

- auth_handler secret compare: replace naked SHA-256 with HMAC-SHA256
  keyed by a per-process random value. The construction is still only
  used to normalise inputs to a fixed-width constant-time comparator;
  using a keyed MAC satisfies CodeQL static analysis and communicates
  the intent more precisely (timing-safe comparator, not persisted
  password hash). Deliberately NOT bcrypt / argon2: nothing is
  persisted, the secret arrives in plaintext over TLS, and the rate
  limiter already bounds online guessing.
- config.validateTLS: fail startup when admin.tls.cert_file and
  admin.tls.key_file are not both set together. A typo that leaves
  one empty would previously treat TLS as disabled silently; we now
  surface the misconfiguration at startup. Added unit test
  covering the two partial-config cases.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

@claude review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the initial (read-only) admin listener foundation to the elastickv main binary, including config validation, JWT session auth w/ key rotation, strict router + middleware chain, login/logout with rate limiting, and basic cluster/health endpoints—plus unit and in-process integration tests.

Changes:

  • Wire an optional admin HTTP listener into main.go via new flags and lifecycle management.
  • Introduce internal/admin/ server composition (router, middleware, auth, JWT, cluster + health endpoints).
  • Add unit tests across admin subsystems plus in-process listener tests (plaintext + TLS).

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
main.go Adds admin-related flags and starts admin listener during run().
main_admin.go Implements flag→config translation, config checks, admin server construction, and listener lifecycle wiring.
main_admin_test.go Adds integration-style tests for starting the admin listener and serving /admin/healthz (HTTP + HTTPS).
internal/admin/server.go Composes admin server dependencies and builds the API handler mux + router.
internal/admin/server_test.go End-to-end handler tests for login/session + cluster access + routing behavior.
internal/admin/router.go Implements strict-prefix admin router for API/health/assets/SPA fallback.
internal/admin/router_test.go Tests router dispatch order and SPA/static behavior.
internal/admin/middleware.go Adds body-limit, session auth, role gate, CSRF, and audit middleware.
internal/admin/middleware_test.go Tests middleware behavior (limit, auth context, CSRF, audit, timing-safe compare).
internal/admin/auth_handler.go Implements login/logout endpoints, cookie issuance, rate limiting, and audit logging.
internal/admin/auth_handler_test.go Tests auth handler (happy paths, cookie hardening, rate limits, logout).
internal/admin/auth_audit_test.go Validates login/logout audit logging behavior and claimed-actor logging.
internal/admin/jwt.go Implements HS256 JWT signing/verifying with key rotation support.
internal/admin/jwt_test.go Tests JWT signing/verifying, rotation acceptance, and malformed cases.
internal/admin/config.go Adds admin listener config, validation, signing key decoding, and role indexing.
internal/admin/config_test.go Tests config validation (TLS requirements, key decoding/length, role overlaps).
internal/admin/cluster_handler.go Adds /admin/api/v1/cluster handler with safe error surfacing and logging.
internal/admin/cluster_handler_test.go Tests cluster handler responses and error sanitization.
internal/admin/ratelimit.go Implements fixed-window per-IP rate limiter with hard cap on distinct IPs.
internal/admin/ratelimit_test.go Tests rate limiter cap behavior and expiry sweeping.
internal/admin/principal.go Defines Role and AuthPrincipal for authorization context.

Comment on lines +161 to +165
if !s.limiter.allow(clientIP(r)) {
w.Header().Set("Retry-After", "60")
writeJSONError(w, http.StatusTooManyRequests, "rate_limited",
"too many login attempts from this source; try again later")
return false
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry-After is hard-coded to 60, but the limiter window is configurable via AuthServiceOpts.LoginWindow. If the window is changed (even in tests), the header will be incorrect; consider deriving Retry-After from the configured window duration (seconds, clamped to at least 1).

Copilot uses AI. Check for mistakes.
Comment thread internal/admin/auth_handler.go Outdated
Comment on lines +291 to +298
expires := time.Now().UTC().Add(s.sessionTTL)
http.SetCookie(w, s.buildCookie(sessionCookieName, token, true))
http.SetCookie(w, s.buildCookie(csrfCookieName, csrf, false))
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/json; charset=utf-8")
w.WriteHeader(http.StatusOK)
_ = json.NewEncoder(w).Encode(loginResponse{Role: principal.Role, ExpiresAt: expires})
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issueSession uses time.Now() to compute expires_at, while the JWT exp is based on the signer's injected clock. This can make the response’s expires_at disagree with the actual token expiry when a custom clock is used (and is generally inconsistent with the injected time abstraction). Consider using the same clock source used for signing (or plumb the AuthService clock through) when computing expires_at.

Copilot uses AI. Check for mistakes.
Comment thread internal/admin/config.go Outdated
return errors.WithStack(errors.Newf(
"admin.listen %q is not loopback but TLS is not configured;"+
" set admin.tls.cert_file + admin.tls.key_file, or explicitly pass"+
" --admin-allow-plaintext-non-loopback (strongly discouraged)",
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TLS validation error suggests passing --admin-allow-plaintext-non-loopback, but the actual flag wired in main.go is -adminAllowPlaintextNonLoopback. This mismatch makes the startup error harder to act on; update the message to reference the real flag name (or avoid flag names entirely and point at the config field).

Suggested change
" --admin-allow-plaintext-non-loopback (strongly discouraged)",
" -adminAllowPlaintextNonLoopback (strongly discouraged)",

Copilot uses AI. Check for mistakes.
Comment thread internal/admin/middleware.go Outdated
Comment on lines +44 to +78
// BodyLimit wraps the request body with http.MaxBytesReader and responds
// 413 when the client exceeds the cap. It also sets
// http.MaxBytesError-aware error translation so the handler does not need
// to distinguish ordinary IO failures from overflow.
func BodyLimit(limit int64) func(http.Handler) http.Handler {
if limit <= 0 {
limit = defaultBodyLimit
}
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Body != nil {
r.Body = http.MaxBytesReader(w, r.Body, limit)
}
next.ServeHTTP(bodyLimitResponseWriter{ResponseWriter: w}, r)
})
}
}

// bodyLimitResponseWriter is a minor adapter that lets a handler translate
// its own MaxBytesError into a consistent 413 without duplicating the
// plumbing. At the time of writing, each write handler can call
// r.ParseForm / json.Decode and on error call
// `if errors.As(err, &http.MaxBytesError{}) { ... }` manually; this
// wrapper just forces the header once per request.
type bodyLimitResponseWriter struct {
http.ResponseWriter
}

// WriteMaxBytesError is called by handlers that detected a MaxBytesError.
// It is a package-level helper rather than a method so the router error
// path keeps the same JSON shape as the rest.
func WriteMaxBytesError(w http.ResponseWriter) {
writeJSONError(w, http.StatusRequestEntityTooLarge, "payload_too_large",
"request body exceeds the 64 KiB admin limit")
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BodyLimit claims it "responds 413" / provides "error translation", but it currently only wraps the body with http.MaxBytesReader and leaves translation to individual handlers. Also, bodyLimitResponseWriter is just a pass-through wrapper with no behavior, so it doesn’t achieve the described goal. Either implement centralized translation (e.g., by intercepting http.MaxBytesError handling) or simplify the comments/types to match the current behavior.

Copilot uses AI. Check for mistakes.
Comment thread internal/admin/middleware.go Outdated
Comment on lines +158 to +177
// Constant-time comparison: the values are user-supplied
// and we do not want to leak length differences.
if !constantTimeEq(cookie.Value, header) {
writeJSONError(w, http.StatusForbidden, "csrf_mismatch", "CSRF token mismatch")
return
}
next.ServeHTTP(w, r)
})
}
}

func constantTimeEq(a, b string) bool {
if len(a) != len(b) {
return false
}
var diff byte
for i := 0; i < len(a); i++ {
diff |= a[i] ^ b[i]
}
return diff == 0
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSRF middleware comment says the comparison is constant-time and "does not leak length differences", but constantTimeEq returns early when lengths differ. Either adjust the comment (since token length isn’t sensitive) or make the comparison truly constant-time across lengths (e.g., compare fixed-length MACs or use subtle.ConstantTimeCompare on padded buffers).

Copilot uses AI. Check for mistakes.
- auth_handler: derive Retry-After from AuthService.loginWindow instead
  of hard-coding 60 seconds, so tests and deployments that tune
  LoginWindow get an accurate hint; clamp to >= 1.
- auth_handler: use the signer's injected clock for the login response's
  expires_at so it cannot drift from the JWT exp claim. AuthService now
  holds the same clock passed to the signer.
- config.validateTLS: the error message pointed at
  --admin-allow-plaintext-non-loopback, but the real flag is
  -adminAllowPlaintextNonLoopback. Fix to match the wired flag name so
  operators can act on the startup error.
- middleware.BodyLimit: drop the pass-through bodyLimitResponseWriter
  wrapper (it provided no translation) and rewrite the comment to say
  plainly that handlers are responsible for detecting overflow via
  IsMaxBytesError and calling WriteMaxBytesError. Centralising that in
  the middleware would either double-write or mask downstream errors
  depending on the decoder shape.
- middleware.CSRFDoubleSubmit: replace constantTimeEq with an explicit
  length check + subtle.ConstantTimeCompare on bytes, and update the
  comment to match the actual behaviour. The short-circuit on length
  mismatch is fine here because both tokens are server-minted at a
  fixed 32-byte width, so length divergence cannot leak secret state.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

@claude review

@bootjp bootjp requested a review from Copilot April 24, 2026 17:19
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Comment thread internal/admin/router.go Outdated
Comment on lines +66 to +82
switch {
case strings.HasPrefix(p, pathPrefixAPIv1):
if rt.api == nil {
rt.notFind.ServeHTTP(w, r)
return
}
rt.api.ServeHTTP(w, r)
return
case p == pathHealthz:
rt.serveHealth(w, r)
return
case strings.HasPrefix(p, pathPrefixAssets):
rt.serveAsset(w, r)
return
case p == pathPrefixAdmin || strings.HasPrefix(p, pathPrefixAdmin+"/"):
rt.serveSPA(w, r)
return
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pathPrefixAPIv1 and pathPrefixAssets include a trailing slash and routing uses strings.HasPrefix. Requests to /admin/api/v1 (no trailing slash) or /admin/assets will fall through to the SPA branch and can incorrectly serve index.html instead of returning a JSON 404. Consider explicitly handling the no-trailing-slash variants (or define prefixes without the trailing slash and check segment boundaries) so API/asset roots never resolve to the SPA fallback.

Copilot uses AI. Check for mistakes.
Comment thread main_admin.go Outdated
Comment on lines +234 to +244
return admin.ClusterInfoFunc(func(_ context.Context) (admin.ClusterInfo, error) {
groups := make([]admin.GroupInfo, 0, len(runtimes))
for _, rt := range runtimes {
if rt == nil || rt.engine == nil {
continue
}
status := rt.engine.Status()
groups = append(groups, admin.GroupInfo{
GroupID: rt.spec.id,
LeaderID: status.Leader.ID,
IsLeader: strings.EqualFold(string(status.State), "leader"),
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newClusterInfoSource currently omits GroupInfo.Members entirely and also discards the provided context (func(_ context.Context)). Since admin.GroupInfo includes a Members field (and tests exercise it), the production /admin/api/v1/cluster response will always have empty/null membership even though the raft engine exposes configuration via Engine.Configuration(ctx). Consider using the passed ctx and populating Members from rt.engine.Configuration(ctx) (best-effort, e.g. leave Members empty if config fetch fails).

Suggested change
return admin.ClusterInfoFunc(func(_ context.Context) (admin.ClusterInfo, error) {
groups := make([]admin.GroupInfo, 0, len(runtimes))
for _, rt := range runtimes {
if rt == nil || rt.engine == nil {
continue
}
status := rt.engine.Status()
groups = append(groups, admin.GroupInfo{
GroupID: rt.spec.id,
LeaderID: status.Leader.ID,
IsLeader: strings.EqualFold(string(status.State), "leader"),
return admin.ClusterInfoFunc(func(ctx context.Context) (admin.ClusterInfo, error) {
groups := make([]admin.GroupInfo, 0, len(runtimes))
for _, rt := range runtimes {
if rt == nil || rt.engine == nil {
continue
}
status := rt.engine.Status()
var members []string
if cfg, err := rt.engine.Configuration(ctx); err == nil {
members = cfg.Members
}
groups = append(groups, admin.GroupInfo{
GroupID: rt.spec.id,
LeaderID: status.Leader.ID,
IsLeader: strings.EqualFold(string(status.State), "leader"),
Members: members,

Copilot uses AI. Check for mistakes.
Comment thread internal/admin/config.go
Comment on lines +91 to +111
func (c *Config) validateTLS() error {
certSet := strings.TrimSpace(c.TLSCertFile) != ""
keySet := strings.TrimSpace(c.TLSKeyFile) != ""
if certSet != keySet {
// A lone cert or key almost always means a typo. Silently
// treating it as "TLS off" would downgrade transport
// security while the operator thinks TLS is enabled; fail
// fast so the misconfiguration is visible at startup.
return errors.New("admin.tls.cert_file and admin.tls.key_file must be set together;" +
" partial TLS configuration is not allowed")
}
tlsConfigured := certSet && keySet
if tlsConfigured || !addressRequiresTLS(strings.TrimSpace(c.Listen)) || c.AllowPlaintextNonLoopback {
return nil
}
return errors.WithStack(errors.Newf(
"admin.listen %q is not loopback but TLS is not configured;"+
" set admin.tls.cert_file + admin.tls.key_file, or explicitly pass"+
" -adminAllowPlaintextNonLoopback (strongly discouraged)",
c.Listen,
))
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several validation errors refer to dotted config keys like admin.tls.cert_file / admin.session_signing_key, but the user-facing configuration surface in this repo is CLI flags (e.g. -adminTLSCertFile, -adminSessionSigningKey). Mixing these can make startup failures harder to action. Consider updating the messages to mention the actual flag names (or include both forms) consistently, similar to how the TLS error already references -adminAllowPlaintextNonLoopback.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08c9846381

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/admin/router.go
p := r.URL.Path

switch {
case strings.HasPrefix(p, pathPrefixAPIv1):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat /admin/api/v1 as API namespace

The API dispatch only matches strings.HasPrefix(p, "/admin/api/v1/"), so GET /admin/api/v1 misses the API mux and falls through to the /admin/* SPA path. When static is configured, that request returns index.html (200) instead of a JSON 404, which violates the router’s strict-order contract that API paths should not be served by SPA fallback and can break clients/probes that omit the trailing slash.

Useful? React with 👍 / 👎.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

@claude review

- router: handle bare /admin/api/v1 and /admin/assets (no trailing
  slash) explicitly so they return a JSON 404 instead of falling
  through to the SPA fallback and being answered with index.html.
  New test fixes the behaviour.
- main_admin.newClusterInfoSource: populate GroupInfo.Members from
  rt.engine.Configuration(ctx) and stop discarding the context.
  Failed Configuration fetches leave Members empty (best effort) so
  a transient raft state cannot poison the whole cluster snapshot.
- config: replace dotted config-key names (admin.listen,
  admin.tls.cert_file, admin.session_signing_key, etc.) in every
  validation error message with the actual CLI flag names
  (-adminListen, -adminTLSCertFile, -adminSessionSigningKey, ...).
  Tests updated to match.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Comment thread internal/admin/server.go Outdated
Comment on lines +125 to +130
// The protected chain: body limit → session auth → CSRF → audit.
protect := func(next http.Handler) http.Handler {
return BodyLimit(defaultBodyLimit)(
SessionAuth(verifier)(
CSRFDoubleSubmit()(
Audit(logger)(next),
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protected middleware chain is ordered as SessionAuth → CSRF → Audit, which means CSRF-rejected write requests will not be logged by Audit at all (Audit never runs if CSRF short-circuits). If the intent is to audit every state-changing request attempt (including CSRF failures), move Audit earlier in the chain (e.g., after SessionAuth but before CSRF), or otherwise ensure CSRF failures are still recorded.

Suggested change
// The protected chain: body limit → session auth → CSRF → audit.
protect := func(next http.Handler) http.Handler {
return BodyLimit(defaultBodyLimit)(
SessionAuth(verifier)(
CSRFDoubleSubmit()(
Audit(logger)(next),
// The protected chain: body limit → session auth → audit → CSRF.
// This ensures CSRF-rejected protected requests are still audited
// after the actor has been established by session authentication.
protect := func(next http.Handler) http.Handler {
return BodyLimit(defaultBodyLimit)(
SessionAuth(verifier)(
Audit(logger)(
CSRFDoubleSubmit()(next),

Copilot uses AI. Check for mistakes.
Comment thread internal/admin/server.go
}
if deps.Credentials == nil {
return nil, errMissing("Credentials")
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewServer() enforces several required dependencies but does not validate deps.Roles. Given the ServerDeps doc says fields are required unless noted, passing a nil Roles map should likely be treated as a startup error (or the doc should explicitly mark it optional). Otherwise, a miswire can silently make all logins fail with 403.

Suggested change
}
}
if deps.Roles == nil {
return nil, errMissing("Roles")
}

Copilot uses AI. Check for mistakes.
Comment thread internal/admin/auth_handler.go Outdated
Comment on lines +304 to +306
// Use the same clock the signer used so the response's
// expires_at cannot drift from the JWT's exp claim; injected
// test clocks therefore produce deterministic outputs too.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In issueSession(), the comment says the response expiry uses “the same clock the signer used”, but the code uses s.clock() (AuthService’s clock), which can differ from the Signer’s injected clock if they’re configured independently. Either align the implementation to actually use the signer’s clock, or adjust the comment to avoid implying a guarantee that isn’t enforced.

Suggested change
// Use the same clock the signer used so the response's
// expires_at cannot drift from the JWT's exp claim; injected
// test clocks therefore produce deterministic outputs too.
// Compute the response expiry from the auth service clock and the
// configured session TTL. Tests remain deterministic when this clock
// is injected, but this value is only guaranteed to match the JWT exp
// if the signer is configured to use the same clock source.

Copilot uses AI. Check for mistakes.
Comment thread internal/admin/router.go
Comment on lines +123 to +137
// Drop /admin/assets/ prefix → relative path under pathRootAssetsDir.
rel := strings.TrimPrefix(r.URL.Path, pathPrefixAssets)
if rel == "" || strings.Contains(rel, "..") {
rt.notFind.ServeHTTP(w, r)
return
}
name := path.Join(pathRootAssetsDir, rel)
f, err := rt.static.Open(name)
if err != nil {
if errors.Is(err, fs.ErrNotExist) {
rt.notFind.ServeHTTP(w, r)
return
}
writeJSONError(w, http.StatusInternalServerError, "internal", "failed to open asset")
return
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In serveAsset(), the traversal guard strings.Contains(rel, "..") is overly broad (it will reject legitimate filenames like app..js) and it also misses other invalid/edge cases like a double-slash path producing rel that starts with /, which then makes path.Join drop the assets prefix and can lead to a 500 (fs.ErrInvalid) instead of a JSON 404. Consider normalizing with path.Clean, rejecting any .. path segments (not substrings), and treating any invalid fs path (e.g. !fs.ValidPath(...) / leading /) as not-found.

Copilot uses AI. Check for mistakes.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

@claude review

bootjp added 2 commits April 25, 2026 03:04
- server.buildAPIMux: move Audit inside SessionAuth and before
  CSRF so CSRF-rejected protected writes are still audited. The
  previous chain (SessionAuth → CSRF → Audit) silently dropped
  csrf_missing / csrf_mismatch from the admin_audit stream — exactly
  the attack traces operators want to see.
- server.NewServer: reject a nil deps.Roles as a wiring error.
  A nil role index would have 403-ed every login silently; an
  empty (but non-nil) map is still accepted for explicit lockdown.
- auth_handler.issueSession: rewrite the expires_at comment so it
  describes what the code actually guarantees. The AuthService clock
  is what computes expires_at, and callers are expected to pass the
  same clock to Signer and AuthService (NewServer does); the comment
  no longer claims a guarantee the type system does not enforce.
- router.serveAsset: replace the naive strings.Contains(rel, "..")
  traversal guard with fs.ValidPath. The new check correctly:
    * accepts filenames with ".." as a substring (e.g. app..js),
    * rejects any ".." path segment,
    * rejects leading-slash / double-slash shapes that could
      previously confuse path.Join into returning a 500 from the
      fs backend.
  Added dedicated tests for the substring-dot-dot allowance, the
  double-slash rejection, and the bare-API-root JSON-404 behaviour.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

@claude review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Comment thread main_admin_test.go Outdated
Comment on lines +204 to +213
func freePort(t *testing.T) string {
t.Helper()
var lc net.ListenConfig
l, err := lc.Listen(context.Background(), "tcp", "127.0.0.1:0")
require.NoError(t, err)
addr, ok := l.Addr().(*net.TCPAddr)
require.True(t, ok, "listener did not produce a *net.TCPAddr")
_ = l.Close()
return strings.TrimSpace(filepathBase(addr.String()))
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freePort() selects an available port by binding and immediately closing the listener, then later startAdminServer binds that port again. This pattern can be flaky under parallel test runs because another process/test can claim the port in between. Consider letting the OS choose the port (listen on 127.0.0.1:0) and then exposing the bound address from startAdminServer (e.g., return the listener or actual addr) so tests can target the correct port without races.

Copilot uses AI. Check for mistakes.
Comment on lines +124 to +131
// loginResponse is the JSON body the login endpoint returns on success.
// The CSRF token is also readable from the admin_csrf cookie; we include
// it here as a convenience for clients that want to avoid parsing the
// Set-Cookie header themselves.
type loginResponse struct {
Role Role `json:"role"`
ExpiresAt time.Time `json:"expires_at"`
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on loginResponse says the CSRF token is included in the JSON response body, but the struct only contains Role and ExpiresAt and issueSession() does not encode a CSRF field. Please either add the CSRF token to the response or update the comment to match the actual response payload (since clients can already read it from the admin_csrf cookie).

Copilot uses AI. Check for mistakes.
Comment thread internal/admin/server_test.go Outdated
Comment on lines +183 to +187
// Keep io, fmt imported to avoid drift if the test evolves.
var (
_ = io.Discard
_ = fmt.Sprint
)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test keeps unused imports (fmt, io) alive via dummy assignments. This adds noise and makes it easier for unused dependencies to accumulate unnoticed. Please remove the unused imports and the placeholder var block; Go will enforce imports as the test evolves.

Copilot uses AI. Check for mistakes.
- startAdminServer now returns (listenAddr, err) so tests can bind
  the real address without a pre-allocated free-port helper. The
  previous freePort()-binds-then-closes flow had a bind/close/rebind
  race under parallel tests. Tests now pass "127.0.0.1:0" and use
  the returned address; the free-port helper is deleted. main.go
  discards the returned address.
- loginResponse comment: drop the claim that the CSRF token is
  returned in the JSON body. The token is delivered exclusively via
  the admin_csrf Set-Cookie header; the comment now says so.
- server_test.go: remove the dummy var block that kept io and fmt
  imported. Unused imports now fail to compile, which is exactly
  what Go's import hygiene is for.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

@claude review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d1b3a94a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/admin/server.go Outdated
Comment on lines +68 to +69
if deps.Credentials == nil {
return nil, errMissing("Credentials")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject typed-nil credential stores at startup

NewServer only checks deps.Credentials == nil, which misses typed-nil interfaces. In this commit, buildAdminHTTPServer passes admin.MapCredentialStore(creds), and creds is nil when --s3CredentialsFile is unset (loadS3StaticCredentials returns nil, nil). That bypasses this guard, starts the admin server, and then makes every login fail with invalid_credentials even for configured role keys, turning a startup wiring error into a runtime lockout.

Useful? React with 👍 / 👎.

- NewServer: `deps.Credentials == nil` only caught untyped nil and
  silently admitted a typed-nil interface such as
  `MapCredentialStore(nil)`. A caller that wrapped a nil map (as
  main_admin did when --s3CredentialsFile was unset) would then
  start the admin listener and reject every login with
  "invalid_credentials" at runtime. Add a reflection-based
  isNilCredentialStore check that also detects the typed-nil cases
  for every nilable reference kind. An empty-but-non-nil map is
  still accepted here because the wiring layer has context to judge
  whether that is intentional.
- main_admin.startAdminFromFlags: after loading the S3 credentials
  file, refuse to start when the resulting map is empty. Running
  admin with zero configured access keys is not a valid lockdown
  posture; it would simply drop every login. Together with the
  server-side guard this closes the lockout loophole Codex called
  out.
- Tests: dedicated table that covers typed-nil and empty-map cases
  of the new guard.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

@claude review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a3e8b699b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/admin/router.go
// path.
rt.notFind.ServeHTTP(w, r)
return
case strings.HasPrefix(p, pathPrefixAPIv1):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep /admin/api paths out of SPA fallback*

The router only treats "/admin/api/v1" and "/admin/api/v1/" as API namespace, so requests like /admin/api, /admin/api/v2, or /admin/api/v2/x bypass the API branch and are served by the SPA fallback with 200 HTML when static assets are present. That breaks API clients/probes that expect JSON 404s on API-like paths and reintroduces namespace confusion; fresh evidence is that this still happens because the API guard is limited to strings.HasPrefix(p, pathPrefixAPIv1).

Useful? React with 👍 / 👎.

Comment thread main_admin.go Outdated
continue
}
status := rt.engine.Status()
var members []string
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Initialize group member list before config lookup

When engine.Configuration(ctx) fails (the comment explicitly calls out leadership transitions), members remains a nil slice and is serialized as "members": null instead of an empty array. This violates the handler’s own “leave Members empty” behavior and can break dashboard/API consumers that treat members as an always-array field.

Useful? React with 👍 / 👎.

- router: widen the API-namespace guard from /admin/api/v1 to the
  whole /admin/api* prefix. Requests like /admin/api, /admin/api/v2,
  or /admin/api/v2/tables now return a JSON 404 instead of falling
  through to the SPA fallback and being served with index.html.
  Split ServeHTTP into classify/dispatch with routeKind to keep the
  logic under the cyclomatic complexity ceiling, and expanded the
  dedicated JSON-only regression test to cover /admin/api and the
  v2 shapes.
- main_admin.newClusterInfoSource: initialise GroupInfo.Members to
  an empty, non-nil slice before the best-effort Configuration()
  call so a transient raft state serialises as `"members": []`
  rather than `"members": null`. The always-array contract matches
  how the rest of the field is consumed.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

@claude review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75aa48142f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/admin/server.go Outdated
Comment on lines +159 to +160
case "/admin/api/v1/auth/logout":
publicAuth(logoutHandler).ServeHTTP(w, r)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Protect logout endpoint with session/CSRF checks

The logout route is wired through publicAuth only, so any cross-site page can send POST /admin/api/v1/auth/logout and force-clear a logged-in user's cookies because HandleLogout always emits expired Set-Cookie headers even when no valid session is present. In browser deployments, this creates a practical logout-CSRF path (user gets unexpectedly logged out by visiting another site) and undermines the stated CSRF protection model for state-changing POST endpoints.

Useful? React with 👍 / 👎.

- server.buildAPIMux: route /admin/api/v1/auth/logout through the
  protected chain instead of publicAuth. The old wiring would
  accept a cross-site POST to /auth/logout and emit expired
  Set-Cookie headers unconditionally, giving an attacker a
  practical logout-CSRF path: a victim visiting a hostile site
  could be force-logged-out. Requiring both SessionAuth and the
  CSRF double-submit makes the route idempotent for legitimate
  callers and unreachable across origins (SameSite=Strict strips
  the cookies on cross-site POSTs, and the server rejects
  anything that lacks both the session cookie and the matching
  X-Admin-CSRF header).
- New tests fix the behaviour:
    * Logout without a session returns 401 and emits no cookies.
    * Logout with a session but no CSRF header returns 403.
    * Logout with session + matching CSRF header returns 204 and
      emits both expired Set-Cookie headers so the browser
      actually forgets the session.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

@claude review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants