Skip to content

feat(s3): expose /healthz/leader for leader-only routing#539

Merged
bootjp merged 3 commits intomainfrom
feat/s3-leader-healthz
Apr 19, 2026
Merged

feat(s3): expose /healthz/leader for leader-only routing#539
bootjp merged 3 commits intomainfrom
feat/s3-leader-healthz

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 19, 2026

Summary

  • Add /healthz/leader to the S3 server so an upstream load balancer can route port-9000 traffic to the current raft leader, mirroring the existing DynamoDB endpoint.

Why

The S3 follower-proxy re-signs requests with the original Host header but the signature validation fails at the leader (separate bug); until that's fixed, pointing all traffic at the leader via an LB is the safest workaround. The leader check has to live on port 9000 because LB active health check uses the upstream port -- it can't split health-check port from proxy port.

Test plan

  • go test ./adapter/ -run TestS3Server_LeaderHealthz
  • golangci-lint run ./adapter/...

Summary by CodeRabbit

  • New Features
    • Added S3 leader health check endpoint (/healthz/leader) supporting GET and HEAD requests that returns 200 when verified as a cluster leader, 503 otherwise. Unsupported methods receive a 405 response with appropriate headers.

The S3 server only exposed /healthz (always 200). That made it
impossible for an upstream load balancer — e.g. the Caddy in front of
dynamodb.bootjp.dev — to single out the current raft leader on S3
port 9000 the way it already does for the DynamoDB port via
/healthz/leader.

Add the symmetric endpoint that returns 200 only when the local node
IsLeader() AND VerifyLeader() succeeds, and 503 otherwise. The check
runs before maybeProxyToLeader so followers still reliably answer 503
instead of opaque-proxying the health probe. Needed to put
s3-elk.bootjp.dev behind a leader-only active health check.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

📝 Walkthrough

Walkthrough

A new S3 leader health check endpoint at GET/HEAD /healthz/leader is added, which returns 200 when the server is a verified Raft leader or 503 otherwise. The implementation enforces allowed HTTP methods, limits request body size, and suppresses response bodies for HEAD requests.

Changes

Cohort / File(s) Summary
S3 Leader Health Endpoint
adapter/s3.go
Added serveS3LeaderHealthz() method to handle health checks, isVerifiedS3Leader() helper to verify Raft leadership via coordinator, and new constants for the health endpoint path and request body size limit. Registered the health handler in S3Server routing before leader proxy logic.
Health Endpoint Tests
adapter/s3_test.go
Added TestS3Server_LeaderHealthz() to verify the endpoint returns 200 with "ok" body for leaders and 503 with "not leader" for followers, confirms HEAD requests omit response bodies, and validates 405 Method Not Allowed for unsupported methods with proper Allow headers.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant S3Server
    participant Handler as serveS3LeaderHealthz
    participant LeaderCheck as isVerifiedS3Leader
    participant Coordinator
    
    Client->>S3Server: GET/HEAD /healthz/leader
    S3Server->>Handler: Route to health handler
    Handler->>Handler: Validate path & method
    Handler->>Handler: Cap request body size
    Handler->>LeaderCheck: Check if verified leader
    LeaderCheck->>Coordinator: VerifyLeader()
    Coordinator-->>LeaderCheck: Return result
    alt Leader Verified
        LeaderCheck-->>Handler: true
        Handler-->>Client: 200 OK<br/>Body: "ok"
    else Not Leader
        LeaderCheck-->>Handler: false
        Handler-->>Client: 503 Service Unavailable<br/>Body: "not leader"<br/>(empty if HEAD)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A health check hops into the warren,
The leader stands tall, none to borrow,
GET and HEAD skip merrily through,
POST gets shooed with a 405 "shoo!"
Leadership verified, all is well—
200 or 503, the status bell! 🔔

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature: exposing a /healthz/leader endpoint for S3 with leader-only routing, which is the core objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/s3-leader-healthz

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new /healthz/leader endpoint to the S3 server, enabling upstream load balancers to route traffic directly to the verified Raft leader. The implementation includes a leadership verification check and corresponding unit tests for GET and HEAD methods. Feedback suggests enhancing the endpoint's security by limiting the request body size to prevent DoS attacks and improving protocol compliance by adding an Allow header to 405 Method Not Allowed responses.

Comment thread adapter/s3.go
Comment on lines +2338 to +2344
if r == nil || r.URL == nil || r.URL.Path != s3LeaderHealthPath {
return false
}
if r.Method != http.MethodGet && r.Method != http.MethodHead {
w.WriteHeader(http.StatusMethodNotAllowed)
return true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In accordance with the repository's general rules, HTTP request bodies should be limited to prevent Denial of Service attacks from oversized payloads, even for health check endpoints. Additionally, per RFC 7231, a 405 Method Not Allowed response must include an Allow header specifying the supported methods.

	if r == nil || r.URL == nil || r.URL.Path != s3LeaderHealthPath {
		return false
	}
	r.Body = http.MaxBytesReader(w, r.Body, 1024)
	if r.Method != http.MethodGet && r.Method != http.MethodHead {
		w.Header().Set("Allow", "GET, HEAD")
		w.WriteHeader(http.StatusMethodNotAllowed)
		return true
	}
References
  1. When handling HTTP requests, always limit the size of the request body (e.g., using http.MaxBytesReader) to prevent Denial of Service attacks from oversized payloads.

Address gemini-code-assist review on #539:
- Cap health probe request bodies at 1 KiB via http.MaxBytesReader so
  a misbehaving client cannot stream unbounded payloads into the
  health handler. Mirrors dynamoHealthMaxRequestBodyBytes.
- Emit an Allow header with the supported methods on 405 responses
  per RFC 7231.

Existing /healthz intentionally left alone to keep this PR's scope
tight; a follow-up can retrofit the same hardening.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 19, 2026

Addressed gemini review in ad6ce70: capped the handler body at 1 KiB via http.MaxBytesReader and added Allow: GET, HEAD on 405. Also extended TestS3Server_LeaderHealthz with a POST case asserting both. Left the legacy /healthz path alone to keep this PR focused on the new endpoint.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 19, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new /healthz/leader endpoint to the S3 adapter, enabling upstream load balancers to route traffic specifically to the verified Raft leader. The implementation includes a new handler, leader verification logic using the coordinator, and comprehensive unit tests. Review feedback suggests removing trailing newlines from the health check response for consistency with existing endpoints and removing a redundant nil check on the method receiver to improve code clarity.

Comment thread adapter/s3.go Outdated
Comment on lines +2350 to +2352
status, body := http.StatusOK, "ok\n"
if !s.isVerifiedS3Leader() {
status, body = http.StatusServiceUnavailable, "not leader\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The response body for /healthz/leader includes a trailing newline, which is inconsistent with the existing /healthz endpoint (which returns ok without a newline). While most load balancers only check the status code, maintaining consistency across health check endpoints is better for tooling and predictability.

Suggested change
status, body := http.StatusOK, "ok\n"
if !s.isVerifiedS3Leader() {
status, body = http.StatusServiceUnavailable, "not leader\n"
status, body := http.StatusOK, "ok"
if !s.isVerifiedS3Leader() {
status, body = http.StatusServiceUnavailable, "not leader"
}

Comment thread adapter/s3.go Outdated
Comment on lines +2362 to +2364
if s == nil || s.coordinator == nil {
return false
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check for s == nil is redundant here. Since isVerifiedS3Leader is a pointer receiver method called on s in serveS3LeaderHealthz, if s were nil, the program would have already panicked during the method dispatch in handle. Removing this unreachable code path improves clarity.

Suggested change
if s == nil || s.coordinator == nil {
return false
}
if s.coordinator == nil {
return false
}
References
  1. Remove dead or unreachable code paths, such as asynchronous handlers for tasks that are always performed synchronously, to improve code clarity and maintainability.

- Drop trailing newline from "ok" / "not leader" response bodies so
  the endpoint matches the existing /healthz byte-for-byte instead of
  introducing a third output shape.
- Drop the redundant s == nil guard in isVerifiedS3Leader; the caller
  dispatches through a pointer receiver method, so by the time we get
  here s is guaranteed non-nil. Collapse the remaining coordinator/
  IsLeader checks into a single line.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 19, 2026

Addressed gemini round 2 in 0ff20d8:

  • Dropped trailing newlines from the response bodies so /healthz/leader matches the existing /healthz byte-for-byte.
  • Removed the redundant s == nil guard in isVerifiedS3Leader and collapsed the coordinator/IsLeader branches.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 19, 2026

/gemini review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
adapter/s3_test.go (1)

219-230: Add the VerifyLeader failure case.

The table covers leader success and follower IsLeader() == false, but not IsLeader() == true with VerifyLeader() failing. That branch is the core “verified leader” contract.

Test case to add
 	leaderSrv := NewS3Server(nil, "", store.NewMVCCStore(), newLocalAdapterCoordinator(store.NewMVCCStore()), nil)
 	followerSrv := NewS3Server(nil, "", store.NewMVCCStore(), &followerS3Coordinator{}, nil)
+	verifyFailSrv := NewS3Server(nil, "", store.NewMVCCStore(), &stubAdapterCoordinator{
+		leaderSet:       true,
+		leader:          true,
+		verifyLeaderErr: kv.ErrLeaderNotFound,
+	}, nil)
 
 	cases := []struct {
 		name   string
 		server *S3Server
 		status int
 		body   string
 	}{
 		{name: "leader", server: leaderSrv, status: http.StatusOK, body: "ok"},
 		{name: "follower", server: followerSrv, status: http.StatusServiceUnavailable, body: "not leader"},
+		{name: "verify leader failure", server: verifyFailSrv, status: http.StatusServiceUnavailable, body: "not leader"},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/s3_test.go` around lines 219 - 230, Add a test case that exercises
the branch where the coordinator reports IsLeader() == true but VerifyLeader()
fails: create an S3Server via NewS3Server using a custom coordinator type (e.g.,
a short-lived test coordinator that implements IsLeader() returning true and
VerifyLeader() returning an error) and add an entry to the cases table like
"verified-failure" with that server and the expected HTTP status and body (same
as non-leader failure), so the test covers the VerifyLeader failure path in
S3Server's leader verification logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapter/s3.go`:
- Around line 2342-2357: The MaxBytesReader wrapper around r.Body (created with
s3HealthMaxRequestBodyBytes) is ineffective because the handler never reads
r.Body; either remove the http.MaxBytesReader wrapper if you don't intend to
enforce a limit, or explicitly drain/read the body after wrapping to enforce the
cap: after creating http.MaxBytesReader, read and discard up to the limit (e.g.,
io.Copy(io.Discard, r.Body) or io.ReadAll with the wrapped body) and if the read
returns a http.ErrBodyReadAfterClose/ErrUnexpectedEOF/size-limit error respond
with http.StatusRequestEntityTooLarge (413) before continuing; keep existing
leader check via s.isVerifiedS3Leader and current response logic otherwise.

---

Nitpick comments:
In `@adapter/s3_test.go`:
- Around line 219-230: Add a test case that exercises the branch where the
coordinator reports IsLeader() == true but VerifyLeader() fails: create an
S3Server via NewS3Server using a custom coordinator type (e.g., a short-lived
test coordinator that implements IsLeader() returning true and VerifyLeader()
returning an error) and add an entry to the cases table like "verified-failure"
with that server and the expected HTTP status and body (same as non-leader
failure), so the test covers the VerifyLeader failure path in S3Server's leader
verification logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f153a1a2-b4e4-4e99-a646-8a70151b893d

📥 Commits

Reviewing files that changed from the base of the PR and between 1fa8069 and 0ff20d8.

📒 Files selected for processing (2)
  • adapter/s3.go
  • adapter/s3_test.go

Comment thread adapter/s3.go
Comment on lines +2342 to +2357
if r.Body != nil {
r.Body = http.MaxBytesReader(w, r.Body, s3HealthMaxRequestBodyBytes)
}
if r.Method != http.MethodGet && r.Method != http.MethodHead {
w.Header().Set("Allow", "GET, HEAD")
w.WriteHeader(http.StatusMethodNotAllowed)
return true
}
status, body := http.StatusOK, "ok"
if !s.isVerifiedS3Leader() {
status, body = http.StatusServiceUnavailable, "not leader"
}
w.WriteHeader(status)
if r.Method != http.MethodHead {
_, _ = io.WriteString(w, body)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Inspect serveS3LeaderHealthz and report whether the capped body is read.

python - <<'PY'
from pathlib import Path

source = Path("adapter/s3.go").read_text()
start = source.index("func (s *S3Server) serveS3LeaderHealthz")
end = source.index("func (s *S3Server) isVerifiedS3Leader")
segment = source[start:end]

print(segment)
print()
print("contains MaxBytesReader:", "http.MaxBytesReader" in segment)
print("contains body read/copy:", any(token in segment for token in [".Read(", "io.ReadAll", "io.Copy"]))
PY

Repository: bootjp/elastickv

Length of output: 793


Enforce the 1 KiB body limit or remove the wrapper.

http.MaxBytesReader only enforces its limit when the wrapped body is actually read. This handler never reads r.Body, so oversized health requests bypass the size check and return 200/503 normally.

Either remove the wrapper (if the limit is not needed) or explicitly read it to enforce the cap:

Suggested fix
 	if r.Body != nil {
 		r.Body = http.MaxBytesReader(w, r.Body, s3HealthMaxRequestBodyBytes)
+		if _, err := io.Copy(io.Discard, r.Body); err != nil {
+			var maxBytesErr *http.MaxBytesError
+			if errors.As(err, &maxBytesErr) {
+				w.WriteHeader(http.StatusRequestEntityTooLarge)
+				return true
+			}
+			writeS3InternalError(w, err)
+			return true
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/s3.go` around lines 2342 - 2357, The MaxBytesReader wrapper around
r.Body (created with s3HealthMaxRequestBodyBytes) is ineffective because the
handler never reads r.Body; either remove the http.MaxBytesReader wrapper if you
don't intend to enforce a limit, or explicitly drain/read the body after
wrapping to enforce the cap: after creating http.MaxBytesReader, read and
discard up to the limit (e.g., io.Copy(io.Discard, r.Body) or io.ReadAll with
the wrapped body) and if the read returns a
http.ErrBodyReadAfterClose/ErrUnexpectedEOF/size-limit error respond with
http.StatusRequestEntityTooLarge (413) before continuing; keep existing leader
check via s.isVerifiedS3Leader and current response logic otherwise.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new health check endpoint, /healthz/leader, to the S3 adapter. This endpoint allows upstream load balancers to identify the current Raft leader, bypassing issues with follower-proxy SigV4 paths. The implementation includes request body size limiting, method validation (GET/HEAD), and leader verification logic. Corresponding unit tests have been added to verify the endpoint's behavior for both leader and follower nodes. I have no feedback to provide.

@bootjp bootjp merged commit cb0c086 into main Apr 19, 2026
8 checks passed
@bootjp bootjp deleted the feat/s3-leader-healthz branch April 19, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant