Retry Redis ping on startup by simonsmallchua · Pull Request #386 · Good-Native/hover

simonsmallchua · 2026-05-11T22:44:26Z

Summary

Add (*broker.Client).PingWithRetry(ctx, total, perAttempt) with capped exponential backoff and per-attempt timeout.
Swap the three Ping call sites in cmd/app, cmd/worker, cmd/analysis to use it (30s budget, 3s per attempt).
Add unit coverage for the retry loop (immediate success, transient errors, budget exhaustion, context cancellation).

Why

Every PR preview spin-up generated a small burst of Sentry errors on staging — *errors.errorString: EOF reported as failed to ping Redis from each of the three binaries. Triaged in Sentry:

HOVER-JX — hover-worker-pr-*, 169 occurrences since 2026-04-19.
HOVER-MD — hover-analysis-pr-*, 158 occurrences since 2026-04-28.
HOVER-JZ — hover-pr-*, 153 occurrences since 2026-04-20.

Review apps provision a fresh per-PR Upstash-on-Fly Redis and pass the URL as a secret. The Fly machine boots and calls client.Ping(context.Background()) immediately. During the Upstash cold-start window TCP connects but the server closes the connection with EOF before answering PING. The client's built-in MaxRetries: 3 burns through in milliseconds inside the same dead window, the binary Fatals, Fly restarts the machine, and the next boot succeeds — hence the burst-per-deploy pattern with zero prod impact (production Redis is warm).

The fix lets the binary ride out the cold-start window instead of crashing. On a healthy Redis the first ping succeeds and the helper returns immediately, so there's no production latency regression. On genuine misconfiguration the helper still exhausts its budget and Fatals — Sentry still gets one signal instead of three back-to-back.

Fixes HOVER-JX HOVER-MD HOVER-JZ

Test plan

gofmt, goimports, go vet ./internal/broker/... ./cmd/app ./cmd/worker ./cmd/analysis
go test ./internal/broker/ — full package, including new TestPingWithRetry
go build ./cmd/app ./cmd/worker ./cmd/analysis
PR review-apps deploy: confirm no EOF events on hover-pr-<N>, hover-worker-pr-<N>, hover-analysis-pr-<N> during boot and that connected to Redis appears in all three startup logs

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

Summary by CodeRabbit

New Features
- Improved Redis startup resilience: services now retry connecting for a bounded window to tolerate transient connection issues.
Bug Fixes
- Prevents premature shutdown on transient Redis failures while still failing on persistent misconfiguration.
Tests
- Added tests covering retry behavior, failure handling, and cancellation scenarios.
Documentation
- Changelog updated to reflect the startup retry behavior.

coderabbitai · 2026-05-11T22:44:36Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a030b9cd-a9d1-46a6-af17-f470c6f8d567

📥 Commits

Reviewing files that changed from the base of the PR and between 4338f2e and b79c830.

📒 Files selected for processing (2)

internal/broker/redis.go
internal/broker/redis_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/broker/redis.go

📝 Walkthrough

Walkthrough

Startup Redis checks now use a bounded retry loop. A new Client.PingWithRetry(ctx, total, perAttempt) retries PING with per-attempt timeouts and capped exponential backoff; tests and three command entry points (analysis, app, worker) were updated and the changelog documents the fix.

Changes

Redis Retry-Based Health Checking

Layer / File(s)	Summary
Redis Retry Health Check Interface `internal/broker/redis.go`	Introduces exported `PingWithRetry(ctx, total, perAttempt)` with docstring describing retry semantics.
Retry Logic Implementation `internal/broker/redis.go`	Implements the retry loop: absolute deadline, per-attempt timeouts, repeated PING attempts, capped exponential backoff, early exit on context cancellation, and final error return on budget exhaustion.
Retry Behavior Tests `internal/broker/redis_test.go`	Adds `TestPingWithRetry` with subtests for immediate success, eventual success after transient failures, retry budget exhaustion, context cancellation, and a regression guard for per-attempt timeout clamping; imports updated.
Startup Health Check Integration `cmd/analysis/main.go`, `cmd/app/main.go`, `cmd/worker/main.go`	Replaces single-shot `Ping` calls with `PingWithRetry(context.Background(), 30time.Second, 3time.Second)` at startup, preserving fatal-on-error behavior and success logging.
Changelog `CHANGELOG.md`	Adds a `### Fixed` entry under Unreleased describing the bounded-retry behavior for Redis PING at startup.

Sequence Diagram(s)

sequenceDiagram
  participant Entrypoint
  participant BrokerClient
  participant Redis
  Entrypoint->>BrokerClient: PingWithRetry(ctx, total, perAttempt)
  BrokerClient->>Redis: PING (per-attempt timeout)
  Redis-->>BrokerClient: PONG or error
  alt PONG
    BrokerClient-->>Entrypoint: success
  else error and time remaining
    BrokerClient->>BrokerClient: sleep (exponential backoff, capped)
    BrokerClient->>Redis: PING (next attempt)
  else deadline exceeded or ctx canceled
    BrokerClient-->>Entrypoint: return last error or ctx.Err()
  end

🎯 3 (Moderate) | ⏱️ ~25 minutes

"I nibble on retries, patient and spry,
30 seconds of hope before I sigh,
Backoff like hops, small and then wide,
Redis wakes up — I smile with pride." 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Retry Redis ping on startup' directly and clearly summarizes the main change: replacing single Ping calls with a retry-capable PingWithRetry method across three startup entry points.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch work/gallant-elbakyan-64b835

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

supabase · 2026-05-11T22:44:40Z

Updates to Preview Branch (work/gallant-elbakyan-64b835) ↗︎

Deployments	Status	Updated
Database	✅	Mon, 11 May 2026 22:54:00 UTC
Services	✅	Mon, 11 May 2026 22:54:00 UTC
APIs	✅	Mon, 11 May 2026 22:54:00 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks	Status	Updated
Configurations	✅	Mon, 11 May 2026 22:54:02 UTC
Migrations	✅	Mon, 11 May 2026 22:54:04 UTC
Seeding	✅	Mon, 11 May 2026 22:54:05 UTC
Edge Functions	✅	Mon, 11 May 2026 22:54:06 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

github-actions · 2026-05-11T22:46:27Z

Release Versions

App patch: v0.34.10 → v0.34.11

Changelog

Fixed

App, worker, and analysis binaries no longer Fatal on the first Redis PING
failure at startup. The ping is now wrapped in a bounded retry loop (30 s
total, 3 s per attempt, capped exponential backoff) so the binary rides out
the Upstash-on-Fly cold-start window that briefly closes connections with EOF
on freshly-provisioned review apps. Production behaviour is unchanged — a
healthy Redis still succeeds on the first attempt and persistent
misconfiguration still fails fast. Resolves the recurring EOF burst on every
PR preview deploy (Sentry: HOVER-JX, HOVER-MD, HOVER-JZ).

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

internal/broker/redis_test.go (1)

15-56: ⚡ Quick win

Add a regression case for perAttempt > total budget handling.

Nice coverage overall. Please add one subtest that asserts retry returns within the total budget when perAttempt is larger, so budget semantics stay protected.

🧪 Suggested test shape

 func TestPingWithRetry(t *testing.T) {
+	t.Run("does not exceed total budget when per-attempt timeout is larger", func(t *testing.T) {
+		start := time.Now()
+		err := pingWithRetry(context.Background(), 80*time.Millisecond, time.Second,
+			func(ctx context.Context) error {
+				<-ctx.Done()
+				return ctx.Err()
+			})
+		require.Error(t, err)
+		assert.LessOrEqual(t, time.Since(start), 200*time.Millisecond)
+	})
+
 	t.Run("immediate success", func(t *testing.T) {
 		var calls int
 		err := pingWithRetry(context.Background(), time.Second, 100*time.Millisecond,
 			func(context.Context) error { calls++; return nil })

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/broker/redis_test.go` around lines 15 - 56, Add a new subtest inside
TestPingWithRetry that calls pingWithRetry with total shorter than perAttempt
(e.g., total=100ms, perAttempt=200ms) using a stub ping function that
immediately returns a sentinel error; capture time before/after the call and
assert that the call returns the expected error and that elapsed time is <=
total (plus a tiny tolerance), ensuring pingWithRetry respects the overall
budget when perAttempt > total. Reference the pingWithRetry helper and add the
subtest under TestPingWithRetry.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/broker/redis.go`:
- Around line 90-123: In pingWithRetry the per-attempt context always uses
perAttempt which can let the final try exceed the overall deadline; compute the
remaining total budget as time.Until(deadline) and clamp the attempt timeout to
min(perAttempt, remaining) before calling context.WithTimeout. If the remaining
budget is <= 0 return the lastErr (or ctx.Err() if set) instead of starting a
timed attempt; replace the direct context.WithTimeout(ctx, perAttempt) call in
pingWithRetry with this clamped timeout logic to enforce the total budget.

---

Nitpick comments:
In `@internal/broker/redis_test.go`:
- Around line 15-56: Add a new subtest inside TestPingWithRetry that calls
pingWithRetry with total shorter than perAttempt (e.g., total=100ms,
perAttempt=200ms) using a stub ping function that immediately returns a sentinel
error; capture time before/after the call and assert that the call returns the
expected error and that elapsed time is <= total (plus a tiny tolerance),
ensuring pingWithRetry respects the overall budget when perAttempt > total.
Reference the pingWithRetry helper and add the subtest under TestPingWithRetry.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 27c218b9-9459-4bc2-bbd1-0d4a1bc92a30

📥 Commits

Reviewing files that changed from the base of the PR and between 41af55a and df771e6.

📒 Files selected for processing (5)

cmd/analysis/main.go
cmd/app/main.go
cmd/worker/main.go
internal/broker/redis.go
internal/broker/redis_test.go

codecov · 2026-05-11T22:47:40Z

Codecov Report

❌ Patch coverage is 74.35897% with 10 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
internal/broker/redis.go	80.55%	6 Missing and 1 partial ⚠️
cmd/analysis/main.go	0.00%	1 Missing ⚠️
cmd/app/main.go	0.00%	1 Missing ⚠️
cmd/worker/main.go	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-05-11T22:50:27Z

🐝 Review App Deployed

Homepage: https://hover-pr-386.fly.dev
Dashboard: https://hover-pr-386.fly.dev/dashboard

github-actions · 2026-05-11T22:57:23Z

🐝 Review App Deployed

Homepage: https://hover-pr-386.fly.dev
Dashboard: https://hover-pr-386.fly.dev/dashboard

github-actions · 2026-05-12T00:32:12Z

🐝 Review App Deployed

Homepage: https://hover-pr-386.fly.dev
Dashboard: https://hover-pr-386.fly.dev/dashboard

Retry Redis ping on startup

df771e6

Add changelog entry for Redis retry

4338f2e

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread internal/broker/redis.go

Clamp Redis retry to total budget

b79c830

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry Redis ping on startup#386

Retry Redis ping on startup#386
simonsmallchua wants to merge 3 commits into
mainfrom
work/gallant-elbakyan-64b835

simonsmallchua commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

supabase Bot commented May 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

simonsmallchua commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

supabase Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Versions

Changelog

Fixed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

simonsmallchua commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading

supabase Bot commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading

codecov Bot commented May 11, 2026 •

edited

Loading