-
Notifications
You must be signed in to change notification settings - Fork 12
feat(redis): Implement exponential backoff for connection retries #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds intelligent retry logic with exponential backoff and jitter to improve Redis connection reliability and prevent thundering herd problems. Changes: - Add RetryConfig struct with tunable parameters (MaxRetries, BaseDelay, MaxDelay, JitterFactor) - Implement calculateBackoff() with exponential backoff and jitter formula: backoff = min(baseDelay * 2^attempt, maxDelay) * (1 ± jitter) - Update createClient() to retry connections with backoff between attempts - Add comprehensive logging for retry attempts and failures - Use sensible defaults: 3 retries, 100ms-5s backoff, 30% jitter Benefits: - Better handling of transient network failures - Reduced load on failing backend nodes (exponential backoff) - Prevents thundering herd via jitter - Configurable retry behavior - Improved observability with debug/info logging Testing: - Add 4 new test functions with 14 test cases - Test default config, backoff calculation, jitter behavior, max delay cap - All existing tests pass - Full test suite passes Resolves #388
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements exponential backoff with jitter for Redis connection retries to improve reliability and prevent thundering herd problems when connecting to Redis nodes.
Changes:
- Added
RetryConfigstruct with configurable retry parameters (MaxRetries, BaseDelay, MaxDelay, JitterFactor) - Implemented exponential backoff algorithm with jitter in
calculateBackoff()function - Updated
createClient()to retry failed connections with exponential backoff instead of failing immediately - Added comprehensive test coverage for retry configuration and backoff calculation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/backends/redis/client.go | Implements retry configuration struct, exponential backoff calculation, and integrates retry logic into connection attempts |
| pkg/backends/redis/client_test.go | Adds tests for default retry config, backoff calculation with/without jitter, and updates existing tests to use retry config |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/backends/redis/client.go
Outdated
| if config.JitterFactor > 0 { | ||
| jitter := backoff * config.JitterFactor | ||
| // Random value between (backoff - jitter) and (backoff + jitter) | ||
| backoff = backoff - jitter + (rand.Float64() * 2 * jitter) |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of rand.Float64() without seeding or using a thread-safe random source can cause issues in concurrent scenarios. Consider using math/rand.NewSource() with time.Now().UnixNano() or the thread-safe crypto/rand package to ensure proper randomization across goroutines.
| backoff = backoff - jitter + (rand.Float64() * 2 * jitter) | |
| src := rand.NewSource(time.Now().UnixNano()) | |
| r := rand.New(src) | |
| backoff = backoff - jitter + (r.Float64() * 2 * jitter) |
| func createClient(machines []string, password string, withReadTimeout bool) (*redis.Client, int, error) { | ||
| func createClient(machines []string, password string, withReadTimeout bool, retryConfig RetryConfig) (*redis.Client, int, error) { | ||
| var lastErr error | ||
|
|
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When machines slice is empty, the function returns nil, 0, fmt.Errorf(...) with lastErr being nil, which will cause a nil pointer dereference in the error wrapping at line 146. The error message will show '%!w()' instead of a meaningful error. Consider returning a specific error for empty machines before the loop.
| if len(machines) == 0 { | |
| return nil, 0, fmt.Errorf("no redis machines configured") | |
| } |
| if err == nil { | ||
| t.Error("createClient with empty machines should return error") | ||
| } |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test should verify the error message content to ensure it properly indicates that no machines were provided, rather than just checking that an error exists.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #405 +/- ##
==========================================
- Coverage 57.41% 57.40% -0.01%
==========================================
Files 31 31
Lines 3198 3226 +28
==========================================
+ Hits 1836 1852 +16
- Misses 1230 1240 +10
- Partials 132 134 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Pull Request Review: Exponential Backoff for Redis Connection RetriesSummaryThis PR implements exponential backoff with jitter for Redis connection retries, addressing issue #388. The implementation is well-structured and adds important reliability improvements to the Redis backend. ✅ StrengthsCode Quality
Testing
Documentation
|
Fixes critical and optional issues identified in code review: Critical Fix: - Fix non-thread-safe random number generation for jitter - Add package-level seeded rand.Source for proper randomization - Ensures different jitter values across instances (prevents thundering herd) - Thread-safe for concurrent calls Improvements: - Aggregate errors from all machines for better debugging - Use errors.Join() to preserve context from each failed machine - Return more informative error messages - Clarify log message terminology - Change "attempt" to "retry" for clearer distinction - Update log at line 106: "retry N/M" instead of "attempt N/M" - Update success log at line 136: "after N retries" instead of "after N attempts" All tests pass.
Pull Request Review - PR #405Overall AssessmentThis is a well-implemented PR that addresses issue #388 with a solid exponential backoff retry mechanism for Redis connections. The implementation is clean, well-tested, and follows Go best practices. I recommend approval with minor considerations. ✅ Strengths1. Code Quality
2. Algorithm Implementation
3. Test Coverage
4. Observability
|
Summary
Implements exponential backoff with jitter for Redis connection retries to improve reliability and prevent thundering herd problems. Addresses issue #388.
Problem
Previously,
createClient()would try each Redis machine sequentially with no delay between attempts and no retry logic:Solution
Added intelligent retry logic with configurable exponential backoff:
New Features
RetryConfig struct - Tunable retry parameters:
MaxRetries- Maximum retry attempts (default: 3)BaseDelay- Initial backoff delay (default: 100ms)MaxDelay- Maximum backoff delay (default: 5s)JitterFactor- Jitter factor to prevent thundering herd (default: 30%)Exponential Backoff Algorithm:
Enhanced Logging:
Benefits
Testing
TestDefaultRetryConfig- Validates default valuesTestCalculateBackoff- Tests exponential backoff (6 cases)TestCalculateBackoffWithJitter- Validates jitter rangeTestCalculateBackoffMaxDelay- Ensures cap is respectedgo test ./... -shortExample Behavior
Before: Immediate sequential attempts to all machines, no delays
After: Retries with exponential backoff
Resolves #388