Retries that don't overwhelm your servers.
A smart retry library for TypeScript/JavaScript that prevents retry amplification in distributed systems. Unlike aggressive retry libraries, polite-retry knows when to back off.
Documentation · API Reference · Examples
Based on research from Retry Amplification in Distributed Systems: A Systematic Analysis of Retry Policies and Their Role in Cascading Failures.
Naive retry policies can make system failures worse. When a service experiences partial failure:
- Clients retry failed requests
- Retried requests add load to an already stressed system
- Increased load causes more failures
- More failures trigger more retries
- Cascade collapse
This is called Retry Amplification. In a 3-tier system with 50% failure rate and 3 retries per tier, request volume can amplify by 6.6x.
This library provides three retry strategies with increasing sophistication:
| Strategy | Use Case | Amplification Risk |
|---|---|---|
retry() |
Simple retries with backoff/jitter | Medium |
retryWithCircuitBreaker() |
Stop retrying when service is down | Low |
retryWithBudget() |
Adaptive Retry Budgeting (ARB) | Very Low |
npm install polite-retryimport { retry } from 'polite-retry';
const data = await retry(
async () => {
const response = await fetch('https://api.example.com/data');
if (!response.ok) throw new Error(`HTTP ${response.status}`);
return response.json();
},
{
maxRetries: 3,
initialDelayMs: 100,
jitter: 'full', // Prevents synchronized retry storms
}
);import { retryWithCircuitBreaker, CircuitBreaker } from 'polite-retry';
const breaker = new CircuitBreaker({
failureThreshold: 0.5, // Open after 50% failure rate
windowSize: 10, // Over last 10 requests
resetTimeoutMs: 30000, // Try again after 30s
});
const data = await retryWithCircuitBreaker(
async () => fetchFromService(),
breaker,
{ maxRetries: 3 }
);import { retryWithBudget, AdaptiveRetryBudget } from 'polite-retry';
// Create a shared budget manager (one per downstream service)
const budget = new AdaptiveRetryBudget({
initialBudget: 0.2, // Allow 20% retry overhead initially
highFailureThreshold: 0.3, // Reduce budget when >30% failing
lowFailureThreshold: 0.05, // Restore budget when <5% failing
onBudgetChange: (budget, rate) => {
console.log(`Retry budget: ${(budget * 100).toFixed(1)}%, failure rate: ${(rate * 100).toFixed(1)}%`);
}
});
// Use for all requests to this service
const data = await retryWithBudget(
async () => fetchFromService(),
budget,
{ maxRetries: 3, jitter: 'full' }
);
// Get metrics
console.log(budget.getMetrics());
// { totalRequests: 150, successfulRequests: 140, failedRequests: 10,
// totalRetries: 15, failureRate: 0.08, retryAmplificationFactor: 1.11 }
// Clean up when shutting down
budget.dispose();Basic retry with exponential backoff and jitter.
function retry<T>(
fn: () => Promise<T>,
options?: RetryOptions
): Promise<T>Options:
| Option | Type | Default | Description |
|---|---|---|---|
maxRetries |
number | 3 | Maximum retry attempts |
initialDelayMs |
number | 100 | Initial backoff delay |
maxDelayMs |
number | 30000 | Maximum backoff delay |
backoffMultiplier |
number | 2 | Exponential multiplier |
jitter |
string | 'full' | Jitter strategy: 'none', 'full', 'equal', 'decorrelated' |
retryIf |
function | always | Predicate to decide if error should trigger retry |
onRetry |
function | - | Callback before each retry |
timeoutMs |
number | - | Timeout per attempt |
| Strategy | Formula | Best For |
|---|---|---|
none |
delay |
Testing only (causes retry storms) |
full |
random(0, delay) |
General use - best spread |
equal |
delay/2 + random(0, delay/2) |
When minimum delay is important |
decorrelated |
random(base, prevDelay * 3) |
Correlated retry sequences |
The ARB algorithm dynamically adjusts retry budget based on observed failure rates.
const budget = new AdaptiveRetryBudget({
initialBudget: 0.2, // 20% initial retry overhead
budgetIncreaseRate: 0.1, // Increase by 10% when stable
budgetDecreaseRate: 0.5, // Decrease by 50% when failing
highFailureThreshold: 0.3, // >30% failures = reduce budget
lowFailureThreshold: 0.05, // <5% failures = restore budget
adjustmentIntervalMs: 1000,
checkBackpressure: async () => {
// Optional: check if downstream is signaling overload
return false;
}
});Prevents requests when a service is known to be failing.
const breaker = new CircuitBreaker({
failureThreshold: 0.5, // 50% failure rate opens circuit
windowSize: 10, // Consider last 10 requests
resetTimeoutMs: 30000, // Wait 30s before testing
onStateChange: (state) => console.log(`Circuit: ${state}`)
});
// States: 'closed' (normal), 'open' (blocking), 'half-open' (testing)Without jitter, clients retry at synchronized intervals, creating periodic load spikes:
// Bad - no jitter
{ jitter: 'none' }
// Good - full jitter
{ jitter: 'full' }Share a single AdaptiveRetryBudget instance for all requests to the same service:
// Good - shared budget
const paymentServiceBudget = new AdaptiveRetryBudget();
app.post('/checkout', async (req, res) => {
await retryWithBudget(() => paymentService.charge(), paymentServiceBudget);
});
app.post('/refund', async (req, res) => {
await retryWithBudget(() => paymentService.refund(), paymentServiceBudget);
});More than 3-5 retries rarely helps and increases amplification risk:
// Industry guidance: 3 retries is usually sufficient
{ maxRetries: 3 }Set timeouts to fail fast rather than holding connections:
{ timeoutMs: 5000 } // 5 second timeout per attemptNot all errors should trigger retries:
{
retryIf: (error) => {
// Don't retry client errors
if (error.message.includes('400')) return false;
if (error.message.includes('401')) return false;
if (error.message.includes('403')) return false;
// Retry server errors and network issues
return true;
}
}Backpressure allows downstream services to tell upstream callers "I'm overloaded, stop retrying." This prevents retry amplification during failures.
┌──────────┐ ┌──────────┐
│ Client │ ───── Request ─────► │ Server │
│ │ ◄─── Response ────── │ │
│ │ + Headers: │ │
│ │ X-Backpressure: 0.85 │
│ │ Retry-After: 5 │ │
└──────────┘ └──────────┘
│ │
│ If X-Backpressure > 0.8 │
│ → Stop retrying │
│ → Wait Retry-After seconds │
└──────────────────────────────────┘
import express from 'express';
import {
RequestCounter,
createBackpressureMiddleware
} from 'polite-retry';
const app = express();
const MAX_CONCURRENT = 100;
// Option 1: Use RequestCounter (automatic tracking)
const counter = new RequestCounter();
app.use(counter.middleware()); // Automatically tracks active requests
app.use(createBackpressureMiddleware({
getLoadLevel: () => counter.getCount() / MAX_CONCURRENT,
overloadThreshold: 0.8,
}));
// Option 2: Manual tracking (if you need more control)
let activeRequests = 0;
app.use((req, res, next) => {
activeRequests++;
res.on('finish', () => activeRequests--);
res.on('close', () => activeRequests--);
next();
});
app.use(createBackpressureMiddleware({
getLoadLevel: () => activeRequests / MAX_CONCURRENT,
overloadThreshold: 0.8,
}));This adds headers to every response:
X-Backpressure: 0.75- Current load level (0.0 to 1.0)X-Load-Shedding: true- When overloadedRetry-After: 5- Suggested wait time in seconds
import {
retryWithBudget,
AdaptiveRetryBudget,
BackpressureManager
} from 'polite-retry';
// Track backpressure signals from each service
const backpressure = new BackpressureManager();
// Create budget that checks backpressure before retrying
const budget = new AdaptiveRetryBudget({
checkBackpressure: () => backpressure.isOverloaded('payment-service'),
});
// Make requests and record backpressure signals
async function callPaymentService(data: PaymentRequest) {
const response = await retryWithBudget(
async () => {
const res = await fetch('https://payment-service/charge', {
method: 'POST',
body: JSON.stringify(data),
});
// Record backpressure signal from response headers
backpressure.recordFromHeaders('payment-service', res.headers);
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
},
budget,
{ maxRetries: 3 }
);
return response;
}If you can't use middleware, add headers manually:
app.get('/api/data', (req, res) => {
const load = activeRequests / maxRequests;
// Always send load level
res.setHeader('X-Backpressure', load.toFixed(2));
// Signal overload if above 80%
if (load > 0.8) {
res.setHeader('X-Load-Shedding', 'true');
res.setHeader('Retry-After', '5');
// Optionally reject request entirely
if (load > 0.95) {
return res.status(503).json({ error: 'Service overloaded' });
}
}
// Process request...
});For gRPC, use metadata instead of headers:
// Server: Add backpressure to trailing metadata
const metadata = new grpc.Metadata();
metadata.set('x-backpressure', loadLevel.toString());
callback(null, response, metadata);
// Client: Extract from trailing metadata
const call = client.getData(request);
call.on('metadata', (metadata) => {
const load = metadata.get('x-backpressure')[0];
backpressure.recordSignal('grpc-service', {
isOverloaded: parseFloat(load) > 0.8,
loadLevel: parseFloat(load),
});
});Track retry behavior to detect problems:
const budget = new AdaptiveRetryBudget({
onBudgetChange: (budget, failureRate) => {
// Send to your metrics system
metrics.gauge('retry.budget', budget);
metrics.gauge('retry.failure_rate', failureRate);
}
});
// Periodically log metrics
setInterval(() => {
const m = budget.getMetrics();
metrics.gauge('retry.amplification_factor', m.retryAmplificationFactor);
}, 10000);MIT