retryify

Production-grade async retry with shared budgets, jittered backoff, and per-attempt timeouts.

The problem

Naive retry loops cause retry storms — when a dependency fails, every caller retries simultaneously, multiplying load by max_attempts. This cascading amplification turns a partial outage into a total one. retryify solves this with:

Shared retry budgets that cap the retry ratio across all callers
Jittered backoff that decorrelates retry timing
Rich predicates that handle real-world responses (HTTP 429, partial success)
Dual timeouts that distinguish "this attempt is slow" from "give up entirely"

Quick start

use retryify::*;
use std::time::Duration;

let result = retry(exponential())
    .with_full_jitter()
    .max_attempts(5)
    .per_attempt_timeout(Duration::from_secs(5))
    .total_timeout(Duration::from_secs(30))
    .run(
        |result: &Result<Response, MyError>, _attempt| match result {
            Err(e) if e.is_transient() => RetryDecision::Retry,
            _ => RetryDecision::Stop,
        },
        || async { call_service().await },
    )
    .await?;

Shared budget

use retryify::*;
use std::time::Duration;

// Create once, clone to each retry site
let budget = RetryBudget::shared()
    .ratio(0.2)           // 20 retries per 100 successes
    .min_per_second(1.0)  // floor during low traffic
    .window(Duration::from_secs(60))
    .build();

// Service A
let b = budget.clone();
retry(exponential()).with_full_jitter().budget(b).run(/* ... */);

// Service B — shares the same token pool
let b = budget.clone();
retry(exponential()).with_full_jitter().budget(b).run(/* ... */);

When the budget is exhausted, retries halt immediately — the strongest protection against retry storms.

Backoff strategies

Strategy	Formula	Use case
`exponential()`	`base * multiplier^attempt` (capped at `max`)	Default for network calls
`linear(base, step)`	`base + step * attempt`	Predictable growth
`constant(delay)`	Fixed delay	Idempotent operations

// Exponential with custom parameters
exponential().base(Duration::from_millis(200)).multiplier(3.0).max(Duration::from_secs(60))

// Linear: 100ms, 200ms, 300ms, ...
linear(Duration::from_millis(100), Duration::from_millis(100))

// Constant: always 500ms
constant(Duration::from_millis(500))

Jitter strategies

Strategy	Formula	Notes
`FullJitter`	`rand(0, base)`	AWS recommendation — maximum decorrelation
`EqualJitter`	`base/2 + rand(0, base/2)`	Guaranteed minimum spacing
`NoJitter`	`base`	Tests only — correlated retries in production cause cascading failures

The builder enforces jitter selection at compile time — you cannot accidentally forget it:

// Won't compile without choosing a jitter strategy:
retry(exponential())
    .with_full_jitter()  // or .without_jitter() for tests
    .max_attempts(5)
    // ...

Timeout semantics

retryify deliberately avoids a single .timeout() method. Ambiguous timeout semantics are a common source of production incidents.

Method	Scope	On expiry
`.per_attempt_timeout(d)`	Single attempt	Attempt is cancelled, retry continues
`.total_timeout(d)`	Entire lifecycle	Returns `RetryError::DeadlineExceeded`

Why this matters: A 5-second "timeout" could mean "cancel this one slow call and try again" or "give up on the entire operation." These are radically different behaviors. Making the distinction explicit prevents a class of outages where per-attempt timeouts were accidentally used as total deadlines (or vice versa).

The total timeout also clamps sleep durations: if only 2 seconds remain in the budget, a 10-second backoff is reduced to 2 seconds.

Design decisions

Why `RetryDecision` instead of `bool`

Real retry logic is richer than retry/don't-retry:

HTTP 429 responses include Retry-After headers
Rate limiters may specify exact cooldown periods
Some failures should use longer backoff than the default

RetryDecision::RetryAfter(Duration) captures this — the delay is max(jittered_backoff, retry_after).

Why `&Result<T, E>` instead of `&E`

Many retryable conditions are not errors:

HTTP 429 (rate limited) — the request "succeeded" but must be retried
HTTP 503 (service unavailable) — valid response, retryable condition
Partial success responses that need full retry

The predicate sees the full result so it can match on Ok variants.

Why shared budgets

Without a budget, N callers × M max_attempts = N*M requests hitting a failing dependency. A shared budget (token bucket) ensures the total retry rate stays proportional to the success rate, regardless of how many retry sites exist. This is the single most important mechanism for preventing retry-induced cascading failures.

Telemetry

// Closure hook
.on_retry(|event: &RetryEvent| {
    metrics::counter!("retries", 1, "attempt" => event.attempt.to_string());
})

// Structured tracing
.instrument()  // emits tracing::warn! at target "retryify"

Minimum supported Rust version

1.75 (for RPITIT support)

License

MIT OR Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
examples		examples
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

retryify

The problem

Quick start

Shared budget

Backoff strategies

Jitter strategies

Timeout semantics

Design decisions

Why `RetryDecision` instead of `bool`

Why `&Result<T, E>` instead of `&E`

Why shared budgets

Telemetry

Minimum supported Rust version

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

retryify

The problem

Quick start

Shared budget

Backoff strategies

Jitter strategies

Timeout semantics

Design decisions

Why RetryDecision instead of bool

Why &Result<T, E> instead of &E

Why shared budgets

Telemetry

Minimum supported Rust version

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Why `RetryDecision` instead of `bool`

Why `&Result<T, E>` instead of `&E`

Packages