Skip to content

OpenAPI fetch should retry temporary errors with backoff limit #59

@eguzki

Description

@eguzki

Problem

When OpenAPI spec fetch fails, the error is persisted in the OpenAPISpecReady status condition. Due to generation-based caching, the controller does not retry fetching unless the spec changes.

This means:

  • Temporary failures (network timeout, server 503) require manual spec edits to trigger retry
  • Permanent failures (404, invalid URL) would waste resources if blindly retried

Desired Behavior

The controller should:

  1. Distinguish error types

    • Temporary: network errors, rate limiting (429), server errors (5xx)
    • Permanent: not found (404), auth failures (401/403), invalid URL, spec too large
  2. Retry temporary errors automatically

    • Without requiring spec changes
    • With exponential backoff between attempts
    • Limited to prevent infinite retries
  3. Stop retrying after limit reached

    • Either by number of attempts (e.g., 5 retries)
    • Or by time window (e.g., retry for up to 1 hour)
    • After limit, treat as permanent failure
  4. Persist retry state in status

    • Users should see retry progress in conditions or status fields
    • Example: "FetchFailed (attempt 3/5)" or "last attempted 2 minutes ago"
  5. Reset retry state on:

    • Successful fetch
    • Spec change (generation update)
  6. Never retry permanent errors

    • Keep existing cached error behavior for these

Acceptance Criteria

  • Temporary errors retry without manual intervention
  • Retries use exponential backoff
  • Retries stop after reaching limit
  • Permanent errors don't retry
  • Retry state is visible to users
  • Tests verify temporary vs permanent error handling

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions