Problem
When OpenAPI spec fetch fails, the error is persisted in the OpenAPISpecReady status condition. Due to generation-based caching, the controller does not retry fetching unless the spec changes.
This means:
- Temporary failures (network timeout, server 503) require manual spec edits to trigger retry
- Permanent failures (404, invalid URL) would waste resources if blindly retried
Desired Behavior
The controller should:
-
Distinguish error types
- Temporary: network errors, rate limiting (429), server errors (5xx)
- Permanent: not found (404), auth failures (401/403), invalid URL, spec too large
-
Retry temporary errors automatically
- Without requiring spec changes
- With exponential backoff between attempts
- Limited to prevent infinite retries
-
Stop retrying after limit reached
- Either by number of attempts (e.g., 5 retries)
- Or by time window (e.g., retry for up to 1 hour)
- After limit, treat as permanent failure
-
Persist retry state in status
- Users should see retry progress in conditions or status fields
- Example: "FetchFailed (attempt 3/5)" or "last attempted 2 minutes ago"
-
Reset retry state on:
- Successful fetch
- Spec change (generation update)
-
Never retry permanent errors
- Keep existing cached error behavior for these
Acceptance Criteria
Problem
When OpenAPI spec fetch fails, the error is persisted in the
OpenAPISpecReadystatus condition. Due to generation-based caching, the controller does not retry fetching unless the spec changes.This means:
Desired Behavior
The controller should:
Distinguish error types
Retry temporary errors automatically
Stop retrying after limit reached
Persist retry state in status
Reset retry state on:
Never retry permanent errors
Acceptance Criteria