Skip to content

REST: Add retry and timeout configuration for REST catalog#3418

Open
rahulsmahadev wants to merge 1 commit into
apache:mainfrom
rahulsmahadev:rahul/rest-retry-timeout
Open

REST: Add retry and timeout configuration for REST catalog#3418
rahulsmahadev wants to merge 1 commit into
apache:mainfrom
rahulsmahadev:rahul/rest-retry-timeout

Conversation

@rahulsmahadev
Copy link
Copy Markdown

Rationale for this change

The REST Catalog uses requests with no retries and no timeout configured, so transient 5xx / network failures bubble up immediately and slow servers can hang the client indefinitely. The reporting user on #2772 hit infinite-timeout hangs against a Polaris instance returning 504 from a proxy.

This change adds an optional connection: block on the catalog properties for opting in to a per-request timeout and a retry policy, as proposed in the issue body:

catalog:
  default:
    uri: http://rest-catalog/ws/
    connection:
      timeout: 60                     # seconds, applied to every HTTP call
      retry:
        total: 5
        backoff_factor: 1.0
        status_forcelist: [429, 500, 502, 503, 504]
        allowed_methods: [GET, HEAD, OPTIONS]

Changes:

  • Added connection, connection.timeout, connection.retry catalog properties.
  • Added a private _RetryTimeoutHTTPAdapter that subclasses requests.adapters.HTTPAdapter and applies a default timeout to every call (since requests does not expose a session-wide timeout).
  • The connection.retry sub-mapping is passed verbatim to urllib3.util.retry.Retry, so any kwarg the upstream class accepts can be configured.
  • Both keys are optional and opt-in. When neither is set, the default requests behavior is preserved — no behavior change for existing users.
  • Validation: negative or zero timeout raises ValueError; unknown Retry kwargs are caught and re-raised as ValueError with a clearer message.

The SigV4 adapter mounted at the catalog URI (see #3063) is a longer-prefix match and still wins for that host, so SigV4 users continue to get their existing retry behavior unchanged — this change deliberately stays out of that path.

Are these changes tested?

Yes. Added five tests in tests/catalog/test_rest.py:

  • test_session_without_connection_config_uses_default_adapter — no connection: block → no _RetryTimeoutHTTPAdapter is mounted (default requests behavior preserved).
  • test_session_with_connection_timeout_and_retry — full nested connection: block applies timeout and all Retry kwargs (total, backoff_factor, status_forcelist, allowed_methods).
  • test_session_with_connection_timeout_onlytimeout alone works without a retry block.
  • test_session_with_invalid_connection_timeout_raises — negative timeout raises ValueError.
  • test_session_with_invalid_connection_retry_kwarg_raises — unknown Retry kwarg raises ValueError.

Are there any user-facing changes?

Yes — two new optional REST catalog properties (connection.timeout, connection.retry) documented in mkdocs/docs/configuration.md under "Retry and timeout". No behavior change for users who do not set them.

Closes apache#2772.

The REST Catalog uses requests with no retries and no timeout by default,
so transient 5xx/network failures bubble up immediately and slow servers
can hang the client indefinitely (e.g. a Polaris instance returning 504
from a proxy).

Add an optional connection: block on the catalog properties:

  catalog:
    default:
      uri: http://rest-catalog/ws/
      connection:
        timeout: 60
        retry:
          total: 5
          backoff_factor: 1.0
          status_forcelist: [429, 500, 502, 503, 504]
          allowed_methods: [GET, HEAD, OPTIONS]

connection.retry is passed verbatim to urllib3.util.retry.Retry. Both
keys are optional and opt-in: when neither is set the default requests
behavior is preserved.

Signed-off-by: rahulsmahadev <rahul.mahadev@databricks.com>
@rambleraptor
Copy link
Copy Markdown
Contributor

Can you take a look at the linter failures?

'make lint' will help you run the linters locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants