Skip to content

Retry and timeout configuration for REST catalog #2772

@skrydal

Description

@skrydal

Feature Request / Improvement

Problem statement
REST Catalog client uses requests library to perform HTTP calls. Currently, if there is an intermittent network/server problem resulting in an error code being returned, pyiceberg library would not retry. This is a common problems for out-of-box requests usage. Also timeout for the calls is not set.

Solution outline
Since retrying/timeout functionalities are very much needed for any HTTP clients, there are classes, available off-the-shelf, which can be used:

  • urllib3.util.retry.Retry - to define strategy
  • requests.adapters.HTTPAdapter - to use the strategy

HTTPAdapter object can be mounted by request's session.
Timeout could be provided directly to the http calls made by the session, but even better HTTPAdapter is commonly extended, to include the timeout parameter in all calls.

There would need to be couple of new parameters provided to the catalog config. They could look as below:

       type: 'rest'
       uri: "http://localhost:8181"
       connection:
         retry:
           backoff_factor: 0.5
           total: 3
           status_forcelist: [500]
         timeout: 120

retry key value could be fed directly into the __init__ of urllibs Retry class.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions