# **Requests wrapper: Retrying API requests**

## **Introduction**

`dlt` offers a tailored Python Requests client featuring automatic retries and adjustable timeouts. It enhances the resilience of your API calls against intermittent network errors and unexpected issues, making your pipeline more robust in handling random glitches.

For most use cases, this serves as a direct replacement for the `requests` library. Instead of:

```python
import requests
```
You can use:

```python
from dlt.sources.helpers import requests
```

And proceed as you normally would with `requests`:

```python
response = requests.get(
    'https://example.com/api/contacts',
    headers={'Authorization': MY_API_KEY}
)
data = response.json()
...
```



## **Retry rules**


- The `dlt` requests client automatically sets the default user-agent header to `dlt/{DLT_VERSION_NAME}`.
- By default, failing requests are retried up to `five` times, with an exponentially increasing delay (starting at `1` second for the first retry and up to `16` seconds for the fifth).
- If all retry attempts fail, the client raises a request exception, such as `requests.HTTPError` or `requests.ConnectionTimeout`.

- Retries are triggered under the following conditions:

  - **HTTP Server Errors:** For all status codes in the `500` range and `429` (Too Many Requests).
    > In cases of `429` and `503` responses, if the server includes a Retry-After header, it will override the standard retry delay.
  - **Connection and Timeout Errors:** When the server is unreachable, the connection drops unexpectedly, or the request exceeds the timeout.

### **Example usage**

First, install `dlt`.

In [2]:
%%capture
!pip install dlt

Import the required modules.

In [3]:
import dlt
from dlt.sources.helpers import requests

INFO:numexpr.utils:NumExpr defaulting to 2 threads.


To help us observe the retry process, we'll also set up logging. This step is optional and just for demonstration purposes.

In [1]:
import logging
import sys

# Set up logging to print to the console (NOT IMPORTANT)
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, force=True)
logger = logging.getLogger('dlt.sources.helpers.requests')
logger.setLevel(logging.DEBUG)


Let‚Äôs see `dlt` in action by making a request to an incorrect URL. This will trigger the retry mechanism 5 times.

In [None]:
# Simulate a failure by using an incorrect URL or domain
url = "https://api.githusb.com/repos/dlt-hub/dlt/issues"  # 'githusb' will cause a failure

response = requests.get(url)

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.githusb.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (2): api.githusb.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (3): api.githusb.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (4): api.githusb.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (5): api.githusb.com:443


ConnectionError: HTTPSConnectionPool(host='api.githusb.com', port=443): Max retries exceeded with url: /repos/dlt-hub/dlt/issues (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x792b18854fa0>: Failed to resolve 'api.githusb.com' ([Errno -2] Name or service not known)"))

## **Customizing retry settings**

You can tailor the retry behavior of the `dlt` client by adjusting various settings in your `config.toml` file or by setting environment variables. This allows you to fine-tune how the client handles retries, timeouts, and delays.

```
[runtime]
request_max_attempts = 10       # Stop after 10 retry attempts instead of 5
request_backoff_factor = 1.5    # Multiplier applied to the exponential delays. Default is 1
request_timeout = 120           # Timeout in seconds
request_max_retry_delay = 30    # Cap exponential delay to 30 seconds
```


### **Example configuration**

 Let's customize `dlt`'s retry settings using environment variables. First, let‚Äôs ensure we start with a clean slate by terminating the current execution.

In [None]:
exit()

Next, we‚Äôll define our retry settings using environment variables. We‚Äôll increase the number of retry attempts to 7 and adjust the backoff factor:

In [None]:
import os

# Define configs via environment variables
os.environ['RUNTIME__REQUEST_MAX_ATTEMPTS'] = '7'
os.environ['RUNTIME__REQUEST_BACKOFF_FACTOR'] = '0.5'

Now, let‚Äôs run the same code as before, but this time with the updated settings. With these settings:

- The request will be retried up to 7 times.
- The delay between retries will be shorter, as the backoff factor is set to `0.5`. For example, the seventh retry will have a delay of `0.5 * 64 seconds = 32 seconds`.

In [None]:
import dlt
from dlt.sources.helpers import requests
import logging
import sys

# Set up logging to print to the console
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, force=True)
logger = logging.getLogger('dlt.sources.helpers.requests')
logger.setLevel(logging.DEBUG)

# Simulate a failure by using an incorrect URL or domain
url = "https://api.githusb.com/repos/dlt-hub/dlt/issues"  # 'githusb' will cause a failure

response = requests.get(url)

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.githusb.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (2): api.githusb.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (3): api.githusb.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (4): api.githusb.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (5): api.githusb.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (6): api.githusb.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (7): api.githusb.com:443


ConnectionError: HTTPSConnectionPool(host='api.githusb.com', port=443): Max retries exceeded with url: /repos/dlt-hub/dlt/issues (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7954cd1e9f90>: Failed to resolve 'api.githusb.com' ([Errno -2] Name or service not known)"))

## **Custom client**

If you need more control over how your requests are handled, you can create your own instance of `dlt.sources.requests.Client`. This approach allows you to customize the behavior of the client, including which HTTP status codes and exceptions will trigger retries.

```python
from dlt.sources.helpers import requests

http_client = requests.Client(
    status_codes=(403, 500, 502, 503),
    exceptions=(requests.ConnectionError, requests.ChunkedEncodingError)
)
```

### **Example custom client**

In some cases, you may want to handle specific HTTP status codes differently, such as retrying only when a `403 Forbidden` error occurs.

To specifically handle the `403` error, you can configure the `requests.Client` to retry only on this status code. Additionally, you can remove the default exceptions (like `ConnectionError`) from triggering retries:

In [None]:
from dlt.sources.helpers import requests

http_client = requests.Client(
    status_codes=(403,),
    exceptions=()  # Remove default retries for connection-related errors
)

Now, let‚Äôs simulate a scenario where your requests hit the rate limit on GitHub‚Äôs API (which allows 60 requests per minute). When the rate limit is exceeded, resulting in a `403 Forbidden` response, the custom client will automatically retry the request up to 7 times (based on previous environment settings).

In [None]:
# The correct endpoint
url = "https://api.github.com/repos/dlt-hub/dlt/issues"

# Simulate too many requests - GitHub allows 60 requests per minute
for i in range(70):
    print(f"Attempt {i + 1}")
    response = http_client.get(url)

Attempt 1
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com:443
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/issues HTTP/1.1" 200 None
Attempt 2
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/issues HTTP/1.1" 200 17578
Attempt 3
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/issues HTTP/1.1" 200 17578
Attempt 4
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/issues HTTP/1.1" 200 17578
Attempt 5
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/issues HTTP/1.1" 200 17578
Attempt 6
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/issues HTTP/1.1" 200 17578
Attempt 7
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/issues HTTP/1.1" 200 17578
Attempt 8
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/issues HT

HTTPError: 403 Client Error: rate limit exceeded for url: https://api.github.com/repos/dlt-hub/dlt/issues

If a different error occurs, such as a `404 Not Found` or a connection error, the custom client will not retry because these scenarios are not specified in the custom configuration.


In [None]:
url = "https://api.githusb.com/repos/dlt-hub/dlt/issues"  # Incorrect URL to simulate a different error
response = http_client.get(url)

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.githusb.com:443


ConnectionError: HTTPSConnectionPool(host='api.githusb.com', port=443): Max retries exceeded with url: /repos/dlt-hub/dlt/issues (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7954cd1eb0a0>: Failed to resolve 'api.githusb.com' ([Errno -2] Name or service not known)"))

## **Custom retry condition**

In some situations, particularly when working with non-standard APIs that don't use conventional HTTP error codes, you may need to define a custom retry condition. This can be done by supplying a predicate function that determines whether a request should be retried based on the response content.

```python
from dlt.sources.helpers import requests

def retry_if_error_key(response: Optional[requests.Response], exception: Optional[BaseException]) -> bool:
    """Decide whether to retry the request based on whether
    the json response contains an `error` key
    """
    if response is None:
        # Fall back on the default exception predicate.
        return False
    data = response.json()
    return 'error' in data

http_client = Client(
    retry_condition=retry_if_error_key
)
```

### **Example custom condition**

First, create a function that will serve as your custom retry condition. This function should accept a response and an exception as arguments and return `True` if the request should be retried, or `False` otherwise. The example above retries if the message of the response contains either `Not Found` or `API rate limit exceeded`.

In [None]:
from typing import Optional

def retry_if_error_key(response: Optional[requests.Response], exception: Optional[BaseException]) -> bool:
    """Decide whether to retry the request based on whether
    the JSON response contains a specific error message.
    """
    if response is None:
        # Fall back on the default exception predicate.
        return False

    data = response.json()
    print(data)

    # If the message is 'Not Found' or 'API rate limit exceeded', retry the request.
    if 'Not Found' in data.get('message', '') or 'API rate limit exceeded' in data.get('message', ''):
        return True
    else:
        return False

Now, we‚Äôll create an instance of `requests.Client` with our custom retry condition. We‚Äôll also disable the default retry conditions for status codes and exceptions.

In [None]:
http_client = requests.Client(
    status_codes=(),  # Disable status code retries
    retry_condition=retry_if_error_key,  # Use the custom retry condition
    exceptions=()  # Disable exception-based retries
)

Finally, use the custom client to make a request. If the response contains the message `Not Found` or `API rate limit exceeded`, the custom client will retry the request.

In [None]:
custom_url = "https://api.github.com/repos/dlt-hub/dlt/wrong_endpoint"

response = http_client.get(custom_url)
print(response.json())


DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com:443
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/wrong_endpoint HTTP/1.1" 404 101
{'message': 'Not Found', 'documentation_url': 'https://docs.github.com/rest', 'status': '404'}
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/wrong_endpoint HTTP/1.1" 404 101
{'message': 'Not Found', 'documentation_url': 'https://docs.github.com/rest', 'status': '404'}
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/wrong_endpoint HTTP/1.1" 404 101
{'message': 'Not Found', 'documentation_url': 'https://docs.github.com/rest', 'status': '404'}
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/wrong_endpoint HTTP/1.1" 404 101
{'message': 'Not Found', 'documentation_url': 'https://docs.github.com/rest', 'status': '404'}
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repos/dlt-hub/dlt/wrong

HTTPError: 404 Client Error: Not Found for url: https://api.github.com/repos/dlt-hub/dlt/wrong_endpoint

# **Tenacity: Retrying pipeline steps**

## **Introduction**

`dlt` is designed to be a robust and flexible data loading tool, but by default, it does not automatically retry failed pipeline steps. Instead, this functionality is delegated to the included `helpers` and the `tenacity` library. `Tenacity` is a powerful Python library that provides configurable retry mechanisms, allowing you to control how and when retries should be performed.

## **Example usage**

In this example, we will load a resource with a frozen data types schema contract using `dlt`. We'll then attempt to breach that schema contract, simulating a broken pipeline. The goal is to demonstrate how `tenacity` can be used to retry the pipeline execution upon failure.

Before setting up and running the `dlt` pipeline, use the `exit()` command to clear the previous logging configurations and reset the environment.

In [None]:
exit()

Now that the environment is reset, proceed with setting up your `dlt` pipeline.

In [None]:
import dlt

# Define dlt resource that prevents any changes to the existing data types
@dlt.resource(schema_contract={"data_type": "freeze"})
def no_data_type_changes(input_data):
    yield input_data

pipeline = dlt.pipeline(
    pipeline_name="quick_start", destination="duckdb", dataset_name="mydata"
)

# Initial load with valid data
load_info = pipeline.run(no_data_type_changes([{"id": 1, "name": "Lisa", "age": 40}]), table_name="users")
print(load_info)

Pipeline quick_start load step completed in 1.23 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/quick_start.duckdb location to store data
Load package 1724323074.2368553 is LOADED and contains no failed jobs


Next, we'll try to load data that breaches the schema contract, triggering an exception. We'll use `tenacity` to automatically retry the operation up to 3 times before giving up. In the end we'll see the message `Pipeline failed after retries:...`

In [None]:
from tenacity import stop_after_attempt, retry_if_exception, Retrying, retry, wait_exponential
from dlt.pipeline.helpers import retry_load

try:
    for attempt in Retrying(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1.5, min=4, max=10), retry=retry_if_exception(retry_load(())), reraise=True):
        with attempt:
            # Attempt to load data that violates the schema contract
            load_info = pipeline.run(no_data_type_changes([{"id": "2", "name": "Anna", "age": "forty"}]), table_name="users")
            print(load_info)
except Exception as e:
    print(f"Pipeline failed after retries: {e}")

Pipeline failed after retries: Pipeline execution failed at stage normalize when processing package 1724323100.3811827 with exception:

<class 'dlt.normalize.exceptions.NormalizeJobFailed'>
Job for users.0c3e39979c.typed-jsonl failed terminally in load 1724323100.3811827 with message In schema: quick_start: In Schema: quick_start Table: users Column: age__v_text . Contract on data_type with mode freeze is violated. Trying to create new variant column age__v_text to table users but data_types are frozen..


You can also use `tenacity` to decorate functions. This example additionally retries on `extract`.

In [None]:
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1.5, min=4, max=10), retry=retry_if_exception(retry_load(("extract", "load"))), reraise=True)
def load():
    return pipeline.run(no_data_type_changes([{"id": "2", "name": "Anna", "age": "forty"}]), table_name="users")

load_info = load()
print(load_info)

PipelineStepFailed: Pipeline execution failed at stage normalize when processing package 1724323100.3811827 with exception:

<class 'dlt.normalize.exceptions.NormalizeJobFailed'>
Job for users.0c3e39979c.typed-jsonl failed terminally in load 1724323100.3811827 with message In schema: quick_start: In Schema: quick_start Table: users Column: age__v_text . Contract on data_type with mode freeze is violated. Trying to create new variant column age__v_text to table users but data_types are frozen..

# **Contact / Support**

For guidance on running custom pipelines with `dlt`, [join our community Slack](https://dlthub-community.slack.com/ssb/redirect).

Checkout our [GitHub](https://github.com/dlt-hub) and perhaps bestow upon us a star ‚≠ê. It's like knighting us in the realm of code! üè∞‚öîÔ∏è