# Advanced Problems – The `requests` Library

This notebook contains **advanced, realistic problems** that will push you beyond basic `requests.get(...)` calls.

Each problem is phrased as a task you might encounter when building real-world HTTP clients, followed by a **worked solution** that illustrates best practices (timeouts, error handling, sessions, streaming, auth, etc.).


## Environment & Imports

> You can run this notebook as-is, but a few cells expect network access.  
> For APIs that require authentication, you will need to provide your own tokens via environment variables.


In [1]:
import os
import time
import json
from typing import Any, Dict, Iterable, Optional

import requests
from requests import Session, Response


---

## Problem 1 – Robust `GET` with Retries and Timeouts

You are calling a flaky HTTP service where:

* some requests time out,
* some return `5xx` status codes,
* but retrying usually succeeds.

### Task

Write a function:

```python
def robust_get(
    url: str,
    *,
    params: Optional[Dict[str, Any]] = None,
    headers: Optional[Dict[str, str]] = None,
    timeout: float = 5.0,
    max_retries: int = 3,
    backoff_factor: float = 0.5,
    status_forcelist: Iterable[int] = (500, 502, 503, 504),
) -> Response:
    ...
```

**Requirements:**

1. Use a single `requests.Session()` inside the function.
2. For each failed attempt (timeout *or* response with status in `status_forcelist`), retry **up to** `max_retries` times.
3. Wait between retries using exponential backoff:

   \[
   \text{sleep} = \text{backoff\_factor} \times 2^{\text{attempt}}
   \]

   where `attempt` starts at `0` for the first retry.
4. If all retries fail, re-raise the *last* exception or `HTTPError`.
5. Use `response.raise_for_status()` so callers see exceptions for non-2xx responses not in the retry list.


### Solution – Robust `GET` with Retries and Timeouts


In [2]:
def robust_get(
    url: str,
    *,
    params: Optional[Dict[str, Any]] = None,
    headers: Optional[Dict[str, str]] = None,
    timeout: float = 5.0,
    max_retries: int = 3,
    backoff_factor: float = 0.5,
    status_forcelist: Iterable[int] = (500, 502, 503, 504),
) -> Response:
    """Perform a robust GET request with basic retry + exponential backoff.

    This implementation avoids external dependencies (like urllib3's Retry)
    and uses plain Python to illustrate the ideas.
    """
    session = Session()
    last_exc: Optional[BaseException] = None
    status_forcelist = set(status_forcelist)

    for attempt in range(max_retries + 1):
        try:
            resp = session.get(url, params=params, headers=headers, timeout=timeout)
            # If we get a response, check if it's in the retryable status list
            if resp.status_code in status_forcelist:
                # Raise an HTTPError so we can treat it in the except block
                resp.raise_for_status()
            # For non-2xx, raise_for_status will raise and never reach here
            resp.raise_for_status()
            return resp
        except (requests.Timeout, requests.ConnectionError, requests.HTTPError) as exc:
            last_exc = exc
            # If this was the last allowed attempt, break and re-raise
            if attempt == max_retries:
                break

            # Simple exponential backoff
            sleep_for = backoff_factor * (2 ** attempt)
            print(f"Attempt {attempt + 1} failed ({exc}); retrying in {sleep_for:.2f} s...")
            time.sleep(sleep_for)

    # If we exit the loop without returning, all retries failed
    assert last_exc is not None
    raise last_exc


In [3]:
# Example usage (this may succeed or fail depending on network conditions):
resp = robust_get("https://httpbin.org/status/500")  # often returns 500
print(resp.status_code)


Attempt 1 failed (503 Server Error: Service Temporarily Unavailable for url: https://httpbin.org/status/500); retrying in 0.50 s...
Attempt 2 failed (503 Server Error: Service Temporarily Unavailable for url: https://httpbin.org/status/500); retrying in 1.00 s...
Attempt 3 failed (503 Server Error: Service Temporarily Unavailable for url: https://httpbin.org/status/500); retrying in 2.00 s...


HTTPError: 503 Server Error: Service Temporarily Unavailable for url: https://httpbin.org/status/500

---

## Problem 2 – Small API Client Class Using `Session`

You often need to call the same API many times.  
Using a persistent `Session`:

* reuses TCP connections,
* lets you set headers (e.g., authentication) once,
* centralizes error handling.

### Task

Implement a small API client for `https://jsonplaceholder.typicode.com` with the following features:

1. A class `JSONPlaceholderClient` that:
   * stores `base_url` and a private `Session`,
   * sets a `User-Agent` header on all requests,
   * sets a default timeout (configurable via constructor).
2. Methods:
   * `get(path: str, **kwargs) -> Response` – low-level method.
   * `get_json(path: str, **kwargs) -> Any` – returns parsed JSON, raising for non-2xx.
3. A convenience method:

   ```python
   def get_post(self, post_id: int) -> Dict[str, Any]:
       ...
   ```

   which fetches `/posts/{post_id}` and returns the JSON as a dict.

Add **docstrings** and ensure that:

* `raise_for_status()` is used in one place only (inside `get_json`).
* Callers may still use `get` if they want to handle status codes manually.


### Solution – API Client Class


In [4]:
class JSONPlaceholderClient:
    """Minimal API client for jsonplaceholder.typicode.com using requests.Session."""

    def __init__(self, base_url: str = "https://jsonplaceholder.typicode.com", *, timeout: float = 5.0) -> None:
        self.base_url = base_url.rstrip("/")
        self.timeout = timeout
        self.session = Session()
        # Set headers that will be sent on every request through this session
        self.session.headers.update({
            "User-Agent": "JSONPlaceholderClient/1.0 (+https://example.com/your-app)",
            "Accept": "application/json",
        })

    def _build_url(self, path: str) -> str:
        if not path.startswith("/"):
            path = "/" + path
        return self.base_url + path

    def get(self, path: str, **kwargs) -> Response:
        """Low-level GET. Caller is responsible for checking status codes.

        Extra kwargs (params, headers, timeout, etc.) are passed directly to `Session.get`.
        """
        url = self._build_url(path)
        timeout = kwargs.pop("timeout", self.timeout)
        return self.session.get(url, timeout=timeout, **kwargs)

    def get_json(self, path: str, **kwargs) -> Any:
        """GET + raise_for_status + JSON decoding.

        If the server does not return JSON, this will raise ValueError from `response.json()`.
        """
        resp = self.get(path, **kwargs)
        resp.raise_for_status()
        return resp.json()

    def get_post(self, post_id: int) -> Dict[str, Any]:
        """Return a single post as a dict, or raise HTTPError on failure."""
        return self.get_json(f"/posts/{post_id}")


# Example usage (requires network access):
# client = JSONPlaceholderClient()
# post = client.get_post(1)
# print(post)


---

## Problem 3 – Handling Pagination + Simple Rate Limiting (GitHub API)

Many APIs paginate large result sets.  
GitHub's API (v3) uses query parameters like `per_page` and `page`, and also includes pagination links in the `Link` header.

### Task

Write a function that:

```python
def fetch_all_public_repos(
    user: str,
    *,
    per_page: int = 50,
    max_pages: int = 10,
    sleep_between: float = 1.0,
) -> list[dict]:
    ...
```

**Requirements:**

1. Use the endpoint: `GET https://api.github.com/users/{user}/repos`.
2. Respect pagination using `page` and `per_page` parameters.
3. After each request, `time.sleep(sleep_between)` to avoid hitting rate limits.
4. Stop when:
   * a page returns an empty list, **or**
   * you reached `max_pages`.
5. Use a personal access token from the environment variable `GITHUB_TOKEN` *if* it is present:

   * If provided, send `Authorization: Bearer <token>` header.
   * If not present, just call the API unauthenticated (with lower rate limits).
6. Use `raise_for_status()` for error handling.


### Solution – Pagination + Simple Rate Limiting


In [5]:
def fetch_all_public_repos(
    user: str,
    *,
    per_page: int = 50,
    max_pages: int = 10,
    sleep_between: float = 1.0,
) -> list[dict]:
    """Fetch public repositories for a GitHub user with naive pagination + rate limiting.

    Returns a list of repository objects (dicts) as returned by GitHub's API.
    """
    base_url = f"https://api.github.com/users/{user}/repos"
    token = os.environ.get("GITHUB_TOKEN")

    headers = {
        "Accept": "application/vnd.github+json",
        "User-Agent": "AdvancedRequestsNotebook/1.0",
    }
    if token:
        headers["Authorization"] = f"Bearer {token}"

    session = Session()
    all_repos: list[dict] = []

    for page in range(1, max_pages + 1):
        params = {"per_page": per_page, "page": page}
        resp = session.get(base_url, headers=headers, params=params, timeout=10.0)
        resp.raise_for_status()
        data = resp.json()

        # GitHub returns a list of repos; an empty list means no more data
        if not data:
            break

        all_repos.extend(data)

        # Simple rate limiting: sleep between pages
        time.sleep(sleep_between)

    return all_repos


# Example usage (will hit the live GitHub API if you run it):
# repos = fetch_all_public_repos("python")
# print(len(repos))
# print(repos[0]["full_name"])


---

## Problem 4 – Streaming Download to Disk

Large file downloads should not load the entire body into memory.  
Instead, you can **stream** the response and write it chunk-by-chunk to disk.

### Task

Implement:

```python
def download_file(
    url: str,
    dest_path: str,
    *,
    chunk_size: int = 8192,
    timeout: float = 10.0,
) -> None:
    ...
```

**Requirements:**

1. Use `stream=True` and `iter_content(chunk_size=...)`.
2. Use `raise_for_status()` for HTTP errors.
3. Make sure that if an exception is raised, any partially written file is **removed**.
4. Use a context manager (`with`) for both the request and the file.


### Solution – Streaming Download


In [6]:
from pathlib import Path


def download_file(
    url: str,
    dest_path: str,
    *,
    chunk_size: int = 8192,
    timeout: float = 10.0,
) -> None:
    """Download a file via HTTP to `dest_path` using streaming.

    Ensures partial files are cleaned up on error.
    """
    dest = Path(dest_path)
    # Ensure parent directory exists
    dest.parent.mkdir(parents=True, exist_ok=True)

    try:
        with requests.get(url, stream=True, timeout=timeout) as resp:
            resp.raise_for_status()
            with dest.open("wb") as f:
                for chunk in resp.iter_content(chunk_size=chunk_size):
                    # filter out keep-alive chunks
                    if not chunk:
                        continue
                    f.write(chunk)
    except Exception:
        # Clean up partial file on any failure
        if dest.exists():
            dest.unlink()
        raise


# Example usage (downloads a small image from httpbin):
# download_file("https://httpbin.org/image/png", "downloads/httpbin_image.png")
# print("Downloaded!")


---

## Problem 5 – File Upload + Extra Form Data

Sometimes you must upload files using `multipart/form-data`.  
`requests` makes this easy via the `files` parameter.

### Task

Using `https://httpbin.org/post` (which echoes back what you send):

Implement:

```python
def upload_report(
    url: str,
    report_path: str,
    *,
    metadata: Optional[Dict[str, Any]] = None,
    timeout: float = 10.0,
) -> dict:
    ...
```

**Requirements:**

1. Send the file contents from `report_path` under the field name `"report"`.
2. Send `metadata` (if given) as additional form fields using the `data` parameter.
3. Use `raise_for_status()` and return the parsed JSON body.
4. Make sure the file is opened using a `with` statement so it is always closed.


### Solution – File Upload


In [7]:
def upload_report(
    url: str,
    report_path: str,
    *,
    metadata: Optional[Dict[str, Any]] = None,
    timeout: float = 10.0,
) -> dict:
    """Upload a file + optional metadata to the given URL.

    Returns the JSON response.
    """
    metadata = metadata or {}

    with open(report_path, "rb") as f:
        files = {
            # (filename, fileobj, content_type) – content_type is optional
            "report": (Path(report_path).name, f, "application/octet-stream"),
        }
        resp = requests.post(url, files=files, data=metadata, timeout=timeout)
        resp.raise_for_status()
        return resp.json()


# Example usage (requires a real file on disk):
# response_json = upload_report(
#     "https://httpbin.org/post",
#     "example_report.txt",
#     metadata={"project": "alpha", "owner": "alice"},
# )
# print(json.dumps(response_json["files"], indent=2))
# print(json.dumps(response_json["form"], indent=2))


---

## Problem 6 – Custom Authentication with `AuthBase`

Some APIs use a proprietary authentication scheme like:

```text
X-Api-Key: <key>
X-Client-Id: <client_id>
```

You can encapsulate this logic in a custom auth object by subclassing `requests.auth.AuthBase`.

### Task

1. Implement a class `HeaderTokenAuth` that:
   * takes `api_key` and `client_id` in the constructor,
   * on each request, adds the headers:
     * `X-Api-Key`
     * `X-Client-Id`
2. Demonstrate using it with `Session` to call `https://httpbin.org/headers` and show that the headers were sent.

> This pattern keeps authentication logic reusable and testable.


### Solution – Custom `AuthBase` Implementation


In [8]:
from requests.auth import AuthBase


class HeaderTokenAuth(AuthBase):
    """Attach custom headers for API key + client id authentication."""

    def __init__(self, api_key: str, client_id: str) -> None:
        self.api_key = api_key
        self.client_id = client_id

    def __call__(self, r: requests.PreparedRequest) -> requests.PreparedRequest:
        # This method is called by requests just before sending the request.
        r.headers.setdefault("X-Api-Key", self.api_key)
        r.headers.setdefault("X-Client-Id", self.client_id)
        return r


# Example usage with httpbin (it just echoes headers back):
# auth = HeaderTokenAuth(api_key="SECRET-KEY", client_id="my-client-id")
# with requests.Session() as session:
#     resp = session.get("https://httpbin.org/headers", auth=auth)
# resp.raise_for_status()
# print(json.dumps(resp.json(), indent=2))


---

## Summary

In this notebook you have implemented and explored:

* robust `GET` requests with retries and timeouts,
* a reusable API client class using `Session`,
* paginated API calls with basic rate limiting,
* streaming downloads to disk,
* file uploads with multipart forms,
* custom authentication via `AuthBase`.

These patterns cover many real-world use cases when building HTTP clients with the `requests` library.  
Feel free to extend these utilities or adapt them to the specific APIs you work with.
