# Pagination, Authentication and dlt Cofiguration

## Pagination
- It is used to limit how much data is sent at once via an API
- If an endpoint supports the `per_page` query parameter then you can decide how many results you want to process at a time

In [6]:
import requests

# Github provides two parameters
# per_page - results per page
# page - page number to retreive results from

response = requests.get("https://api.github.com/orgs/dlt-hub/events?per_page=10&page=1")
response.links

{'next': {'url': 'https://api.github.com/organizations/89419010/events?per_page=10&page=2',
  'rel': 'next'},
 'last': {'url': 'https://api.github.com/organizations/89419010/events?per_page=10&page=29',
  'rel': 'last'}}

### dlt RESTClient
- dlt has a helper to handle pagination and to manage repetitive tasks such as 
    - authentication
    - query parameter handling
    - pagination

In [10]:
from dlt.sources.helpers.rest_client import RESTClient
client = RESTClient(base_url="https://api.github.com")

i = 0
for page in client.paginate("orgs/dlt-hub/events"):
    if i < 5: # print only first 5 pages
        print(page)
        i+=1

[{'id': '46696466465', 'type': 'IssueCommentEvent', 'actor': {'id': 40209326, 'login': 'netlify[bot]', 'display_login': 'netlify', 'gravatar_id': '', 'url': 'https://api.github.com/users/netlify[bot]', 'avatar_url': 'https://avatars.githubusercontent.com/u/40209326?'}, 'repo': {'id': 452221115, 'name': 'dlt-hub/dlt', 'url': 'https://api.github.com/repos/dlt-hub/dlt'}, 'payload': {'action': 'created', 'issue': {'url': 'https://api.github.com/repos/dlt-hub/dlt/issues/2330', 'repository_url': 'https://api.github.com/repos/dlt-hub/dlt', 'labels_url': 'https://api.github.com/repos/dlt-hub/dlt/issues/2330/labels{/name}', 'comments_url': 'https://api.github.com/repos/dlt-hub/dlt/issues/2330/comments', 'events_url': 'https://api.github.com/repos/dlt-hub/dlt/issues/2330/events', 'html_url': 'https://github.com/dlt-hub/dlt/pull/2330', 'id': 2860938017, 'node_id': 'PR_kwDOGvRYu86Ln9JU', 'number': 2330, 'title': 'small improvements to cli docs generator', 'user': {'login': 'sh-rp', 'id': 1155738, 

There are different types of paginations, in the above code dlt automatically inferred the type but we can specify it as well

- JSONLinkPaginator - link to the next page is included in the JSON response.
- HeaderLinkPaginator - link to the next page is included in the response headers.
- OffsetPaginator - pagination based on offset and limit query parameters.
- PageNumberPaginator - pagination based on page numbers.
- JSONResponseCursorPaginator - pagination based on a cursor in the JSON response.
- HeaderCursorPaginator - pagination based on a cursor in the response headers.

In [14]:
from dlt.sources.helpers.rest_client.paginators import HeaderLinkPaginator

client = RESTClient(
    base_url="https://api.github.com",
    paginator=HeaderLinkPaginator()
)

### Exercise 1: Pagination with RESTClient
Question: What type of pagination should we use for the GitHub API?

In [18]:
response = requests.get("https://api.github.com/orgs/dlt-hub/events?per_page=10&page=1")
print(response.headers)

{'Date': 'Sun, 23 Feb 2025 09:08:52 GMT', 'Server': 'Varnish', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'deny', 'X-XSS-Protection': '1; mode=block', 'Content-Security-Policy': "default-src 'none'; style-src 'unsafe-inline'", 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-RateLimit-Used, X-RateLimit-Resource, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset', 'Content-Type': 'application/json; charset=utf-8', 'Referrer-Policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'X-GitHub-Media-Type': 'github.v3; format=json', 'X-RateLimit-Limit': '60', 'X-RateLimit-Remaining': '0', 'X-RateLimit-Reset': '1740304296', 'X-RateLimit-Resource': 'core', 'X-RateLimit-Used': '60', 'Content-Length': '278', 'X-GitH

The header contains `Link` showing the next page, so GitHub uses HeaderLinkPaginator

```
{'Date': 'Sun, 23 Feb 2025 09:09:31 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Cache-Control': 'public, max-age=60, s-maxage=60', 'Vary': 'Accept,Accept-Encoding, Accept, X-Requested-With', 'ETag': 'W/"b1c22a97c4cacc94cff289841fb952e9b6c9293f7838d5a87900d5e1ae651c97"', 'Last-Modified': 'Sun, 23 Feb 2025 08:48:57 GMT', 'X-Poll-Interval': '60', 'X-GitHub-Media-Type': 'github.v3; format=json', 'Link': '<https://api.github.com/organizations/89419010/events?per_page=10&page=2>; rel="next", <https://api.github.com/organizations/89419010/events?per_page=10&page=29>; rel="last"', 'x-github-api-version-selected': '2022-11-28', 'Access-Control-Expose-Headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Frame-Options': 'deny', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '0', 'Referrer-Policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'Content-Security-Policy': "default-src 'none'", 'Content-Encoding': 'gzip', 'Server': 'github.com', 'Accept-Ranges': 'bytes', 'X-RateLimit-Limit': '60', 'X-RateLimit-Remaining': '59', 'X-RateLimit-Reset': '1740305371', 'X-RateLimit-Resource': 'core', 'X-RateLimit-Used': '1', 'Transfer-Encoding': 'chunked', 'X-GitHub-Request-Id': 'C8B6:3C37A6:1340157:2706F30:67BAE5CB'}
```


## Authentication

In [24]:
import os

github_token = os.getenv("GITHUB_TOKEN")

print(github_token)

None
