-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add missing parts for rest client docs #1397
Changes from all commits
931111a
373b8b3
8b88d65
415c3c6
0e1e7b1
7c6dcbd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,18 @@ | ||
--- | ||
title: RESTClient | ||
description: Learn how to use the RESTClient class to interact with RESTful APIs | ||
keywords: [api, http, rest, request, extract, restclient, client, pagination, json, response, data_selector, session, auth, paginator, jsonresponsepaginator, headerlinkpaginator, offsetpaginator, jsonresponsecursorpaginator, queryparampaginator, bearer, token, authentication] | ||
keywords: | ||
[ | ||
api, http, rest, request, extract, restclient, client, | ||
pagination, json, response, data_selector, session, auth, | ||
paginator, jsonresponsepaginator, headerlinkpaginator, offsetpaginator, | ||
jsonresponsecursorpaginator, queryparampaginator, bearer, token, | ||
authentication, reverse etl, json path, openapi, swagger | ||
] | ||
--- | ||
|
||
The `RESTClient` class offers an interface for interacting with RESTful APIs, including features like: | ||
|
||
- automatic pagination, | ||
- various authentication mechanisms, | ||
- customizable request/response handling. | ||
|
@@ -72,31 +80,31 @@ For example, if the API response looks like this: | |
|
||
```json | ||
{ | ||
"posts": [ | ||
{"id": 1, "title": "Post 1"}, | ||
{"id": 2, "title": "Post 2"}, | ||
{"id": 3, "title": "Post 3"} | ||
] | ||
"posts": [ | ||
{ "id": 1, "title": "Post 1" }, | ||
{ "id": 2, "title": "Post 2" }, | ||
{ "id": 3, "title": "Post 3" } | ||
] | ||
} | ||
``` | ||
|
||
The `data_selector` should be set to `"posts"` to extract the list of posts from the response. | ||
The `data_selector` should be set to `"posts"` or `"$.posts"` to extract the list of posts from the response. | ||
|
||
For a nested structure like this: | ||
|
||
```json | ||
{ | ||
"results": { | ||
"posts": [ | ||
{"id": 1, "title": "Post 1"}, | ||
{"id": 2, "title": "Post 2"}, | ||
{"id": 3, "title": "Post 3"} | ||
] | ||
} | ||
"results": { | ||
"posts": [ | ||
{ "id": 1, "title": "Post 1" }, | ||
{ "id": 2, "title": "Post 2" }, | ||
{ "id": 3, "title": "Post 3" } | ||
] | ||
} | ||
} | ||
``` | ||
|
||
The `data_selector` needs to be set to `"results.posts"`. Read more about [JSONPath syntax](https://github.com/h2non/jsonpath-ng?tab=readme-ov-file#jsonpath-syntax) to learn how to write selectors. | ||
The `data_selector` needs to be set to `"results.posts"` or `"$.results.posts"`. Read more about [JSONPath syntax](https://github.com/h2non/jsonpath-ng?tab=readme-ov-file#jsonpath-syntax) to learn how to write selectors. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And for this. Why would we need to have an alternative declaration here? |
||
|
||
### PageData | ||
|
||
|
@@ -133,14 +141,14 @@ Suppose the API response for `https://api.example.com/posts` looks like this: | |
|
||
```json | ||
{ | ||
"data": [ | ||
{"id": 1, "title": "Post 1"}, | ||
{"id": 2, "title": "Post 2"}, | ||
{"id": 3, "title": "Post 3"} | ||
], | ||
"pagination": { | ||
"next": "https://api.example.com/posts?page=2" | ||
} | ||
"data": [ | ||
{ "id": 1, "title": "Post 1" }, | ||
{ "id": 2, "title": "Post 2" }, | ||
{ "id": 3, "title": "Post 3" } | ||
], | ||
"pagination": { | ||
"next": "https://api.example.com/posts?page=2" | ||
} | ||
} | ||
``` | ||
|
||
|
@@ -161,7 +169,6 @@ def get_data(): | |
yield page | ||
``` | ||
|
||
|
||
#### HeaderLinkPaginator | ||
|
||
This paginator handles pagination based on a link to the next page in the response headers (e.g., the `Link` header, as used by GitHub). | ||
|
@@ -432,6 +439,26 @@ for page in client.paginate("/protected/resource"): | |
print(page) | ||
``` | ||
|
||
## Common resource defaults | ||
|
||
In `RESTAPIConfig` you can provide via `resource_defaults` which will then be applied to all requests | ||
|
||
```py | ||
my_params = { | ||
"from_year": 2018, | ||
"end_year": 2024, | ||
} | ||
|
||
source_config: RESTAPIConfig = { | ||
"client": {...}, | ||
"resource_defaults": { | ||
"endpoint": { | ||
"params": my_params, | ||
} | ||
} | ||
} | ||
``` | ||
|
||
Comment on lines
+442
to
+461
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This part does not belong to this document. This is documentation for RESTClient and not rest_api. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it, I will remove it |
||
### API key authentication | ||
|
||
API Key Authentication (`ApiKeyAuth`) is an auth method where the client sends an API key in a custom header (e.g. `X-API-Key: <key>`, or as a query parameter). | ||
|
@@ -481,11 +508,13 @@ response = client.get("/protected/resource") | |
|
||
You can implement custom authentication by subclassing the `AuthConfigBase` class and implementing the `__call__` method: | ||
|
||
**Custom bearer auth:** | ||
|
||
```py | ||
from dlt.sources.helpers.rest_client.auth import AuthConfigBase | ||
|
||
class CustomAuth(AuthConfigBase): | ||
def __init__(self, token): | ||
def __init__(self, token: str): | ||
self.token = token | ||
|
||
def __call__(self, request): | ||
|
@@ -494,6 +523,24 @@ class CustomAuth(AuthConfigBase): | |
return request | ||
``` | ||
|
||
**Custom combined auth:** | ||
Sometimes you need to pass authentication parameters via headers as well as query params | ||
|
||
```py | ||
from dlt.sources.helpers.rest_client.auth import AuthConfigBase | ||
|
||
class CombinedAuth(AuthConfigBase): | ||
def __init__(self, client_id: str, client_secret: str): | ||
self.client_id = client_id | ||
self.client_secret = client_secret | ||
|
||
def __call__(self, request): | ||
# Modify the request object to include the necessary authentication headers and request params | ||
request.headers["Authorization"] = f"Bearer {self.client_secret}" | ||
request.prepare_url(request.url, {"client_id": self.client_id}) | ||
return request | ||
``` | ||
Comment on lines
+526
to
+542
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not convinced we need this example here: the difference that I see here from the previous example is that it shows that |
||
|
||
Then, you can use your custom authentication class with the `RESTClient`: | ||
|
||
```py | ||
|
@@ -518,6 +565,74 @@ client.paginate("/posts", hooks={"response": [custom_response_handler]}) | |
|
||
The handler function may raise `IgnoreResponseException` to exit the pagination loop early. This is useful for the enpoints that return a 404 status code when there are no items to paginate. | ||
|
||
### Incremental loading | ||
|
||
It is often needed to load only the new data based on some incremental property be it timestamp, date and time, integer identifier or a cursor value. | ||
Fortunately our `RESTClient` allows you to elegantly express this behavior. | ||
|
||
Let's use our slightly modified example response json and we want to load new posts as they appear without complete reload of data. | ||
|
||
```json | ||
{ | ||
"data": [ | ||
{ "id": 1, "title": "Post 1", "created_at": "2010-08-21T17:11:27-0400" }, | ||
{ "id": 2, "title": "Post 2", "created_at": "2010-09-21T17:11:27-0400" }, | ||
{ "id": 3, "title": "Post 3", "created_at": "2010-10-21T17:11:27-0400" } | ||
] | ||
} | ||
``` | ||
|
||
To achive our objective we need to use `endpoint.params` by adding the incremental type. | ||
In the following examples we use `id` - primary key and `created_at` - creation datetime. | ||
|
||
**Incremental loading by id** | ||
|
||
```py | ||
source_config: RESTAPIConfig = { | ||
"resources": [ | ||
{ | ||
"name": "get_posts_list", | ||
"table_name": "posts", | ||
"endpoint": { | ||
"data_selector": "$.data", | ||
"path": "/posts", | ||
"params": { | ||
"post_id": { | ||
"type": "incremental", | ||
"cursor_path": "id", | ||
"initial_value": 1, | ||
} | ||
}, | ||
}, | ||
} | ||
] | ||
} | ||
``` | ||
|
||
**Incremental loading by creation date** | ||
|
||
```py | ||
source_config: RESTAPIConfig = { | ||
"resources": [ | ||
{ | ||
"name": "get_posts_list", | ||
"table_name": "posts", | ||
"endpoint": { | ||
"data_selector": "$.data", | ||
"path": "/posts", | ||
"params": { | ||
"creation_date": { | ||
"type": "incremental", | ||
"cursor_path": "created_at", | ||
"initial_value": "2010-08-21T17:11:27-0400", | ||
} | ||
}, | ||
}, | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Comment on lines
+568
to
+635
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not the right place for this section: rest-client.md is only for documenting RESTClient class & relevant functionality and not rest_api source. Incremental loading is covered in rest_api here: https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#incremental-loading |
||
## Shortcut for paginating API responses | ||
|
||
The `paginate()` function provides a shorthand for paginating API responses. It takes the same parameters as the `RESTClient.paginate()` method but automatically creates a RESTClient instance with the specified base URL: | ||
|
@@ -560,7 +675,7 @@ RUNTIME__LOG_LEVEL=INFO python my_script.py | |
``` | ||
|
||
2. Use the [`PageData`](#pagedata) instance to inspect the [request](https://docs.python-requests.org/en/latest/api/#requests.Request) | ||
and [response](https://docs.python-requests.org/en/latest/api/#requests.Response) objects: | ||
and [response](https://docs.python-requests.org/en/latest/api/#requests.Response) objects: | ||
|
||
```py | ||
from dlt.sources.helpers.rest_client import RESTClient | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some people are used to use JSONPath starting with
$.
so this is just to give relation to it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but I think if they know JSONPath already they would know that extended syntax anyway. I would try to optimize here for those unfamiliar with JSONPath. We also link JSONPath docs twice for those who need advanced JSONPath.