Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add missing parts for rest client docs #1397

Closed
wants to merge 6 commits into from

Conversation

sultaniman
Copy link
Collaborator

@sultaniman sultaniman commented May 22, 2024

This PR improves and add more details on certain features of our rest client like

  • common parameters via resource_defaults,
  • addition data_selector path examples,
  • More TODO

Copy link

netlify bot commented May 22, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 7c6dcbd
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/664de6998c75930008442115
😎 Deploy Preview https://deploy-preview-1397--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@sultaniman sultaniman requested a review from burnash May 22, 2024 12:20
Copy link
Collaborator

@burnash burnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments

@@ -80,7 +80,7 @@ For example, if the API response looks like this:
}
```

The `data_selector` should be set to `"posts"` to extract the list of posts from the response.
The `data_selector` should be set to `"posts"` or `"$.posts"` to extract the list of posts from the response.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some people are used to use JSONPath starting with $. so this is just to give relation to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but I think if they know JSONPath already they would know that extended syntax anyway. I would try to optimize here for those unfamiliar with JSONPath. We also link JSONPath docs twice for those who need advanced JSONPath.

@@ -96,7 +96,7 @@ For a nested structure like this:
}
```

The `data_selector` needs to be set to `"results.posts"`. Read more about [JSONPath syntax](https://github.com/h2non/jsonpath-ng?tab=readme-ov-file#jsonpath-syntax) to learn how to write selectors.
The `data_selector` needs to be set to `"results.posts"` or `"$.results.posts"`. Read more about [JSONPath syntax](https://github.com/h2non/jsonpath-ng?tab=readme-ov-file#jsonpath-syntax) to learn how to write selectors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for this. Why would we need to have an alternative declaration here?

Comment on lines +435 to +454
## Common resource defaults

In `RESTAPIConfig` you can provide via `resource_defaults` which will then be applied to all requests

```py
my_params = {
"from_year": 2018,
"end_year": 2024,
}

source_config: RESTAPIConfig = {
"client": {...},
"resource_defaults": {
"endpoint": {
"params": my_params,
}
}
}
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part does not belong to this document. This is documentation for RESTClient and not rest_api.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I will remove it

Comment on lines +568 to +635
### Incremental loading

It is often needed to load only the new data based on some incremental property be it timestamp, date and time, integer identifier or a cursor value.
Fortunately our `RESTClient` allows you to elegantly express this behavior.

Let's use our slightly modified example response json and we want to load new posts as they appear without complete reload of data.

```json
{
"data": [
{ "id": 1, "title": "Post 1", "created_at": "2010-08-21T17:11:27-0400" },
{ "id": 2, "title": "Post 2", "created_at": "2010-09-21T17:11:27-0400" },
{ "id": 3, "title": "Post 3", "created_at": "2010-10-21T17:11:27-0400" }
]
}
```

To achive our objective we need to use `endpoint.params` by adding the incremental type.
In the following examples we use `id` - primary key and `created_at` - creation datetime.

**Incremental loading by id**

```py
source_config: RESTAPIConfig = {
"resources": [
{
"name": "get_posts_list",
"table_name": "posts",
"endpoint": {
"data_selector": "$.data",
"path": "/posts",
"params": {
"post_id": {
"type": "incremental",
"cursor_path": "id",
"initial_value": 1,
}
},
},
}
]
}
```

**Incremental loading by creation date**

```py
source_config: RESTAPIConfig = {
"resources": [
{
"name": "get_posts_list",
"table_name": "posts",
"endpoint": {
"data_selector": "$.data",
"path": "/posts",
"params": {
"creation_date": {
"type": "incremental",
"cursor_path": "created_at",
"initial_value": "2010-08-21T17:11:27-0400",
}
},
},
}
]
}
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place for this section: rest-client.md is only for documenting RESTClient class & relevant functionality and not rest_api source. Incremental loading is covered in rest_api here: https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#incremental-loading

Comment on lines +526 to +542
**Custom combined auth:**
Sometimes you need to pass authentication parameters via headers as well as query params

```py
from dlt.sources.helpers.rest_client.auth import AuthConfigBase

class CombinedAuth(AuthConfigBase):
def __init__(self, client_id: str, client_secret: str):
self.client_id = client_id
self.client_secret = client_secret

def __call__(self, request):
# Modify the request object to include the necessary authentication headers and request params
request.headers["Authorization"] = f"Bearer {self.client_secret}"
request.prepare_url(request.url, {"client_id": self.client_id})
return request
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we need this example here: the difference that I see here from the previous example is that it shows that request is a PreparedRequest instance (because we can call prepare_url on it). I think showing that request is a PreparedRequest has a value, so instead of the example I would add explicit types in the previous example (like you did with token: str) and add a text description elaborating on what __call__ actually is receiving and linking Requests docs from PreparedRequest so the reader can quickly see what's possible.

@sultaniman sultaniman closed this May 24, 2024
@sultaniman sultaniman deleted the docs/rest-client-hackaton-notes branch May 24, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants