From 011c443594b7fc1d51d9bf246d0d2e9e575610d6 Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Wed, 12 Apr 2023 16:33:27 +0200 Subject: [PATCH 01/12] authentication documentation --- .../connector-builder-ui/authentication.md | 176 ++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 docs/connector-development/connector-builder-ui/authentication.md diff --git a/docs/connector-development/connector-builder-ui/authentication.md b/docs/connector-development/connector-builder-ui/authentication.md new file mode 100644 index 00000000000000..ad3724f730cf2b --- /dev/null +++ b/docs/connector-development/connector-builder-ui/authentication.md @@ -0,0 +1,176 @@ +# Authentication + +Almost every API requires some form of authentication when issuing requests so the API provider can check whether the caller (in this case the connector) has permission to fetch data. The authentication feature provides a secure way to configure authentication using a variety of methods. + +The credentials itself (e.g. username and password) are _not_ specified as part of the connector, instead they are part of the source configration that is specified by the end user when setting up a source based on the connector. During development, it's possible to provide testing credentials in the "Testing values" menu, but those are not saved along with the connector. Credentials that are part of the source configuration are stored in a secure way in your Airbyte instance while the connector configuration is saved in the regular database. + +In the "Authentication" section on the "Global Configuration" page in the connector builder, the authentication method can be specified. This configuration is shared for all streams - it's not possible to use different authentication methods for different streams in the same connector. + +If your API doesn't need authentication, leave it set at "No auth". This means the connector will be able to make requests to the API without providing any credentials which is possible for some public APIs or private APIs only available in local networks. + +## Authentication methods + +Check the documentation of the API you want to integrate for the used authentication method. The following ones are supported in the connector builder: +* [Basic HTTP](#basic-http) +* [Bearer](#bearer) +* [API Key](#api-key) +* [OAuth](#oauth) +* [Session token](#session-token) + +Select the matching authentication method for your API and check the sections below for more information about individual methods. + +### Basic HTTP + +If requests are authenticated using the Basic HTTP authentication method, the documentation page will likely contain one of the following keywords: +- "Basic Auth" +- "Basic HTTP" +- "Authorization: Basic" +- "Base64" + +The Basic HTTP authentication method is a standard and doesn't require any further configuration. Username and password are set via "Testing values" in the connector builder and as part of the source configuration when configuring connections. + +#### Example + +The [Greenhouse API](https://developers.greenhouse.io/harvest.html#introduction) is an API using basic auth. + +Sometimes, only a username and no password is required, like for the [Chargebee API](https://apidocs.chargebee.com/docs/api/auth?prod_cat_ver=2) - in these cases simply leave the password input empty. + +In the basic auth scheme, the supplied username and password are concatenated with a colon `:` and encoded using the base64 algorithm. For username `user` and password `passwd`, the base64-encoding of `user:passwd` is `dXNlcjpwYXNzd2Q=`. + +When fetching records, this string is sent as part of the `Authorization` header: +``` +curl -X GET \ + -H "Authorization: Basic dXNlcjpwYXNzd2Q=" \ + https://harvest.greenhouse.io/v1/ +``` + +### Bearer + +If requests are authenticated using Bearer authentication, the documentation will probably mention "bearer token" or "token authentication". In this scheme, the `Authorization` header of the HTTP request is set to `Bearer `. + +Like the Basic HTTP authentication it does not require further configuration. The bearer token can be set via "Testing values" in the connector builder as well as paert of the source configuration when configuring connections. + +#### Example + +The [Sendgrid API](https://docs.sendgrid.com/api-reference/how-to-use-the-sendgrid-v3-api/authentication) and the [Square API](https://developer.squareup.com/docs/build-basics/access-tokens) are supporting Bearer authentication. + +When fetching records, the token is sent along as the `Authorization` header: +``` +curl -X GET \ + -H "Authorization: Bearer " \ + https://api.sendgrid.com/ +``` + +### API Key + +The API key authentication method is similar to the Bearer authentication but allows to configure as which HTTP header the API key is sent as part of the request. The http header name is part of the connector definition while the API key itself can be set via "Testing values" in the connector builder as well as part of the source configuration when configuring connections. + +This form of authentication is often called "(custom) header authentication". + +#### Example + +The [CoinAPI.io API](https://docs.coinapi.io/market-data/rest-api#authorization) is using API key authentication via the `X-CoinAPI-Key` header. + +When fetching records, the api token is sent along as the configured header: +``` +curl -X GET \ + -H "X-CoinAPI-Key: " \ + https://rest.coinapi.io/v1/ +``` + +### OAuth + +The OAuth authentication method implements authentication using an OAuth2.0 flow with a refresh token grant type. + +In this scheme the OAuth endpoint of an API is called with a long-lived refresh token provided as part of the source configuration to obtain a short-lived access token that's used to make requests actually extracting records. If the access token expires, a new one is requested automatically. + +It needs to be configured with the endpoint to call to obtain access tokens with the refresh token. OAuth client id/secret and the refresh token are provided via "Testing values" in the connector builder as well as part of the source configuration when configuring connections. + + +Depending on how the refresh endpoint is implemented exactly, additional configuration might be necessary to specify how to request an access token with the right permissions (configuring OAuth scopes and grant type) and how to extract the access token and the expiry date out of the response (configuring expiry date format and property name as well as the access key property name): +* Scopes - if not specified, no scopes are sent along with the refresh token request +* Grant type - if not specified, it's set to `refresh_token` +* Token expiry property name - if not specified, it's set to `expires_in` +* Token expire property date format - if not specified, the expiry property is interpreted as the number of seconds the access token will be valid +* Access token property name - the name of the property in the response that contains the access token to do requests. If not specified, it's set to `access_token` + +If the refresh token itself isn't long lived but expires after a short amount of time and needs to be refresh as well or if other grant types like PKCE are required, it's not possible to use the connector builder with OAuth authentication - check out the [compatibility guide](http://localhost:3000/connector-development/config-based/connector-builder-compatibility#oauth) for more information. + +Keep in mind that the OAuth authentication method does not implement a single-click authentication experience for the end user configuring the connector - it will still be necessary to obtain client id, client secret and refresh token from the API and manually enter them into the configuration form. + +#### Example + +The [Square API](https://developer.squareup.com/docs/build-basics/access-tokens#get-an-oauth-access-token) supports OAuth. + +In this case, the authentication method has to be configured like this: +* "Token refresh endpoint" is `https://connect.squareup.com/oauth2/token` +* "Token expiry property name" is `expires_at` + +When running a sync, the connector is first sending client id, client secret and refresh token to the token refresh endpoint: +``` + +curl -X POST \ + -H "Content-Type: application/json" \ + -d '{"client_id": "", "client_secret": "", "refresh_token": "", "grant_type": "refresh_token" }' \ + +``` + +The response is a JSON object containing an `access_token` property and an `expires_at` property: +``` + {"access_token":"", "expires_at": "2023-12-12T00:00:00"} +``` + +The `expires_at` date tells the connector how long the access token can be used - if this point in time is passed, a new access token is requested automatically. + +When fetching records, the access token is sent along as part of the `Authorization` header: +``` +curl -X GET \ + -H "Authorization: Bearer " \ + https://connect.squareup.com/v2/ +``` + + +### Session token + +The session token authentication method is the right choice if the API requires a consumer to send a request to "log in", returning a session token which can be used to send request actually extracting records. + +Username and password are supplied as "Testing values" / as part of the source configuration and are sent to the configured "Login url" as a JSON body with a `username` and a `password` property. The "Session token response key" specifies where to find the session token in the log-in response. The "Header" specifies in which HTTP header field the session token has to be injected. + +Besides specifying username and password, the session token authentication method also allows + +to directly specify the session token via the source configuration. In this case, the "Validate session url" is used to check whether the provided session token is still valid. + +#### Example + +The [Metabase API](https://www.metabase.com/learn/administration/metabase-api#authenticate-your-requests-with-a-session-token) is providing a session authentication scheme. + +In this case, the authentication method has to be configured like this: +* "Login url" is `session` +* "Session token response key" is `id` +* "Header" is `X-Metabase-Session` +* "Validate session url" is `user/current` + +When running a sync, the connector is first sending username and password to the `session` +endpoint: +``` +curl -X POST \ + -H "Content-Type: application/json" \ + -d '{"username": "", "password": ""}' \ + http:///session +``` + +The response is a JSON object containing an `id` property: +``` + {"id":""} +``` + +When fetching records, the session token is sent along as the `X-Metabase-Session` header: +``` +curl -X GET \ + -H "X-Metabase-Session: " \ + http:/// +``` + +## Reference + +For detailed documentation of the underlying low code components, see [here TODO](TODO) \ No newline at end of file From eea60d765d9dfca762ad72ada23b47557a72b46e Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Wed, 12 Apr 2023 16:44:48 +0200 Subject: [PATCH 02/12] some fixes --- .../connector-builder-ui/authentication.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/connector-development/connector-builder-ui/authentication.md b/docs/connector-development/connector-builder-ui/authentication.md index ad3724f730cf2b..ee9cea651b9758 100644 --- a/docs/connector-development/connector-builder-ui/authentication.md +++ b/docs/connector-development/connector-builder-ui/authentication.md @@ -1,12 +1,12 @@ # Authentication -Almost every API requires some form of authentication when issuing requests so the API provider can check whether the caller (in this case the connector) has permission to fetch data. The authentication feature provides a secure way to configure authentication using a variety of methods. +Authentication is about providing some form of confidental secret (credentials, password, token, key) to the API provider that identifies the caller (in this case the connector). It allows the provider to check whether the caller is known and has sufficient permission to fetch data. The authentication feature provides a secure way to configure authentication using a variety of methods. The credentials itself (e.g. username and password) are _not_ specified as part of the connector, instead they are part of the source configration that is specified by the end user when setting up a source based on the connector. During development, it's possible to provide testing credentials in the "Testing values" menu, but those are not saved along with the connector. Credentials that are part of the source configuration are stored in a secure way in your Airbyte instance while the connector configuration is saved in the regular database. In the "Authentication" section on the "Global Configuration" page in the connector builder, the authentication method can be specified. This configuration is shared for all streams - it's not possible to use different authentication methods for different streams in the same connector. -If your API doesn't need authentication, leave it set at "No auth". This means the connector will be able to make requests to the API without providing any credentials which is possible for some public APIs or private APIs only available in local networks. +If your API doesn't need authentication, leave it set at "No auth". This means the connector will be able to make requests to the API without providing any credentials which might be the case for some public open APIs or private APIs only available in local networks. ## Authentication methods @@ -171,6 +171,15 @@ curl -X GET \ http:/// ``` +If the session token is specified as part of the source configuration, the "Validate session url" is requested with the existing token to check for validity: +``` +curl -X GET \ + -H "X-Metabase-Session: " \ + http:///user/current +``` + +If this request returns an HTTP status code that's not in the range of 400 to 600, the token is considered valid and records are fetched. + ## Reference For detailed documentation of the underlying low code components, see [here TODO](TODO) \ No newline at end of file From e7f5688e00631dbc3f2b87b0e98c318d910c317c Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Wed, 12 Apr 2023 16:48:03 +0200 Subject: [PATCH 03/12] fix broken link --- .../connector-builder-ui/authentication.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/connector-builder-ui/authentication.md b/docs/connector-development/connector-builder-ui/authentication.md index ee9cea651b9758..06faddd2763386 100644 --- a/docs/connector-development/connector-builder-ui/authentication.md +++ b/docs/connector-development/connector-builder-ui/authentication.md @@ -182,4 +182,4 @@ If this request returns an HTTP status code that's not in the range of 400 to 60 ## Reference -For detailed documentation of the underlying low code components, see [here TODO](TODO) \ No newline at end of file +For detailed documentation of the underlying low code components, see here TODO \ No newline at end of file From f88c88b63dfad40e948558e75d92a9fbd4090eb7 Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Thu, 13 Apr 2023 09:50:59 +0200 Subject: [PATCH 04/12] remove session token auth --- .../connector-builder-ui/authentication.md | 52 ------------------- 1 file changed, 52 deletions(-) diff --git a/docs/connector-development/connector-builder-ui/authentication.md b/docs/connector-development/connector-builder-ui/authentication.md index 06faddd2763386..c33c7e34f0bd90 100644 --- a/docs/connector-development/connector-builder-ui/authentication.md +++ b/docs/connector-development/connector-builder-ui/authentication.md @@ -15,7 +15,6 @@ Check the documentation of the API you want to integrate for the used authentica * [Bearer](#bearer) * [API Key](#api-key) * [OAuth](#oauth) -* [Session token](#session-token) Select the matching authentication method for your API and check the sections below for more information about individual methods. @@ -129,57 +128,6 @@ curl -X GET \ https://connect.squareup.com/v2/ ``` - -### Session token - -The session token authentication method is the right choice if the API requires a consumer to send a request to "log in", returning a session token which can be used to send request actually extracting records. - -Username and password are supplied as "Testing values" / as part of the source configuration and are sent to the configured "Login url" as a JSON body with a `username` and a `password` property. The "Session token response key" specifies where to find the session token in the log-in response. The "Header" specifies in which HTTP header field the session token has to be injected. - -Besides specifying username and password, the session token authentication method also allows - -to directly specify the session token via the source configuration. In this case, the "Validate session url" is used to check whether the provided session token is still valid. - -#### Example - -The [Metabase API](https://www.metabase.com/learn/administration/metabase-api#authenticate-your-requests-with-a-session-token) is providing a session authentication scheme. - -In this case, the authentication method has to be configured like this: -* "Login url" is `session` -* "Session token response key" is `id` -* "Header" is `X-Metabase-Session` -* "Validate session url" is `user/current` - -When running a sync, the connector is first sending username and password to the `session` -endpoint: -``` -curl -X POST \ - -H "Content-Type: application/json" \ - -d '{"username": "", "password": ""}' \ - http:///session -``` - -The response is a JSON object containing an `id` property: -``` - {"id":""} -``` - -When fetching records, the session token is sent along as the `X-Metabase-Session` header: -``` -curl -X GET \ - -H "X-Metabase-Session: " \ - http:/// -``` - -If the session token is specified as part of the source configuration, the "Validate session url" is requested with the existing token to check for validity: -``` -curl -X GET \ - -H "X-Metabase-Session: " \ - http:///user/current -``` - -If this request returns an HTTP status code that's not in the range of 400 to 600, the token is considered valid and records are fetched. - ## Reference For detailed documentation of the underlying low code components, see here TODO \ No newline at end of file From 08a99789e50d5a0461805a80e46f8ee117e2e172 Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Thu, 13 Apr 2023 10:47:21 +0200 Subject: [PATCH 05/12] review comments --- .../connector-builder-ui/authentication.md | 41 +++++++++++++------ 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/docs/connector-development/connector-builder-ui/authentication.md b/docs/connector-development/connector-builder-ui/authentication.md index c33c7e34f0bd90..0fee4bb842aff1 100644 --- a/docs/connector-development/connector-builder-ui/authentication.md +++ b/docs/connector-development/connector-builder-ui/authentication.md @@ -1,10 +1,10 @@ # Authentication -Authentication is about providing some form of confidental secret (credentials, password, token, key) to the API provider that identifies the caller (in this case the connector). It allows the provider to check whether the caller is known and has sufficient permission to fetch data. The authentication feature provides a secure way to configure authentication using a variety of methods. +Authentication allows the connection to check whether it has sufficient permission to fetch data. The authentication feature provides a secure way to configure authentication using a variety of methods. The credentials itself (e.g. username and password) are _not_ specified as part of the connector, instead they are part of the source configration that is specified by the end user when setting up a source based on the connector. During development, it's possible to provide testing credentials in the "Testing values" menu, but those are not saved along with the connector. Credentials that are part of the source configuration are stored in a secure way in your Airbyte instance while the connector configuration is saved in the regular database. -In the "Authentication" section on the "Global Configuration" page in the connector builder, the authentication method can be specified. This configuration is shared for all streams - it's not possible to use different authentication methods for different streams in the same connector. +In the "Authentication" section on the "Global Configuration" page in the connector builder, the authentication method can be specified. This configuration is shared for all streams - it's not possible to use different authentication methods for different streams in the same connector. In case your API uses multiple or custom authentication methods, you can use the [low-code CDK](/connector-development/config-based/low-code-cdk-overview) or [Python CDK](/connector-development/cdk-python/). If your API doesn't need authentication, leave it set at "No auth". This means the connector will be able to make requests to the API without providing any credentials which might be the case for some public open APIs or private APIs only available in local networks. @@ -12,7 +12,7 @@ If your API doesn't need authentication, leave it set at "No auth". This means t Check the documentation of the API you want to integrate for the used authentication method. The following ones are supported in the connector builder: * [Basic HTTP](#basic-http) -* [Bearer](#bearer) +* [Bearer Token](#bearer-token) * [API Key](#api-key) * [OAuth](#oauth) @@ -30,11 +30,11 @@ The Basic HTTP authentication method is a standard and doesn't require any furth #### Example -The [Greenhouse API](https://developers.greenhouse.io/harvest.html#introduction) is an API using basic auth. +The [Greenhouse API](https://developers.greenhouse.io/harvest.html#introduction) is an API using basic authentication. Sometimes, only a username and no password is required, like for the [Chargebee API](https://apidocs.chargebee.com/docs/api/auth?prod_cat_ver=2) - in these cases simply leave the password input empty. -In the basic auth scheme, the supplied username and password are concatenated with a colon `:` and encoded using the base64 algorithm. For username `user` and password `passwd`, the base64-encoding of `user:passwd` is `dXNlcjpwYXNzd2Q=`. +In the basic authentication scheme, the supplied username and password are concatenated with a colon `:` and encoded using the base64 algorithm. For username `user` and password `passwd`, the base64-encoding of `user:passwd` is `dXNlcjpwYXNzd2Q=`. When fetching records, this string is sent as part of the `Authorization` header: ``` @@ -43,7 +43,7 @@ curl -X GET \ https://harvest.greenhouse.io/v1/ ``` -### Bearer +### Bearer Token If requests are authenticated using Bearer authentication, the documentation will probably mention "bearer token" or "token authentication". In this scheme, the `Authorization` header of the HTTP request is set to `Bearer `. @@ -70,7 +70,7 @@ This form of authentication is often called "(custom) header authentication". The [CoinAPI.io API](https://docs.coinapi.io/market-data/rest-api#authorization) is using API key authentication via the `X-CoinAPI-Key` header. -When fetching records, the api token is sent along as the configured header: +When fetching records, the api token is included in the request using the configured header: ``` curl -X GET \ -H "X-CoinAPI-Key: " \ @@ -79,12 +79,11 @@ curl -X GET \ ### OAuth -The OAuth authentication method implements authentication using an OAuth2.0 flow with a refresh token grant type. +The OAuth authentication method implements authentication using an [OAuth2.0 flow with a refresh token grant type](https://oauth.net/2/grant-types/refresh-token/). -In this scheme the OAuth endpoint of an API is called with a long-lived refresh token provided as part of the source configuration to obtain a short-lived access token that's used to make requests actually extracting records. If the access token expires, a new one is requested automatically. - -It needs to be configured with the endpoint to call to obtain access tokens with the refresh token. OAuth client id/secret and the refresh token are provided via "Testing values" in the connector builder as well as part of the source configuration when configuring connections. +In this scheme the OAuth endpoint of an API is called with a long-lived refresh token provided as part of the source configuration to obtain a short-lived access token that's used to make requests actually extracting records. If the access token expires, the connection will automatically request a new one. +The connector needs to be configured with the endpoint to call to obtain access tokens with the refresh token. OAuth client id/secret and the refresh token are provided via "Testing values" in the connector builder as well as part of the source configuration when configuring connections. Depending on how the refresh endpoint is implemented exactly, additional configuration might be necessary to specify how to request an access token with the right permissions (configuring OAuth scopes and grant type) and how to extract the access token and the expiry date out of the response (configuring expiry date format and property name as well as the access key property name): * Scopes - if not specified, no scopes are sent along with the refresh token request @@ -93,7 +92,7 @@ Depending on how the refresh endpoint is implemented exactly, additional configu * Token expire property date format - if not specified, the expiry property is interpreted as the number of seconds the access token will be valid * Access token property name - the name of the property in the response that contains the access token to do requests. If not specified, it's set to `access_token` -If the refresh token itself isn't long lived but expires after a short amount of time and needs to be refresh as well or if other grant types like PKCE are required, it's not possible to use the connector builder with OAuth authentication - check out the [compatibility guide](http://localhost:3000/connector-development/config-based/connector-builder-compatibility#oauth) for more information. +If the API uses a short-lived refresh token that expires after a short amount of time and needs to be refreshed as well or if other grant types like PKCE are required, it's not possible to use the connector builder with OAuth authentication - check out the [compatibility guide](/connector-development/config-based/connector-builder-compatibility#oauth) for more information. Keep in mind that the OAuth authentication method does not implement a single-click authentication experience for the end user configuring the connector - it will still be necessary to obtain client id, client secret and refresh token from the API and manually enter them into the configuration form. @@ -128,6 +127,24 @@ curl -X GET \ https://connect.squareup.com/v2/ ``` +### Other authentication methods + +If your API is not using one of the natively supported authentication methods, it's still possible to build an Airbyte connector as described below. + +#### Access token as query or body parameter + +Some APIs require to include the access token in different parts of the request (for example as a request parameter). For example, the [Breezometer API](https://docs.breezometer.com/api-documentation/introduction/#authentication) is using this kind of authentication. In these cases it's also possible to configure authentication manually: +* Add a user input as secret field on the "User inputs" page (e.g. named `api_key`) +* On the stream page, add a new "Request parameter" +* As key, configure the name of the query parameter the API requires (e.g. named `key`) +* As value, configure a placeholder for the created user input (e.g. `{{ config['api_key'] }}`) + +The same approach can be used to add the token to the request body. + +#### Custom authentication methods + +Some APIs require complex custom authentication schemes involving signing requests or doing multiple requests to authenticate. In these cases, it's required to use the [low-code CDK](/connector-development/config-based/low-code-cdk-overview) or [Python CDK](/connector-development/cdk-python/). + ## Reference For detailed documentation of the underlying low code components, see here TODO \ No newline at end of file From bbdeb997f2a75240ee3d6f4fd62189e2b5269e3c Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Thu, 13 Apr 2023 15:44:01 +0200 Subject: [PATCH 06/12] first version of tutorial --- .../connector-builder-ui/tutorial.md | 199 ++++++++++++++++++ 1 file changed, 199 insertions(+) create mode 100644 docs/connector-development/connector-builder-ui/tutorial.md diff --git a/docs/connector-development/connector-builder-ui/tutorial.md b/docs/connector-development/connector-builder-ui/tutorial.md new file mode 100644 index 00000000000000..194ec99d024285 --- /dev/null +++ b/docs/connector-development/connector-builder-ui/tutorial.md @@ -0,0 +1,199 @@ +# Tutorial + +Throughout this tutorial, we'll walk you through the creation of an Airbyte connector using the connector builder UI to read and extract data from an HTTP API. + +We'll build a connector reading data from the Exchange Rates API, but the steps apply to other HTTP APIs you might be interested in integrating with. + +The API documentations can be found [here](https://apilayer.com/marketplace/exchangerates_data-api). +In this tutorial, we will read data from the following endpoints: + +- `Latest Rates Endpoint` +- `Historical Rates Endpoint` + +With the end goal of implementing a source connector with a single `Stream` containing exchange rates going from a base currency to many other currencies. +The output schema of our stream will look like the following: + +```json +{ + "base": "USD", + "date": "2022-07-15", + "rates": { + "CAD": 1.28, + "EUR": 0.98 + } +} +``` + +## Exchange Rates API Setup + +Before we get started, you'll need to generate an API access key for the Exchange Rates API. +This can be done by signing up for the Free tier plan on [Exchange Rates API](https://exchangeratesapi.io/): + +1. Visit https://exchangeratesapi.io and click "Get free API key" on the top right +2. You'll be taken to https://apilayer.com -- finish the sign up process, signing up for the free tier +3. Once you're signed in, visit https://apilayer.com/marketplace/exchangerates_data-api#documentation-tab and click "Live Demo" +4. Inside that editor, you'll see an API key. This is your API key. + +## Requirements + +- An Exchange Rates API key +- An Airbyte Cloud account or an OSS Airbyte deployment version 0.43.0 or greater + +## The connector builder project + +When developing a connector using the connector builder UI, the current state is saved in a connector builder project. These projects are saved as part of the Airbyte workspace and separate from your source configurations and connections. In the last step of this tutorial you will publish the connector builder project to make it ready to use in connections to run syncs. + +To get started, follow the following steps: +* Go to the connector builder page by clicking the "Builder" item in the left hand navigation bar +* Select "Start from scratch" to start a new connector builder project +* Set the connector name to "Exchange rates" + +Your connector builder project is now set up. The next steps describe how to configure your connector to extract records from the Exchange rates API + +## Global configuration + +On the "global configuration" page, general settings applying to all streams are configured - the base URL requests are sent to as well as configuration for how to authenticate with the API server. + +* Set the base URL to `https://api.apilayer.com` +* Select the "API Key" authentication method +* Set the "Header" to `apikey` + +The actual API Key you copied from apilayer.com will not be part of the connector itself - instead it will be set as part of the source configuration when configuring a connection based on your connector in a later step. + +## Setup and test a stream + +Now that you configured how to talk the API, let's set up the stream of records that will be sent to a destination later on. To do so, click the button with the plus icon next to the "Streams" section in the side menu and fill out the form: +* Set the name to "Rates" +* Set the "URL path" to `/exchangerates_data/latest` +* Submit +* Set the "Record selector" to `rates` + +Now the basic stream is configured and can be tested. To send test requests, suppliy testing values by clicking the "Testing values" button in top right and pasting your API key in the input. + +This form corresponds to what a user of this connector will need to provide when setting up a connection later on. The testing values are not saved along with the connector project. + +Now, click the "Test" button to trigger a test read to simulate what will happen during a sync. After a little while, you should see a single record that looks like this: +``` +{ + "base": "EUR", + "date": "2023-04-13", + "rates": { + "AED": 4.053261, + "AFN": 95.237669, + "ALL": 112.964844, + "AMD": 432.048005, + // ... + } +} +``` + +In a real sync, this record will be passed on to a destination like a warehouse. + +The request/response tabs are helpful during development to see which requests and responses your connector will send and receive from the API. + +## Declaring the record schema + +Each stream of a connector needs to declare how emitted records will look like (which properties do they have, what data types will be used, ...). During a sync, this information will be passed on to the destination to configure it correctly - for example a SQL database destination can use it to properly set up the destination table, assigning the right type to each column. This process is called "normalization" + +By default, the stream schema is set to a simple object with unspecified properties. However, the connector builder can infer the schema based on the test read you just issued. To use the infered schema, switch to the "Detected schema" tab and click the "Import schema" button. + +## Making the stream configurable + +The exchange rate API supports configuring a different base currency via request parameter - let's make this part of the user inputs that can be controlled by the user of the connector when configuring a source, similar to the API key. + +To do so, follow these steps: +* Scroll down to the "Request parameters" section and add a new request parameter +* Set the key to `base` +* For the value, click the user icon in the input and select "New user input" +* Set the name to "Base" +* As hint, set "A base currency like USD or EUR" + +Now your connector has a second configuration input. To test it, click the "Testing values" button again, set the "Base" to `USD`. Then, click the "Test" button again to issue a new test read. + +The record should update to use USD as the base currency: +``` +{ + "base": "USD", + "date": "2023-04-13", + "rates": { + "AED": 3.6723, + "AFN": 86.286329, + "ALL": 102.489617, + "AMD": 391.984204, + // ... + } +} +``` + +## Incremental reads + +We now have a working implementation of a connector reading the latest exchange rates for a given currency. +In this section, we'll update the source to read historical data instead of only reading the latest exchange rates. + +According to the API documentation, we can read the exchange rate for a specific date range by querying the `"/exchangerates_data/{date}"` endpoint instead of `"/exchangerates_data/latest"`. + +To configure your connector to request every day individually, follow these steps: +* Enable "Incremental sync" for the Rates stream +* Set the "Cursor field" to `date` - this is the property in our records to check what date got synced last +* Set the "Datetime format" to `%Y-%m-%d` to match the format of the date in the record returned from the API +* Set the "Cursor granularity" to `P1D` to tell the connector the API only supports daily increments +* Make the start time configurable by clicking the user icon in the "Start datetime" input and creating a new user input called `Start date` +* Set the "End datetime" to `{{ today_utc() }}` to always sync up to the current day +* In a lot of cases the start and end date are injected into the request body or request parameters. However in the case of the exchange rate API it needs to be added to the path of the request, so disable the "Inject start/end time into outgoing HTTP request" options +* On top of the form, change the "Path URL" input to `/exchangerates_data/{{ stream_slice.start_time }}` to inject the date to fetch data for into the path of the request +* Open the "Advanced" section and set "Step" to `P1D` to configure the connector to do one separate request per day +* Set a start date (like `2023-03-03`) in the "Testing values" menu +* Hit the "Test" button to trigger a new test read + +Now, you should see a dropdown above the records view that lets you step through the daily exchange rates. Note that in the connector builder at most 5 slices are requested to speed up testing. During a proper sync the full time range between your configured start date and the current day will be executed. + +When used in a connection, the connector will make sure exchange rates for the same day are not requested multiple times - the date of the last fetched record will be stored and the next scheduled sync will pick up where the previous one stopped. + +## Transformations + +Note that a warning icon should show next to the "Detected schema" tab - using the per-date endpoint instead of the latest endpoint slightly changed the shape of the records by adding a `historical` property. As we don't need this property in our destination, we can remove it using a transformation. + +To do so, follow these steps: +* Enable the "Transformations" section +* Add a new transformation +* Set the "Path" to `historical` +* Trigger a new test read + +The `historical` property in the records tab and the schema warning should disappear. + +## Publish and sync + +So far, the connector is only configured as part of the connector builder project. To make it possible to use it in actual connections, you need to publish the connector. This captures the current state of the configuration and makes the connector available as a custom connector within the current Airbyte workspace. + +To use the connector for a proper sync, follow these steps: +* Click the "Publish" button and publish the first version of the "Stock Exchange" connector +* Go to the "Connections" page and create a new connection +* As Source type, pick the "Stock Exchange" connector you just created +* Set API Key, base currency and start date for the sync - to avoid a large number of requests, set the start date to one week in the past +* Click "Set up source" and wait for the connection check to validate the provided configuration is valid +* Set up a destination - to keep things simple let's choose the "E2E Testing" destination type +* Click "Set up destination" +* Wait for Airbyte to check the record schema, then click "Set up connection" - this will create the connection and kick off the first sync +* After a short while, the sync should complete successfully + +Congratulations! You just completed the following steps: +* Configure a connector to extract currency exchange data from a HTTP-based API: + * Configurable API key, start date and base currency + * Incremental sync to keep the number of requests small + * Schema declaration to enable normalization in the destination +* Test whether the connector works correctly in the UI +* Make the working connector available to configure sources in the workspace +* Set up a connection using the published connector to sync data from the exchange rates API + +## Next steps + +This tutorial didn't go into depth about all features that can be used in the connector builder. Check out the concept pages for more information about certain topics: +* Authentication +* Record selection and transformation +* Pagination +* Error handling +* Incremental syncs +* Partitioning +* Schema declaration + +Not every possible API can be consumed by connectors configured in the connector builder. The [compatibility guide](/connector-development/config-based/connector-builder-compatibility#oauth) can help determining whether another technology can be used to integrate an API with the Airbyte platform. \ No newline at end of file From 4b8adba58b0a4c42e6946f6e80c94486c097e8dd Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Thu, 13 Apr 2023 15:44:21 +0200 Subject: [PATCH 07/12] Revert "first version of tutorial" This reverts commit bbdeb997f2a75240ee3d6f4fd62189e2b5269e3c. --- .../connector-builder-ui/tutorial.md | 199 ------------------ 1 file changed, 199 deletions(-) delete mode 100644 docs/connector-development/connector-builder-ui/tutorial.md diff --git a/docs/connector-development/connector-builder-ui/tutorial.md b/docs/connector-development/connector-builder-ui/tutorial.md deleted file mode 100644 index 194ec99d024285..00000000000000 --- a/docs/connector-development/connector-builder-ui/tutorial.md +++ /dev/null @@ -1,199 +0,0 @@ -# Tutorial - -Throughout this tutorial, we'll walk you through the creation of an Airbyte connector using the connector builder UI to read and extract data from an HTTP API. - -We'll build a connector reading data from the Exchange Rates API, but the steps apply to other HTTP APIs you might be interested in integrating with. - -The API documentations can be found [here](https://apilayer.com/marketplace/exchangerates_data-api). -In this tutorial, we will read data from the following endpoints: - -- `Latest Rates Endpoint` -- `Historical Rates Endpoint` - -With the end goal of implementing a source connector with a single `Stream` containing exchange rates going from a base currency to many other currencies. -The output schema of our stream will look like the following: - -```json -{ - "base": "USD", - "date": "2022-07-15", - "rates": { - "CAD": 1.28, - "EUR": 0.98 - } -} -``` - -## Exchange Rates API Setup - -Before we get started, you'll need to generate an API access key for the Exchange Rates API. -This can be done by signing up for the Free tier plan on [Exchange Rates API](https://exchangeratesapi.io/): - -1. Visit https://exchangeratesapi.io and click "Get free API key" on the top right -2. You'll be taken to https://apilayer.com -- finish the sign up process, signing up for the free tier -3. Once you're signed in, visit https://apilayer.com/marketplace/exchangerates_data-api#documentation-tab and click "Live Demo" -4. Inside that editor, you'll see an API key. This is your API key. - -## Requirements - -- An Exchange Rates API key -- An Airbyte Cloud account or an OSS Airbyte deployment version 0.43.0 or greater - -## The connector builder project - -When developing a connector using the connector builder UI, the current state is saved in a connector builder project. These projects are saved as part of the Airbyte workspace and separate from your source configurations and connections. In the last step of this tutorial you will publish the connector builder project to make it ready to use in connections to run syncs. - -To get started, follow the following steps: -* Go to the connector builder page by clicking the "Builder" item in the left hand navigation bar -* Select "Start from scratch" to start a new connector builder project -* Set the connector name to "Exchange rates" - -Your connector builder project is now set up. The next steps describe how to configure your connector to extract records from the Exchange rates API - -## Global configuration - -On the "global configuration" page, general settings applying to all streams are configured - the base URL requests are sent to as well as configuration for how to authenticate with the API server. - -* Set the base URL to `https://api.apilayer.com` -* Select the "API Key" authentication method -* Set the "Header" to `apikey` - -The actual API Key you copied from apilayer.com will not be part of the connector itself - instead it will be set as part of the source configuration when configuring a connection based on your connector in a later step. - -## Setup and test a stream - -Now that you configured how to talk the API, let's set up the stream of records that will be sent to a destination later on. To do so, click the button with the plus icon next to the "Streams" section in the side menu and fill out the form: -* Set the name to "Rates" -* Set the "URL path" to `/exchangerates_data/latest` -* Submit -* Set the "Record selector" to `rates` - -Now the basic stream is configured and can be tested. To send test requests, suppliy testing values by clicking the "Testing values" button in top right and pasting your API key in the input. - -This form corresponds to what a user of this connector will need to provide when setting up a connection later on. The testing values are not saved along with the connector project. - -Now, click the "Test" button to trigger a test read to simulate what will happen during a sync. After a little while, you should see a single record that looks like this: -``` -{ - "base": "EUR", - "date": "2023-04-13", - "rates": { - "AED": 4.053261, - "AFN": 95.237669, - "ALL": 112.964844, - "AMD": 432.048005, - // ... - } -} -``` - -In a real sync, this record will be passed on to a destination like a warehouse. - -The request/response tabs are helpful during development to see which requests and responses your connector will send and receive from the API. - -## Declaring the record schema - -Each stream of a connector needs to declare how emitted records will look like (which properties do they have, what data types will be used, ...). During a sync, this information will be passed on to the destination to configure it correctly - for example a SQL database destination can use it to properly set up the destination table, assigning the right type to each column. This process is called "normalization" - -By default, the stream schema is set to a simple object with unspecified properties. However, the connector builder can infer the schema based on the test read you just issued. To use the infered schema, switch to the "Detected schema" tab and click the "Import schema" button. - -## Making the stream configurable - -The exchange rate API supports configuring a different base currency via request parameter - let's make this part of the user inputs that can be controlled by the user of the connector when configuring a source, similar to the API key. - -To do so, follow these steps: -* Scroll down to the "Request parameters" section and add a new request parameter -* Set the key to `base` -* For the value, click the user icon in the input and select "New user input" -* Set the name to "Base" -* As hint, set "A base currency like USD or EUR" - -Now your connector has a second configuration input. To test it, click the "Testing values" button again, set the "Base" to `USD`. Then, click the "Test" button again to issue a new test read. - -The record should update to use USD as the base currency: -``` -{ - "base": "USD", - "date": "2023-04-13", - "rates": { - "AED": 3.6723, - "AFN": 86.286329, - "ALL": 102.489617, - "AMD": 391.984204, - // ... - } -} -``` - -## Incremental reads - -We now have a working implementation of a connector reading the latest exchange rates for a given currency. -In this section, we'll update the source to read historical data instead of only reading the latest exchange rates. - -According to the API documentation, we can read the exchange rate for a specific date range by querying the `"/exchangerates_data/{date}"` endpoint instead of `"/exchangerates_data/latest"`. - -To configure your connector to request every day individually, follow these steps: -* Enable "Incremental sync" for the Rates stream -* Set the "Cursor field" to `date` - this is the property in our records to check what date got synced last -* Set the "Datetime format" to `%Y-%m-%d` to match the format of the date in the record returned from the API -* Set the "Cursor granularity" to `P1D` to tell the connector the API only supports daily increments -* Make the start time configurable by clicking the user icon in the "Start datetime" input and creating a new user input called `Start date` -* Set the "End datetime" to `{{ today_utc() }}` to always sync up to the current day -* In a lot of cases the start and end date are injected into the request body or request parameters. However in the case of the exchange rate API it needs to be added to the path of the request, so disable the "Inject start/end time into outgoing HTTP request" options -* On top of the form, change the "Path URL" input to `/exchangerates_data/{{ stream_slice.start_time }}` to inject the date to fetch data for into the path of the request -* Open the "Advanced" section and set "Step" to `P1D` to configure the connector to do one separate request per day -* Set a start date (like `2023-03-03`) in the "Testing values" menu -* Hit the "Test" button to trigger a new test read - -Now, you should see a dropdown above the records view that lets you step through the daily exchange rates. Note that in the connector builder at most 5 slices are requested to speed up testing. During a proper sync the full time range between your configured start date and the current day will be executed. - -When used in a connection, the connector will make sure exchange rates for the same day are not requested multiple times - the date of the last fetched record will be stored and the next scheduled sync will pick up where the previous one stopped. - -## Transformations - -Note that a warning icon should show next to the "Detected schema" tab - using the per-date endpoint instead of the latest endpoint slightly changed the shape of the records by adding a `historical` property. As we don't need this property in our destination, we can remove it using a transformation. - -To do so, follow these steps: -* Enable the "Transformations" section -* Add a new transformation -* Set the "Path" to `historical` -* Trigger a new test read - -The `historical` property in the records tab and the schema warning should disappear. - -## Publish and sync - -So far, the connector is only configured as part of the connector builder project. To make it possible to use it in actual connections, you need to publish the connector. This captures the current state of the configuration and makes the connector available as a custom connector within the current Airbyte workspace. - -To use the connector for a proper sync, follow these steps: -* Click the "Publish" button and publish the first version of the "Stock Exchange" connector -* Go to the "Connections" page and create a new connection -* As Source type, pick the "Stock Exchange" connector you just created -* Set API Key, base currency and start date for the sync - to avoid a large number of requests, set the start date to one week in the past -* Click "Set up source" and wait for the connection check to validate the provided configuration is valid -* Set up a destination - to keep things simple let's choose the "E2E Testing" destination type -* Click "Set up destination" -* Wait for Airbyte to check the record schema, then click "Set up connection" - this will create the connection and kick off the first sync -* After a short while, the sync should complete successfully - -Congratulations! You just completed the following steps: -* Configure a connector to extract currency exchange data from a HTTP-based API: - * Configurable API key, start date and base currency - * Incremental sync to keep the number of requests small - * Schema declaration to enable normalization in the destination -* Test whether the connector works correctly in the UI -* Make the working connector available to configure sources in the workspace -* Set up a connection using the published connector to sync data from the exchange rates API - -## Next steps - -This tutorial didn't go into depth about all features that can be used in the connector builder. Check out the concept pages for more information about certain topics: -* Authentication -* Record selection and transformation -* Pagination -* Error handling -* Incremental syncs -* Partitioning -* Schema declaration - -Not every possible API can be consumed by connectors configured in the connector builder. The [compatibility guide](/connector-development/config-based/connector-builder-compatibility#oauth) can help determining whether another technology can be used to integrate an API with the Airbyte platform. \ No newline at end of file From e3ca9dad726bfdb09ef823e1b5dc69302cfe0684 Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Fri, 14 Apr 2023 15:53:48 +0200 Subject: [PATCH 08/12] review comments --- .../connector-builder-ui/authentication.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/connector-development/connector-builder-ui/authentication.md b/docs/connector-development/connector-builder-ui/authentication.md index 0fee4bb842aff1..be0bfc78e95f47 100644 --- a/docs/connector-development/connector-builder-ui/authentication.md +++ b/docs/connector-development/connector-builder-ui/authentication.md @@ -2,7 +2,7 @@ Authentication allows the connection to check whether it has sufficient permission to fetch data. The authentication feature provides a secure way to configure authentication using a variety of methods. -The credentials itself (e.g. username and password) are _not_ specified as part of the connector, instead they are part of the source configration that is specified by the end user when setting up a source based on the connector. During development, it's possible to provide testing credentials in the "Testing values" menu, but those are not saved along with the connector. Credentials that are part of the source configuration are stored in a secure way in your Airbyte instance while the connector configuration is saved in the regular database. +The credentials itself (e.g. username and password) are _not_ specified as part of the connector, instead they are part of the configuration that is specified by the end user when setting up a source based on the connector. During development, it's possible to provide testing credentials in the "Testing values" menu, but those are not saved along with the connector. Credentials that are part of the source configuration are stored in a secure way in your Airbyte instance while the connector configuration is saved in the regular database. In the "Authentication" section on the "Global Configuration" page in the connector builder, the authentication method can be specified. This configuration is shared for all streams - it's not possible to use different authentication methods for different streams in the same connector. In case your API uses multiple or custom authentication methods, you can use the [low-code CDK](/connector-development/config-based/low-code-cdk-overview) or [Python CDK](/connector-development/cdk-python/). @@ -26,7 +26,7 @@ If requests are authenticated using the Basic HTTP authentication method, the do - "Authorization: Basic" - "Base64" -The Basic HTTP authentication method is a standard and doesn't require any further configuration. Username and password are set via "Testing values" in the connector builder and as part of the source configuration when configuring connections. +The Basic HTTP authentication method is a standard and doesn't require any further configuration. Username and password are set via "Testing values" in the connector builder and by the end user when configuring this connector as a Source. #### Example @@ -47,7 +47,7 @@ curl -X GET \ If requests are authenticated using Bearer authentication, the documentation will probably mention "bearer token" or "token authentication". In this scheme, the `Authorization` header of the HTTP request is set to `Bearer `. -Like the Basic HTTP authentication it does not require further configuration. The bearer token can be set via "Testing values" in the connector builder as well as paert of the source configuration when configuring connections. +Like the Basic HTTP authentication it does not require further configuration. The bearer token can be set via "Testing values" in the connector builder and by the end user when configuring this connector as a Source. #### Example @@ -62,9 +62,9 @@ curl -X GET \ ### API Key -The API key authentication method is similar to the Bearer authentication but allows to configure as which HTTP header the API key is sent as part of the request. The http header name is part of the connector definition while the API key itself can be set via "Testing values" in the connector builder as well as part of the source configuration when configuring connections. +The API key authentication method is similar to the Bearer authentication but allows to configure as which HTTP header the API key is sent as part of the request. The http header name is part of the connector definition while the API key itself can be set via "Testing values" in the connector builder as well as when configuring this connector as a Source. -This form of authentication is often called "(custom) header authentication". +This form of authentication is often called "(custom) header authentication". It only supports setting the token to an HTTP header, for other cases, see the ["Other authentication methods" section](#access-token-as-query-or-body-parameter) #### Example @@ -81,14 +81,14 @@ curl -X GET \ The OAuth authentication method implements authentication using an [OAuth2.0 flow with a refresh token grant type](https://oauth.net/2/grant-types/refresh-token/). -In this scheme the OAuth endpoint of an API is called with a long-lived refresh token provided as part of the source configuration to obtain a short-lived access token that's used to make requests actually extracting records. If the access token expires, the connection will automatically request a new one. +In this scheme the OAuth endpoint of an API is called with a long-lived refresh token that's provided by the end user when configuring this connector as a Source. The refresh token is used to obtain a short-lived access token that's used to make requests actually extracting records. If the access token expires, the connection will automatically request a new one. -The connector needs to be configured with the endpoint to call to obtain access tokens with the refresh token. OAuth client id/secret and the refresh token are provided via "Testing values" in the connector builder as well as part of the source configuration when configuring connections. +The connector needs to be configured with the endpoint to call to obtain access tokens with the refresh token. OAuth client id/secret and the refresh token are provided via "Testing values" in the connector builder as well as when configuring this connector as a Source. Depending on how the refresh endpoint is implemented exactly, additional configuration might be necessary to specify how to request an access token with the right permissions (configuring OAuth scopes and grant type) and how to extract the access token and the expiry date out of the response (configuring expiry date format and property name as well as the access key property name): -* Scopes - if not specified, no scopes are sent along with the refresh token request -* Grant type - if not specified, it's set to `refresh_token` -* Token expiry property name - if not specified, it's set to `expires_in` +* Scopes - the [OAuth scopes](https://oauth.net/2/scope/) the access token will have access to. if not specified, no scopes are sent along with the refresh token request +* Grant type - the OAuth grant type to request. This should be set to the string mapping to the [refresh token grant type](https://oauth.net/2/grant-types/refresh-token/). If not specified, it's set to `refresh_token` which is the right value in most cases. +* Token expiry property name - the name of the property in the response that contains token expiry information. If not specified, it's set to `expires_in` * Token expire property date format - if not specified, the expiry property is interpreted as the number of seconds the access token will be valid * Access token property name - the name of the property in the response that contains the access token to do requests. If not specified, it's set to `access_token` From 0d76115474cef20bf5682e2c4e8a5bb75f1a118c Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Fri, 14 Apr 2023 18:49:52 +0200 Subject: [PATCH 09/12] partitioning documentation --- .../connector-builder-ui/partitioning.md | 116 ++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 docs/connector-development/connector-builder-ui/partitioning.md diff --git a/docs/connector-development/connector-builder-ui/partitioning.md b/docs/connector-development/connector-builder-ui/partitioning.md new file mode 100644 index 00000000000000..62a429d7eac963 --- /dev/null +++ b/docs/connector-development/connector-builder-ui/partitioning.md @@ -0,0 +1,116 @@ +# Partitioning + +Partitioning is required if the records of a stream are nested within multiple buckets or parent resources that need to be queried separately to extract the records. + +Sometimes records belonging to a single stream are partitioned into subsets that need to be fetched separately. In most cases, these partitions are a parent resource type of the resource type targeted by the connector. The partitioning feature can be used to configure your connector to iterate through all partitions. In API documentations, this concept is showing up as mandatory parameters that need to be set on the path, query or request body of the request. + +Common API structures look like this: +* The [SurveySparrow API](https://developers.surveysparrow.com/rest-apis/response#getV3Responses) allows to fetch a list of responses to surveys. For the `/responses` endpoint, the id of the survey to fetch responses for needs to be specified via the query parameter `survey_id`. The API does not allow to fetch responses for all available surveys in a single request, there needs to be a separate request per survey. The surveys represent the partitions of the responses stream. +* The [Woocommerce API](https://woocommerce.github.io/woocommerce-rest-api-docs/#order-notes) includes an endpoint to fetch notes of webshop orders via the `/orders//notes` endpoint. The `` placeholder needs to be set to the id of the order to fetch the notes for. The orders represent the partitions of the notes stream. + +There are some cases that require multiple requests to fetch all records as well, but partitioning is not the right tool to configure these in the connector builder: +* If your records are spread out across multiple pages that need to be requested individually if there are too many records, use the Pagination feature. +* If your records are spread out over time and multiple requests are necessary to fetch all data (for example one request per day), use the Incremental sync feature. + +## Dynamic and static partition routing + +There are three possible sources for the partitions that need to be queried - the connector itself, supplied by the end user when configuring a Source based on the connector, or the API provides the list of partitions on another endpoint (for example the Woocommerce API also includes an `/orders` endpoint that returns all orders). + +The first two options are a "static" form of partition routing (because the partitions won't change as long as the Airbyte configuration isn't changed). The API providing the partitions via one or multiple separate requests is a "dynamic" form of partition routing because the partitions can change any time. + +### List partition router + +To configure static partitioning, choose the "List" method for the partition router. The following fields have to be configured: +* The "partition values" can either be set to a list of strings, making the partitions part of the connector itself or delegated to a user input so the end user configuring a Source based on the connector can control which partitions to fetch. +* The "Current partition value identifier" can be freely choosen and is the identifier of the variable holding the current partition value. It can for example be used in the path of the stream using the `{{ stream_slice. }}` syntax. +* The "Inject partition value into outgoing HTTP request" option allows to configure how to add the current partition value to the requests + +#### Example + +To enable static partition routing for the [SurveySparrow API](https://developers.surveysparrow.com/rest-apis/response#getV3Responses) responses, the list partition router needs to be configured as following: +* "Partition values" are set to the list of survey ids to fetch +* "Current partition value identifier" is set to `survey` (this is not used for this example) +* "Inject partition value into outgoing HTTP request" is set to `request_parameter` for the field name `survey_id` + +When partition values were set to `123`, `456` and `789`, the following requests will be executed: +``` +curl -X GET https://api.surveysparrow.com/v3/responses?survey_id=123 +curl -X GET https://api.surveysparrow.com/v3/responses?survey_id=456 +curl -X GET https://api.surveysparrow.com/v3/responses?survey_id=789 +``` + +To enable static partitions for the [Woocommerce API](https://woocommerce.github.io/woocommerce-rest-api-docs/#order-notes) order notes, the configuration would look like this: +* "Partition values" are set to the list of order ids to fetch +* "Current partition value identifier" is set to `order` +* "Inject partition value into outgoing HTTP request" is disabled, because the order id needs to be injected into the path +* In the general section of the stream configuration, the "Path URL" is set to `/orders/{{ stream_slice.order }}/notes` + + +When partition values were set to `123`, `456` and `789`, the following requests will be executed: +``` +curl -X GET https://example.com/wp-json/wc/v3/orders/123/notes +curl -X GET https://example.com/wp-json/wc/v3/orders/456/notes +curl -X GET https://example.com/wp-json/wc/v3/orders/789/notes +``` + +### Substream partition router + +To fetch the list of partitions (in this example surveys or orders) from the API itself, the "Substream" partition router has to be used. It allows you to select another stream of the same connector to serve as the source for partitions to fetch. Each record of the parent stream is used as a partition for the current stream. + +The following fields have to be configured to use the substream partition router: +* The "Parent key" is the property on the parent stream record that should become the partition value (in most cases this is some form of id) +* The "Current partition value identifier" can be freely choosen and is the identifier of the variable holding the current partition value. It can for example be used in the path of the stream using the `{{ stream_slice. }}` interpolation placeholder. +* The "Parent stream" defines the records of which stream should be used as partitions + +#### Example + +To enable dynamic partition routing for the [Woocommerce API](https://woocommerce.github.io/woocommerce-rest-api-docs/#order-notes) order notes, first an "orders" stream needs to be configured for the `/orders` endpoint to fetch a list of orders. Once this is done, the partition router for responses has be configured like this: +* "Parent key" is set to `id` +* "Current partition value identifier" is set to `order` +* In the general section of the stream configuration, the "Path URL" is set to `/orders/{{ stream_slice.order }}/notes` + +When triggering a sync, the connector will first fetch all records of the orders stream. The records will look like this: +``` +{ "id": 123, "currency": "EUR", "shipping_total": "12.23", ... } +{ "id": 456, "currency": "EUR", "shipping_total": "45.56", ... } +{ "id": 789, "currency": "EUR", "shipping_total": "78.89", ... } +``` + +To turn a record into a partition value, the "parent key" is extracted, resulting in the partition values `123`, `456` and `789`. In turn, this results in the following requests to fetch the records of the notes stream: +``` +curl -X GET https://example.com/wp-json/wc/v3/orders/123/notes +curl -X GET https://example.com/wp-json/wc/v3/orders/456/notes +curl -X GET https://example.com/wp-json/wc/v3/orders/789/notes +``` + +## Multiple partition routers + +It is possible to configure multiple partition routers on a single stream - if this is the case, all possible combinations of partition values are requested separately. + +For example, the [Google Pagespeed API](https://developers.google.com/speed/docs/insights/v5/reference/pagespeedapi/runpagespeed) allows to specify the URL and the "strategy" to run an analysis for. To allow a user to trigger an analysis for multiple URLs and strategies at the same time, two list partition routers can be used (one injecting the partition value into the `url` parameter, one injecting it into the `strategy` parameter). + +If a user configures the URLs `example.com` and `example.org` and the strategies `desktop` and `mobile`, then the following requests will be triggered +``` +curl -X GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=example.com&strategy=desktop +curl -X GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=example.com&strategy=mobile +curl -X GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=example.org&strategy=desktop +curl -X GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=example.org&strategy=mobile +``` + +## Adding the partition value to the record + +Sometimes it's helpful to attach the partition a record belongs to to the record itself so it can be used during analysis in the destination. This can be done using a transformation to add a field and the `{{ stream_slice. }}` interpolation placeholder. + +For example when fetching the order notes via the [Woocommerce API](https://woocommerce.github.io/woocommerce-rest-api-docs/#order-notes), the order id itself is not included in the note record, which means it won't be possible to associate which note belongs to which order: +``` +{ "id": 999, "author": "Jon Doe", "note": "Great product!" } +``` + +However the order id can be added by taking the following steps: +* Making sure the "Current partition value identifier" is set to `order` +* Add an "Add field" transformation with "Path" `order_id` and "Value" `{{ stream_slice.order }}` + +Using this configuration, the notes record looks like this: +``` +{ "id": 999, "author": "Jon Doe", "note": "Great product!", "order_id": 123 } +``` \ No newline at end of file From b750ef07718c10ad4ad292a12e3f25bad401932e Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Mon, 17 Apr 2023 10:16:06 +0200 Subject: [PATCH 10/12] incremental sync --- .../connector-builder-ui/incremental-sync.md | 133 ++++++++++++++++++ 1 file changed, 133 insertions(+) create mode 100644 docs/connector-development/connector-builder-ui/incremental-sync.md diff --git a/docs/connector-development/connector-builder-ui/incremental-sync.md b/docs/connector-development/connector-builder-ui/incremental-sync.md new file mode 100644 index 00000000000000..83a799dab9cd79 --- /dev/null +++ b/docs/connector-development/connector-builder-ui/incremental-sync.md @@ -0,0 +1,133 @@ +# Incremental sync + +An incremental sync is a sync which pulls only the data that has changed since the previous sync (as opposed to all the data available in the data source). + +This is especially important if there are a large number of records to sync and/or the API has tight request limits which makes a full sync of all records on a regular schedule too expensive or too slow. + +Incremental syncs are usually implemented using a cursor value (like a timestamp) that delineates which data was pulled and which data is new. A very common cursor value is an `updated_at` timestamp. This cursor means that records whose `updated_at` value is less than or equal than that cursor value have been synced already, and that the next sync should only export records whose `updated_at` value is greater than the cursor value. + +To use incremental syncs, the API endpoint needs to fullfil the following requirements: +* Records contain a date/time field that defines when this record was last updated (the "cursor field") +* It's possible to filter/request records by the cursor field + +To learn more about how different modes of incremental syncs, check out the [Incremental Sync - Append](/understanding-airbyte/connections/incremental-append/) and [Incremental Sync - Deduped History](/understanding-airbyte/connections/incremental-deduped-history) pages. + +## Configuration + +To configure incremental syncs for a stream in the connector builder, you have to specify how the records specify the **"last changed" / "updated at" timestamp**, the **initial time range** to fetch records for and **how to request records from a certain time range**. + +In the builder UI, these things are specified like this: +* The "Cursor field" is the property in the record that defines the date and time when the record got changed. It's used to decide which records are synced already and which records are "new" +* The "Datetime format" specifies the [format](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) the cursor field is using to specify date and time, +* The "Cursor granularity" is the smallest time unit supported by the API to specify the time range to request records for +* The "Start datetime" is the initial start date of the time range to fetch records for. When doing incremental syncs, the second sync will overwrite this date with the last record that got synced so far. In most cases, it is defined by the end user when configuring a Source using your connector. +* The "End datetime" is the end date of the time range to fetch records for. In most cases it's set to the current date and time when the sync is started to sync all changes that happened so far. +* The "Inject start/end time into outgoing HTTP request" defines how to request records that got changed in the time range to sync. In most cases the start and end time is added as a request parameter or body parameter + +## Example + +The [API of The Guardian](https://open-platform.theguardian.com/documentation/search) has a `/search` endpoint that allows to extract a list of articles. + +The `/search` endpoint has a `from-date` and a `to-date` query parameter which can be used to only request data for a certain time range. + +Content records have the following form: +``` +{ + "id": "world/2022/oct/21/russia-ukraine-war-latest-what-we-know-on-day-240-of-the-invasion", + "type": "article", + "sectionId": "world", + "sectionName": "World news", + "webPublicationDate": "2022-10-21T14:06:14Z", + "webTitle": "Russia-Ukraine war latest: what we know on day 240 of the invasion", + // ... +} +``` + +As this fulfills the requirements for incremental syncs, we can configure the "Incremental sync" section in the following way: +* "Cursor field" is set to `webPublicationDate` +* "Datetime format" is set to `%Y-%m-%dT%H:%M:%SZ` +* "Cursor granularity is set to `PT1S` as this API can handle date/time values on the second level +* "Start datetime" is set to "config value" +* "End datetime" is set to "now" to fetch all articles up to the current date +* "Inject start time into outgoing HTTP request" is set to `request_parameter` with "Field" set to `from-date` +* "Inject end time into outgoing HTTP request" is set to `request_parameter` with "Field" set to `to-date` + +This API orders records by default from new to old, which is not optimal for a reliable sync as the last encountered cursor value will be the most recent date even if some older records did not get synced (for example if a sync fails halfway through). It's better to start with the oldest records and work your way up to make sure that all older records are synced already once a certain date is encountered on a record. In this case the API can be configured to behave like this by setting an additional parameter: +* At the bottom of the stream configuration page, add a new "Request parameter" +* Set the key to `order-by` +* Set the value to `oldest` + +Setting the start date in the "Testing values" to a week in the past (`2023-04-09T00:00:00Z` at the time of writing) results in the following request: +``` +curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-04-09T00:00:00Z&to-date=2023-04-15T10:18:08Z' +``` + +The last encountered date will be saved as part of the connection - when the next sync is running, it picks up from the last record. Let's assume the last ecountered article looked like this: +``` +{ + "id": "business/live/2023/apr/15/uk-bosses-more-optimistic-energy-prices-fall-ai-spending-boom-economics-business-live", + "type": "liveblog", + "sectionId": "business", + "sectionName": "Business", + "webPublicationDate": "2023-04-15T07:30:58Z", +} +``` + +Then when a sync is triggered for the same connection the next day, the following request is made: +``` +curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-04-15T07:30:58Z&to-date=2023-04-16T10:18:08Z' +``` + +The `from-date` is set to the cutoff date of articles synced already and the `to-date` is set to the new "now". + +## Advanced settings + +The description above is sufficient for a lot of APIs. However there are some more subtle configurations which sometimes become relevant. + +### Step + +When incremental syncs are enabled, the connector is not fetching all records since the cutoff date at once - instead it's splitting up the time range between the cutoff date and the desired end date into intervals based on the "Step" configuration (by default it's set to 1 month) + +For example if the "Step" is set to 10 days (`P10D`) for the Guardian articles stream described above and a longer time range, then the following requests will be performed: +``` + +curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-01-01T00:00:00Z&to-date=2023-01-10T00:00:00Z' + +curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-01-10T00:00:00Z&to-date=2023-01-20T00:00:00Z' + +curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-01-20T00:00:00Z&to-date=2023-01-30T00:00:00Z' + +... +``` + +After an interval is processed, the cursor value of the last record will be saved as part of the connection as the new cutoff date. + +In most cases, the default step size is fine, there are two reasons to change it: +* **The API is unreliable** and the cutoff date should be saved more often to prevent resync of a lot of records - if the "Step" size is a day, then at most one day worth of data needs to be resync in case the sync fails halfway through. However, a smaller step size might cause more requests to the API and more load on the system. +* **The API requires the connector to fetch data in pre-specified chunks** - for example the [Exchange Rates API](https://exchangeratesapi.io/documentation/) makes the date to fetch data for part of the URL path and only allows to fetch data for a single day at a time + +### Lookback window + +The "Lookback window" specifies a duration that is subtracted from the last cutoff date before starting to sync. + +Same APIs update records over time but do not allow to filter or search by modification date, only by creation date. For example the API of The Guardian might change the title of an article after it got published, but the `webPublicationDate` still shows the original date the article got published initially. + +In these cases, there are two options: +* **Do not use incremental sync** and always sync the full set of records to always have a consistent state - depending on the amount of data this might not be feasable +* **Configure the "Lookback window"** to not only sync exclusively new records, but resync some portion of records before the cutoff date to catch changes that were made to existing records, trading off data consistency and the amount of synced records. In the case of the API of The Guardian, this strategy will likey work well because news articles tend to only be updated for a few days after the initial release date, so this strategy should be able to catch most updates without having to resync all articles. + +Reiterating the example from above with a "Lookback window" of 2 days configured, let's assume the last ecountered article looked like this: +``` +{ + "id": "business/live/2023/apr/15/uk-bosses-more-optimistic-energy-prices-fall-ai-spending-boom-economics-business-live", + "type": "liveblog", + "sectionId": "business", + "sectionName": "Business", + "webPublicationDate": "2023-04-15T07:30:58Z", +} +``` + +Then when a sync is triggered for the same connection the next day, the following request is made: +``` +curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-04-13T07:30:58Z&to-date=2023-04-16T10:18:08Z' +``` \ No newline at end of file From cad5876bd6ffce98ee63d4b207ebc42e24688a69 Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Tue, 18 Apr 2023 15:41:31 +0200 Subject: [PATCH 11/12] improvements --- .../connector-builder-ui/incremental-sync.md | 78 +++++++++---------- 1 file changed, 37 insertions(+), 41 deletions(-) diff --git a/docs/connector-development/connector-builder-ui/incremental-sync.md b/docs/connector-development/connector-builder-ui/incremental-sync.md index 83a799dab9cd79..2944b22bbc9e6e 100644 --- a/docs/connector-development/connector-builder-ui/incremental-sync.md +++ b/docs/connector-development/connector-builder-ui/incremental-sync.md @@ -10,17 +10,17 @@ To use incremental syncs, the API endpoint needs to fullfil the following requir * Records contain a date/time field that defines when this record was last updated (the "cursor field") * It's possible to filter/request records by the cursor field -To learn more about how different modes of incremental syncs, check out the [Incremental Sync - Append](/understanding-airbyte/connections/incremental-append/) and [Incremental Sync - Deduped History](/understanding-airbyte/connections/incremental-deduped-history) pages. +The knowledge of a cursor value also allows the Airbyte system to automatically keep a history of changes to records in the destination. To learn more about how different modes of incremental syncs, check out the [Incremental Sync - Append](/understanding-airbyte/connections/incremental-append/) and [Incremental Sync - Deduped History](/understanding-airbyte/connections/incremental-deduped-history) pages. ## Configuration -To configure incremental syncs for a stream in the connector builder, you have to specify how the records specify the **"last changed" / "updated at" timestamp**, the **initial time range** to fetch records for and **how to request records from a certain time range**. +To configure incremental syncs for a stream in the connector builder, you have to specify how the records will represent the **"last changed" / "updated at" timestamp**, the **initial time range** to fetch records for and **how to request records from a certain time range**. In the builder UI, these things are specified like this: * The "Cursor field" is the property in the record that defines the date and time when the record got changed. It's used to decide which records are synced already and which records are "new" * The "Datetime format" specifies the [format](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) the cursor field is using to specify date and time, -* The "Cursor granularity" is the smallest time unit supported by the API to specify the time range to request records for -* The "Start datetime" is the initial start date of the time range to fetch records for. When doing incremental syncs, the second sync will overwrite this date with the last record that got synced so far. In most cases, it is defined by the end user when configuring a Source using your connector. +* The "Cursor granularity" is the smallest time unit supported by the API to specify the time range to request records for expressed as [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations) +* The "Start datetime" is the initial start date of the time range to fetch records for. When doing incremental syncs, the second sync will overwrite this date with the last record that got synced so far. * The "End datetime" is the end date of the time range to fetch records for. In most cases it's set to the current date and time when the sync is started to sync all changes that happened so far. * The "Inject start/end time into outgoing HTTP request" defines how to request records that got changed in the time range to sync. In most cases the start and end time is added as a request parameter or body parameter @@ -57,28 +57,28 @@ This API orders records by default from new to old, which is not optimal for a r * Set the key to `order-by` * Set the value to `oldest` -Setting the start date in the "Testing values" to a week in the past (`2023-04-09T00:00:00Z` at the time of writing) results in the following request: -``` -curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-04-09T00:00:00Z&to-date=2023-04-15T10:18:08Z' -``` +Setting the start date in the "Testing values" to a date in the past like **2023-04-09T00:00:00Z** results in the following request: +
+curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-04-09T00:00:00Z&to-date={`now`}'
+
The last encountered date will be saved as part of the connection - when the next sync is running, it picks up from the last record. Let's assume the last ecountered article looked like this: -``` -{ +
+{`{
   "id": "business/live/2023/apr/15/uk-bosses-more-optimistic-energy-prices-fall-ai-spending-boom-economics-business-live",
   "type": "liveblog",
   "sectionId": "business",
   "sectionName": "Business",
-  "webPublicationDate": "2023-04-15T07:30:58Z",
-}
-```
+  "webPublicationDate": `}"2023-04-15T07:30:58Z"{`,
+}`}
+
Then when a sync is triggered for the same connection the next day, the following request is made: -``` -curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-04-15T07:30:58Z&to-date=2023-04-16T10:18:08Z' -``` +
+curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-04-15T07:30:58Z&to-date={``}'
+
-The `from-date` is set to the cutoff date of articles synced already and the `to-date` is set to the new "now". +The `from-date` is set to the cutoff date of articles synced already and the `to-date` is set to the current date. ## Advanced settings @@ -86,48 +86,44 @@ The description above is sufficient for a lot of APIs. However there are some mo ### Step -When incremental syncs are enabled, the connector is not fetching all records since the cutoff date at once - instead it's splitting up the time range between the cutoff date and the desired end date into intervals based on the "Step" configuration (by default it's set to 1 month) +When incremental syncs are enabled, the connector is not fetching all records since the cutoff date at once - instead it's splitting up the time range between the cutoff date and the desired end date into intervals based on the "Step" configuration (by default it's set to one month) expressed as [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations). For example if the "Step" is set to 10 days (`P10D`) for the Guardian articles stream described above and a longer time range, then the following requests will be performed: -``` - -curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-01-01T00:00:00Z&to-date=2023-01-10T00:00:00Z' - -curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-01-10T00:00:00Z&to-date=2023-01-20T00:00:00Z' - -curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-01-20T00:00:00Z&to-date=2023-01-30T00:00:00Z' - +
+curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-01-01T00:00:00Z&to-date=2023-01-10T00:00:00Z'{`\n`}
+curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-01-10T00:00:00Z&to-date=2023-01-20T00:00:00Z'{`\n`}
+curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-01-20T00:00:00Z&to-date=2023-01-30T00:00:00Z'{`\n`}
 ...
-```
+
After an interval is processed, the cursor value of the last record will be saved as part of the connection as the new cutoff date. -In most cases, the default step size is fine, there are two reasons to change it: -* **The API is unreliable** and the cutoff date should be saved more often to prevent resync of a lot of records - if the "Step" size is a day, then at most one day worth of data needs to be resync in case the sync fails halfway through. However, a smaller step size might cause more requests to the API and more load on the system. +In most cases, the default step size is fine, but there are two reasons to change it: +* **To protect a connection against intermittent failures** - if the "Step" size is a day, the cutoff date is saved after all records associated with a day are proccessed. If a sync fails halfway through because the API, the Airbyte system, the destination or the network between these components has a failure, then at most one day worth of data needs to be resynced. However, a smaller step size might cause more requests to the API and more load on the system. It depends on the expected amount of data and load characteristics of an API what step size is optimal, but for a lot of applications the default of one month is a good starting point. * **The API requires the connector to fetch data in pre-specified chunks** - for example the [Exchange Rates API](https://exchangeratesapi.io/documentation/) makes the date to fetch data for part of the URL path and only allows to fetch data for a single day at a time ### Lookback window The "Lookback window" specifies a duration that is subtracted from the last cutoff date before starting to sync. -Same APIs update records over time but do not allow to filter or search by modification date, only by creation date. For example the API of The Guardian might change the title of an article after it got published, but the `webPublicationDate` still shows the original date the article got published initially. +Some APIs update records over time but do not allow to filter or search by modification date, only by creation date. For example the API of The Guardian might change the title of an article after it got published, but the `webPublicationDate` still shows the original date the article got published initially. In these cases, there are two options: -* **Do not use incremental sync** and always sync the full set of records to always have a consistent state - depending on the amount of data this might not be feasable -* **Configure the "Lookback window"** to not only sync exclusively new records, but resync some portion of records before the cutoff date to catch changes that were made to existing records, trading off data consistency and the amount of synced records. In the case of the API of The Guardian, this strategy will likey work well because news articles tend to only be updated for a few days after the initial release date, so this strategy should be able to catch most updates without having to resync all articles. +* **Do not use incremental sync** and always sync the full set of records to always have a consistent state, losing the advantages of reduced load and [automatic history keeping in the destination](/understanding-airbyte/connections/incremental-deduped-history) +* **Configure the "Lookback window"** to not only sync exclusively new records, but resync some portion of records before the cutoff date to catch changes that were made to existing records, trading off data consistency and the amount of synced records. In the case of the API of The Guardian, news articles tend to only be updated for a few days after the initial release date, so this strategy should be able to catch most updates without having to resync all articles. -Reiterating the example from above with a "Lookback window" of 2 days configured, let's assume the last ecountered article looked like this: -``` -{ +Reiterating the example from above with a "Lookback window" of 2 days configured, let's assume the last encountered article looked like this: +
+{`{
   "id": "business/live/2023/apr/15/uk-bosses-more-optimistic-energy-prices-fall-ai-spending-boom-economics-business-live",
   "type": "liveblog",
   "sectionId": "business",
   "sectionName": "Business",
-  "webPublicationDate": "2023-04-15T07:30:58Z",
-}
-```
+  "webPublicationDate": `}{`"2023-04-15T07:30:58Z"`}{`,
+}`}
+
Then when a sync is triggered for the same connection the next day, the following request is made: -``` -curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-04-13T07:30:58Z&to-date=2023-04-16T10:18:08Z' -``` \ No newline at end of file +
+curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023-04-13T07:30:58Z&to-date={``}'
+
\ No newline at end of file From aa3cb658adf3c89dc5fe4098bfd722bfc2750f85 Mon Sep 17 00:00:00 2001 From: Joe Reuter Date: Fri, 21 Apr 2023 12:13:11 +0200 Subject: [PATCH 12/12] review comments --- .../connector-builder-ui/incremental-sync.md | 6 +++++- .../connector-development/connector-builder-ui/tutorial.mdx | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/connector-development/connector-builder-ui/incremental-sync.md b/docs/connector-development/connector-builder-ui/incremental-sync.md index 2944b22bbc9e6e..f150d19fc09201 100644 --- a/docs/connector-development/connector-builder-ui/incremental-sync.md +++ b/docs/connector-development/connector-builder-ui/incremental-sync.md @@ -47,7 +47,7 @@ As this fulfills the requirements for incremental syncs, we can configure the "I * "Cursor field" is set to `webPublicationDate` * "Datetime format" is set to `%Y-%m-%dT%H:%M:%SZ` * "Cursor granularity is set to `PT1S` as this API can handle date/time values on the second level -* "Start datetime" is set to "config value" +* "Start datetime" is set to "user input" to allow the user of the connector configuring a Source to specify the time to start syncing * "End datetime" is set to "now" to fetch all articles up to the current date * "Inject start time into outgoing HTTP request" is set to `request_parameter` with "Field" set to `from-date` * "Inject end time into outgoing HTTP request" is set to `request_parameter` with "Field" set to `to-date` @@ -80,6 +80,10 @@ curl 'https://content.guardianapis.com/search?order-by=oldest&from-date=2023- The `from-date` is set to the cutoff date of articles synced already and the `to-date` is set to the current date. +:::info +In some cases, it's helpful to reference the start and end date of the interval that's currently synced, for example if it needs to be injected into the URL path of the current stream. In these cases it can be referenced using the `{{ stream_interval.start_date }}` and `{{ stream_interval.end_date }}` placeholders. Check out [the tutorial](./tutorial.mdx#adding-incremental-reads) for such a case. +::: + ## Advanced settings The description above is sufficient for a lot of APIs. However there are some more subtle configurations which sometimes become relevant. diff --git a/docs/connector-development/connector-builder-ui/tutorial.mdx b/docs/connector-development/connector-builder-ui/tutorial.mdx index 082a06d3d54f0b..db56d8aeb97a4c 100644 --- a/docs/connector-development/connector-builder-ui/tutorial.mdx +++ b/docs/connector-development/connector-builder-ui/tutorial.mdx @@ -158,7 +158,7 @@ In this section, we'll update the source to read historical data instead of only According to the API documentation, we can read the exchange rate for a specific date range by querying the `"/exchangerates_data/{date}"` endpoint instead of `"/exchangerates_data/latest"`. To configure your connector to request every day individually, follow these steps: -* On top of the form, change the "Path URL" input to `/exchangerates_data/{{ stream_slice.start_time }}` to inject the date to fetch data for into the path of the request +* On top of the form, change the "Path URL" input to `/exchangerates_data/{{ stream_interval.start_time }}` to inject the date to fetch data for into the path of the request * Enable "Incremental sync" for the Rates stream * Set the "Cursor field" to `date` - this is the property in our records to check what date got synced last * Set the "Datetime format" to `%Y-%m-%d` to match the format of the date in the record returned from the API