-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GitBook: [master] 161 pages and 75 assets modified
- Loading branch information
1 parent
18c1d6e
commit a18b0e8
Showing
77 changed files
with
1,322 additions
and
124 deletions.
There are no files selected for viewing
Binary file not shown.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# Connector Development Kit \(Python\) | ||
|
||
The Airbyte Python CDK is a framework for rapidly developing production-grade Airbyte connectors. The CDK currently offers helpers specific for creating Airbyte source connectors for: | ||
|
||
* HTTP APIs \(REST APIs, GraphQL, etc..\) | ||
* Singer Taps | ||
* Generic Python sources \(anything not covered by the above\) | ||
|
||
The CDK provides an improved developer experience by providing basic implementation structure and abstracting away low-level glue boilerplate. | ||
|
||
This document is a general introduction to the CDK. Readers should have basic familiarity with the [Airbyte Specification](https://docs.airbyte.io/architecture/airbyte-specification) before proceeding. | ||
|
||
## Getting Started | ||
|
||
Generate an empty connector using the code generator. First clone the Airbyte repository then from the repository root run | ||
|
||
```text | ||
cd airbyte-integrations/connector-templates/generator | ||
npm run generate | ||
``` | ||
|
||
then follow the interactive prompt. Next, find all `TODO`s in the generated project directory -- they're accompanied by lots of comments explaining what you'll need to do in order to implement your connector. Upon completing all TODOs properly, you should have a functioning connector. | ||
|
||
Additionally, you can follow [this tutorial](https://github.com/airbytehq/airbyte/tree/184dab77ebfbc00c69eea9e34b7db29c79a9e6d1/airbyte-cdk/python/docs/tutorials/http_api_source.md) for a complete walkthrough of creating an HTTP connector using the Airbyte CDK. | ||
|
||
### Concepts & Documentation | ||
|
||
See the [concepts docs](concepts/) for a tour through what the API offers. | ||
|
||
### Example Connectors | ||
|
||
**HTTP Connectors**: | ||
|
||
* [Exchangerates API](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-exchange-rates/source_exchange_rates/source.py) | ||
* [Stripe](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/source.py) | ||
* [Slack](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-slack/source_slack/source.py) | ||
|
||
**Singer connectors**: | ||
|
||
* [Salesforce](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce-singer/source_salesforce_singer/source.py) | ||
* [Github](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-github-singer/source_github_singer/source.py) | ||
|
||
**Simple Python connectors using the barebones `Source` abstraction**: | ||
|
||
* [Google Sheets](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-google-sheets/google_sheets_source/google_sheets_source.py) | ||
* [Mailchimp](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-mailchimp/source_mailchimp/source.py) | ||
|
||
## Contributing | ||
|
||
### First time setup | ||
|
||
We assume `python` points to python >=3.7. | ||
|
||
Setup a virtual env: | ||
|
||
```text | ||
python -m venv .venv | ||
source .venv/bin/activate | ||
pip install -e ".[dev]" # [dev] installs development-only dependencies | ||
``` | ||
|
||
#### Iteration | ||
|
||
* Iterate on the code locally | ||
* Run tests via `pytest -s unit_tests` | ||
* Perform static type checks using `mypy airbyte_cdk`. `MyPy` configuration is in `.mypy.ini`. | ||
* The `type_check_and_test.sh` script bundles both type checking and testing in one convenient command. Feel free to use it! | ||
|
||
#### Testing | ||
|
||
All tests are located in the `unit_tests` directory. Run `pytest --cov=airbyte_cdk unit_tests/` to run them. This also presents a test coverage report. | ||
|
||
#### Publishing a new version to PyPi | ||
|
||
1. Bump the package version in `setup.py` | ||
2. Open a PR | ||
3. An Airbyte member must comment `/publish-cdk --dry-run=<true or false>`. Dry runs publish to test.pypi.org. | ||
|
||
## Coming Soon | ||
|
||
* Full OAuth 2.0 support \(including refresh token issuing flow via UI or CLI\) | ||
* Airbyte Java HTTP CDK | ||
* CDK for Async HTTP endpoints \(request-poll-wait style endpoints\) | ||
* CDK for other protocols | ||
* General CDK for Destinations | ||
* Don't see a feature you need? [Create an issue and let us know how we can help!](https://github.com/airbytehq/airbyte/tree/184dab77ebfbc00c69eea9e34b7db29c79a9e6d1/airbyte-cdk/python/github.com/airbytehq/airbyte/issues/new/choose/README.md) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Concepts | ||
|
||
This concepts section serves as a general introduction to the Python CDK. Readers will certainly benefit from a deeper understanding of the [Airbyte Specification](https://docs.airbyte.io/architecture/airbyte-specification) before proceeding, but we do a quick overview of it in our basic concepts guide below. | ||
|
||
## Basic Concepts | ||
|
||
If you want to learn more about the classes required to implement an Airbyte Source, head to our [basic concepts doc](basic-concepts.md). | ||
|
||
## Full Refresh Streams | ||
|
||
If you have questions or are running into issues creating your first full refresh stream, head over to our [full refresh stream doc](full-refresh-stream.md). If you have questions about implementing a `path` or `parse_response` function, this doc is for you. | ||
|
||
## Incremental Streams | ||
|
||
Having trouble figuring out how to write a `stream_slices` function or aren't sure what a `cursor_field` is? Head to our [incremental stream doc](incremental-stream.md). | ||
|
||
## Practical Tips | ||
|
||
Airbyte recommends using the CDK template generator to develop with the CDK. The template generates created all the required scaffolding, with convenient TODOs, allowing developers to truly focus on implementing the API. | ||
|
||
For tips on useful Python knowledge, see the [Python Concepts](python-concepts.md) page. | ||
|
||
You can find a complete tutorial for implementing an HTTP source connector in [this tutorial](https://github.com/airbytehq/airbyte/tree/4a397d25247db77a7b78783d26dae35bc3900f59/airbyte-cdk/python/docs/tutorials/http_api_source.md) | ||
|
||
## Examples | ||
|
||
Those interested in getting their hands dirty can check out implemented APIs: | ||
|
||
* [Exchange Rates API](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-exchange-rates/source_exchange_rates/source.py) \(Incremental\) | ||
* [Stripe API](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/source.py) \(Incremental and Full-Refresh\) | ||
* [Slack API](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-slack/source_slack/source.py) \(Incremental and Full-Refresh\) | ||
|
66 changes: 66 additions & 0 deletions
66
docs/contributing-to-airbyte/python/concepts/basic-concepts.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Basic Concepts | ||
|
||
## The Airbyte Specification | ||
|
||
As a quick recap, the Airbyte Specification requires an Airbyte Source to support 4 distinct operations: | ||
|
||
1. `Spec` - The required configuration in order to interact with the underlying technical system e.g. database | ||
|
||
information, authentication information etc. | ||
|
||
2. `Check` - Validate that the provided configuration is valid with sufficient permissions for one to perform all | ||
|
||
required operations on the Source. | ||
|
||
3. `Discover` - Discover the Source's schema. This let users select what a subset of the data to sync. Useful | ||
|
||
if users require only a subset of the data. | ||
|
||
4. `Read` - Perform the actual syncing process. Data is read from the Source, parsed into `AirbyteRecordMessage`s | ||
|
||
and sent to the Airbyte Destination. Depending on how the Source is implemented, this sync can be incremental | ||
|
||
or a full-refresh. | ||
|
||
A core concept discussed here is the **Source**. | ||
|
||
The Source contains one or more **Streams** \(or **Airbyte Streams**\). A **Stream** is the other concept key to understanding how Airbyte models the data syncing process. A **Stream** models the logical data groups that make up the larger **Source**. If the **Source** is a RDMS, each **Stream** is a table. In a REST API setting, each **Stream** corresponds to one resource within the API. e.g. a **Stripe Source** would have have one **Stream** for `Transactions`, one for `Charges` and so on. | ||
|
||
## The `Source` class | ||
|
||
Airbyte provides abstract base classes which make it much easier to perform certain categories of tasks e.g: `HttpStream` makes it easy to create HTTP API-based streams. However, if those do not satisfy your use case \(for example, if you're pulling data from a relational database\), you can always directly implement the Airbyte Protocol by subclassing the CDK's `Source` class. | ||
|
||
Note that while this is the most flexible way to implement a source connector, it is also the most toilsome as you will be required to manually manage state, input validation, correctly conforming to the Airbyte Protocol message formats, and more. We recommend using a subclass of `Source` unless you cannot fulfill your use case otherwise. | ||
|
||
## The `AbstractSource` Object | ||
|
||
`AbstractSource` is a more opinionated implementation of `Source`. It implements `Source`'s 4 methods as follows: | ||
|
||
`Spec` and `Check` are the `AbstractSource`'s simplest operations. | ||
|
||
`Spec` returns a checked in json schema file specifying the required configuration. The `AbstractSource` looks for a file named `spec.json` in the module's root by default. Here is an [example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-exchange-rates/source_exchange_rates/spec.json). | ||
|
||
`Check` delegates to the `AbstractSource`'s `check_connection` function. The function's `config` parameter contains the user-provided configuration, specified in the `spec.json` returned by `Spec`. `check_connection` uses this configuration to validate access and permissioning. Here is an [example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-exchange-rates/source_exchange_rates/source.py#L90) from the same Exchange Rates API. | ||
|
||
### The `Stream` Abstract Base Class | ||
|
||
An `AbstractSource` also owns a set of `Stream`s. This is populated via the `AbstractSource`'s `streams` [function](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L63). `Discover` and `Read` rely on this populated set. | ||
|
||
`Discover` returns an `AirbyteCatalog` representing all the distinct resources the underlying API supports. Here is the [entrypoint](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L74) for those interested in reading the code. | ||
|
||
`Read` creates an in-memory stream reading from each of the `AbstractSource`'s streams. Here is the [entrypoint](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L90) for those interested. | ||
|
||
As the code examples show, the `AbstractSource` delegates to the set of `Stream`s it owns to fulfill both `Discover` and `Read`. Thus, implementing `AbstractSource`'s `streams` function is required when using the CDK. | ||
|
||
A summary of what we've covered so far on how to use the Airbyte CDK: | ||
|
||
* A concrete implementation of the `AbstractSource` object is required. | ||
* This involves, | ||
1. implementing the `check_connection`function. | ||
2. Creating the appropriate `Stream` classes and returning them in the `streams` function. | ||
3. placing the above mentioned `spec.json` file in the right place. | ||
|
||
## HTTP Streams | ||
|
||
We've covered how the `AbstractSource` works with the `Stream` interface in order to fulfill the Airbyte Specification. Although developers are welcome to implement their own object, the CDK saves developers the hassle of doing so in the case of HTTP APIs with the [`HTTPStream`](http-streams.md) object. | ||
|
43 changes: 43 additions & 0 deletions
43
docs/contributing-to-airbyte/python/concepts/full-refresh-stream.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Full Refresh Streams | ||
|
||
As mentioned in the [Basic Concepts Overview](basic-concepts.md), a `Stream` is the atomic unit for reading data from a Source. A stream can read data from anywhere: a relational database, an API, or even scrape a web page! \(although that might be stretching the limits of what a connector should do\). | ||
|
||
To implement a stream, there are two minimum requirements: 1. Define the stream's schema 2. Implement the logic for reading records from the underlying data source | ||
|
||
## Defining the stream's schema | ||
|
||
Your connector must describe the schema of each stream it can output using [JSONSchema](https://json-schema.org). | ||
|
||
The simplest way to do this is to describe the schema of your streams using one `.json` file per stream. You can also dynamically generate the schema of your stream in code, or you can combine both approaches: start with a `.json` file and dynamically add properties to it. | ||
|
||
The schema of a stream is the return value of `Stream.get_json_schema`. | ||
|
||
### Static schemas | ||
|
||
By default, `Stream.get_json_schema` reads a `.json` file in the `schemas/` directory whose name is equal to the value of the `Stream.name` property. In turn `Stream.name` by default returns the name of the class in snake case. Therefore, if you have a class `class EmployeeBenefits(HttpStream)` the default behavior will look for a file called `schemas/employee_benefits.json`. You can override any of these behaviors as you need. | ||
|
||
Important note: any objects referenced via `$ref` should be placed in the `shared/` directory in their own `.json` files. | ||
|
||
### Dynamic schemas | ||
|
||
If you'd rather define your schema in code, override `Stream.get_json_schema` in your stream class to return a `dict` describing the schema using [JSONSchema](https://json-schema.org). | ||
|
||
### Dynamically modifying static schemas | ||
|
||
Place a `.json` file in the `schemas` folder containing the basic schema like described in the static schemas section. Then, override `Stream.get_json_schema` to run the default behavior, edit the returned value, then return the edited value: | ||
|
||
```text | ||
def get_json_schema(self): | ||
schema = super().get_json_schema() | ||
schema['dynamically_determined_property'] = "property" | ||
return schema | ||
``` | ||
|
||
## Reading records from the data source | ||
|
||
The only method required to implement a `Stream` is `Stream.read_records`. Given some information about how the stream should be read, this method should output an iterable object containing records from the data source. We recommend using generators as they are very efficient with regards to memory requirements. | ||
|
||
## Incremental Streams | ||
|
||
We highly recommend implementing Incremental when feasible. See the [incremental streams page](incremental-stream.md) for more information. | ||
|
Oops, something went wrong.