Skip to content

Commit

Permalink
GitBook: [master] 161 pages and 75 assets modified
Browse files Browse the repository at this point in the history
  • Loading branch information
Marcos Marx authored and gitbook-bot committed May 16, 2021
1 parent 18c1d6e commit a18b0e8
Show file tree
Hide file tree
Showing 77 changed files with 1,322 additions and 124 deletions.
Binary file removed docs/.gitbook/assets/azure_shell_vm_overview.png
Binary file not shown.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
44 changes: 22 additions & 22 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
* [Local Deployment](deploying-airbyte/local-deployment.md)
* [On AWS \(EC2\)](deploying-airbyte/on-aws-ec2.md)
* [On GCP \(Compute Engine\)](deploying-airbyte/on-gcp-compute-engine.md)
* [On Azure\(VM)](deploying-airbyte/on-azure-vm-cloud-shell.md)
* [On Azure\(VM\)](deploying-airbyte/on-azure-vm-cloud-shell.md)
* [On Kubernetes \(Alpha\)](deploying-airbyte/on-kubernetes.md)
* [On AWS ECS \(Coming Soon\)](deploying-airbyte/on-aws-ecs.md)
* [Connector Catalog](integrations/README.md)
Expand Down Expand Up @@ -65,7 +65,7 @@
* [MySQL](integrations/sources/mysql.md)
* [Oracle DB](integrations/sources/oracle.md)
* [Plaid](integrations/sources/plaid.md)
* [PokéAPI](integrations/sources/pokeapi.md)
* [PokéAPI](integrations/sources/pokeapi.md)
* [Postgres](integrations/sources/postgres.md)
* [Quickbooks](integrations/sources/quickbooks.md)
* [Recurly](integrations/sources/recurly.md)
Expand Down Expand Up @@ -95,26 +95,26 @@
* [Contributing to Airbyte](contributing-to-airbyte/README.md)
* [Code of Conduct](contributing-to-airbyte/code-of-conduct.md)
* [Developing Locally](contributing-to-airbyte/developing-locally.md)
* [Connector Development Kit (Python)](../airbyte-cdk/python/README.md)
* [Concepts](../airbyte-cdk/python/docs/concepts/README.md)
* [Basic Concepts](../airbyte-cdk/python/docs/concepts/basic-concepts.md)
* [Full Refresh Streams](../airbyte-cdk/python/docs/concepts/full-refresh-stream.md)
* [Incremental Streams](../airbyte-cdk/python/docs/concepts/incremental-stream.md)
* [HTTP-API-based Connectors](../airbyte-cdk/python/docs/concepts/http-streams.md)
* [Python Concepts](../airbyte-cdk/python/docs/concepts/python-concepts.md)
* [Stream Slices](../airbyte-cdk/python/docs/concepts/stream_slices.md)
* [Tutorials](../airbyte-cdk/python/docs/tutorials/README.md)
* [Speedrun: Creating a Source with the CDK](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-any-percent/cdk-speedrun.md)
* [Creating an HTTP API Source](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/README.md)
* [Getting Started](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/0-getting-started.md)
* [Step 1: Creating the Source](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/1-creating-the-source.md)
* [Step 2: Install Dependencies](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/2-install-dependencies.md)
* [Step 3: Define Inputs](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/3-define-inputs.md)
* [Step 4: Connection Checking](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/4-connection-checking.md)
* [Step 5: Declare the Schema](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/5-declare-schema.md)
* [Step 6: Read Data](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/6-read-data.md)
* [Step 7: Use the Connector in Airbyte](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/7-use-connector-in-airbyte.md)
* [Step 8: Test Connector](../airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/8-test-your-connector.md)
* [Connector Development Kit \(Python\)](contributing-to-airbyte/python/README.md)
* [Concepts](contributing-to-airbyte/python/concepts/README.md)
* [Basic Concepts](contributing-to-airbyte/python/concepts/basic-concepts.md)
* [Full Refresh Streams](contributing-to-airbyte/python/concepts/full-refresh-stream.md)
* [Incremental Streams](contributing-to-airbyte/python/concepts/incremental-stream.md)
* [HTTP-API-based Connectors](contributing-to-airbyte/python/concepts/http-streams.md)
* [Python Concepts](contributing-to-airbyte/python/concepts/python-concepts.md)
* [Stream Slices](contributing-to-airbyte/python/concepts/stream_slices.md)
* [Tutorials](contributing-to-airbyte/python/tutorials/README.md)
* [Speedrun: Creating a Source with the CDK](contributing-to-airbyte/python/tutorials/cdk-speedrun.md)
* [Creating an HTTP API Source](contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/README.md)
* [Getting Started](contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/0-getting-started.md)
* [Step 1: Creating the Source](contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/1-creating-the-source.md)
* [Step 2: Install Dependencies](contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/2-install-dependencies.md)
* [Step 3: Define Inputs](contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/3-define-inputs.md)
* [Step 4: Connection Checking](contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/4-connection-checking.md)
* [Step 5: Declare the Schema](contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/5-declare-schema.md)
* [Step 6: Read Data](contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/6-read-data.md)
* [Step 7: Use the Connector in Airbyte](contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/7-use-connector-in-airbyte.md)
* [Step 8: Test Connector](contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/8-test-your-connector.md)
* [Developing Connectors](contributing-to-airbyte/building-new-connector/README.md)
* [Best Practices](contributing-to-airbyte/building-new-connector/best-practices.md)
* [Java Connectors](contributing-to-airbyte/building-new-connector/java-connectors.md)
Expand Down
2 changes: 1 addition & 1 deletion docs/contributing-to-airbyte/developing-locally.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ VERSION=dev docker-compose up

## Run formatting automation/tests

To format code in the repo, simply run `./gradlew format` at the base of the repo.
To format code in the repo, simply run `./gradlew format` at the base of the repo.

Note: If you are contributing a Python file without imports or function definitions, place the following comment at the top of your file:

Expand Down
87 changes: 87 additions & 0 deletions docs/contributing-to-airbyte/python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Connector Development Kit \(Python\)

The Airbyte Python CDK is a framework for rapidly developing production-grade Airbyte connectors. The CDK currently offers helpers specific for creating Airbyte source connectors for:

* HTTP APIs \(REST APIs, GraphQL, etc..\)
* Singer Taps
* Generic Python sources \(anything not covered by the above\)

The CDK provides an improved developer experience by providing basic implementation structure and abstracting away low-level glue boilerplate.

This document is a general introduction to the CDK. Readers should have basic familiarity with the [Airbyte Specification](https://docs.airbyte.io/architecture/airbyte-specification) before proceeding.

## Getting Started

Generate an empty connector using the code generator. First clone the Airbyte repository then from the repository root run

```text
cd airbyte-integrations/connector-templates/generator
npm run generate
```

then follow the interactive prompt. Next, find all `TODO`s in the generated project directory -- they're accompanied by lots of comments explaining what you'll need to do in order to implement your connector. Upon completing all TODOs properly, you should have a functioning connector.

Additionally, you can follow [this tutorial](https://github.com/airbytehq/airbyte/tree/184dab77ebfbc00c69eea9e34b7db29c79a9e6d1/airbyte-cdk/python/docs/tutorials/http_api_source.md) for a complete walkthrough of creating an HTTP connector using the Airbyte CDK.

### Concepts & Documentation

See the [concepts docs](concepts/) for a tour through what the API offers.

### Example Connectors

**HTTP Connectors**:

* [Exchangerates API](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-exchange-rates/source_exchange_rates/source.py)
* [Stripe](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/source.py)
* [Slack](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-slack/source_slack/source.py)

**Singer connectors**:

* [Salesforce](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce-singer/source_salesforce_singer/source.py)
* [Github](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-github-singer/source_github_singer/source.py)

**Simple Python connectors using the barebones `Source` abstraction**:

* [Google Sheets](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-google-sheets/google_sheets_source/google_sheets_source.py)
* [Mailchimp](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-mailchimp/source_mailchimp/source.py)

## Contributing

### First time setup

We assume `python` points to python >=3.7.

Setup a virtual env:

```text
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]" # [dev] installs development-only dependencies
```

#### Iteration

* Iterate on the code locally
* Run tests via `pytest -s unit_tests`
* Perform static type checks using `mypy airbyte_cdk`. `MyPy` configuration is in `.mypy.ini`.
* The `type_check_and_test.sh` script bundles both type checking and testing in one convenient command. Feel free to use it!

#### Testing

All tests are located in the `unit_tests` directory. Run `pytest --cov=airbyte_cdk unit_tests/` to run them. This also presents a test coverage report.

#### Publishing a new version to PyPi

1. Bump the package version in `setup.py`
2. Open a PR
3. An Airbyte member must comment `/publish-cdk --dry-run=<true or false>`. Dry runs publish to test.pypi.org.

## Coming Soon

* Full OAuth 2.0 support \(including refresh token issuing flow via UI or CLI\)
* Airbyte Java HTTP CDK
* CDK for Async HTTP endpoints \(request-poll-wait style endpoints\)
* CDK for other protocols
* General CDK for Destinations
* Don't see a feature you need? [Create an issue and let us know how we can help!](https://github.com/airbytehq/airbyte/tree/184dab77ebfbc00c69eea9e34b7db29c79a9e6d1/airbyte-cdk/python/github.com/airbytehq/airbyte/issues/new/choose/README.md)

32 changes: 32 additions & 0 deletions docs/contributing-to-airbyte/python/concepts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Concepts

This concepts section serves as a general introduction to the Python CDK. Readers will certainly benefit from a deeper understanding of the [Airbyte Specification](https://docs.airbyte.io/architecture/airbyte-specification) before proceeding, but we do a quick overview of it in our basic concepts guide below.

## Basic Concepts

If you want to learn more about the classes required to implement an Airbyte Source, head to our [basic concepts doc](basic-concepts.md).

## Full Refresh Streams

If you have questions or are running into issues creating your first full refresh stream, head over to our [full refresh stream doc](full-refresh-stream.md). If you have questions about implementing a `path` or `parse_response` function, this doc is for you.

## Incremental Streams

Having trouble figuring out how to write a `stream_slices` function or aren't sure what a `cursor_field` is? Head to our [incremental stream doc](incremental-stream.md).

## Practical Tips

Airbyte recommends using the CDK template generator to develop with the CDK. The template generates created all the required scaffolding, with convenient TODOs, allowing developers to truly focus on implementing the API.

For tips on useful Python knowledge, see the [Python Concepts](python-concepts.md) page.

You can find a complete tutorial for implementing an HTTP source connector in [this tutorial](https://github.com/airbytehq/airbyte/tree/4a397d25247db77a7b78783d26dae35bc3900f59/airbyte-cdk/python/docs/tutorials/http_api_source.md)

## Examples

Those interested in getting their hands dirty can check out implemented APIs:

* [Exchange Rates API](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-exchange-rates/source_exchange_rates/source.py) \(Incremental\)
* [Stripe API](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/source.py) \(Incremental and Full-Refresh\)
* [Slack API](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-slack/source_slack/source.py) \(Incremental and Full-Refresh\)

66 changes: 66 additions & 0 deletions docs/contributing-to-airbyte/python/concepts/basic-concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Basic Concepts

## The Airbyte Specification

As a quick recap, the Airbyte Specification requires an Airbyte Source to support 4 distinct operations:

1. `Spec` - The required configuration in order to interact with the underlying technical system e.g. database

information, authentication information etc.

2. `Check` - Validate that the provided configuration is valid with sufficient permissions for one to perform all

required operations on the Source.

3. `Discover` - Discover the Source's schema. This let users select what a subset of the data to sync. Useful

if users require only a subset of the data.

4. `Read` - Perform the actual syncing process. Data is read from the Source, parsed into `AirbyteRecordMessage`s

and sent to the Airbyte Destination. Depending on how the Source is implemented, this sync can be incremental

or a full-refresh.

A core concept discussed here is the **Source**.

The Source contains one or more **Streams** \(or **Airbyte Streams**\). A **Stream** is the other concept key to understanding how Airbyte models the data syncing process. A **Stream** models the logical data groups that make up the larger **Source**. If the **Source** is a RDMS, each **Stream** is a table. In a REST API setting, each **Stream** corresponds to one resource within the API. e.g. a **Stripe Source** would have have one **Stream** for `Transactions`, one for `Charges` and so on.

## The `Source` class

Airbyte provides abstract base classes which make it much easier to perform certain categories of tasks e.g: `HttpStream` makes it easy to create HTTP API-based streams. However, if those do not satisfy your use case \(for example, if you're pulling data from a relational database\), you can always directly implement the Airbyte Protocol by subclassing the CDK's `Source` class.

Note that while this is the most flexible way to implement a source connector, it is also the most toilsome as you will be required to manually manage state, input validation, correctly conforming to the Airbyte Protocol message formats, and more. We recommend using a subclass of `Source` unless you cannot fulfill your use case otherwise.

## The `AbstractSource` Object

`AbstractSource` is a more opinionated implementation of `Source`. It implements `Source`'s 4 methods as follows:

`Spec` and `Check` are the `AbstractSource`'s simplest operations.

`Spec` returns a checked in json schema file specifying the required configuration. The `AbstractSource` looks for a file named `spec.json` in the module's root by default. Here is an [example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-exchange-rates/source_exchange_rates/spec.json).

`Check` delegates to the `AbstractSource`'s `check_connection` function. The function's `config` parameter contains the user-provided configuration, specified in the `spec.json` returned by `Spec`. `check_connection` uses this configuration to validate access and permissioning. Here is an [example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-exchange-rates/source_exchange_rates/source.py#L90) from the same Exchange Rates API.

### The `Stream` Abstract Base Class

An `AbstractSource` also owns a set of `Stream`s. This is populated via the `AbstractSource`'s `streams` [function](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L63). `Discover` and `Read` rely on this populated set.

`Discover` returns an `AirbyteCatalog` representing all the distinct resources the underlying API supports. Here is the [entrypoint](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L74) for those interested in reading the code.

`Read` creates an in-memory stream reading from each of the `AbstractSource`'s streams. Here is the [entrypoint](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L90) for those interested.

As the code examples show, the `AbstractSource` delegates to the set of `Stream`s it owns to fulfill both `Discover` and `Read`. Thus, implementing `AbstractSource`'s `streams` function is required when using the CDK.

A summary of what we've covered so far on how to use the Airbyte CDK:

* A concrete implementation of the `AbstractSource` object is required.
* This involves,
1. implementing the `check_connection`function.
2. Creating the appropriate `Stream` classes and returning them in the `streams` function.
3. placing the above mentioned `spec.json` file in the right place.

## HTTP Streams

We've covered how the `AbstractSource` works with the `Stream` interface in order to fulfill the Airbyte Specification. Although developers are welcome to implement their own object, the CDK saves developers the hassle of doing so in the case of HTTP APIs with the [`HTTPStream`](http-streams.md) object.

Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Full Refresh Streams

As mentioned in the [Basic Concepts Overview](basic-concepts.md), a `Stream` is the atomic unit for reading data from a Source. A stream can read data from anywhere: a relational database, an API, or even scrape a web page! \(although that might be stretching the limits of what a connector should do\).

To implement a stream, there are two minimum requirements: 1. Define the stream's schema 2. Implement the logic for reading records from the underlying data source

## Defining the stream's schema

Your connector must describe the schema of each stream it can output using [JSONSchema](https://json-schema.org).

The simplest way to do this is to describe the schema of your streams using one `.json` file per stream. You can also dynamically generate the schema of your stream in code, or you can combine both approaches: start with a `.json` file and dynamically add properties to it.

The schema of a stream is the return value of `Stream.get_json_schema`.

### Static schemas

By default, `Stream.get_json_schema` reads a `.json` file in the `schemas/` directory whose name is equal to the value of the `Stream.name` property. In turn `Stream.name` by default returns the name of the class in snake case. Therefore, if you have a class `class EmployeeBenefits(HttpStream)` the default behavior will look for a file called `schemas/employee_benefits.json`. You can override any of these behaviors as you need.

Important note: any objects referenced via `$ref` should be placed in the `shared/` directory in their own `.json` files.

### Dynamic schemas

If you'd rather define your schema in code, override `Stream.get_json_schema` in your stream class to return a `dict` describing the schema using [JSONSchema](https://json-schema.org).

### Dynamically modifying static schemas

Place a `.json` file in the `schemas` folder containing the basic schema like described in the static schemas section. Then, override `Stream.get_json_schema` to run the default behavior, edit the returned value, then return the edited value:

```text
def get_json_schema(self):
schema = super().get_json_schema()
schema['dynamically_determined_property'] = "property"
return schema
```

## Reading records from the data source

The only method required to implement a `Stream` is `Stream.read_records`. Given some information about how the stream should be read, this method should output an iterable object containing records from the data source. We recommend using generators as they are very efficient with regards to memory requirements.

## Incremental Streams

We highly recommend implementing Incremental when feasible. See the [incremental streams page](incremental-stream.md) for more information.

Loading

0 comments on commit a18b0e8

Please sign in to comment.