Skip to content

Commit

Permalink
馃摎 add doc to describe how CDK handles schemas (#4537)
Browse files Browse the repository at this point in the history
  • Loading branch information
sherifnada committed Jul 6, 2021
1 parent 3e3cf7b commit 21116ca
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 1 deletion.
23 changes: 23 additions & 0 deletions airbyte-cdk/python/docs/concepts/schemas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Defining your stream schemas
Your connector must describe the schema of each stream it can output using [JSONSchema](https://json-schema.org).

The simplest way to do this is to describe the schema of your streams using one `.json` file per stream. You can also dynamically generate the schema of your stream in code, or you can combine both approaches: start with a `.json` file and dynamically add properties to it.

The schema of a stream is the return value of `Stream.get_json_schema`.

## Static schemas
By default, `Stream.get_json_schema` reads a `.json` file in the `schemas/` directory whose name is equal to the value of the `Stream.name` property. In turn `Stream.name` by default returns the name of the class in snake case. Therefore, if you have a class `class EmployeeBenefits(HttpStream)` the default behavior will look for a file called `schemas/employee_benefits.json`. You can override any of these behaviors as you need.

Important note: any objects referenced via `$ref` should be placed in the `shared/` directory in their own `.json` files.

## Dynamic schemas
If you'd rather define your schema in code, override `Stream.get_json_schema` in your stream class to return a `dict` describing the schema using [JSONSchema](https://json-schema.org).

## Dynamically modifying static schemas
Override `Stream.get_json_schema` to run the default behavior, edit the returned value, then return the edited value:
```
def get_json_schema(self):
schema = super().get_json_schema()
schema['dynamically_determined_property'] = "property"
return schema
```
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@
* [Connector Development Kit \(Python\)](contributing-to-airbyte/python/README.md)
* [Concepts](contributing-to-airbyte/python/concepts/README.md)
* [Basic Concepts](contributing-to-airbyte/python/concepts/basic-concepts.md)
* [Defining Stream Schemas](contributing-to-airbyte/python/docs/concepts/schemas.md)
* [Full Refresh Streams](contributing-to-airbyte/python/concepts/full-refresh-stream.md)
* [Incremental Streams](contributing-to-airbyte/python/concepts/incremental-stream.md)
* [HTTP-API-based Connectors](contributing-to-airbyte/python/concepts/http-streams.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Note that while this is the most flexible way to implement a source connector, i

An `AbstractSource` also owns a set of `Stream`s. This is populated via the `AbstractSource`'s `streams` [function](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L63). `Discover` and `Read` rely on this populated set.

`Discover` returns an `AirbyteCatalog` representing all the distinct resources the underlying API supports. Here is the [entrypoint](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L74) for those interested in reading the code.
`Discover` returns an `AirbyteCatalog` representing all the distinct resources the underlying API supports. Here is the [entrypoint](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L74) for those interested in reading the code. See [schemas](schemas.md) for more information on how to declare the schema of a stream.

`Read` creates an in-memory stream reading from each of the `AbstractSource`'s streams. Here is the [entrypoint](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L90) for those interested.

Expand Down
23 changes: 23 additions & 0 deletions docs/contributing-to-airbyte/python/concepts/schemas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Defining your stream schemas
Your connector must describe the schema of each stream it can output using [JSONSchema](https://json-schema.org).

The simplest way to do this is to describe the schema of your streams using one `.json` file per stream. You can also dynamically generate the schema of your stream in code, or you can combine both approaches: start with a `.json` file and dynamically add properties to it.

The schema of a stream is the return value of `Stream.get_json_schema`.

## Static schemas
By default, `Stream.get_json_schema` reads a `.json` file in the `schemas/` directory whose name is equal to the value of the `Stream.name` property. In turn `Stream.name` by default returns the name of the class in snake case. Therefore, if you have a class `class EmployeeBenefits(HttpStream)` the default behavior will look for a file called `schemas/employee_benefits.json`. You can override any of these behaviors as you need.

Important note: any objects referenced via `$ref` should be placed in the `shared/` directory in their own `.json` files.

## Dynamic schemas
If you'd rather define your schema in code, override `Stream.get_json_schema` in your stream class to return a `dict` describing the schema using [JSONSchema](https://json-schema.org).

## Dynamically modifying static schemas
Override `Stream.get_json_schema` to run the default behavior, edit the returned value, then return the edited value:
```
def get_json_schema(self):
schema = super().get_json_schema()
schema['dynamically_determined_property'] = "property"
return schema
```

0 comments on commit 21116ca

Please sign in to comment.