Skip to content

Commit

Permalink
馃帀 New source: Fauna (#15274)
Browse files Browse the repository at this point in the history
* Add fauna source

* Update changelog to include the correct PR

* Improve docs (#1)

* Applied suggestions to improve docs (#2)

* Applied suggestions to improve docs

* Cleaned up the docs

* Apply suggestions from code review

Co-authored-by: Ewan Edwards <46354154+faunaee@users.noreply.github.com>

* Update airbyte-integrations/connectors/source-fauna/source_fauna/spec.yaml

Co-authored-by: Ewan Edwards <46354154+faunaee@users.noreply.github.com>

Co-authored-by: Ewan Edwards <46354154+faunaee@users.noreply.github.com>

* Flake Checker (#3)

* Run ./gradlew :airbyte-integrations:connectors:source-fauna:flakeCheck

* Fix all the warnings

* Set additionalProperties to true to adhere to acceptance tests

* Remove custom fields (#4)

* Remove custom fields from source.py

* Remove custom fields from spec.yaml

* Collections that support incremental sync are found correctly

* Run formatter

* Index values and termins are verified

* Stripped additional_columns from collection config and check()

* We now search for an index at the start of each sync

* Add default for missing data in collection

* Add a log message about the index chosen to sync an incremental stream

* Add an example for a configured incremental catalog

* Check test now validates the simplified check function

* Remove collection name from spec.yaml and CollectionConfig

* Update test_util.py to ahere to the new config

* Update the first discover test to validate that we can find indexes correctly

* Remove other discover tests, as they no longer apply

* Full refresh test now works with simplified expanded columns

* Remove unused imports

* Incremental test now adheres to the find_index_for_stream system

* Database test passes, so now all unit tests pass again

* Remove extra fields from required section

* ttl is nullable

* Data defaults to an empty object

* Update tests to reflect ttl and data select changes

* Fix expected records. All unit tests and acceptance tests pass

* Cleanup docs for find_index_for_stream

* Update setup guide to reflect multiple collections

* Add docs to install the fauna shell

* Update examples and README to conform to the removal of additional columns

Co-authored-by: Ewan Edwards <46354154+faunaee@users.noreply.github.com>
  • Loading branch information
macmv and faunaee committed Sep 29, 2022
1 parent 448828b commit 65e6168
Show file tree
Hide file tree
Showing 49 changed files with 4,232 additions and 0 deletions.
12 changes: 12 additions & 0 deletions airbyte-config/init/src/main/resources/icons/fauna.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions airbyte-integrations/builds.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
| End-to-End Testing | [![source-e2e-test](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-e2e-test%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-e2e-test) |
| Exchange Rates API | [![source-exchange-rates](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-exchange-rates%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-exchange-rates) |
| Facebook Marketing | [![source-facebook-marketing](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-facebook-marketing%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-facebook-marketing) |
| Fauna | [![source-fauna](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-fauna%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-fauna) |
| Files | [![source-file](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-file%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-file) |
| Flexport | [![source-file](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-flexport%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-flexport) |
| Freshdesk | [![source-freshdesk](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-freshdesk%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-freshdesk) |
Expand Down
6 changes: 6 additions & 0 deletions airbyte-integrations/connectors/source-fauna/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*
!Dockerfile
!main.py
!source_fauna
!setup.py
!secrets
6 changes: 6 additions & 0 deletions airbyte-integrations/connectors/source-fauna/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Python version tools
.tool-versions
../../../.tool-versions
# emacs auto-save files
*~
*#
38 changes: 38 additions & 0 deletions airbyte-integrations/connectors/source-fauna/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
FROM python:3.9.11-alpine3.15 as base

# build and load all requirements
FROM base as builder
WORKDIR /airbyte/integration_code

# upgrade pip to the latest version
RUN apk --no-cache upgrade \
&& pip install --upgrade pip \
&& apk --no-cache add tzdata build-base


COPY setup.py ./
# install necessary packages to a temporary folder
RUN pip install --prefix=/install .

# build a clean environment
FROM base
WORKDIR /airbyte/integration_code

# copy all loaded and built libraries to a pure basic image
COPY --from=builder /install /usr/local
# add default timezone settings
COPY --from=builder /usr/share/zoneinfo/Etc/UTC /etc/localtime
RUN echo "Etc/UTC" > /etc/timezone

# bash is installed for more convenient debugging.
RUN apk --no-cache add bash

# copy payload code only
COPY main.py ./
COPY source_fauna ./source_fauna

ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=dev
LABEL io.airbyte.name=airbyte/source-fauna
188 changes: 188 additions & 0 deletions airbyte-integrations/connectors/source-fauna/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# New Readers

If you know how Airbyte works, read [bootstrap.md](bootstrap.md) for a quick introduction to this source. If you haven't
used airbyte before, read [overview.md](overview.md) for a longer overview about what this connector is and how to use
it.

# For Fauna Developers

## Running locally

First, start a local fauna container:
```
docker run --rm --name faunadb -p 8443:8443 fauna/faunadb
```

In another terminal, cd into the connector directory:
```
cd airbyte-integrations/connectors/source-fauna
```

Once started the container is up, setup the database:
```
fauna eval "$(cat examples/setup_database.fql)" --domain localhost --port 8443 --scheme http --secret secret
```

Finally, run the connector:
```
python main.py spec
python main.py check --config examples/config_localhost.json
python main.py discover --config examples/config_localhost.json
python main.py read --config examples/config_localhost.json --catalog examples/configured_catalog.json
```

To pick up a partial failure you need to pass in a state file. To test via example induce a crash via bad data (e.g. a missing required field), update `examples/sample_state_full_sync.json` to contain your emitted state and then run:

```
python main.py read --config examples/config_localhost.json --catalog examples/configured_catalog.json --state examples/sample_state_full_sync.json
```

## Running the intergration tests

First, cd into the connector directory:
```
cd airbyte-integrations/connectors/source-fauna
```

The integration tests require a secret config.json. Ping me on slack to get this file.
Once you have this file, put it in `secrets/config.json`. A sample of this file can be
found at `examples/secret_config.json`. Once the file is created, build the connector:
```
docker build . -t airbyte/source-fauna:dev
```

Now, run the integration tests:
```
python -m pytest -p integration_tests.acceptance
```


# Fauna Source

This is the repository for the Fauna source connector, written in Python.
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.io/integrations/sources/fauna).

## Local development

### Prerequisites
**To iterate on this connector, make sure to complete this prerequisites section.**

#### Minimum Python version required `= 3.9.0`

#### Build & Activate Virtual Environment and install dependencies
From this connector directory, create a virtual environment:
```
python -m venv .venv
```

This will generate a virtualenv for this module in `.venv/`. Make sure this venv is active in your
development environment of choice. To activate it from the terminal, run:
```
source .venv/bin/activate
pip install -r requirements.txt
```
If you are in an IDE, follow your IDE's instructions to activate the virtualenv.

Note that while we are installing dependencies from `requirements.txt`, you should only edit `setup.py` for your dependencies. `requirements.txt` is
used for editable installs (`pip install -e`) to pull in Python dependencies from the monorepo and will call `setup.py`.
If this is mumbo jumbo to you, don't worry about it, just put your deps in `setup.py` but install using `pip install -r requirements.txt` and everything
should work as you expect.

#### Building via Gradle
From the Airbyte repository root, run:
```
./gradlew :airbyte-integrations:connectors:source-fauna:build
```

#### Create credentials
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.io/integrations/sources/fauna)
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_fauna/spec.yaml` file.
Note that the `secrets` directory is gitignored by default, so there is no danger of accidentally checking in sensitive information.
See `examples/secret_config.json` for a sample config file.

**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source fauna test creds`
and place them into `secrets/config.json`.

### Locally running the connector
```
python main.py spec
python main.py check --config secrets/config.json
python main.py discover --config secrets/config.json
python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json
```

### Locally running the connector docker image

#### Build
First, make sure you build the latest Docker image:
```
docker build . -t airbyte/source-fauna:dev
```

You can also build the connector image via Gradle:
```
./gradlew :airbyte-integrations:connectors:source-fauna:airbyteDocker
```
When building via Gradle, the docker image name and tag, respectively, are the values of the `io.airbyte.name` and `io.airbyte.version` `LABEL`s in
the Dockerfile.

#### Run
Then run any of the connector commands as follows:
```
docker run --rm airbyte/source-fauna:dev spec
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-fauna:dev check --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-fauna:dev discover --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-fauna:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
```
## Testing
Make sure to familiarize yourself with [pytest test discovery](https://docs.pytest.org/en/latest/goodpractices.html#test-discovery) to know how your test files and methods should be named.
First install test dependencies into your virtual environment:
```
pip install .[tests]
```
### Unit Tests
To run unit tests locally, from the connector directory run:
```
python -m pytest unit_tests
```

### Integration Tests
There are two types of integration tests: Acceptance Tests (Airbyte's test suite for all source connectors) and custom integration tests (which are specific to this connector).
#### Custom Integration tests
Place custom tests inside `integration_tests/` folder, then, from the connector root, run
```
python -m pytest integration_tests
```
#### Acceptance Tests
Customize `acceptance-test-config.yml` file to configure tests. See [Source Acceptance Tests](https://docs.airbyte.io/connector-development/testing-connectors/source-acceptance-tests-reference) for more information.
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
To run your integration tests with acceptance tests, from the connector root, run
```
python -m pytest integration_tests -p integration_tests.acceptance
```
To run your integration tests with docker

### Using gradle to run tests
All commands should be run from airbyte project root.
To run unit tests:
```
./gradlew :airbyte-integrations:connectors:source-fauna:unitTest
```
To run acceptance and custom integration tests:
```
./gradlew :airbyte-integrations:connectors:source-fauna:integrationTest
```

## Dependency Management
All of your dependencies should go in `setup.py`, NOT `requirements.txt`. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development.
We split dependencies between two groups, dependencies that are:
* required for your connector to work need to go to `MAIN_REQUIREMENTS` list.
* required for the testing need to go to `TEST_REQUIREMENTS` list

### Publishing a new version of the connector
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
1. Make sure your changes are passing unit and integration tests.
1. Bump the connector version in `Dockerfile` -- just increment the value of the `LABEL io.airbyte.version` appropriately (we use [SemVer](https://semver.org/)).
1. Create a Pull Request.
1. Pat yourself on the back for being an awesome contributor.
1. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# See [Source Acceptance Tests](https://docs.airbyte.io/connector-development/testing-connectors/source-acceptance-tests-reference)
# for more information about how to configure these tests
connector_image: airbyte/source-fauna:dev
tests:
spec:
- spec_path: "source_fauna/spec.yaml"
connection:
- config_path: "secrets/config.json"
status: "succeed"
- config_path: "secrets/config-deletions.json"
status: "succeed"
- config_path: "integration_tests/config/invalid.json"
status: "failed"
discovery:
- config_path: "secrets/config.json"
- config_path: "secrets/config-deletions.json"
basic_read:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
empty_streams: []
expect_records:
path: "integration_tests/expected_records.txt"
extra_fields: no
exact_order: yes
extra_records: no
- config_path: "secrets/config-deletions.json"
configured_catalog_path: "integration_tests/configured_catalog_incremental.json"
empty_streams: []
expect_records:
path: "integration_tests/expected_deletions_records.txt"
extra_fields: no
exact_order: yes
extra_records: no
incremental:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
# Note that the time in this file was generated with this fql:
# ToMicros(ToTime(Date("9999-01-01")))
future_state_path: "integration_tests/abnormal_state.json"
- config_path: "secrets/config-deletions.json"
configured_catalog_path: "integration_tests/configured_catalog_incremental.json"
future_state_path: "integration_tests/abnormal_deletions_state.json"
full_refresh:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/usr/bin/env sh

# Build latest connector image
docker build . -t $(cat acceptance-test-config.yml | grep "connector_image" | head -n 1 | cut -d: -f2-)

# Pull latest acctest image
docker pull airbyte/source-acceptance-test:latest

# Run
docker run --rm -it \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp:/tmp \
-v $(pwd):/test_input \
airbyte/source-acceptance-test \
--acceptance-test-config /test_input

56 changes: 56 additions & 0 deletions airbyte-integrations/connectors/source-fauna/bootstrap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Fauna Source

[Fauna](https://fauna.com/) is a serverless "document-relational" database that user's interact with via APIs. This connector delivers Fauna as an airbyte source.

This source is implemented in the [Airbyte CDK](https://docs.airbyte.io/connector-development/cdk-python).
It also uses the [Fauna Python Driver](https://docs.fauna.com/fauna/current/drivers/python), which
allows the connector to build FQL queries in python. This driver is what queries the Fauna database.

Fauna has collections (similar to tables) and documents (similar to rows).

Every document has at least 3 fields: `ref`, `ts` and `data`. The `ref` is a unique string identifier
for every document. The `ts` is a timestamp, which is the time that the document was last modified.
The `data` is arbitrary json data. Because there is no shape to this data, we also allow users of
airbyte to specify which fields of the document they want to export as top-level columns.

Users can also choose to export the `data` field itself in the raw and in the case of incremental syncs, metadata regarding when a document was deleted.

We currently only provide a single stream, which is the collection the user has chosen. This is
because to support incremental syncs we need an index with every collection, so it ends up being easier to just have the user
setup the index and tell us the collection and index name they wish to use.

## Full sync

This source will simply call the following [FQL](https://docs.fauna.com/fauna/current/api/fql/): `Paginate(Documents(Collection("collection-name")))`.
This queries all documents in the database in a paginated manner. The source then iterates over all the results from that query to export data from the connector.

Docs:
[Paginate](https://docs.fauna.com/fauna/current/api/fql/functions/paginate?lang=python).
[Documents](https://docs.fauna.com/fauna/current/api/fql/functions/documents?lang=python).
[Collection](https://docs.fauna.com/fauna/current/api/fql/functions/collection?lang=python).

## Incremental sync

### Updates (uses an index over ts)

The source will call FQL similar to this: `Paginate(Range(Match(Index("index-name")), <last-sync-ts>, []))`.
The index we match against has the values `ts` and `ref`, so it will sort by the time since the document
has been modified. The Range() will limit the query to just pull the documents that have been modified
since the last query.

Docs:
[Range](https://docs.fauna.com/fauna/current/api/fql/functions/range?lang=python).
[Match](https://docs.fauna.com/fauna/current/api/fql/functions/match?lang=python).
[Index](https://docs.fauna.com/fauna/current/api/fql/functions/iindex?lang=python).

### Deletes (uses the events API)

If the users wants deletes, we have a seperate query for that:
`Paginate(Events(Documents(Collection("collection-name"))))`. This will paginate over all the events
in the documents of the collection. We also filter this to only give us the events since the recently
modified documents. Using these events, we can produce a record with the "deleted at" field set, so
that users know the document has been deleted.

Docs:
[Events](https://docs.fauna.com/fauna/current/api/fql/functions/events?lang=python).

9 changes: 9 additions & 0 deletions airbyte-integrations/connectors/source-fauna/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
plugins {
id 'airbyte-python'
id 'airbyte-docker'
id 'airbyte-source-acceptance-test'
}

airbytePython {
moduleDirectory 'source_fauna_singer'
}

0 comments on commit 65e6168

Please sign in to comment.