Skip to content

Commit

Permalink
✨Source Apify Dataset: Migrate Python CDK to Low-code CDK (#29859)
Browse files Browse the repository at this point in the history
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
  • Loading branch information
3 people committed Aug 30, 2023
1 parent 19a65bf commit 05b7d01
Show file tree
Hide file tree
Showing 28 changed files with 488 additions and 338 deletions.
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
*
!Dockerfile
!Dockerfile.test
!main.py
!source_apify_dataset
!setup.py
Expand Down
36 changes: 29 additions & 7 deletions airbyte-integrations/connectors/source-apify-dataset/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,16 +1,38 @@
FROM python:3.9-slim
FROM python:3.9.11-alpine3.15 as base

# build and load all requirements
FROM base as builder
WORKDIR /airbyte/integration_code

# upgrade pip to the latest version
RUN apk --no-cache upgrade \
&& pip install --upgrade pip \
&& apk --no-cache add tzdata build-base

# Bash is installed for more convenient debugging.
RUN apt-get update && apt-get install -y bash && rm -rf /var/lib/apt/lists/*

COPY setup.py ./
# install necessary packages to a temporary folder
RUN pip install --prefix=/install .

# build a clean environment
FROM base
WORKDIR /airbyte/integration_code
COPY source_apify_dataset ./source_apify_dataset

# copy all loaded and built libraries to a pure basic image
COPY --from=builder /install /usr/local
# add default timezone settings
COPY --from=builder /usr/share/zoneinfo/Etc/UTC /etc/localtime
RUN echo "Etc/UTC" > /etc/timezone

# bash is installed for more convenient debugging.
RUN apk --no-cache add bash

# copy payload code only
COPY main.py ./
COPY setup.py ./
RUN pip install .
COPY source_apify_dataset ./source_apify_dataset

ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.2.0
LABEL io.airbyte.version=1.0.0
LABEL io.airbyte.name=airbyte/source-apify-dataset
77 changes: 12 additions & 65 deletions airbyte-integrations/connectors/source-apify-dataset/README.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,27 @@
# Apify Dataset Source

This is the repository for the Apify Dataset source connector, written in Python.
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.io/integrations/sources/apify-dataset).

# About connector
This connector allows you to download data from Apify [dataset](https://docs.apify.com/storage/dataset) to Airbyte. All you need
is Apify dataset ID.
This is the repository for the Apify Dataset configuration based source connector.
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.com/integrations/sources/apify-dataset).

## Local development

### Prerequisites
**To iterate on this connector, make sure to complete this prerequisites section.**

#### Minimum Python version required `= 3.7.0`

#### Build & Activate Virtual Environment and install dependencies
From this connector directory, create a virtual environment:
```
python -m venv .venv
```

This will generate a virtualenv for this module in `.venv/`. Make sure this venv is active in your
development environment of choice. To activate it from the terminal, run:
```
source .venv/bin/activate
pip install -r requirements.txt
```
If you are in an IDE, follow your IDE's instructions to activate the virtualenv.

Note that while we are installing dependencies from `requirements.txt`, you should only edit `setup.py` for your dependencies. `requirements.txt` is
used for editable installs (`pip install -e`) to pull in Python dependencies from the monorepo and will call `setup.py`.
If this is mumbo jumbo to you, don't worry about it, just put your deps in `setup.py` but install using `pip install -r requirements.txt` and everything
should work as you expect.

#### Building via Gradle
From the Airbyte repository root, run:
You can also build the connector in Gradle. This is typically used in CI and not needed for your development workflow.

To build using Gradle, from the Airbyte repository root, run:
```
./gradlew :airbyte-integrations:connectors:source-apify-dataset:build
```

#### Create credentials
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.io/integrations/sources/apify-dataset)
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_apify_dataset/spec.json` file.
Note that the `secrets` directory is gitignored by default, so there is no danger of accidentally checking in sensitive information.
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.com/integrations/sources/apify-dataset)
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_apify_dataset/spec.yaml` file.
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
See `integration_tests/sample_config.json` for a sample config file.

You can get your Apify credentials from Settings > Integration [section](https://my.apify.com/account#/integrations) of the Apify app

**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source apify-dataset test creds`
and place them into `secrets/config.json`.

### Locally running the connector
```
python main.py spec
python main.py check --config secrets/config.json
python main.py discover --config secrets/config.json
python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json
```

### Locally running the connector docker image

#### Build
Expand All @@ -82,32 +46,15 @@ docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-apify-dataset:dev disc
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-apify-dataset:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
```
## Testing
Make sure to familiarize yourself with [pytest test discovery](https://docs.pytest.org/en/latest/goodpractices.html#test-discovery) to know how your test files and methods should be named.
First install test dependencies into your virtual environment:
```
pip install .[tests]
```
### Unit Tests
To run unit tests locally, from the connector directory run:
```
python -m pytest unit_tests
```

### Integration Tests
There are two types of integration tests: Acceptance Tests (Airbyte's test suite for all source connectors) and custom integration tests (which are specific to this connector).
#### Custom Integration tests
Place custom tests inside `integration_tests/` folder, then, from the connector root, run
```
python -m pytest integration_tests
```
#### Acceptance Tests
Customize `acceptance-test-config.yml` file to configure tests. See [Connector Acceptance Tests](https://docs.airbyte.io/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
Customize `acceptance-test-config.yml` file to configure tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
To run your integration tests with acceptance tests, from the connector root, run

To run your integration tests with Docker, run:
```
python -m pytest integration_tests -p integration_tests.acceptance
./acceptance-test-docker.sh
```
To run your integration tests with docker

### Using gradle to run tests
All commands should be run from airbyte project root.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#


def test_example_method():
assert True
Original file line number Diff line number Diff line change
@@ -1,19 +1,44 @@
# See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference)
# for more information about how to configure these tests
connector_image: airbyte/source-apify-dataset:dev
tests:
acceptance_tests:
spec:
- spec_path: "source_apify_dataset/spec.json"
tests:
- spec_path: "source_apify_dataset/spec.yaml"
backward_compatibility_tests_config:
disable_for_version: 0.2.0
connection:
- config_path: "secrets/config.json"
status: "succeed"
- config_path: "integration_tests/invalid_config.json"
status: "failed"
tests:
- config_path: "secrets/config.json"
status: "succeed"
- config_path: "integration_tests/invalid_config.json"
status: "failed"
discovery:
- config_path: "secrets/config.json"
tests:
- config_path: "secrets/config.json"
backward_compatibility_tests_config:
disable_for_version: 0.2.0
basic_read:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
tests:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
incremental:
bypass_reason: Connector doesn't use incremental sync
# tests:
# - config_path: "secrets/config.json"
# configured_catalog_path: "integration_tests/configured_catalog.json"
# future_state:
# future_state_path: "integration_tests/abnormal_state.json"
full_refresh:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
tests:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
ignored_fields:
datasets:
- name: "accessedAt"
bypass_reason: "Change everytime"
- name: "stats/readCount"
bypass_reason: "Change everytime"
dataset:
- name: "accessedAt"
bypass_reason: "Change everytime"
- name: "stats/readCount"
bypass_reason: "Change everytime"
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
#!/usr/bin/env sh

source "$(git rev-parse --show-toplevel)/airbyte-integrations/bases/connector-acceptance-test/acceptance-test-docker.sh"
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[
{
"type": "STREAM",
"stream": {
"stream_state": { "modifiedAt": "3021-09-08T07:04:28.000Z" },
"stream_descriptor": { "name": "dataset" }
}
},
{
"type": "STREAM",
"stream": {
"stream_state": { "modifiedAt": "3021-09-08T07:04:28.000Z" },
"stream_descriptor": { "name": "datasets" }
}
}
]
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,6 @@
@pytest.fixture(scope="session", autouse=True)
def connector_setup():
"""This fixture is a placeholder for external resources that acceptance test might require."""
# TODO: setup test dependencies if needed. otherwise remove the TODO comments
yield
# TODO: clean up test dependencies

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,24 +1,31 @@
{
"streams": [
{
"stream": {
"name": "datasets",
"json_schema": {},
"supported_sync_modes": ["full_refresh"]
},
"sync_mode": "full_refresh",
"destination_sync_mode": "overwrite"
},
{
"stream": {
"name": "dataset",
"json_schema": {},
"supported_sync_modes": ["full_refresh"]
},
"sync_mode": "full_refresh",
"destination_sync_mode": "overwrite",
"destination_sync_mode": "overwrite"
},
{
"stream": {
"name": "DatasetItems",
"supported_sync_modes": ["full_refresh"],
"destination_sync_mode": "overwrite",
"json_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"data": {
"type": "object",
"additionalProperties": true
}
},
"additionalProperties": true
}
}
"name": "item_collection",
"json_schema": {},
"supported_sync_modes": ["full_refresh"]
},
"sync_mode": "full_refresh",
"destination_sync_mode": "overwrite"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{"stream": "datasets", "data": {"id":"Mxnvcv4Rspg9P9aP0","name":"my-dataset-name","userId":"YnGtyk7naKpwpousW","username":"encouraging_cliff","createdAt":"2023-08-25T19:19:33.588Z","modifiedAt":"2023-08-25T19:19:33.588Z","accessedAt":"2023-08-25T19:19:43.646Z","itemCount":0,"cleanItemCount":0,"actId":null,"actRunId":null,"schema":null,"stats":{"inflatedBytes":0,"readCount":0,"writeCount":0}}, "emitted_at": 1692990238010}
{"stream": "dataset", "data": {"id":"Mxnvcv4Rspg9P9aP0","name":"my-dataset-name","userId":"YnGtyk7naKpwpousW","createdAt":"2023-08-25T19:19:33.588Z","modifiedAt":"2023-08-25T19:19:33.588Z","accessedAt":"2023-08-25T19:19:43.646Z","itemCount":0,"cleanItemCount":0,"actId":null,"actRunId":null,"schema":null,"stats":{"readCount":0,"writeCount":0,"storageBytes":0},"fields":[]}, "emitted_at": 1692990238010}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{
"datasetId": "non_existent_dataset_id",
"clean": false
"token": "abc",
"start_date": "2099-08-25T00:00:59.244Z"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"token": "apify_api_XXXXXXXXXXXXXXXXXXXX",
"start_date": "2023-08-25T00:00:59.244Z"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[
{
"type": "STREAM",
"stream": {
"stream_state": { "modifiedAt": "3021-09-08T07:04:28.000Z" },
"stream_descriptor": { "name": "example" }
}
}
]
30 changes: 18 additions & 12 deletions airbyte-integrations/connectors/source-apify-dataset/metadata.yaml
Original file line number Diff line number Diff line change
@@ -1,24 +1,30 @@
data:
allowedHosts:
hosts:
- api.apify.com
registries:
oss:
enabled: true
cloud:
enabled: true
connectorSubtype: api
connectorType: source
definitionId: 47f17145-fe20-4ef5-a548-e29b048adf84
dockerImageTag: 0.2.0
dockerImageTag: 1.0.0
dockerRepository: airbyte/source-apify-dataset
githubIssueLabel: source-apify-dataset
icon: apify.svg
icon: apify-dataset.svg
license: MIT
name: Apify Dataset
registries:
cloud:
enabled: true
oss:
enabled: true
releaseDate: 2023-08-25
releaseStage: alpha
releases:
breakingChanges:
1.0.0:
upgradeDeadline: 2023-08-30
message: "Update spec to use token and ingest all 3 streams correctly"
supportLevel: community
documentationUrl: https://docs.airbyte.com/integrations/sources/apify-dataset
tags:
- language:python
ab_internal:
sl: 100
ql: 100
supportLevel: community
- language:lowcode
metadataSpecVersion: "1.0"

0 comments on commit 05b7d01

Please sign in to comment.