Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add AirbyteLib devcontainer config #33876

Closed
wants to merge 69 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
f76a0d3
base implementation
Dec 13, 2023
19eba07
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-lib
Dec 13, 2023
a960851
implement peek
Dec 13, 2023
17c17b9
prepare
Dec 13, 2023
6dd8815
various things
Dec 13, 2023
81cc214
tests and stuff
Dec 13, 2023
585631c
format
Dec 13, 2023
73b5b18
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-lib
Dec 14, 2023
7a96d61
hook into CI
Dec 14, 2023
796ca18
format
Dec 14, 2023
24b7df6
run integration_tests via airbyte-ci
alafanechere Dec 14, 2023
44bf767
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-lib
Dec 14, 2023
8734cfc
Merge branch 'flash1293/airbyte-lib' of github.com:airbytehq/airbyte …
Dec 14, 2023
4f349cc
fix CI invocation
Dec 14, 2023
aae69e5
clean up
Dec 14, 2023
90cbc7a
Merge branch 'master' into flash1293/airbyte-lib
Dec 15, 2023
ceea28d
Merge branch 'master' into flash1293/airbyte-lib
Dec 18, 2023
22f9504
Merge branch 'master' into flash1293/airbyte-lib
Dec 18, 2023
d384150
base airbyte-lib caches implementation
aaronsteers Dec 18, 2023
e5d76c9
initial commit - airbyte-lib caches implementation
aaronsteers Dec 18, 2023
6eceb7f
fix sql cache base name
aaronsteers Dec 19, 2023
38e7df8
base implementation
Dec 13, 2023
9ed829e
implement peek
Dec 13, 2023
4db998f
prepare
Dec 13, 2023
76485cf
various things
Dec 13, 2023
d432172
tests and stuff
Dec 13, 2023
2031de7
format
Dec 13, 2023
470a76d
hook into CI
Dec 14, 2023
e77fb36
format
Dec 14, 2023
9a64147
run integration_tests via airbyte-ci
alafanechere Dec 14, 2023
ba627ea
fix CI invocation
Dec 14, 2023
8906687
clean up
Dec 14, 2023
be42017
base airbyte-lib caches implementation
aaronsteers Dec 18, 2023
e341d96
initial commit - airbyte-lib caches implementation
aaronsteers Dec 18, 2023
605bc83
fix sql cache base name
aaronsteers Dec 19, 2023
6a854bf
Merge branch 'aj/airbyte-lib-caches-base' of https://github.com/airby…
aaronsteers Dec 19, 2023
e35b0c6
Merge branch 'master' into flash1293/airbyte-lib
aaronsteers Dec 19, 2023
9e003a9
airbyte-lib: Add path executor (#33600)
Dec 19, 2023
a5869e3
Merge branch 'flash1293/airbyte-lib' of github.com:airbytehq/airbyte …
Dec 19, 2023
89059c8
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-lib
Dec 19, 2023
8418dca
code format
Dec 19, 2023
6d11890
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-lib
Dec 19, 2023
babf999
mypy
Dec 19, 2023
d2e1e46
always show last log messages
Dec 19, 2023
8c86f9f
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-lib
Dec 20, 2023
e090490
add py.typed
Dec 20, 2023
4ca3a6b
add mypy checks
Dec 20, 2023
c6ec803
rename peek to read_stream
Dec 20, 2023
e69c568
refactor to match new API
Dec 20, 2023
19f675e
more refactoring
Dec 20, 2023
6d4278e
add header to registry request and fix test
Dec 20, 2023
966fe43
fix wrong variable used
Dec 20, 2023
878dbe0
rename classes, update dir structure
aaronsteers Dec 21, 2023
401e5b5
mypy fixes
aaronsteers Dec 21, 2023
9e9f61e
update from branch 'origin/flash1293/airbyte-lib'
aaronsteers Dec 21, 2023
7331d13
`poetry lock`
aaronsteers Dec 21, 2023
f210cbd
resolved all mypy issues
aaronsteers Dec 21, 2023
fa87002
add duckdb test (failing)
aaronsteers Dec 21, 2023
a60b3c0
checkpoint: working duckdb cache
aaronsteers Dec 22, 2023
21e41d9
checkpoint: all tests passing
aaronsteers Dec 23, 2023
5d51ea4
checkpoint: postgres passing test
aaronsteers Dec 23, 2023
17fcf28
update fixme comments
aaronsteers Dec 23, 2023
14b0408
add faker example script (not yet working)
aaronsteers Dec 23, 2023
7cb3e4b
set space-x example script to install-if-missing=true
aaronsteers Dec 23, 2023
d1e062e
Update airbyte-lib/airbyte_lib/file_writers/base.py
aaronsteers Dec 23, 2023
5099e58
small fixes
aaronsteers Dec 23, 2023
a3bd109
update docstring
aaronsteers Dec 23, 2023
492e254
update docstring
aaronsteers Dec 23, 2023
618282f
add devcontainer config
aaronsteers Jan 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions .devcontainer/airbyte-lib/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/python
{
"name": "AirbyteLib DevContainer (Python)",

// Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
"image": "mcr.microsoft.com/devcontainers/python:0-3.10",

// Features to add to the dev container. More info: https://containers.dev/features.
"features": {
"ghcr.io/devcontainers-contrib/features/poetry:2": {},
"ghcr.io/devcontainers/features/docker-in-docker": {},
"ghcr.io/devcontainers/features/python:1": {
"installJupyterlab": true
}
},

"overrideFeatureInstallOrder": [
// Deterministic order maximizes cache reuse
"ghcr.io/devcontainers-contrib/features/poetry",
"ghcr.io/devcontainers/features/docker-in-docker",
"ghcr.io/devcontainers/features/python"
],
// "workspaceFolder": "/workspaces/airbyte/airbyte-lib",

// Configure tool-specific properties.
"customizations": {
"vscode": {
"extensions": [
// Python extensions:
"charliermarsh.ruff",
"matangover.mypy",
"ms-python.python",
"ms-python.vscode-pylance",
// Toml support
"tamasfe.even-better-toml",
// Yaml and JSON Schema support:
"redhat.vscode-yaml",
// Contributing:
"GitHub.vscode-pull-request-github"
],
"settings": {
"extensions.ignoreRecommendations": true,
"git.autofetch": true,
"git.openRepositoryInParentFolders": "always",
"python.defaultInterpreterPath": "/workspaces/airbyte/airbyte-lib/.venv/bin/python",
"python.interpreter.infoVisibility": "always",
"python.terminal.activateEnvironment": true,
"python.testing.pytestEnabled": true,
"python.testing.cwd": "/workspaces/airbyte/airbyte-lib",
"python.testing.pytestArgs": [
"--rootdir=/workspaces/airbyte/airbyte-lib",
"."
],
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff"
}
}
}
},
"containerEnv": {
"POETRY_VIRTUALENVS_IN_PROJECT": "true"
},

// Mark the root directory as 'safe' for git.
"initializeCommand": "git config --add safe.directory /workspaces/airbyte",
// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "python -m pip install jupyter notebook -U && cd airbyte-lib && poetry install"
// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
}
18 changes: 18 additions & 0 deletions .github/workflows/airbyte-ci-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ jobs:
- 'airbyte-ci/connectors/metadata_service/lib/**'
- 'airbyte-ci/connectors/metadata_service/orchestrator/**'
- '!**/*.md'
airbyte_lib:
- 'airbyte_lib/**'
- '!**/*.md'

- name: Run airbyte-ci/connectors/connector_ops tests
if: steps.changes.outputs.ops_any_changed == 'true'
Expand Down Expand Up @@ -132,3 +135,18 @@ jobs:
docker_hub_password: ${{ secrets.DOCKER_HUB_PASSWORD }}
airbyte_ci_binary_url: ${{ inputs.airbyte_ci_binary_url || 'https://connectors.airbyte.com/airbyte-ci/releases/ubuntu/latest/airbyte-ci' }}
tailscale_auth_key: ${{ secrets.TAILSCALE_AUTH_KEY }}

- name: Run airbyte-lib tests
if: steps.changes.outputs.airbyte_lib_any_changed == 'true'
id: run-airbyte-lib-tests
uses: ./.github/actions/run-dagger-pipeline
with:
context: "pull_request"
docker_hub_password: ${{ secrets.DOCKER_HUB_PASSWORD }}
docker_hub_username: ${{ secrets.DOCKER_HUB_USERNAME }}
gcs_credentials: ${{ secrets.METADATA_SERVICE_PROD_GCS_CREDENTIALS }}
sentry_dsn: ${{ secrets.SENTRY_AIRBYTE_CI_DSN }}
github_token: ${{ secrets.GH_PAT_MAINTENANCE_OCTAVIA }}
subcommand: "test airbyte-lib"
airbyte_ci_binary_url: ${{ inputs.airbyte_ci_binary_url || 'https://connectors.airbyte.com/airbyte-ci/releases/ubuntu/latest/airbyte-ci' }}
tailscale_auth_key: ${{ secrets.TAILSCALE_AUTH_KEY }}
1 change: 1 addition & 0 deletions airbyte-lib/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.venv*
10 changes: 10 additions & 0 deletions airbyte-lib/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# airbyte-lib

airbyte-lib is a library that allows to run Airbyte syncs embedded into any Python application, without the need to run Airbyte server.

## Development

* Make sure [Poetry is installed](https://python-poetry.org/docs/#).
* Run `poetry install`
* For examples, check out the `examples` folder. They can be run via `poetry run python examples/<example file>`
* Unit tests and type checks can be run via `poetry run pytest`
13 changes: 13 additions & 0 deletions airbyte-lib/airbyte_lib/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@

from .factories import (get_connector, get_default_cache, new_local_cache)
from .sync_results import (Dataset, SyncResult)
from .source import (Source)

__all__ = [
"get_connector",
"get_default_cache",
"new_local_cache",
"Dataset",
"SyncResult",
"Source",
]
66 changes: 66 additions & 0 deletions airbyte-lib/airbyte_lib/_util.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
"""Internal utility functions, especially for dealing with Airbyte Protocol."""

import datetime
from functools import lru_cache
from typing import Any, cast
from collections.abc import Iterable

from airbyte_protocol.models import (
AirbyteMessage,
AirbyteRecordMessage,
ConfiguredAirbyteCatalog,
ConfiguredAirbyteStream,
Type,
)


def airbyte_messages_to_record_dicts(messages: Iterable[AirbyteMessage]) -> Iterable[dict[str, Any]]:
"""Convert an AirbyteMessage to a dictionary."""
yield from (
cast(dict[str, Any], airbyte_message_to_record_dict(message))
for message in messages
if message is not None
)


def airbyte_message_to_record_dict(message: AirbyteMessage) -> dict[str, Any] | None:
"""Convert an AirbyteMessage to a dictionary.

Return None if the message is not a record message.
"""
if message.type != Type.RECORD:
return None

return airbyte_record_message_to_dict(message.record)

def airbyte_record_message_to_dict(record_message: AirbyteRecordMessage) -> dict[str, Any] | None:
"""Convert an AirbyteMessage to a dictionary.

Return None if the message is not a record message.
"""
result = record_message.data

# TODO: Add the metadata columns (this breaks tests)
# result["_airbyte_extracted_at"] = datetime.datetime.fromtimestamp(
# record_message.emitted_at
# )

return result


def get_primary_keys_from_stream(
stream_name: str, configured_catalog: ConfiguredAirbyteCatalog
) -> set[str]:
"""Get the primary keys from a stream in the configured catalog."""
stream = next(
(
stream
for stream in configured_catalog.streams
if stream.stream.name == stream_name
),
None,
)
if stream is None:
raise ValueError(f"Stream {stream_name} not found in catalog.")

return set(stream.stream.source_defined_primary_key or [])
20 changes: 20 additions & 0 deletions airbyte-lib/airbyte_lib/caches/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
"""Base module for all caches."""

from airbyte_lib.caches.base import SQLCacheBase
from airbyte_lib.caches.duckdb import DuckDBCache, DuckDBCacheConfig
from airbyte_lib.caches.memory import InMemoryCache, InMemoryCacheConfig
from airbyte_lib.caches.postgres import PostgresCache, PostgresCacheConfig
from airbyte_lib.types import SQLTypeConverter


# We export these classes for easy access: `airbyte_lib.caches...`
__all__ = [
"DuckDBCache",
"DuckDBCacheConfig",
"InMemoryCache",
"InMemoryCacheConfig",
"PostgresCache",
"PostgresCacheConfig",
"SQLCacheBase",
"SQLTypeConverter",
]
Loading
Loading