Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airbyte-ci: remove connector secrets hack for --is-local #33972

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion airbyte-ci/connectors/pipelines/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -521,7 +521,8 @@ E.G.: running `pytest` on a specific test folder:

| Version | PR | Description |
| ------- | ---------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| 3.1.1 | [#33979](https://github.com/airbytehq/airbyte/pull/33979) | Fix AssertionError on report existence again |
| 3.1.2 | [#33972](https://github.com/airbytehq/airbyte/pull/33972) | Remove secrets scrubbing hack for --is-local and other small tweaks. |
| 3.1.1 | [#33979](https://github.com/airbytehq/airbyte/pull/33979) | Fix AssertionError on report existence again |
| 3.1.0 | [#33994](https://github.com/airbytehq/airbyte/pull/33994) | Log more context information in CI. |
| 3.0.2 | [#33987](https://github.com/airbytehq/airbyte/pull/33987) | Fix type checking issue when running --help |
| 3.0.1 | [#33981](https://github.com/airbytehq/airbyte/pull/33981) | Fix issues with deploying dagster, pin pendulum version in dagster-cli install |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
#

import asyncclick as click
from pipelines.airbyte_ci.metadata.pipeline import run_metadata_orchestrator_deploy_pipeline
from pipelines.cli.dagger_pipeline_command import DaggerPipelineCommand

# MAIN GROUP
Expand All @@ -24,6 +23,9 @@ def deploy(ctx: click.Context) -> None:
@deploy.command(cls=DaggerPipelineCommand, name="orchestrator", help="Deploy the metadata service orchestrator to production")
@click.pass_context
async def deploy_orchestrator(ctx: click.Context) -> None:
# Import locally to speed up CLI.
from pipelines.airbyte_ci.metadata.pipeline import run_metadata_orchestrator_deploy_pipeline
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shaves 4 seconds off of the execution time for airbyte-ci --help, bringing it down to under a second.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a clue why the import is so slow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not this import in particular, it's all of the transitive imports in pipeline.py and beyond.


await run_metadata_orchestrator_deploy_pipeline(
ctx.obj["is_local"],
ctx.obj["git_branch"],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -137,45 +137,16 @@ async def get_connector_secrets(context: ConnectorContext) -> dict[str, Secret]:


async def mounted_connector_secrets(context: ConnectorContext, secret_directory_path: str) -> Callable[[Container], Container]:
# By default, mount the secrets properly as dagger secret files.
#
# This will cause the contents of these files to be scrubbed from the logs. This scrubbing comes at the cost of
# unavoidable latency in the log output, see next paragraph for details as to why. This is fine in a CI environment
# however this becomes a nuisance locally: the developer wants the logs to be displayed to them in an as timely
# manner as possible. Since the secrets aren't really secret in that case anyway, we mount them in the container as
# regular files instead.
#
# The buffering behavior that comes into play when logs are scrubbed is both unavoidable and not configurable.
# It's fundamentally unavoidable because dagger needs to match a bunch of regexes (one per secret) and therefore
# needs to buffer at least as many bytes as the longest of all possible matches. Still, this isn't that long in
# practice in our case. The real problem is that the buffering is not configurable: dagger relies on a golang
# library called transform [1] to perform the regexp matching on a stream and this library hard-codes a buffer
# size of 4096 bytes for each regex [2].
#
# Remove the special local case whenever dagger implements scrubbing differently [3,4].
#
# [1] https://golang.org/x/text/transform
# [2] https://cs.opensource.google/go/x/text/+/refs/tags/v0.13.0:transform/transform.go;l=130
# [3] https://github.com/dagger/dagger/blob/v0.6.4/cmd/shim/main.go#L294
# [4] https://github.com/airbytehq/airbyte/issues/30394
#
connector_secrets = await context.get_connector_secrets()
if context.is_local:
# Special case for local development.
# Query dagger for the contents of the secrets and mount these strings as files in the container.
contents = {}
for secret_file_name, secret in connector_secrets.items():
contents[secret_file_name] = await secret.plaintext()
"""Returns an argument for a dagger container's with_ method which mounts all connector secrets in it.

def with_secrets_mounted_as_regular_files(container: Container) -> Container:
container = container.with_exec(["mkdir", "-p", secret_directory_path], skip_entrypoint=True)
for secret_file_name, secret_content_str in contents.items():
container = container.with_new_file(
f"{secret_directory_path}/{secret_file_name}", contents=secret_content_str, permissions=0o600
)
return container
Args:
context (ConnectorContext): The context providing a connector object and its secrets.
secret_directory_path (str): Container directory where the secrets will be mounted, as files.

return with_secrets_mounted_as_regular_files
Returns:
fn (Callable[[Container], Container]): A function to pass as argument to the connector container's with_ method.
"""
connector_secrets = await context.get_connector_secrets()

def with_secrets_mounted_as_dagger_secrets(container: Container) -> Container:
container = container.with_exec(["mkdir", "-p", secret_directory_path], skip_entrypoint=True)
Expand Down
13 changes: 5 additions & 8 deletions airbyte-ci/connectors/pipelines/pipelines/helpers/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
import anyio
import asyncclick as click
import asyncer
from dagger import Client, Config, Container, ExecError, File, ImageLayerCompression, Platform, QueryError, Secret
from dagger import Client, Config, Container, ExecError, File, ImageLayerCompression, Platform, Secret
from more_itertools import chunked

if TYPE_CHECKING:
Expand Down Expand Up @@ -101,13 +101,10 @@ async def get_file_contents(container: Container, path: str) -> Optional[str]:
Returns:
Optional[str]: The file content if the file exists in the container, None otherwise.
"""
try:
return await container.file(path).contents()
except QueryError as e:
if "no such file or directory" not in str(e):
# this error could come from a network issue
raise
return None
dir_name, file_name = os.path.split(path)
if file_name not in set(await container.directory(dir_name).entries()):
return None
return await container.file(path).contents()
Comment on lines +104 to +107
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The precondition here is that path is a file path and not a dir path, I'm not sure its enforced but I'm fine with it, Dagger will raise an exception if the path leads to a directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that raising an exception is fine in that case, get_file_contents really makes no sense if applied to a directory.



@contextlib.contextmanager
Expand Down
2 changes: 1 addition & 1 deletion airbyte-ci/connectors/pipelines/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "pipelines"
version = "3.1.1"
version = "3.1.2"
description = "Packaged maintained by the connector operations team to perform CI for connectors' pipelines"
authors = ["Airbyte <contact@airbyte.io>"]

Expand Down
Loading