Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airbyte-ci: Generate binary files for airbyte-ci #31930

Merged
merged 41 commits into from
Nov 14, 2023

Conversation

bnchrch
Copy link
Contributor

@bnchrch bnchrch commented Oct 27, 2023

Goal

Provide binaries for use in CI and locally

Anti Goal(s)

  1. Updating the CI runners to use
  2. Using this in place of the pipx install step

How

  1. Adds pyinstaller as a dev dependency
  2. Adds the poetry command poetry run poe build-release-binary which generates a standalone binary wrapping dependencies and python
  3. Merges dagger run into airbyte-ci and removes `airbyte-ci-internal
  4. Adds an experimental github action for building binarys for different platforms

Gotchas/Learnings/Deadends

  1. This took much longer than I would like and became a victim of the sunk cost fallacy
  2. Nuitka was a deadend
  3. pyinstaller does not collect dependencies well
  4. Dagger was a deadend. There was no adequate way to build macos binary using dagger (or at least not with some paint)

Later PRs

  1. Download inside of airbyte-ci
  2. Build on merge to master
  3. Download appropriate binary via makefile and place on path
  4. Sign macos binaries

@vercel
Copy link

vercel bot commented Oct 27, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 14, 2023 8:31am

@bnchrch bnchrch force-pushed the bnchrch/ci/binary-generation-pyinstaller branch from 9396814 to d64044b Compare October 31, 2023 19:21
@bnchrch bnchrch marked this pull request as ready for review October 31, 2023 19:31
@bnchrch bnchrch requested review from a team, alafanechere and erohmensing October 31, 2023 19:31
Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like that this is an experiment too, so we can slowly roll this out

ci_gcs_credentials_secret: dagger.Secret,
ci_artifact_bucket_name: str,
image: str,
output_filename: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... what will the output_filename be? Ideally, it's something tike airbyte-ci-{arch}-{version}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally it would have just been airbyte-ci-debian or airbyte-ci-macos

however I updated it to be airbyte-ci-amd64-1.2.3 and airbyte-ci-arm64-1.2.3

Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm chock full of questions today 🤠

@@ -123,7 +123,7 @@ At this point you can run `airbyte-ci` commands.

| Option | Default value | Mapped environment variable | Description |
| --------------------------------------- | ------------------------------- | ----------------------------- | ------------------------------------------------------------------------------------------- |
| `--no-tui` | | | Disables the Dagger terminal UI. |
| `--enable-dagger-run/--disable-dagger-run` | `--enable-dagger-run`` | | Disables the Dagger terminal UI. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I was worried about merging airbyte CI and airbyte CI internal but this works

@@ -408,6 +408,7 @@ This command runs the Python tests for a airbyte-ci poetry package.
## Changelog
| Version | PR | Description |
| ------- | ---------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- |
| 2.6.0 | [#31974](https://github.com/airbytehq/airbyte/pull/31974) | Merge airbyte-ci-internal into airbyte-ci |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ was this necessary to get the binaries working? because of the scripts, or what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should mention the new command!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it was. Basically because the airbyte-ci entry point relied on airbyte-ci-internal being a separate command. With a binary we can (technically should) only build 1 so we had to merge 2 into 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, makes sense

Comment on lines 172 to 175
# by checking if sys.argv contains the dagger run command
# e.g. `dagger run aibyte-ci connectors publish`

called_with_dagger_run = os.getenv("_DAGGER_WRAP_APPLIED") == "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not seem to check what the comment says it does

ci_gcs_credentials: str,
ci_job_key: str,
s3_build_cache_access_key_id: str,
s3_build_cache_secret_key: str,
show_dagger_logs: bool,
): # noqa D103
ctx.ensure_object(dict)


if enable_dagger_run and not is_current_process_wrapped_by_dagger_run():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case do we start out not wrapped by dagger run, if we only have the one entrypoint now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I take it back, looks like this is now the entrypoint, so we'll almost always not be wrapped in dagger run and we do the wrapping here. In which case... is this pretty much only to test that someone didn't append dagger run themselves?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're the only ones adding that env var I'm not sure how that works either 🤔



if enable_dagger_run and not is_current_process_wrapped_by_dagger_run():
main_logger.info("Re-Running airbyte-ci with dagger run.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this log might be a bit confusing to users when it does this each time by default 🤔

@@ -88,16 +87,12 @@ def check_dagger_cli_install() -> str:
return check_dagger_cli_install()
return dagger_path


def main():
set_working_directory_to_root()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woohoo, no longer doing this in 2 places!

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because I think a minor code restructuration would help in maintaining this.
I'm also not sure about the release flow you have in mind.

  • Is it supposed to be a manual release that we run locally?
  • Would it be a reasonable effort in the scope of this PR to create a workflow that would call airbyte-ci release command on pyproject.toml file change of airbyte-ci on the master branch? (release on merge)

Comment on lines 8 to 11
PLATFORMS_TO_BUILD = {
"amd64": "python:3.10-slim",
"arm64": "arm64v8/python:3.10-slim",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mapping between platform and image should not be required.
The python:3.10-slim tag already has platform digests for amd64 and arm64.
It's dagger client role's to figure out which digest to use according to the platform you build your container for. See my other comment on line 40


container = await (
dagger_client
.container()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.container()
.container(platform=dagger.Platform(f"linux/{platform_name}"))
.from("python:3.10-slim")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! I did not know this! Perfect!

from pipelines.dagger.actions.remote_storage import upload_to_gcs

PLATFORMS_TO_BUILD = {
"amd64": "python:3.10-slim",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be "smart" to use or python connector base image as it really has poetry?
Might not be a good idea as our base image is made to build connectors..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Ideally I would like to leave it separate, but I do like the speed gains of not having to install poetry. Sure why not!

Comment on lines 71 to 72
.with_exec([f"./dist/{artifact_name}", "--version"])
.with_exec(["ls", "-la", "dist"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are "checks" command, can we:

  1. await on container (without the check command): it will trigger the bin generation: if the bin gen fails an ExecError will be thrown
  2. await on container.with_exec(). If the command exec fails an ExecError will be thrown but we'll know that it's these specific check command that return a 1 status code.
  3. Upload: upload is actually executed when we know that the bin gen succedded.

It'll help avoid lazy execution surprises and simplify debugging.

"arm64": "arm64v8/python:3.10-slim",
}

DIRECTORIES_TO_MOUNT = [".git", "airbyte-ci"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we mount .git?
Is it because when we check the bin we run --version which has to be run inside our git repo?

I would suggest not mounting the .git dir at binary generation time if it's not required, and run the checks inside platform specific "check" containers following the bin generation.

On these "check" containers we'd mount:

  • The bin
  • The airbyte repo, not using .git but using the directory representation of our repo with Dagger client.git method..
    It will make this airbyte-ci release command run faster locally as .git is pretty fat locally.

CURRENT_VERSION = importlib.metadata.version("pipelines")


async def create_airbyte_ci_release(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate three different function to split responsability:

  • generate_binary(airbyte_repo: Directory, platform: dagger.Platform) -> runs the pyinstaller command
  • check_binary(airbyte_ci_binary: File, platform: dagger.Platform) -> mount the binary in platform specific containers with the airbyte repo and run sanity checks. Returns a boolean wether the check succeeds or not
  • upload_binary(airbyte_ci_binary: File, platform: dagger.Platform) uploads the binary to GCS

run_releases would call these function for each platform and would first retrieve the current airbyte-repo with dagger_client.git

Comment on lines 83 to 89
await upload_to_gcs(
dagger_client=dagger_client,
file_to_upload=binary_file,
key=file_path,
bucket=ci_artifact_bucket_name,
gcs_credentials=ci_gcs_credentials_secret,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explicitely log the public GCS url of the uploaded binaries?

"--onefile",
"pipelines/cli/airbyte_ci.py",
])
.with_exec([f"./dist/{artifact_name}", "--version"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check that the --version output matches the CURRENT_VERSION?

Comment on lines 1 to 32
name: Connector Ops CI - Experimental Airbyte CI Release

# Note this is a workflow simply to test out if github can build using macos

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

on:
workflow_dispatch:
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: ['ubuntu-latest', 'macos-latest']

steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.10

- run: curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -

- run: cd airbyte-ci/pipelines/airbyte_ci
- run: poetry install --with dev
- run: poetry run pyinstaller --collect-all pipelines --collect-all beartype --collect-all dagger --hidden-import strawberry --name airbyte-ci-${{ matrix.os }} --onefile pipelines/cli/airbyte_ci.py
- uses: actions/upload-artifact@v2
with:
path: dist/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this workflow if it's not workfing for macos-latest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Sorry I couldve explained this better. The reason I have this github action is to test if githubs macos-latest image is successful in building an macos image (given that our amd64 is not)

airbyte-ci/connectors/pipelines/README.md Show resolved Hide resolved
@bnchrch bnchrch marked this pull request as ready for review November 10, 2023 00:10
@bnchrch
Copy link
Contributor Author

bnchrch commented Nov 10, 2023

@alafanechere Ready for your rereview!

This has been updated to use github actions exclusively and Ive deleted the airbyte-ci release pipeline

You can build and download the macos binary by

  1. Running this GHA action: https://github.com/airbytehq/airbyte/actions/workflows/airbyte-ci-release-experiment.yml
  2. Targeting this branch: bnchrch/ci/binary-generation-pyinstaller
  3. When done download the macos artifact to your machine
  4. copy it to you airbyte repo
  5. run ./airbyte-ci-macos-latest connectors --support-level=community list
  6. It will fail due to a permission issue! Just got to you secuirty settings and click allow. (heres a howto)

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only requesting changes for the GHA workflow, I'd like to make sure we can't override a released version from a branch

uses: google-github-actions/upload-cloud-storage@v1
with:
path: airbyte-ci/connectors/pipelines/dist/${{ steps.get_artifact_name.outputs.artifact_name}}
destination: dev-airbyte-cloud-connector-metadata-service/airbyte-ci/releases/${{ steps.get_artifact_name.outputs.artifact_name}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the destination bucket / path a workflow input?
Would be nice to be able to write to safely write to pre-releases to make sure an upload from a PR without a bumped airbyte-ci version never overwrites a previous artifact. An alternative way to avoid accidental overwrite would be to add a commit id postfix.

@@ -30,7 +30,6 @@ MANIFEST
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share why it's not gitignored anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was an artifact from attempting to use spec files as the way to define pyinstaller configuration.

will revert

@@ -56,7 +57,7 @@ def display_welcome_message() -> None:
)


def check_up_to_date() -> bool:
def check_up_to_date(throw_as_error=False) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
I'd suggest making this function always succeed (returns true or false)
And implement the raising logic at the caller level.
It'd be better to throw a ClickException as it has a show method to cleanly output errors to stderr.
I think taking this approach would help build more environment checks in a consolidated way.

@@ -206,6 +212,38 @@ def check_local_docker_configuration():
)


def is_dagger_run_enabled_by_default() -> bool:
dagger_run_by_default = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we store it in a constant variable at module level?

@@ -72,6 +72,10 @@ def get_dagger_cli_version(dagger_path: Optional[str]) -> Optional[str]:


def check_dagger_cli_install() -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we rename this to get_dagger_path that would wrap subfunctions:

  • Checking if the current dagger cli version is the correct one
  • Installing it if needed
  • Returning the current dagger cli path

Comment on lines +121 to +130
@functools.cache
def get_git_repo() -> git.Repo:
"""Retrieve the git repo."""
return git.Repo(search_parent_directories=True)


@functools.cache
def get_git_repo_path() -> str:
"""Retrieve the git repo path."""
return get_git_repo().working_tree_dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎
Can we ensure the current git repo is the airbyte one?
We would check the remote url and fail if it's not airbyte?

remote_url = repo.remotes[0].config_reader.get("url") 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explain airbyte-ci developers how to build the binary locally in this file?

Comment on lines +103 to +110
def call_current_command_with_dagger_run():
mark_dagger_wrap()
if (os.environ.get("AIRBYTE_ROLE") == "airbyter") or (os.environ.get("CI") == "True"):
os.environ[DAGGER_CLOUD_TOKEN_ENV_VAR_NAME_VALUE[0]] = DAGGER_CLOUD_TOKEN_ENV_VAR_NAME_VALUE[1]

exit_code = 0
if len(sys.argv) > 1 and any([arg in ARGS_DISABLING_TUI for arg in sys.argv]):
command = ["airbyte-ci-internal"] + [arg for arg in sys.argv[1:] if arg != "--no-tui"]
else:
dagger_path = check_dagger_cli_install()
command = [dagger_path, "run", "airbyte-ci-internal"] + sys.argv[1:]
dagger_path = check_dagger_cli_install()
command = [dagger_path, "run"] + sys.argv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this change breaks the feature of running airbyte-ci from nested airbyte subfolders if the binary is not in the shell path:
e.g running:

cd airbyte-ci/connectors/pipelines
./dist/airbyte-ci connectors --name=source-faker test

Output:

 INFO     pipelines.cli.dagger_run: Running command: ['/Users/augustin/bin/dagger', 'run', './dist/airbyte-ci', 'connectors', '--name=source-faker', 'test']                       dagger_run.py:113
█ [0.00s] ERROR ./dist/airbyte-ci connectors --name=source-faker test

This is not happening after I added airbyte-ci to my path (which the local install process should do)

@alafanechere alafanechere force-pushed the bnchrch/ci/binary-generation-pyinstaller branch from 3279e55 to f72546d Compare November 13, 2023 12:15
@alafanechere alafanechere force-pushed the bnchrch/ci/binary-generation-pyinstaller branch from ab3a5b1 to 325fd2b Compare November 13, 2023 13:35
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnchrch I made some improvements to the workflow:

  • Make sure we always checkout the same commit for ubuntu and macOS build (by setting ref in the checkout action)
  • Upload pre-release binaries with the commit sha, to avoid accidental overwrites of “pre-releases” when folks are iterating on different branches
  • Pre-release binaries are uploaded to the dev bucket
  • Version + latest binaries are uploaded to the prod bucket (only uploaded when the workflow runs on master)
  • On master:
    • Upload a per version artifact and a latest artifact: this can be helpful if we’d like to pin CI / users to a specific version
  • I made a minor filename / path changes:
    • I changed the GCS paths to airbyte-ci/releases/<os>/<latest/version/sha>/airbyte-ci
    • I removed the -latest prefix on os to avoid confusing postfixes
Screenshot 2023-11-13 at 14 37 07
  • Print uploaded artifact public url in the workflow for convenience
  • Change fetch-depth back to default to not fetch unrelated branches

You can check how the new workflow looks on this run

@alafanechere
Copy link
Contributor

✅ Local install and run still works
✅ CI connector test still works
✅ The publish pipeline still works

@alafanechere alafanechere merged commit 0e47bda into master Nov 14, 2023
25 checks passed
@alafanechere alafanechere deleted the bnchrch/ci/binary-generation-pyinstaller branch November 14, 2023 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants