Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

productionize community-ci workflow #37404

Merged

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Apr 18, 2024

Closes https://github.com/airbytehq/airbyte-internal-issues/issues/7315

This PR introduces a new community_ci.yml workflow to run connectors tests and format on fork PR.

How can this safely work as forks contain untrusted code ⁉️

  • We use pull_request_target triggers: it means the worfklow logic is the one hosted on the target branch. The fork can't alter the CI behavior. We can't use the pull_request event because: it does not provide access to secrets on forks and the workflow logic could be altered by the contribution.
  • We use GitHub environments with protection rule (community_ci): the execution of the test and format jobs is conditioned by a human approval from a pool of selected Airbyters. A human review should identify shady activity like secret extraction or resource exhaustion. If the code is deemed safe the Airbyte reviewer can approve the deployment to the environment, which would run CI jobs.
  • Github runners are ephemeral so there's no way of accessing artifacts produced by a previous job. We moreover created dedicated runners for community use cases.

Changes

  • I modified the community_ci.yml workflow from my spike to clean it up and make it work. This is the first thing you should review.
  • I made airbyte-ci accept a --git-repo-url option. This is to enable it to clone forked Airbyte repo to compute the list of modified files.
  • I changed the install-airbyte-ci.yml action to always install the binary when we're on a fork. This will prevent accidentally installing airbyte-ci from source if something shady is introduced to this tool.
  • I modified the other existing "non fork" workflow to early exit or skip if they're triggered by a fork. This should guarantee mutual exclusion of what's running in community_ci.yml and the other non fork workflows.

Why did you introduce a new workflow and not change the logic of the existing ones ⁉️

I felt that introducing a new workflow dedicated to Community CI was simpler to colocate all the specific (and a bit more complex) logic related to running CI on forks.
If we wanted to modify the other workflows to support running on forks it would require changing their trigger from pull_request to pull_request_target, which has some drawbacks. It would also introduce a lot more conditional expressions in workflows to check whether we're on a fork.

We can consider streamlining workflows to work on fork / non fork in the future, but I first want to unblock the community review use case in a simple and atomic manner.

Demo

Test and format correctly ran in this workflow execution on this fork.

When a push happens on the PR Github displays this button to start the approval flow:
Screenshot 2024-04-23 at 11 32 47

Copy link

vercel bot commented Apr 18, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Apr 25, 2024 8:26am

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @alafanechere and the rest of your teammates on Graphite Graphite

.github/workflows/community_ci.yml Fixed Show resolved Hide resolved
@alafanechere alafanechere marked this pull request as ready for review April 18, 2024 13:42
@alafanechere alafanechere requested a review from a team as a code owner April 18, 2024 13:42
Copy link

ellipsis-dev bot commented Apr 18, 2024

Your free trial has expired. To continue using Ellipsis, sign up at https://app.ellipsis.dev for $20/developer/month. If you have any questions, reach us at help@ellipsis.dev

@alafanechere alafanechere force-pushed the augustin/04-18-community-ci_workflow_prevent_injection branch 23 times, most recently from 13ff71d to c83d072 Compare April 22, 2024 11:18
@alafanechere alafanechere force-pushed the augustin/04-18-community-ci_workflow_prevent_injection branch 2 times, most recently from 4df719e to d2d622a Compare April 23, 2024 08:24
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worfklow was not used anywhere

@alafanechere alafanechere force-pushed the augustin/04-18-community-ci_workflow_prevent_injection branch 12 times, most recently from df29d55 to ba1b38a Compare April 23, 2024 09:36
@alafanechere alafanechere marked this pull request as ready for review April 23, 2024 10:02
@aaronsteers
Copy link
Collaborator

Regarding security, @natikgadzhi and @alafanechere , are you both confident enough in the security profile here that we can relax the GitHub setting to let workflows run freely on fork PRs? We just relaxed on PyAirbyte because this enables users constant feedback on those non-privileged workflows like lint, format, static code tests and non-cred-tests like unit tests.

Before bringing in any pull_request_target workflows, we should carefully consider if this limits our options to enable more frequent checks for users. (Today all workflows require approval, even though there is zero security risk. And we're adding a new security vulnerability here that we need to mitigate ourselves.)

@alafanechere
Copy link
Contributor Author

alafanechere commented Apr 23, 2024

Regarding security, @natikgadzhi and @alafanechere , are you both confident enough in the security profile here that we can relax the GitHub setting to let workflows run freely on fork PRs?

No I am not.

  • A. Anytime there's secret access there's a risk for secret exfiltration as we execute untrusted code.
  • B. Anytime we execute untrusted code there's a risk for resource abuse.

B can be easily mitigated with carefully picked timeouts and concurrency control.
A would require a rework of the worfklows and airbyte-ci to not require secrets for non-cred-tests. We currently run one single airbyte-ci command which runs all tests, so we pass secrets to it. We could rework the workflow for more granular execution. This is something I described in details in Phase 2 of my tech spec

In any case I believe this should be a follow up effort. We could start by not requiring secrets for format as it's a smaller lift compared to running unit tests automatically.

@alafanechere
Copy link
Contributor Author

Before bringing in any pull_request_target workflows, we should carefully consider if this limits our options to enable more frequent checks for users.

The pull_request_target trigger does not limit the checks frequency. It has the same triggering logic as pull_requests. What can throttle frequent checks is the use of Github environment with protection which requires human approval.

Copy link
Contributor

@natikgadzhi natikgadzhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving to get this moving, caveated on addressing feedback —I hope to have this in soon so we can then cleanup formatting and get @marcosmarxm to be very happy ;)

@@ -10,20 +10,28 @@ inputs:
description: "Path to airbyte-ci source"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work without Python tho!? TIL

if [ "${{ github.event.pull_request.head.repo.fork }}" == "true" ]; then
echo "PR is from a fork. Exiting workflow..."
exit 78
fi
- name: Checkout Airbyte
uses: actions/checkout@v3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: check if new version of checkout is available. V4 I think?

permissions:
statuses: write
env:
MAIN_BRANCH_NAME: "augustin/04-18-community-ci_workflow_prevent_injection"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this needs to change before we merge?


# This will sync the .github folder of the main repo with the fork
# This allows us to use up to date actions from the main repo
- name: Pull .github folder from main repository
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so we want to ensure that they ran new versions of tests and such?

context: "pull_request"
dagger_cloud_token: ${{ secrets.DAGGER_CLOUD_TOKEN }}
dagger_cloud_token: ${{ secrets.DAGGER_CLOUD_TOKEN_2 }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the token here? How is it related? Is this one specific to community ci?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAGGER_CLOUD_TOKEN_2 is the latest version of this secrets which works.
The previous token stopped working when we moved to GHA runners and I don't know why.
I kept both tokens as we did progressive migration from self hosted to GHA runners.

docker_hub_password: ${{ secrets.DOCKER_HUB_PASSWORD }}
docker_hub_username: ${{ secrets.DOCKER_HUB_USERNAME }}
gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }}
gcs_credentials: ${{ secrets.METADATA_SERVICE_PROD_GCS_CREDENTIALS }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this change a bit? Did we rename the variable before but didn't rename this particular input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just aligning the inputs to what is currently declared in the format_check.yml workflow.

subcommand: "connectors --modified test"
is_fork: "true"

format:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we remove this in favor of @aaronsteers's approach to formatting, given we'll clean that one up?

@alafanechere alafanechere force-pushed the augustin/04-18-community-ci_workflow_prevent_injection branch from ba1b38a to 5cc609a Compare April 25, 2024 08:22
@alafanechere alafanechere force-pushed the augustin/04-18-community-ci_workflow_prevent_injection branch from 5cc609a to 3023e79 Compare April 25, 2024 08:25
@alafanechere alafanechere enabled auto-merge (squash) April 25, 2024 08:27
@alafanechere alafanechere merged commit 71ebd4a into master Apr 25, 2024
36 checks passed
@alafanechere alafanechere deleted the augustin/04-18-community-ci_workflow_prevent_injection branch April 25, 2024 09:00
Copy link

sentry-io bot commented Apr 25, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ LiveError: Only one live display may be active at once pipelines.helpers.git in get_modified_files_in_... View Issue
  • ‼️ LiveError: Only one live display may be active at once rich.console in set_live View Issue
  • ‼️ ExecError: process "git checkout --track origin/04-26-bug_airbyte-ci_ensure_we_always_track_an_explicit_bran... pipelines.dagger.containers.git in checked_out_... View Issue

Did you find this useful? React with a 👍 or 👎

FVidalCarneiro pushed a commit to AgiData/airbyte that referenced this pull request May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants