Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create auto_merge package #38019

Merged
merged 3 commits into from
May 13, 2024
Merged

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented May 7, 2024

What

Relates to https://github.com/airbytehq/airbyte-internal-issues/issues/7554
Introduce a new internal package to automatically merge pull requests on the airbyte repo

How

Read the README.md please 🙏

User Impact

None, the tool is not used in a GHA workflow yet

Copy link

vercel bot commented May 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview May 13, 2024 10:13am

Copy link
Contributor Author

alafanechere commented May 7, 2024

@alafanechere alafanechere force-pushed the augustin/05-06-create_auto_merge_package branch from 2f072e3 to 4f8ed1b Compare May 7, 2024 11:38
@alafanechere alafanechere marked this pull request as ready for review May 7, 2024 11:40
@alafanechere alafanechere requested a review from a team as a code owner May 7, 2024 11:40
@alafanechere alafanechere requested a review from bleonard May 7, 2024 11:40
Copy link
Contributor

@natikgadzhi natikgadzhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, reviewing to unblock.

A few nits that I would love you to work through, but I assume there's a stack of PRs where you could clean those up.

@@ -15,3 +15,4 @@ The installation instructions for the `airbyte-ci` CLI tool cal be found here
| [`connectors_qa`](connectors/connectors_qa/) | A tool to verify connectors have sounds assets and metadata. |
| [`metadata_service`](connectors/metadata_service/) | Tools to generate connector metadata and registry. |
| [`pipelines`](connectors/pipelines/) | Airbyte CI pipelines, including formatting, linting, building, testing connectors, etc. Connector acceptance tests live here. |
| [`auto_merge`](connectors/auto_merge/) | A tool to automatically merge connector pull requests. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding it to the index!

Nit: can you also add the live testing tool to this index in here or separate quick PR?

@@ -0,0 +1,33 @@
# `auto_merge`

This python package is made to merge pull requests automatically on the Airbyte Repo. It is used in the [following workflow](TBD).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't forget to update this link (I'm assuming it's in the next PR?)

- All the required checks have passed


## Install and usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prettier -w this.


[tool.airbyte_ci]
optional_poetry_groups = ["dev"]
poe_tasks = ["type_check", "lint",]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
poe_tasks = ["type_check", "lint",]
poe_tasks = ["type_check", "lint",]

@@ -0,0 +1,12 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.

from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from __future__ import annotations

I think from __future__ import annotations is not needed in Python 3.10?

AUTO_MERGE_LABEL = "auto-merge"
BASE_BRANCH = "master"
CONNECTOR_PATH_PREFIXES = {
"airbyte-integrations/connectors/",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this one has trailing /, the other two do not. I don't think this is intentional?



@contextmanager
def github_client() -> Iterator[Github]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

client.close()


def check_if_modifies_connectors_only(modified_files: Iterable[GithubFile]) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. Ideally, we'd like to verify that only files in a single connector are modified, but honestly I think this would be too granular to focus on here.

bool: True if the head commit passes all required checks, False otherwise
"""
successful_status_contexts = [commit_status.context for commit_status in head_commit.get_statuses() if commit_status.state == "success"]
successful_check_runs = [check_run.name for check_run in head_commit.get_check_runs() if check_run.conclusion == "success"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why, TIL you can get check status in Python code! This is very neat.

if not has_auto_merge_label:
logging.info(f"PR {pr.number} does not have the {AUTO_MERGE_LABEL} label")
return False
targets_main_branch = pr.base.ref == BASE_BRANCH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you're declaring variables that are only used once on the very next line — let's just use the if statement directly?

I.e. if not pr.base.ref == BASE_BRANCH:. I don't think you will loose any readability with that.

Copy link
Contributor

@clnoll clnoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @alafanechere! Is the idea that this will run nightly eventually? Should we send the summary to Slack?

import os

GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
PRODUCTION = os.environ.get("AUTO_MERGE_PRODUCTION", "false").lower() == "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required for this PR, but we have various ways that we define truthy in airbyte-ci. At some point we should probably try to make them consistent.

Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I dont want to get in the way so im approving now.

The only thing I'd strongly encourage is

  1. Tell users more about the auto merge label, what it is, where it comes from
  2. Consider containing our "What is a conenctor?" logic to our connector ops library.

Future footage of you merging auto merge 👇
lights-space-gif

This python package is made to merge pull requests automatically on the Airbyte Repo. It is used in the [following workflow](TBD).

A pull request is currently considered as auto-mergeable if:
- It has the `auto-merge` label
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Is this a github label? or a metadata field?

@@ -0,0 +1,33 @@
# `auto_merge`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 There could be a section on "Purpose" e.g. explain why we have automerge and that the goal is to merge connectors with as little manual airbyte intervention as possible

import os

GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
PRODUCTION = os.environ.get("AUTO_MERGE_PRODUCTION", "false").lower() == "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎

logging.getLogger().setLevel(logging.INFO)


@contextmanager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎 Oooh gettting fancy I see

bool: True if all modified files are in CONNECTOR_PATH_PREFIXES, False otherwise
"""
for file in modified_files:
if not any(file.filename.startswith(prefix) for prefix in CONNECTOR_PATH_PREFIXES):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Didn't we have this same logic in the connector ops package? If so should we introduce that as dependency so if we change where connectors are located we dont have to update so many places?

return required_checks.issubset(successful_contexts)


def check_if_pr_is_auto_mergeable(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Ok so here is the most important logic in the system. The business logic.

Its important that its

  1. Readable
  2. Easily extendable
  3. Contained to as few locations (files) as possible

I think youve achieved that so this is suggestion is very optional:

What if we used a validator pattern

def _validate_has_auto_merge_label((head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool, Optional[str]:
    has_auto_merge_label = any(label.name == AUTO_MERGE_LABEL for label in pr.labels)
    if not has_auto_merge_label:
        return False, f"PR {pr.number} does not have the {AUTO_MERGE_LABEL} label"

    return True, None
    
def check_if_pr_is_auto_mergeable(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool:
    validators = [
        _validate_has_auto_merge_label,
        # ...
    ]

    for pr_validator in validators:
        is_valid, error = pr_validator(head_commit, pr, required_checks)        
        if not is_valid
            logging.info(error)
            return False

(this is an example and could be improved (Named ValidationArgument|Result type, standard error messaging, etc) but the ideas there.)

Why this way?

  1. It lets us conditionally add/remove validations for a connector more easily. You can imagine that critical connectors or third party connectors might want to add or skip certain validations
  2. Validation logic is easier to test
  3. imo easier to skim and understand what validations run when/under what conditions

Anyway again very optional

logging.info(f"PR {pr.number} does not target {BASE_BRANCH}")
return False
touches_connectors_only = check_if_modifies_connectors_only(pr.get_files())
if not touches_connectors_only:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅 Can we put some newlines space between validations?

logging.info(f"Processing PR {pr.number}")
head_commit = repo.get_commit(sha=pr.head.sha)
if check_if_pr_is_auto_mergeable(head_commit, pr, required_passing_contexts):
if not dry_run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎

time.sleep(github_client.rate_limiting_resettime - time.time())


def generate_job_summary_as_markdown(merged_prs: list[PullRequest]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅 Could move to a helper file

@bleonard
Copy link
Contributor

The main thing I'm curious about: if it takes a label anyway that suggests these PRs are auto-created with a nuance or someone clicks the label button. If that's the case, why not use the auto-merge feature of github?
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/automatically-merging-a-pull-request

@alafanechere
Copy link
Contributor Author

The main thing I'm curious about: if it takes a label anyway that suggests these PRs are auto-created with a nuance or someone clicks the label button. If that's the case, why not use the auto-merge feature of github? https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/automatically-merging-a-pull-request

@bleonard I think the main nuance here is that this package gives us more flexibility as it can override the branch protection rules the normal "auto merge" GH feature follows.
For instance, it's currently required for any PR to get a review from an org member. With this package we can implement a bit more complex and flexible rules compared to Github auto merge, and decide to make the package merge the PR without review if it meets a certain set of criteria. I also think it's sane to keep the current branch protection rules we have for all PRs.

@alafanechere
Copy link
Contributor Author

Nice @alafanechere! Is the idea that this will run nightly eventually? Should we send the summary to Slack?

Yes this will run on a nightly / weekly basis. I think we could send the summary to slack. Might a logic to add to the GHA workflow and not the python package itself.

@alafanechere alafanechere force-pushed the augustin/05-06-create_auto_merge_package branch from 4f8ed1b to da21c54 Compare May 13, 2024 09:55
@alafanechere alafanechere enabled auto-merge (squash) May 13, 2024 10:13
@alafanechere alafanechere merged commit 9179ec5 into master May 13, 2024
34 checks passed
@alafanechere alafanechere deleted the augustin/05-06-create_auto_merge_package branch May 13, 2024 10:30
@bleonard
Copy link
Contributor

bleonard commented May 13, 2024

For instance, it's currently required for any PR to get a review from an org member. With this package we can implement a bit more complex and flexible rules compared to Github auto merge

The flexibility makes sense for sure. It can't hurt and the footprint is small. I was expecting, and still assume, this flexibility should be built into the actual tests that get run on the connector. I think this is the case now that long-tail connectors aren't requiringt hat review. Or we could make it so. The we'd know from metadata which connectors gets the full CAT tests (because we have sandboxes) and which ones don't. It's useful for "green" to mean something, basically. So I would assume most of the logic is in the tests that get run not in the merging tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants