Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connectors-ci: get modified files from PR API #26889

Merged
merged 12 commits into from
Jun 1, 2023

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Jun 1, 2023

What

In our connectors CI pipeline we originally get the list of modified files with git diff operation.
These operations are quite expensive, especially in the CI because we run the git diff operation in a dagger container that is checkouting airbyte's repo.

In a pull request context the GitHub API provides the list of modified files in the pull request.
We can use this list instead of running git diff to speed up pipelines running on PRs.

The current pull request is an attempt at doing so.
I also did an additional operation:
We previously alway ran two different git diff commands: one to get the list of modified files in branch, the other one to get the list of modified files in the last commit.
I slightly changed this logic: we get the list of modified files in the last commits only when we're on master. When we're on a branch we git diff the full branch.

Before on source-pokeapi: total run duration 3min
After change on source-pokeapi: total run duration 2mn

How

User impact

Faster test run on PR: expect 1 minute speed gain.

@alafanechere alafanechere marked this pull request as ready for review June 1, 2023 08:53
@alafanechere alafanechere force-pushed the augustin/connectors-ci/modified-files-from-pr branch from 762b28a to 04b17f5 Compare June 1, 2023 09:08
@octavia-squidington-iii octavia-squidington-iii removed the area/connectors Connector related issues label Jun 1, 2023
@alafanechere alafanechere requested a review from bnchrch June 1, 2023 09:27
@alafanechere alafanechere requested a review from a team June 1, 2023 16:08
Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pros - faster
Cons - our pipelines are even more resilient on Github... but they already are, so 🤷

@@ -24,6 +24,7 @@

if TYPE_CHECKING:
from ci_connector_ops.pipelines.contexts import ConnectorContext
from github import PullRequest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting some curl/API work in this PR. I guess this just handles it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lib is pretty good at wrapping the GitHub API for typed interaction.

Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dope, I didn't know that the pull request context has a list of changed files. Learn something new every day :D

git_branch: str, git_revision: str, diffed_branch: str, is_local: bool, ci_context: CIContext, pull_request: PullRequest
) -> List[str]:
"""Get the list of modified files in the current git branch.
If the current branch is master, it will return the list of modified files in the current commit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Maybe clarify that this "current commit" is the squash commit of the branch whose merge kicked off the run

The reason I mention this is that if we decided to switch from squash commits to a full history (e.g. rebase instead of merge), I'm not sure if this "last commit" logic would work anymore. Its an edge case not worth writing more code for, but a reminder of the context that allows us to do it this way may be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improved the docstring, thank you for the suggestion.

@alafanechere
Copy link
Contributor Author

Cons - our pipelines are even more resilient on Github... but they already are, so 🤷

True, but we can easily fallback on the git diff

@alafanechere alafanechere enabled auto-merge (squash) June 1, 2023 17:07
Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love some more speed!

@alafanechere alafanechere enabled auto-merge (squash) June 1, 2023 19:37
@alafanechere alafanechere enabled auto-merge (squash) June 1, 2023 21:51
@alafanechere alafanechere merged commit 1166d41 into master Jun 1, 2023
4 checks passed
@alafanechere alafanechere deleted the augustin/connectors-ci/modified-files-from-pr branch June 1, 2023 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants