-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I fetch all commits only in the PR branch #552
Comments
workaround: assume that forks are never more than 300 commits and fetch 300 commits |
also duplicate of #520 |
If you want your script to only consider the commits in the PR, then by over fetching you'll pollute this list with commits not in the PR. |
This is a necessity for most monorepo builds |
I'd like to run an automated release cron job that'd analyze commits for the past N hours and create a release. Turns out that I know there's a |
Without being able to do it via the checkout action, here's a workaround in GHA - name: Checkout
uses: actions/checkout@v2
- name: Checkout PR changes
run: |
# Un-shallow
git config remote.origin.fetch "+refs/heads/*:refs/remotes/origin/*"
# Deepen topic branch; checkout topic branch
git fetch origin ${{ github.event.pull_request.head.ref }} --depth=$(( ${{ github.event.pull_request.commits }} + 1 ))
git checkout ${{ github.event.pull_request.head.ref }}
# Fetch main for common origin
git fetch origin main:main --depth=50
- name: Get BRANCH_POINT
id: branch_point
run: |
# Find common ancestor between main and topic branch
BRANCH_POINT=$(git merge-base $(git rev-parse --abbrev-ref HEAD) main)
[[ -z BRANCH_POINT ]] && echo "No branch point" && exit 1
echo "::set-output name=REF::$BRANCH_POINT"
- name: List changed files
run: git diff --name-only ${{ steps.branch_point.outputs.REF }} (Improvements welcome) |
You can use the github context to get how many commits belong to the PR, then fetch that depth. Example - Fetch enough commits from PR & base branch (eg: master)To minimize commits fetched to compare two branches, a If the local commit history doesn't already have a commit from the 2nd branch being fetched, the full history (or whatever depth is requested) is fetched for that branch instead. This usually requires fetching one additional commit (the one you branched from) for the 1st branch (eg: PR branch). Example for the PR branch to pull enough commit history to include a commit the other branch also has: - name: 'PR commits + 1'
run: echo "PR_FETCH_DEPTH=$(( ${{ github.event.pull_request.commits }} + 1 ))" >> "${GITHUB_ENV}"
- name: 'Checkout PR branch and all PR commits'
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.ref }}
fetch-depth: ${{ env.PR_FETCH_DEPTH }}
- name: 'Fetch the other branch with enough history for a common merge-base commit'
run: |
git fetch origin ${{ github.event.pull_request.base.ref }} More InfoIt's important to set the ref to the PR head ref as above, since the default ref of this action is one extra "merge-commit" (the PR into the base branch), which will not only offset your fetch-depth by 1 additional commit needed, but possibly cause other issues (eg: with NOTE: Merge-commits (those from the base branch into the PR branch, not the action default test merge-commit ref) contribute to the total commits from github context, and the fetch depth range would not only get N commits from your branch but also N commits associated to the history of merge commits. You may not need the extra commit - If you have a merge commit from the base branch, then your local history will have a commit prior to the merge that it can probably use instead. Otherwise fetching an extra commit should ensure a common ancestor. EDIT: Fetching to a common ancestor may fail when you have commits belonging to the base branch in local history which have not been merged into the PR branch. This problem occurs with the default ref (the generated test merge-commit of the PR branch into the base branch) with a An earlier attempt for fetch that didn't work outOriginally I was suggesting a way to provide a more specific commit for fetch to negotiate with, but I think it usually won't make much of a difference vs default # Get the oldest commit in the branch which should have no parents,
# `--first-parent` only follows first parent encountered (should be our PR branch commits, not commits associated to merge commits from base branch):
FIRST_COMMIT_IN_BRANCH=$(git rev-list --first-parent --max-parents=0 --max-count=1 HEAD)
# Use that commit hash to lookup the next parent (that we don't have in history):
PARENT_COMMIT_HASH=$(git cat-file -p "${FIRST_COMMIT_IN_BRANCH}" | sed -n 's/^parent //p')
# Use that parent commit as a hint for fetch auto-depth, otherwise default fetch increments by the
# fibonacci sequence until a commit from the base branch is found that also exists in our local commit history:
git fetch --negotiation-tip "${PARENT_COMMIT_HASH}" origin ${{ github.event.pull_request.base.ref }}
# Note this hint seems to fail if you already have a newer commit in local history from the base branch,
# Such as when the default merge-commit ref with fetch depth of 2 or more pulls in base branch commits.
# Thus not always reliable to establish a merge-base...
#
# It will also fail if the root commit (first commit in repo) was the oldest commit found for FIRST_COMMIT_IN_BRANCH,
# Due to the PARENT_COMMIT_HASH not being possible to resolve.
# That can happen with too many merge commits of base branch to PR branch, as it bumps up the github context
# for commit count, fetching more history than needed (those merge commits don't seem to count for history depth) Otherwise I found with a merge-commit, the fetch depth for +1 commits of your PR commit count could result in history being more than the expected number of commits fetched (the depth is technically correct from the commit chain associated to the merge commit). With that lengthy command, you shouldn't need to request one more commit. But I've since found it unreliable (explained in comments for snippet). |
@jbreckmckye if you're interested in a file-name diff (specifically files added/changed, ignoring others like renames or deletions), here's a few examples. This is nice and small, and should be fine AFAIK, see commented version below for more details: - name: 'Checkout PR branch (with test merge-commit)'
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: 'Get a list of changed files to process'
run: git diff-tree --no-commit-id --name-only --diff-filter 'AM' -r HEAD^1 HEAD Variants and detailsI've collapsed the original content (still useful maybe if you want to understand why a Click to view more info- name: 'Checkout PR branch'
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.ref }}
- name: 'Get a list of changed files to process'
run: |
# Fetch enough history for a common merge-base commit
git fetch origin ${{ github.event.pull_request.head.ref }} --depth $(( ${{ github.event.pull_request.commits }} + 1 ))
git fetch origin ${{ github.event.pull_request.base.ref }}
# Show only files from the PR with content filtered by specific git status (Added or Modified):
git diff-tree --name-only --diff-filter 'AM' -r \
--merge-base origin/${{ github.event.pull_request.base.ref }} ${{ github.event.pull_request.head.ref }} The If your action(s) involved use If you do need to avoid that failure scenario, you can lookup the date of the commit you branched from the base branch, and request all commits since then: # This should get the oldest commit in the local fetched history (which may not be the branched base commit):
BRANCHED_FROM_COMMIT=$( git rev-list --first-parent --max-parents=0 --max-count=1 ${{ github.event.pull_request.head.ref }} )
UNIX_TIMESTAMP=$( git log --format=%ct "${BRANCHED_FROM_COMMIT}" )
# Get all commits since that commit for the base branch (eg: master):
git fetch --shallow-since "${UNIX_TIMESTAMP}" origin ${{ github.event.pull_request.base.ref }} If you only need the diff from the PR, you may not need the potentially lengthy commit history from the above approach and can instead use either of these: - name: 'Checkout PR branch (with test merge-commit)'
uses: actions/checkout@v3
with:
# Merge commit + two commits (1 from each branch, the 2nd depth level)
fetch-depth: 2
# Show only files from the PR with content filtered by specific git status (Added or Modified)
# HEAD^1 is the base branch commit, HEAD is the merge commit, no merge-base needed
- name: 'Get a list of changed files to process'
run: git diff-tree --no-commit-id --name-only --diff-filter 'AM' -r HEAD^1 HEAD Instead of using - name: 'Checkout PR branch (with test merge-commit)'
uses: actions/checkout@v3
with:
# Merge commit + two commits (1 from each branch, the 2nd depth level)
fetch-depth: 2
- name: 'Get a list of changed files to process'
env:
# This is the default ref value, not exactly the same value as `github.ref` context:
PR_REF: refs/remotes/pull/${{ github.event.pull_request.number }}/merge
run: |
# No extra commits need to be fetched, the base branch ref will be set to HEAD^1 commit
git fetch origin ${{ github.event.pull_request.base.ref }}
# Show only files from the PR with content filtered by specific git status (Added or Modified)
git diff-tree --name-only --diff-filter 'AM' -r \
--merge-base origin/${{ github.event.pull_request.base.ref }} ${PR_REF} If you tried to use WARNING: If you removed the For those unfamiliar with why |
@polarathene What I was actually after was that I'm finding this is really helping me optimise my PR flow. For example, I'm working on repo that used to have a twenty-five minute PR flow for lint, unit-tests, static analysis, etc. Now I have differential builds I can run a differential lint and a subset of unit tests, and a PR build can now take as little as two minutes. Your examples are still helpful though - I didn't realise I could parameterise the checkout action with the fetch depth, and I'd never really looked too hard at |
# Avoid the default ref "test merge commit" generated by Github (`github.sha`),
# Otherwise `git fetch` will fail to retrieve enough commits for a merge-base,
# The head ref (with default `fetch-depth: 1`) references the latest commit from the PR branch:
- name: 'Checkout PR branch'
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.ref }}
- name: 'Find the merge-base commit hash'
env:
branch_main: ${{ github.event.pull_request.base.ref }}
branch_topic: ${{ github.event.pull_request.head.ref }}
run: |
# Fetch enough history to find a common ancestor commit (aka merge-base):
git fetch origin ${{ env.branch_topic }} --depth $(( ${{ github.event.pull_request.commits }} + 1 ))
git fetch origin ${{ env.branch_main }}
BRANCH_POINT=$( git merge-base origin/${{ env.branch_main }} ${{ env.branch_topic }} )
echo "::set-output name=REF::$BRANCH_POINT" Same thing really. Just remember that won't identify the commit that the PR branched from. If there was a merge commit from base branch into the PR branch, the last commit from the base branch becomes the merge-base target instead (which is probably what you'd want, anyway). WARNING: While using the branch refs as shown above works most of the time, it's not as deterministic for when you re-run a previous workflow job, as those branches may have new commits since, and especially for the PR branch, with a stale github context in the re-run, you risk an inaccurate Example of when merge-base derived is not the original commit you branched the PR from: Example graph with merge-base when merge-commits are in historygit init /tmp/example && cd /tmp/example
git remote add origin https://github.com/user-name/repo-name
# PR has 6 commits (1 is a merge-commit from `main` branch), so fetch that + 1:
git fetch origin pr-branch --depth=7
git log --oneline --graph --all
* 264e38d (origin/pr-branch) PR-F
* 3a2b30b Merge branch 'main' into pr-branch <PR-E>
|\
| * 43653c4 MAIN-E
| * 1f7e88e MAIN-D
| * 50feed1 MAIN-C
* | e210a28 PR-D
* | 60f8089 PR-C
* | 6f71a7f PR-B
* | 194ad14 PR-A
|/
* d9411d4 MAIN-B <branched from here> <fetch-depth from 264e38d down PR commit chain>
* 6a18b1c (grafted) MAIN-A <fetch-depth from 264e38d down MAIN commit chain> It's a bit harder to read the graph output without the colours CLI uses, but you can see the two lines on the left, the The Next fetch the git fetch origin main
git log --oneline --graph --all
* 60b5fd6 (origin/main) MAIN-H <test merge-commit target>
* d1eeab7 MAIN-G
* b0d5074 MAIN-F
| * 264e38d (origin/pr-branch) PR-F
| * 3a2b30b Merge branch 'main' into pr-branch <PR-E>
| |\
| |/
|/|
* | 43653c4 MAIN-E <merge-base result> <current commit for ${{ github.event.pull_request.base.sha }}>
* | 1f7e88e MAIN-D
* | 50feed1 MAIN-C
| * e210a28 PR-D
| * 60f8089 PR-C
| * 6f71a7f PR-B
| * 194ad14 PR-A
|/
* d9411d4 MAIN-B <branched from here>
* 6a18b1c (grafted) MAIN-A The graph output now shows commit Additional tips:
|
One alternative is to use Two problems with
|
## Summary & Motivation Based off of actions/checkout#552 We should limit the fetch depth of the PR branch checkout to the bare minimum of what we need. ## How I Tested These Changes Applied script to local repository, made sure that the main branch checkout happened and the PR checkout happened with the calculated fetch depth.
…er-io#13796) ## Summary & Motivation Based off of actions/checkout#552 We should limit the fetch depth of the PR branch checkout to the bare minimum of what we need. ## How I Tested These Changes Applied script to local repository, made sure that the main branch checkout happened and the PR checkout happened with the calculated fetch depth.
I was curious so gave this a look with your 2020 example script and adapted to advice I mentioned earlier in this thread about doing a shallow clone and using Your script referenced the 4.20 kernel (which later became 5.0), so I pulled in the history since then (probably could have started from # Initial setup
# Expect around 3-6GB:
git clone --single-branch --shallow-since 2019-03-03 https://github.com/torvalds/linux
cd linux
git remote add sof_remote_demo https://github.com/thesofproject/linux
# This probably could be reduced in scope? I tried the shallow since but
# this step still took ridiculously long on a 1GB RAM VPS, like hours resolving delta?
# (more RAM would probably have made a big difference)
# git fetch sof_remote_demo
git fetch --shallow-since 2020-10-01 sof_remote_demo
# Tagging commits and adding a change:
# 16th Oct 2020 (authored 9th Sep): https://github.com/thesofproject/linux/commit/b150588d227ac0
git tag _real_PR_base b150588d227ac0
git checkout _real_PR_base
touch dummyfile; git add dummyfile
git commit -m 'pull request' dummyfile
git tag _pull_request
# 30th Oct 2020: https://github.com/thesofproject/linux/commit/70fe32e776dafb
git tag _target_branch 70fe32e776dafb
# 10th Oct 2020: https://github.com/torvalds/linux/commit/86f29c7442ac4b
git tag _spurious_base 86f29c7442ac4b # Make a local clone to use the first clone as a remote instead:
# (--bare is faster / smaller by excluding the working tree, only cloning `.git/` content)
# (single commit in PR + common branching point commit, github provides this context)
git clone --bare --single-branch --branch _pull_request --depth 2 "file://$(pwd)" /tmp/linux-local
# Go through the commit history available to find a useful commit in common with target branch:
# (b150588d227ac0edc933e8f7f810d4ea9453b32c aka `_real_PR_base`)
git rev-list --first-parent --max-parents=0 --max-count=1 _pull_request
# Get date of commit (2020-10-15 16:52:12 -0500):
git log --max-count=1 --date=iso8601 --format=%cd "b150588d227ac0edc933e8f7f810d4ea9453b32c"
# Fetch the commit history from the `_target_branch` since then:
git fetch origin _target_branch --shallow-since "2020-10-15 16:52:12 -0500"
# `FETCH_HEAD` is pointing to tip of this branch, tag it:
git tag _target_branch FETCH_HEAD
# Now we can do the merge base and get the result we expect:
# (b150588d227ac0edc933e8f7f810d4ea9453b32c aka `_real_PR_base`)
git merge-base -a _target_branch _pull_request Avoids the concern of fetching over a million commits 👍 Just need to provide context of date or number of commits to get shared history between the two branches. A similar issue (with much smaller repo size) was discussed here recently: #578 (comment) (lack of common history between the two branches caused the problem for the operation) If you don't have that extra context, you can probably clone/fetch with I didn't quite follow your intention of the example. You chose to clone the Once you have that, you can derive a date from a commit that should be a common ancestor between the two branches like I had done in my linked example. If you did the clone with |
Yes of course, if you "guess right" and fetch enough then But that's just guessing; it does not give an automated solution that works every time, it does not give you something that you can script and forget about it. |
When does it not work when you fetch commits since 1 year ago? 5 years ago? Obviously that's not as efficient, but you're not going to need more than that are you? If your PRs are branching off and merging back within a 3 month window, you could use that too, or if you know your PRs aren't likely to exceed 1000 commits that works well. Depends what you're comfortable with if you for some reason are not able to provide better context.
What is wrong with what I've shared? Github provides the commit count of a PR as context, you can use that to get the With your kernel example, my approach resulted in a little over 100 commits total fetched via My approach does ensure you get sufficient commit history pulled in without having to bring in a full clone of a branches history (which would be wasteful with the linux kernel example if you're not truncating it with So even if you don't use I know you've cited niche scenarios (criss-cross), but it's not clear what your actual goal / expectation is for handling that. What would you do differently? At the criss-cross the two branches effectively effectively have a common branch history? It's not uncommon in some projects I've contributed to for the main branch to add merge commits into a PR branch, but you're not really going to be merging a PR branch back into main and keep treating it as the same branch/PR are you? So is that really an applicable concern? I'm open to real-world scenarios where my approach doesn't work well, and what you believe would be better. |
It's not just "inefficient", it's extremely slow with a large repo like the Linux kernel. This is going to systematically fetch many thousands of commits when you need only a few most of time (and a few thousands from time to time) Plus this is still guessing some semi-random number of years.
How do you compute a date from a number of commits?!?
It maybe a niche scenario in Github but not in git:
Now to be honest I've heard many kernel developers (who invented git) say "don't use Github". It's not just about using Off-topic but we've also experienced many performance issues with a kernel repo while much smaller repos in the same project were experiencing none. I think it got better. |
What would you do differently? I've seen you suggest When reproducing your 2020 example with the initial linux repo clone, I wasn't entirely sure what I needed history wise beyond Here's a revised starting point if you know your actual commit/ref before and after points: git init example && cd example
git remote add origin https://github.com/torvalds/linux
git remote add sof_remote_demo https://github.com/thesofproject/linux
# Fetch starting commit and assign a ref (branch):
git fetch sof_remote_demo --depth 1 b150588d227ac0edc933e8f7f810d4ea9453b32c:_real_PR_base
# Get commits between these two refs:
# (fetch commits since date of _real_PR_base up to a known commit that we assign ref _target_branch)
git fetch sof_remote_demo --shallow-since "2020-10-15 16:52:12 -0500" 70fe32e776dafb4b03581d62a4569f65c2f13ada:_target_branch
# 111 commits total
git rev-list _target_branch _real_PR_base | wc -l
# 292M size
du -sh
# 1.4G size after checkout:
git switch _real_PR_base
# Now make a change and add your _pull_request ref
# Optionally pull in commits from origin to include the _spurious_base
# Continue with exercise (but note there won't be millions of commits to pull from a subsequent clone due to how shallow this was) You could have fetched
You're setting a ceiling of what should be safe for the bulk of PRs in a project. There could be some outliers but often you're probably not dealing with merging a PR that branched off years ago are you? You'd probably want to rebase or similar at that point. Many projects I've engaged with don't tend to have PRs active for such long durations. There's plenty that amass open PRs that cease to get updated, but other projects also close/reject PRs that are considered stale/abandoned in favor of opening a new one if activity will continue. I'm not sure why we're debating this What would you do differently?
You just need a common ancestor commit between the two branches, and then take the commit date from it. Did you miss the earlier example I provided you? Or was there a misunderstanding?:
For a workflow example with context and vars used, I linked you to this previously: #520 (comment)
I'm not disagreeing with you there, but:
You're referring to I guess long-lived branches that are merging between each other? (the criss-cross) I've not had any issues with the main/target branch being merged into the PR branch, or rebasing the PR branch. I can't think of a scenario where I've needed a PR branch that also merges into a target branch, and still continues to be iterated on instead of a separate PR branch. I'm sure that's probably a thing as the LWN article seemed to discuss, and your projects rebase branch, but I'm not really following what you're suggesting should be done differently than the approach I've shared?
That's fine. That's a very different workflow from what most users need in their projects, and the linux kernel isn't using Github for CI jobs right? If you've got a better solution that can be used on Github, that'd be great to know. My approach to retrieving both branches relevant commit histories should meet the needs of most projects (even handled your kernel PR example with I'd also appreciate any reproducible examples of real-world scenarios where it will falter and where a better alternative that can be used instead is provided. |
This does not make sense to me sorry. I'm merely saying that git merge-base is not compatible with shallow clones and you keep giving commands that only work in deep or "deep enough" clones for some hardcoded variants of "deep enough". A hypothetical Github Action that solves the problem stated in the description at the top of this issue should work for every PR in every repo. Also, you keep posting very long comments which makes it very hard to extract the bits that matter. |
I was already making an effort to be terse 😅
I can't meet your request to be terse and break this stuff down further for you. The command does what is needed to find an appropriate commit.
Not compatible when there is insufficient history. My approach gets you sufficient history, maybe try it out?
I hard-coded them for the examples. I mentioned using github context is better and linked to an example, but I wanted to try keep it simple for you. Here's how you get the "deep enough" commits: - name: 'PR commits + 1'
run: echo "PR_FETCH_DEPTH=$(( ${{ github.event.pull_request.commits }} + 1 ))" >> "${GITHUB_ENV}"
- name: 'Checkout PR branch and all PR commits'
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: ${{ env.PR_FETCH_DEPTH }} Alternative fetch example- name: 'Checkout the latest commit of the PR branch (at the time this event originally triggered)'
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: 'Example - Fetch full PR commit history'
run: |
git fetch --no-tags --prune --no-recurse-submodules \
--depth=$(( ${{ github.event.pull_request.commits }} + 1 )) \
origin +${{ github.event.pull_request.head.sha }}:remotes/origin/${{ github.event.pull_request.head.ref }}
Cool, well until one actually exists, my Github Actions workflow example can be adapted: #520 (comment) |
I never asked you to "break down" anything, quite the opposite.
Guess what: Github lets you import ANY git repo. Maybe Github itself should say "don't import the Linux kernel" and not just kernel developers?
Totally agreed. Yet an official git action should work all the time. Or at the very least be ready for all cases and fail gracefully when it can't do.
This entire discussion started with you questioning my data posted in thesofproject/linux#2556 ?!?
(Counter-)examples can only show when something does not work; examples can never show that a generic solution always works. “Program testing can be used to show the presence of bugs, but never to show their absence” (Dijkstra). Yes, examples are very useful as a "simple" way to introduce concepts new to some people but they're just noise when trying to prove that something always works. An example is never an answer to a counter example. BTW someone showing a counter example is obviously not new to a topic. Finally,
Which command? you posted so many of them. |
Ah, this command/approach. Thanks for finally dropping all the noise about hardcoded commits, hardcoded number of years and pointless So now I remember that I did try this approach and I agree it does work every time to find one merge-base. It's great for the very large majority of Github users - but not for all people / a generic Github Action. That's because it's massively inefficient for git repos that use
A LOT more actually for a project that uses git merge a lot. In this example below, curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/thesofproject/linux/pulls/4393
=> 806 commits
cd test-repo
git init
git remote add origin https://github.com/thesofproject/linux/
# 1 commit, 45 seconds
time git fetch origin --depth 1 2f0c3e8b882303de468060e35769c0f79465f848 # (start over)
# Full clone, 7 minutes
time git fetch origin 2f0c3e8b882303de468060e35769c0f79465f848 # (start over)
# 10 minutes, LONGER than a full clone!
time git fetch origin --depth 806 2f0c3e8b882303de468060e35769c0f79465f848
git log --oneline FETCH_HEAD | wc -l
=> 1.1 million commits!
The more efficient approach that I did implement and already linked to is to iterate and
EDIT: correction, that implementation isn't finished yet, it's still using an inefficient count and depth for now. |
You said you didn't understand a command. I responded that if I elaborated, it wouldn't be terse like you requested.
There is a lot of things you can do in life, doesn't mean you should. To remain terse (and on topic), this isn't worth debating.
One would hope for the best yes. In my experience it doesn't matter how well funded a company is, these issues lasting for years is common to encounter, even if they're open-source and solutions are provided via PR it can take quite some time before action is taken.
Yes? I was curious, so I tried my approach with Most of the discussion after my first response was repeating myself because you wouldn't read what was said the first time 🙄
Clearly I communicated that poorly and could have done better... or it'd still be glossed over and variables considered noise/overhead to explain. Can't really predict that up front 🤷♂️
|
The
That was only for getting the number of commits. I shared this via a link in my very first sentence to you: #552 (comment) The rest was about bringing in the other branch for the
I'm not sure how you propose doing that with the PR you referenced? Warning: Verbosity aheadHere is the end of it with the base commit sha That highlighted commit and the one below it are listed in the PR commit history after the merge commits: The commits belonging to the merges then follow afterwards and can be seen from the top of this git tree: I'm not sure what to make of that 🤷♂️ what is your starting point for older commits? The PR base commit Due to all those cascading merges, there isn't much actual commits unique to the PR branch itself, I'm really not sure how you'd plan to only fetch the 806 individual commits without bringing in a bunch of excess? (as you can see the original merged branches are truncated, they have more history technically) With enough history ( # 806 commits:
git rev-list --count 5224191...a714334
# Wasteful excess though:
git rev-list --count --all
20879 However if we fetch commits since the date of the oldest commit, we should have those commits but something seems off: # Commits sorted by timestamp, output the oldest commit:
$ git rev-list --timestamp 5224191...a714334 | sort -n | head -n 1
1680775471 430cac487400494c19a8b85299e979bb07b4671f
$ git log --max-count=1 --date=iso8601 --format=%cd 430cac487400494c19a8b85299e979bb07b4671f
2023-04-06 12:04:31 +0200
$ git init
$ git remote add origin https://github.com/thesofproject/linux
# Fetch commits this far back:
$ git fetch origin --no-tags --prune --no-recurse-submodules --shallow-since "2023-04-06 12:04:31 +0200" '+5224191371f701bbf0e10bb49bd5cd3fda9848a5:remotes/origin/merge/sound-upstream-20230530'
# More than 806?
$ git rev-list --count 5224191...a714334
1223
$ git rev-list --count --all
5049 This is probably something you'd know the answer to. Presumably all the commits are there still, but some are already applied / shared further back in history?
I'm not sure how your staggered fetching would handle the above experience any better? 🤔 You said you've linked to it but all I see is |
So you're fetching over a million commits still, while saying my advice is inefficient? (with PR Most of the PRs on the Fallback (niche scenario) - Only double the time to fetch vs single commitThe fallback advice I gave with # 1 minute (324 MiB) - Approx 20k commits fetched:
$ time git fetch origin --no-tags --prune --no-recurse-submodules --shallow-since "2023-02-01" \
'+5224191371f701bbf0e10bb49bd5cd3fda9848a5:remotes/origin/merge/sound-upstream-20230530'
real 1m1.267s
user 0m27.861s
sys 0m3.801s
# 40 sec (237 MiB):
$ time git fetch origin --no-tags --prune --no-recurse-submodules --depth 1 \
'+5224191371f701bbf0e10bb49bd5cd3fda9848a5:remotes/origin/merge/sound-upstream-20230530'
# 45 minutes (1GB VPS instance, memory starved by git) - Over a million commits fetched:
$ time git fetch origin --no-tags --prune --no-recurse-submodules --depth 806 \
'+5224191371f701bbf0e10bb49bd5cd3fda9848a5:remotes/origin/merge/sound-upstream-20230530' As you can see close to 6 months isn't too much overhead, you'd probably end up with only double the time of a single commit fetch, despite fetching tens of thousands. For kernel cycles, I'd imagine a relative date offset for fetching like this is predictable enough to know if 6 months is a large enough window for you. |
How do I tell
actions/checkout
to fetch all the commits up to the fork point? I want all the commits up to the fork point from the base branch. I don't see any examples on how to do this in the README. I'd assume this is a common need.The text was updated successfully, but these errors were encountered: