Skip to content

Conversation

maflcko
Copy link
Contributor

@maflcko maflcko commented Jul 23, 2025

Fixes #228

Can be reproduced via:

git filter-repo --invert-paths --path fuzz_seed_corpus  # clear the path with the old name

git mv fuzz_corpora fuzz_corpora_backup  # backup of the new name
git commit -m 'backup'
git filter-repo --invert-paths --path fuzz_corpora
git mv fuzz_corpora_backup fuzz_corpora
git commit -m 'restore'

I've also rebased on the very first commit, as it does not need to be rewritten (this will also simplify review later on) and the exact commit id can be kept:

git rebase 52db8e0f4a2c75b0f977c808f81c0cf6b264e077

This can be reviewed by re-doing the filter, and then comparing the resulting commit history:

git range-diff 52db8e0f4a2c75b0f977c808f81c0cf6b264e077 HEAD fd7e08cd37a175b31a100f71f8a9f3fb369b4837

Or simply by comparing against current main (ignoring the history) and observing an empty diff:

$ git diff e6e82b895a44365a2faa9ff96f6d39dafe2da43e fd7e08cd37a175b31a100f71f8a9f3fb369b4837 | wc -l
0

@maflcko
Copy link
Contributor Author

maflcko commented Jul 23, 2025

(Obviously this should not be merged, but rather force pushed to the main branch, after review)

@maflcko
Copy link
Contributor Author

maflcko commented Jul 23, 2025

This should nuke 5GB unused stuff from the .git history, bringing a full fresh .git clone down to ~600MB:

$ du -sh ./.git
632M	./.git

@murchandamus
Copy link
Contributor

I’m surprised to see a few "Add inputs" commits in that history still. Should they not all be squashed to one to get a cut-through?

@maflcko
Copy link
Contributor Author

maflcko commented Jul 23, 2025

I can remove them as well, but I don't think it is going to provide a significant difference. I'll take a look tomorrow.

@murchandamus
Copy link
Contributor

Ah right, if they only add inputs that are still in the current set it would not make a big difference. I just thought from the description of what you are doing that all commits that touch the content of the fuzz_corpora dir would be squashed, but that would not actually necessarily follow.

@maflcko
Copy link
Contributor Author

maflcko commented Jul 24, 2025

Thanks, done. It actually went down another 50%:

$ du -sh ./.git
309M	./.git

@dergoegge
Copy link
Member

I tried git clone --depth 1 --branch 2507-filter git@github.com:maflcko/bitcoin-core-qa-assets.git filtered-qa-assets but the size of the clone is still >4GB? I expected this to reflect the new size we are aiming for after a force push.

@maflcko
Copy link
Contributor Author

maflcko commented Jul 24, 2025

I tried git clone --depth 1 --branch 2507-filter git@github.com:maflcko/bitcoin-core-qa-assets.git filtered-qa-assets but the size of the clone is still >4GB? I expected this to reflect the new size we are aiming for after a force push.

For me it is 300M:

root@4445d0550fb8:/# git clone --depth 1 --branch 2507-filter https://github.com/maflcko/bitcoin-core-qa-assets.git filtered-qa-assets 
Cloning into 'filtered-qa-assets'...
remote: Enumerating objects: 184319, done.
remote: Counting objects: 100% (184319/184319), done.
remote: Compressing objects: 100% (149742/149742), done.
Receiving objects:  95% (175104/184319), 265.99 MiB | 4.26 MiB/s
remote: Total 184319 (delta 8622), reused 173457 (delta 7023), pack-reused 0 (from 0)
Receiving objects: 100% (184319/184319), 279.44 MiB | 4.20 MiB/s, done.
Resolving deltas: 100% (8622/8622), done.
Updating files: 100% (185786/185786), done.

root@4445d0550fb8:/# du -sh filtered-qa-assets/.git
309M	filtered-qa-assets/.git

Also note, that this doesn't affect a clone that omits the history (--depth=1). Clones with depth=1 will be exactly the same size before and after this. Also, this doesn't affect the checked out files, because they are identical to the ones in current main, and will use the same amount of storage-space.

This will only affect a fresh, full clone. The goal is to drop years old fuzz inputs from the history that are irrelevant today.

@dergoegge
Copy link
Member

Thanks, I was looking at the whole directory not just .git🤦

lgtm!

@maflcko
Copy link
Contributor Author

maflcko commented Jul 24, 2025

@murchandamus I guess I'll wait for your review and then merge this?

@murchandamus
Copy link
Contributor

I have verified that the diff between this branch and main is empty. I was curious which commit would be creating the fuzz_corpora content, but it looks like their history simply begins with them being moved back and forth. It might be cleaner to squash those two commits by resetting to the commit before them and adding fuzz_corpora as if it were new. That said, I don’t feel strongly about it being necessary.

LGTM.

@maflcko maflcko merged commit fd7e08c into bitcoin-core:main Jul 24, 2025
4 checks passed
@maflcko maflcko deleted the 2507-filter branch July 24, 2025 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flatten fuzz_corpora git history?
3 participants