This project contains a bunch of tools to help analyse the largest blobs (by "on disk" storage) in a repository.
Here is a sample sequence of commands showing typical usage:
-
Typically start with a clean clone of the repository that you want to analyse. It can be bare. For reasonable performance it should be cloned onto "local" disk on a reasonably fast Linux machine.
-
Add these tools to your
PATHor use a full path to each script or executable. -
Run these tools from the repository undergoing analysis and cleaning.
-
Work out a suitable threshold size by running
generate-larger-thanwith experimental parameters. 50000 might be a good starting point. The size is "average bytes after compression by Git". -
Generate a sorted list of objects with file information
generate-larger-than 50000 | sort -k3n | add-file-info >../largeobjs.txt -
Make a report showing the summary of each commit together with the paths which introduce the large objects, their uncompressed size and file information
report-on-large-objects ../largeobjs.txt
-
Create a temporary work directory and export
RFWORK_DIRto point to this directory (defaults to the current directory). -
Again, run all commands from the repository being analysed.
-
From the above report, edit down a list of blob ids that can be eliminated. Call this
large-objects.txt. -
Generate a remove script
make-remove-blobs large-objects.txt >"$RFWORK_DIR"/remove-blobs.pl chmod +x "$RFWORK_DIR"/remove-blobs.pl -
Optionally edit the remove script to filter out any paths that are not required at the same time
-
Run the filter branch
run-filter-branch -
Create a new "easy rebase" script for moving work-in-progess branches from the old history to the new history
make-mtnh >"$RFWORK_DIR"/move-to-new-history -
Push the rewritten refs and the
rewrite-commit-mapbranch to all central repositories -
Deploy
move-to-new-historyfor users to use