Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip downloads from the git repo #163

Open
KOLANICH opened this issue Jan 21, 2016 · 33 comments
Open

Strip downloads from the git repo #163

KOLANICH opened this issue Jan 21, 2016 · 33 comments

Comments

@KOLANICH
Copy link
Contributor

Don't store large binary files in git repo. Every time you make a commit every file in your repo is copyed. That's why it is 184 MiB now. Use Github Releases to store downloads and use git-lfs to store large files and blobs if they are strictly needed.

@KOLANICH KOLANICH changed the title Strip downloads from git repo Strip downloads from the git repo Jan 21, 2016
@berndhahnebach
Copy link
Contributor

since we have started to use github releases this can be closed.

@KOLANICH
Copy link
Contributor Author

Could you strip all the binaries from the history?

@berndhahnebach
Copy link
Contributor

This would on one side makes sense but on the other would change all commit ids. This is not a good behavior on a opensource repository as it would mean to do a force push. But I must admit it is a problem, thus reopen the issue.

May be a separate branch?

Or we retire this repo an use a new one with rewritten history.

@KOLANICH
Copy link
Contributor Author

Just put a message instructing users to rebase their patches manually using git format-patch and git am

@berndhahnebach
Copy link
Contributor

berndhahnebach commented Oct 20, 2020

Do you mean rewrite history and give the instructions you mentioned ?

BTW: still lots of not needed MegaBytes in the repo ... https://github.com/boltsparts/BOLTS/tree/b47ae5fb53b4975320867909cfd0de2641f6bf15/output These is even the website. The new (looks exactly like the old) is generated as a BOLTS backend too. Can be found here https://github.com/boltsparts/boltsparts.github.io

@KOLANICH
Copy link
Contributor Author

Do you mean rewrite history and give the instructions you mentioned ?

Yes, really. Rebasing some patches manually is a minor inconvenience, overbloated repo is a major one.

@berndhahnebach
Copy link
Contributor

berndhahnebach commented Oct 20, 2020

I am involved in FreeCAD project. In such a project I would never ever think a second about rewriting history of the main repo master branch. But BOLTS in in a situation with no PR ATM and less traffic. We do not have any development or release branches in the repo. You are may be right. We will never ever get a better chance to do it.

I will keep you informed.

bernd

@berndhahnebach
Copy link
Contributor

BTW: The cloned repo is 166 MB whereas the real code is still 94 MB and the .git is 71 MB. Means we will not save extremely much.

@berndhahnebach
Copy link
Contributor

Ahh ok in downloads are 61 MByte of binary data. I have done BOLTS dev for years and never realized this. I must admit I have seen it just a few seconds before and it disturbs me ...

@johannes:
I would probably have done exactly the same 7 years ago with the knowledge I had at that time :-)

@berndhahnebach
Copy link
Contributor

OK the code is 33 MB whereas the drawings are 9.5 MB and the website backend is 21.5 MB

@johannes
Copy link

Git actually is quite good in avoiding copies and merging similar objects. But yeah, keeping larger files out reduces clone&push times which is great. Unfortunately getting files out requires rewriting history, which means all clones are invalid ....

Anyways, I have. I idea about this project and was probably highlighted by mistake :-) (unsubscribed now, so please don't @ me again)

@berndhahnebach
Copy link
Contributor

gave it a try ...

# informations
https://myopswork.com/how-remove-files-completely-from-git-repository-history-47ed3e0c4c35
https://stackoverflow.com/questions/6403601/purging-file-from-git-repo-failed-unable-to-create-new-backup

# command and test
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch path_to_file" HEAD
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch downloads/freecad/BOLTS_FreeCAD_0.2_gpl3.tar.gz" HEAD

# **************************************************************************************************
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch downloads/*" HEAD
rm -rf .git/refs/original
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch output/*" HEAD
rm -rf .git/refs/original

but still .git has 75 MB in file explorer. du even shows 100 MB ...

@berndhahnebach
Copy link
Contributor

berndhahnebach commented Oct 20, 2020

deleting unreference blobs with

git gc --aggressive --prune=all

from https://stackoverflow.com/questions/1904860/how-to-remove-unreferenced-blobs-from-my-git-repo/14728706

makes .git in file manager and by du 65 MB small, means the whole repo ist still 99 MB = 33 MB code and 65 MB .git

@berndhahnebach
Copy link
Contributor

pushed it to a new reop on my github ... https://github.com/berndhahnebach/stripedbolts

When I clone this one I have still 33.8 MB code but only 18.1 MB .git = 51.9 MB

LGTM, may be one of you guys can make it even smaller? We probably will never ever get chance again.

@berndhahnebach
Copy link
Contributor

Anyways, I have. I idea about this project and was probably highlighted by mistake :-) (unsubscribed now, so please don't @ me again)

sorry johannes. Yes you where highlighted by mistake. The real one would have been @jreinhardt Sorry for the inconvinience.

BTW: We are aware of you have said and we are disscussing if it is worth.

cheers bernd

@jreinhardt
Copy link
Collaborator

Hi,

yes, lets do this.

There might be even more to win, when I check the largest blobs in the repo (https://stackoverflow.com/questions/10622179/how-to-find-identify-large-commits-in-git-history), many of those are js files with literal 3d models for 3d.js. This is about 40 MB (but probably compresses quite well in the pack files).

Also when using filter-branch, tags are unaffected and might still reference of big blobs and keep from being garbage collected. So I removed all tags and branches except the main branch.

Anyway, my attempt with

git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch downloads/* output/* html/3dviews/*" HEAD

gave me

19M ./.git
54M .

So I guess this is more or less the same than what Bernd achieved...

@KOLANICH
Copy link
Contributor Author

KOLANICH commented Oct 27, 2020

https://github.com/KOLANICH/strippedbolts

53M . (it was 41, but when I downloaded from the repo, it became 53, likely LFS files have just not been checked out)
18M ./.git

Have put all the images, fonts, compiled translations and FreeCAD zips into LFS. Fonts and zips don't give any noticeable gain, but they are binary, their place is there.

We may want to remove some pngs, since they have the same drawing as in svgs. No noticeable gain, probably were modified too few times, in removing or lfsing other file types.

Though LFS has an extremily large drawback - GH considers it as a driver to sell paid services, so they have quotas on them, and also any uploaded permanently eat the quotas of a parent accounts untill the repo is deleted, loosing all the issues, PRs and forks.

So IMHO it doesn't worth, at least untill changes in M$ policy about LFS.

@berndhahnebach
Copy link
Contributor

means we could go for the one on my github.

@pwab
Copy link

pwab commented Sep 13, 2021

I'm also for keeping the size of the repository as small as possible.
As those download files seem to not correspond with the published releases I'm not sure why they are kept in the first place. Sorry if I get something wrong here.

Also the gh-pages branch seems to be not needed anymore as the websites repository is boltsparts.github.io.


Just for reference this is the current repository master:
grafik

@luzpaz
Copy link
Contributor

luzpaz commented Jan 25, 2022

bump

@berndhahnebach
Copy link
Contributor

berndhahnebach commented Nov 2, 2022

found a problem ... I have some branches ... https://github.com/berndhahnebach/BOLTS/branches/all They are not part of the git tree anymore. But most of them have just a few commits, means cherry picking would work. At least not a problem.

@berndhahnebach
Copy link
Contributor

stripped this directory too, I have it to delete after website generation anyway to get the webpage up backends/website/static/source/bootstrap-3.2.0/ This gives another 6.3 MB ... 46 MB

@berndhahnebach
Copy link
Contributor

If I move the repository to an archive repo, all issues and PRs will be moved too. But we could recreate them if needed and set a link to the Archive repository.

@berndhahnebach
Copy link
Contributor

I am curious if more regressions will come up.

@berndhahnebach
Copy link
Contributor

to clearly state it is another repo the new repo could be named bolts instead of BOLTS. makes it even easier to put in on a keyboard. Thus a link to and issue would never link to the wrong issue because the new repo will have new issues.

@berndhahnebach
Copy link
Contributor

since I move the repo all forkes will still work. After the move I will make a last commit. In a repo README.md I will explain and link to this issue.

@berndhahnebach
Copy link
Contributor

The master/main branch of the new repository will be main. This is because of the new guidelines and it states there has something changed.

@berndhahnebach
Copy link
Contributor

just tried the repo names are not case sensitive. Thus to get no mix we would need to use a other reponame. I will use boltsparts for the new main stripped BOLTS repo.

@berndhahnebach
Copy link
Contributor

links are not broken somehow github seams to know the repo name has changed.

@berndhahnebach
Copy link
Contributor

a new BOLTS was born ... https://github.com/boltsparts/boltsparts

I do not close it ATM, see what will happen ...

@Moult
Copy link

Moult commented Nov 3, 2022

awesome! Is it possible to transfer issues?

@berndhahnebach
Copy link
Contributor

good question,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants