-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large JLD2 file has invaded our git history #509
Comments
Bah, that's annoying. |
The file is in |
But also we ignore |
Yeah, although I can't find a
Yeah so I'm not sure how the file made it in... |
oh, probably that's why it took so long to clone the repo today.. |
Sorry about that! We could probably just use git bfg repo cleaner to get rid of it and be more careful in the future... |
Something similar happened in FourierFlows.jl and if I recall correctly bfg-repo-cleaner is how we dealt with it. |
This has gotten really bad as now our repo size is 585 MB indicating that the file keeps changing so I looked into this again and it wasn't any of us but was actually the documentation building including the output of the I'll open a PR that deletes JLD2 files generated by examples in |
PS: I think this issue is why Julia TagBot is failing to tag v0.15.0: JuliaRegistries/General#4989 |
IMPORTANT: @glwagner @suyashbire1 we should delete all old repos and clone fresh. From the BFG website: "At this point, you're ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have dirty history that you don't want to risk pushing back into your newly cleaned repo. " I used BFG Repo Cleaner to delete all files larger than 1 MB in git history. This deleted two files: I have a backup of the old "dirty" repository in case we need it for any reason. Before:
After:
BFG log:
Testing:
|
If you have a large file which exists only on a branch (and has never made it into master) it suffices to delete the branch. After that, fresh clones will not include the large file. So in this case I think you could have deleted or rewritten gh-pages and that would have fixed things with less disruption. (IIRC the way gh-pages usually works is a special branch which is detached from the rest of the history, which makes rewriting gh-pages in isolation even easier.) Note that testing branch deletion with a local git repo can be misleading for repo size - you need some extra steps to remove references to the old branch HEAD from the reflog and gc before testing the size of the .git directory (something like |
@ali-ramadhan, are the 134MB due to .jld2 files again? |
If you clone only the master branch, you will see that it is quite clean: $ git clone https://github.com/climate-machine/Oceananigans.jl.git --single-branch
Cloning into 'Oceananigans.jl'...
remote: Enumerating objects: 19, done.
remote: Counting objects: 100% (19/19), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 12346 (delta 0), reused 19 (delta 0), pack-reused 12327
Receiving objects: 100% (12346/12346), 10.25 MiB | 1.02 MiB/s, done.
Resolving deltas: 100% (8872/8872), done. $ du -sh Oceananigans.jl/.git
12M Oceananigans.jl/.git So only 12M of (compressed) files are downloaded. On the other hand, cloning the documentation branch $ git clone https://github.com/climate-machine/Oceananigans.jl.git --single-branch --branch=gh-pages
Cloning into 'Oceananigans.jl'...
remote: Enumerating objects: 4040, done.
remote: Total 4040 (delta 0), reused 0 (delta 0), pack-reused 4040
Receiving objects: 100% (4040/4040), 117.40 MiB | 6.62 MiB/s, done.
Resolving deltas: 100% (1819/1819), done. So this is nothing to worry too much about :-) A simple non-disruptive solution is to change your hosting solution for the docs to use something other than the main git repository, and to delete the |
Thanks for finding this @navidcy! Ah, this is annoying but this wasn't actually fixed... Apparently the commit to See: https://travis-ci.com/climate-machine/Oceananigans.jl/jobs/262276959#L755-L756 I wonder if it's just better to delete the output file as part of the example, i.e. do it in Thankfully the issue is isolated in the |
Yeah, rewriting the gh-pages branch should be good enough 👍 Generally I'd only rewrite a master branch as a tool of last resort because it can be quite disruptive :-) |
This is still an issue and it's been escalating... navid:/ $ git clone https://github.com/climate-machine/Oceananigans.jl.git
Cloning into 'Oceananigans.jl'...
remote: Enumerating objects: 453, done.
remote: Counting objects: 100% (453/453), done.
remote: Compressing objects: 100% (227/227), done.
remote: Total 20837 (delta 204), reused 318 (delta 120), pack-reused 20384
Receiving objects: 100% (20837/20837), 331.98 MiB | 2.05 MiB/s, done.
Resolving deltas: 100% (13268/13268), done.
navid:Research/ $ du -sh Oceananigans.jl
343M Oceananigans.jl |
Thanks for the heads up again @navidcy! Indeed #558 didn't exactly solve the problem as it only deleted the JLD2 file after it was already pushed to I added a I just did it again and repo size is down to 53 MB now. As all the files were on Just to be safe @glwagner @suyashbire1 @sandreza we should probably Cloning just the Cloning just the PS: Thanks again for the tips @c42f and for the branch size measuring commands! |
I noticed that our git repo has ballooned in size some time in the past week.
Someone, possibly me, committed a 52 MiB
ocean_wind_mixing_and_convection.jld2
file, possibly generated by running the example? Butdocs/src/generated
is in.gitignore
so not sure how it made it in.Either way, I think we should scrub it because the repo size has increased by an order of magnitude...
Here's a list of files over 500 KiB:
The text was updated successfully, but these errors were encountered: