Investigate purging files from the git repo #169

Open
jonkerz opened this Issue Oct 21, 2016 · 4 comments

Comments

Projects
None yet
4 participants
Member

jonkerz commented Oct 21, 2016 edited

At 507 MB, the repo is too big, at least much bigger than what it should be. Checked out, all files are 9 MB (excluding screencasts).

There are at least 164 MB worth of screencasts (in a format unknown to me), and a dozen of db backups. Neither of which should had been committed in the first place. Purging these from the repo will reduce the size a lot. Old import files probably also take up a bunch of MBs, but they should not be removed for reasons.

Counter arguments:

  • It breaks commit hashes
  • It's scary
  • Possibly not worth the effort
  • 500 MB is not that bad

What are the advantages to have it smaller?

On Fri, Oct 21, 2016 at 6:41 AM, jonkerz notifications@github.com wrote:

At 507 MB, the repo is too big, at least much bigger than what it should
be. Checked out, all files are 9 MB (excluding screencasts).

There are at least 164 MB worth of outdated screencasts (in a format
unknown to me), and a dozen of db backups. Neither of which should had been
committed in the first place. Purging these from the repo will reduce the
size a lot. Old import files probably also take up a bunch of MBs, but they
should not be removed for reasons.

Counter arguments:

  • It breaks commit hashes
  • It's scary
  • Possibly not worth the effort
  • 500 MB is not that bad


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#169, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AGUBHHQTFzI7kQVKXKrFiASarRsIkes6ks5q2MENgaJpZM4KdPJ3
.

Flávia Esteves

Postdoctoral Fellow
Entomology
California Academy of Sciences
Golden Gate Park
55 Music Concourse Drive
San Francisco, CA 94118, U.S.A.

I agree that it's inappropriate to have the screencasts and database
backups in the code repository.

I haven't looked at the screencasts in ages, but I think Flavia might have
done one (some) that could still be relevant. More important, I think it's
important to save periodic backups of the database (like one a month) for a
long-time, just to be sure that an accidental bulk change that goes
undetected can be reversed. I haven't set that up in the past because I
understood that an even greater quantity/frequency of backups was being
kept, but it really should be done.

On Fri, Oct 21, 2016 at 6:41 AM, jonkerz notifications@github.com wrote:

At 507 MB, the repo is too big, at least much bigger than what it should
be. Checked out, all files are 9 MB (excluding screencasts).

There are at least 164 MB worth of outdated screencasts (in a format
unknown to me), and a dozen of db backups. Neither of which should had been
committed in the first place. Purging these from the repo will reduce the
size a lot. Old import files probably also take up a bunch of MBs, but they
should not be removed for reasons.

Counter arguments:

  • It breaks commit hashes
  • It's scary
  • Possibly not worth the effort
  • 500 MB is not that bad


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#169, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AJWaDbEfPXRHg7ni-LbS70N1-4RaOgNsks5q2MENgaJpZM4KdPJ3
.

Owner

foozleface commented Oct 25, 2016

We back up on ibis-info. and I have backups there going back to 2015. I have some backups that go back to 2014 in a couple other places. I do need to write a proper backup management script and integrate that with ITs storage, Outstanding task.
Joe Russack
Computer Scientist, Genomics
Center for Comparative Genomics
California Academy of Sciences
T 415.238.2529
<>jrussack@calacademy.org mailto:jrussack@calacademy.org
www.calacademy.org
http://www.calacademy.org/55 Music Concourse Drive
Golden Gate Park
San Francisco, CA 94118

Facebook http://www.facebook.com/calacademy | Twitter https://twitter.com/calacademy
bioGraphic is a new multimedia magazine featuring beautiful and surprising stories from the natural world, powered by the California Academy of Sciences. Visit www.biographic.com http://www.biographic.com/

On Oct 25, 2016, at 12:02 PM, Stan Blum notifications@github.com wrote:

I agree that it's inappropriate to have the screencasts and database
backups in the code repository.

I haven't looked at the screencasts in ages, but I think Flavia might have
done one (some) that could still be relevant. More important, I think it's
important to save periodic backups of the database (like one a month) for a
long-time, just to be sure that an accidental bulk change that goes
undetected can be reversed. I haven't set that up in the past because I
understood that an even greater quantity/frequency of backups was being
kept, but it really should be done.

On Fri, Oct 21, 2016 at 6:41 AM, jonkerz notifications@github.com wrote:

At 507 MB, the repo is too big, at least much bigger than what it should
be. Checked out, all files are 9 MB (excluding screencasts).

There are at least 164 MB worth of outdated screencasts (in a format
unknown to me), and a dozen of db backups. Neither of which should had been
committed in the first place. Purging these from the repo will reduce the
size a lot. Old import files probably also take up a bunch of MBs, but they
should not be removed for reasons.

Counter arguments:

  • It breaks commit hashes
  • It's scary
  • Possibly not worth the effort
  • 500 MB is not that bad


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#169, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AJWaDbEfPXRHg7ni-LbS70N1-4RaOgNsks5q2MENgaJpZM4KdPJ3
.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #169 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AALAcmmZ9CgYXC2FNDGmWJJiJowOdJiVks5q3lJOgaJpZM4KdPJ3.

Member

jonkerz commented Nov 19, 2016

@flaviaesteves The screencasts would be moved somewhere else, maybe to one of the Academy's servers or Dropbox. They are pretty hard to discover where they are now, so putting them somewhere more accessible would be good for this reason as well. These are the ones I'm referring to. If you upload them and all other screencasts to YouTube, we could link them in the "For editors" section on the wiki.

Likely, purging anything from the repo would either happen a long time from now or not at all, because it's so complicated. Even if we never do, having the screencasts on YouTube would still be a good thing.

And I'm sure all of you understood that "purging" is the actual technical term for "hard removing" files -- it just sounds so very horrible and mean out of context!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment