Skip to content

Commit

Permalink
DESIGN: mention bloom filters.
Browse files Browse the repository at this point in the history
Jeff Anderson-Lee discovered the missing information and posted to the
mailing list.  Gabriel Filion reminded me to actually update the docs :)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  • Loading branch information
apenwarr committed May 8, 2011
1 parent 64abc2f commit a76207b
Showing 1 changed file with 8 additions and 4 deletions.
12 changes: 8 additions & 4 deletions DESIGN
Expand Up @@ -281,7 +281,7 @@ they're written.
But that leads us to our next problem. But that leads us to our next problem.




Huge numbers of huge packfiles (midx.py, cmd/midx) Huge numbers of huge packfiles (midx.py, bloom.py, cmd/midx, cmd/bloom)
------------------------------ ------------------------------


Git isn't actually designed to handle super-huge repositories. Most git Git isn't actually designed to handle super-huge repositories. Most git
Expand Down Expand Up @@ -354,9 +354,13 @@ You generate midx files with 'bup midx'. The downside of midx files is that
generating one takes a while, and you have to regenerate it every time you generating one takes a while, and you have to regenerate it every time you
add a few packs. add a few packs.


(Computer Sciency observers will note that there are some interesting data UPDATE: Brandon Low contributed an implementation of "bloom filters", which
structures out there that could help make things even better. A very have even better characteristics than midx for certain uses. Look it up in
promising sounding one is called a "bloom filter." Look it up in Wikipedia.) Wikipedia. He also massively sped up both midx and bloom by rewriting the
key parts in C. The nicest thing about bloom filters is we can update them
incrementally every time we get a new idx, without regenerating from
scratch. That makes the update phase much faster, and means we can also get
away with generating midxes less often.


midx files are a bup-specific optimization and git doesn't know what to do midx files are a bup-specific optimization and git doesn't know what to do
with them. However, since they're stored as separate files, they don't with them. However, since they're stored as separate files, they don't
Expand Down

0 comments on commit a76207b

Please sign in to comment.