Skip to content
Commits on Sep 7, 2010
  1. Introduce BUP_DEBUG, --debug, and tone down the log messages a lot.

    There's a new global bup option, --debug (-D) that increments BUP_DEBUG.  If
    BUP_DEBUG >=1, debug1() prints; if >= 2, debug2() prints.
    
    We change a bunch of formerly-always-printing log() messages to debug1 or
    debug2, so now a typical bup session should be a lot less noisy.
    
    This affects midx in particular, which was *way* too noisy now that 'bup
    save' and 'bup server' were running it automatically every now and then.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 6, 2010
  2. client.py,git.py: run 'bup midx -a' automatically sometimes.

    Now that 'bup midx -a' is smarter, we should run it automatically after
    creating a new index file.  This should remove the need for running it by
    hand.
    
    Thus, we also remove 'bup midx' from the lists of commonly-used subcommands.
    (While we're here, let's take out 'split' and 'join' too; you should be
    using 'index' and 'save' most of the time.)
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 6, 2010
Commits on Sep 6, 2010
  1. Rename 'bup rbackup' to 'bup on'

    'rbackup' was a dumb name but I couldn't think of anything better at the
    time.  This works nicely in a grammatical sort of way:
    
       bup on myserver save -n myserver-backup /etc
    
    Now that we've settled on a name, also add some documentation for the
    command.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 6, 2010
  2. rot13 the t/testfile* sample data files.

    They were generated by catting bunches of bup source code together, which,
    as it turns out, makes 'git grep' super annoying.  Let's rot13 them so
    grepping doesn't do anything interesting but the other characteristics are
    the same.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 6, 2010
  3. cmd/midx: --auto mode can combine existing midx files now.

    Previously, --auto would *only* create a midx from not-already-midxed .idx
    files.  This wasn't optimal since you'd eventually end up with a tonne of
    .midx files, which is just as bad as a tonne of .idx files.
    
    Now we'll try to maintain a maximum number of midx files using a
    highwater/lowwater mark.  That means the number of active midx files should
    now stay between 2 and 5, and you can run 'bup midx -a' as often as you
    want.
    
    'bup midx -f' will still make sure everything is in a single .midx file,
    which is an efficient thing to run every now and then.
    
    'bup midx -af' is the same, but uses existing midx files rather than forcing
    bup to start from only .idx files.  Theoretically this should always be
    faster than, and never be worse than, 'bup midx -f'.
    
    Bonus: 'bup midx -a' now works when there's a limited number of file
    descriptors.  The previous fix only worked properly with 'bup midx -f'.
    (This was rarely a problem since 'bup midx -a' would only ever touch the
    last few .idx files, so it didn't need many file descriptors.)
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 2, 2010
  4. Merge branch 'maint'

    * maint:
      cmd/midx: use getrlimit() to find the max open files.
    committed Sep 6, 2010
  5. cmd/midx: use getrlimit() to find the max open files.

    It turns out the default file limit on MacOS is 256, which is less than our
    default of 500.  I guess this means trouble after all, so let's auto-detect
    it.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 6, 2010
  6. Merge branch 'maint'

    * maint:
      index.py: handle uid/gid == -1 on cygwin
      cmd/memtest: use getrusage() instead of /proc/self/stat.
      cmd/index: catch exception for paths that don't exist.
      Don't use $(wildcard) during 'make install'.
      Don't forget to install _helpers.dll on cygwin.
    committed Sep 6, 2010
  7. cmd/margin: interpret the meaning of the margin bits.

    Maybe you were wondering how good it is when 'bup margin' returns 40 or 45.
    Well, now it'll tell you.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 5, 2010
  8. index.py: handle uid/gid == -1 on cygwin

    On cygwin, the uid or gid might be -1 for some reason.  struct.pack()
    complains about a DeprecationWarning when packing a negative number into an
    unsigned int, so fix it up first.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 6, 2010
  9. cmd/memtest: use getrusage() instead of /proc/self/stat.

    Only Linux has /proc/self/stat, so 'bup memtest' didn't work on anything
    except Linux.  Unfortunately, getrusage() on *Linux* doesn't have a valid
    RSS field (sigh), so we have to use /proc/self/stat as a fallback if it's
    zero.
    
    Now memtest works on MacOS as well, which means 'make test' passes again.
    (It stopped passing because 'bup memtest' recently got added to one of the
    tests.)
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 5, 2010
Commits on Sep 5, 2010
  1. @davidcroda

    cmd/index: catch exception for paths that don't exist.

    Rather than aborting completely if a path specified on the command line
    doesn't exist, report it as a non-fatal error instead.
    
    (Heavily modified by apenwarr from David Roda's original patch.)
    
    Signed-off-by: David Roda <davidcroda@gmail.com>
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    davidcroda committed with Aug 31, 2010
  2. Documentation/*.md: add some options that we forgot to document.

    Software evolves, but documentation evolves... slower.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 5, 2010
  3. Rename Documentation/*.1.md to Documentation/*.md

    All our man pages end up in section 1 of man anyway, and it looks like that
    will probably never change.  So let's make our filenames simpler and easier
    to understand.
    
    Even if we do end up adding a page in (say) section 5 someday, it's no big
    deal; we can just add an exception to the Makefile for it or something.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 5, 2010
Commits on Sep 4, 2010
  1. Don't use $(wildcard) during 'make install'.

    It seems the $(wildcard) is evaluated once at make's startup, so any changes
    made *during* build don't get noticed.
    
    That means 'make install' would fail if you ran it without first running
    'make all', because $(wildcard cmd/bup-*) wouldn't match anything at startup
    time; the files we were copying only got created during the build.
    
    Problem reported by David Roda.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 4, 2010
  2. Don't forget to install _helpers.dll on cygwin.

    We were installing *.so, but not *$(SOEXT) like we should have.  Now we do,
    which should fix some cygwin install problems reported by David Roda.
    
    Also, when installing *.so and *.dll files, make them 0755 instead of 0644,
    also to prevent permissions problems on cygwin, also reported by David Roda.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 4, 2010
Commits on Sep 2, 2010
  1. Merge branch 'maint'

    * maint:
      git.py: recover more elegantly if a MIDX file has the wrong version.
      cmd/midx: add a new --max-files parameter.
    
    Conflicts:
    	lib/bup/git.py
    committed Sep 2, 2010
  2. Merge branch 'guesser'

    * guesser:
      _helpers.extract_bits(): rewrite git.extract_bits() in C.
      _helpers.firstword(): a new function to extract the first 32 bits.
      git.py: when seeking inside a midx, use statistical guessing.
    committed Sep 2, 2010
  3. git.py: recover more elegantly if a MIDX file has the wrong version.

    Previously we'd throw an assertion for any too-new-format MIDX file, which
    isn't so good.  Let's recover more politely (and just ignore the file in
    question) if that happens.
    
    Noticed by Zoran Zaric who was testing my midx3 branch.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 2, 2010
  4. cmd/midx: add a new --max-files parameter.

    Zoran reported that 'bup midx -f' on his system tried to open 3000 files at
    a time and wouldn't work.  That's no good, so let's limit the maximum files
    to open; the default is 500 for now, since that ought to be usable for
    normal people.  Arguably we could use getrlimit() or something to find out
    the actual maximum, or just keep opening stuff until we get an error, but
    maybe there's no point.
    
    Unfortunately this patch isn't really perfect, because it limits the
    usefulness of midx files.  If you could merge midx files into other midx
    files, then you could at least group them all together after multiple runs,
    but that's not currently supported.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Sep 2, 2010
Commits on Aug 27, 2010
  1. _helpers.extract_bits(): rewrite git.extract_bits() in C.

    That makes our memtest run just slightly faster: 2.8 seconds instead of 3.0
    seconds, which catches us back up with the pre-interpolation-search code.
    Thus we should now be able to release this patch without feeling embarrassed
    :)
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Aug 26, 2010
  2. _helpers.firstword(): a new function to extract the first 32 bits.

    This is a pretty common operation in git.py and it speeds up cmd/memtest
    results considerably: from 3.7 seconds to 3.0 seconds.
    
    That gets us *almost* as fast as we were before the whole statistical
    guessing thing, but we still enjoy the improved memory usage.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Aug 26, 2010
  3. git.py: when seeking inside a midx, use statistical guessing.

    Instead of using a pure binary search (where we seek to the middle of the
    area and do a greater/lesser comparison) we now use an "interpolation
    search" (http://en.wikipedia.org/wiki/Interpolation_search), which means we
    seek to where we statistically *expect* the desired value to be.
    
    In my test data, this reduces the number of typical search steps in my test
    midx from 8.7 steps/object to 4.8 steps/object.
    
    This reduces memory churn when using a midx, since sometimes a given search
    region spans two pages, and this technique allows us to more quickly
    eliminate one of the two pages sometimes, allowing us to dirty one fewer
    page.
    
    Unfortunately the implementation requires some futzing, so this actually
    makes memtest run about 35% *slower*.  Will try to fix that next.
    
    The original link to this algorithm came from this article:
    http://sna-projects.com/blog/2010/06/beating-binary-search/
    
    Thanks, article!
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Aug 26, 2010
  4. git.py: keep statistics on how much sha1 searching we had to do.

    And cmd/memtest prints out the results.  Unfortunately this slows down
    memtest runs by 0.126/2.526 = 5% or so.  Yuck.  Well, we can take it out
    later.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Aug 26, 2010
  5. cmd/memtest: add a --existing option to test with existing objects.

    This is useful for testing behaviour when we're looking for objects
    that *do* exist.  Of course, it just goes through the objects in order, so
    it's not actually that realistic.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Aug 26, 2010
Commits on Aug 26, 2010
  1. cmd/midx: fix SHA_PER_PAGE calculation.

    For some reason we were dividing by 200 instead of by 20, which was way off.
    Switch to 20 instead.  Suspiciously, this makes memory usage slightly worse
    in my current (smallish) set of test data, so we might need to revert it
    later...?  But if we're going to have an adjustment, we should at least make
    it clear what for, rather than hiding it in something that looks
    suspiciously like a typo.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Aug 25, 2010
  2. cmd/margin: add a new --predict option.

    When --predict is given, it tries to guess the offset in the indexfile of
    each hash, based on assumption that the hashes are distributed evenly
    throughout the file.  Then it prints the maximum amount by which this guess
    deviates from reality.
    
    I was hoping the results would show that the maximum deviation in a typical
    midx was less than a page's worth of hashes; that would mean the toplevel
    lookup table could be redundant, which means fewer pages hit in the
    common case.  No such luck, unfortunately; with 1.6 million objects, my
    maximum deviation was 913 hashes (about 18 kbytes, or 5 pages).
    
    By comparison, midx files should hit about 2 pages in the common case (1
    lookup table + 1 data page).  Or 3 pages if we're unlucky and the search
    spans two data pages.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Aug 25, 2010
  3. cmd/memtest: print per-cycle and total times.

    This makes it easier to compare output from other people or between
    machines, and also gives a clue as to swappiness.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Aug 25, 2010
Commits on Aug 23, 2010
  1. Rename _faster.so to _helpers.so.

    Okay, _faster.so wasn't a good choice of names.  Partly because not
    everything in there is just to make stuff faster, and partly because some
    *proposed* changes to it don't just make stuff faster.  So let's rename it
    one more time.  Hopefully the last time for a while!
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Aug 22, 2010
Commits on Aug 22, 2010
  1. @lelutin

    lib/bup/ssh: Add docstrings

    Document the code with doctrings.
    
    Also add an "import sys" line since it is used by sys.argv[0] on line 6.
    
    Signed-off-by: Gabriel Filion <lelutin@gmail.com>
    lelutin committed with Aug 15, 2010
  2. @lelutin

    lib/bup/options: Add docstrings

    Document the code with docstrings.
    
    Use one line per imported module as recommended by PEP 8 to make it
    easier to spot unused modules.
    
    Signed-off-by: Gabriel Filion <lelutin@gmail.com>
    lelutin committed with Aug 15, 2010
  3. @lelutin

    import cleanup

    Remove unused imported modules.
    
    I started using the pyflakes.vim plugin and it automagically shows a
    bunch of problems/uncleanliness in the code. It helped me pull this out
    in 15mins.
    
    This change shouldn't have any impact on performance or functionality
    but it makes the code cleaner.
    
    Signed-off-by: Gabriel Filion <lelutin@gmail.com>
    lelutin committed with Aug 15, 2010
  4. cmd/ftp: don't die if we can't import the ctypes module.

    It's only needed on some rare broken versions of readline anyway.  If we
    can't find the module, chances are the system doesn't have that broken
    version of readline.
    
    Based on suggestions by Gabriel Filion and Aaron Ucko.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    committed Aug 21, 2010
  5. @lelutin

    lib/bup/vfs: bring back Python 2.4 support

    There is currently one test failure when running tests against Python
    2.4: a try..except..finally block that's interpreted as a syntax error.
    The commit introducing this incompatibility with 2.4 is f77a082
    
    This is a well known python 2.4 limitation and the workaround, although
    ugly, is easy.
    
    With this test passing, Python 2.4 support is back.
    
    Signed-off-by: Gabriel Filion <lelutin@gmail.com>
    lelutin committed with Aug 20, 2010
Commits on Aug 11, 2010
  1. @lelutin

    lib/bup/vfs: Add docstrings

    Since the vfs module uses the function git._treeparse, it should not be
    named as if it was a private function. Rename git._treeparse to
    git.treeparse and document it (add a docstring to it).
    
    Also, transform _ChunkReader, _FileReader and Node into new-style
    classes.
    
    Finally, remove trailing spaces from lib/bup/vfs.py .
    lelutin committed with Aug 2, 2010
Something went wrong with that request. Please try again.