Commits on Mar 14, 2010
  1. PackIdxList.refresh(): remember to exclude old midx files.

    apenwarr committed Mar 14, 2010
    Previously, if you called refresh(), it would fail to consider
    the contents of already-loaded .midx files as already-loaded.  That means
    it would load all the constituent .idx files, so you'd actually lose all the
    advantages of the .midx after the first refresh().
    Thus, the midx optimization mainly worked before you filled up your first
    pack (about 1GB of data saved) or until you got an index suggestion.  This
    explains why backups would slow down significantly after running for a
    Also, get rid of the stupid forget_packs option; just automatically prune
    the packs that aren't relevant after the refresh.  This avoids the
    possibility of weird behaviour if you set forget_packs incorrectly (which we
  2. bup.client: fix freeze when suggest-index after finishing a full pack.

    apenwarr committed Mar 14, 2010
    It was just rare enough to be hard to find: if you write an entire pack full
    of stuff (1GB or more) and *then* trigger a suggest-index, the client would
    freeze because it would send a send-index command without actually
    suspending the receive-pack first.
    The whole Client/PackWriter separation is pretty gross, so it's not terribly
    surprising this would happen.
    Add a unit test to detect this case if it ever happens in the future, for
    what it's worth.
  3. main: even more fixes for signal handling.

    apenwarr committed Mar 14, 2010
    If the child doesn't die after the first SIGINT and the user presses ctrl-c
    one more time, the main bup process would die instead of forwarding it on to
    the child.  That's no good; we actually have to loop forwarding signals
    until the child is really good and dead.
    And if the child refuses to die, well, he's the one with the bug, not  So should stay alive too in the name of not losing track
    of things.
  4. client/server: correctly handle case where receive-objects had 0 obje…

    apenwarr committed Mar 14, 2010
    Previously we'd throw a (probably harmless other than ugly output)
    exception in this case.
Commits on Mar 13, 2010
  1. cmd/{index,save}: handle ctrl-c without printing a big exception trace.

    apenwarr committed Mar 13, 2010
    It's not very exciting to look at a whole stack trace just because someone
    hit ctrl-c, especially since that's designed to work fine.  Trim it down in
    that case.
  2. git.PackWriter: avoid pack corruption if interrupted by a signal.

    apenwarr committed Mar 12, 2010
    PackWriter tries to "finish" a half-written pack in its destructor if
    interrupted.  To do this, it flushes the stream, seeks back to the beginning
    to update the sha1sum and object count, then runs git-index-pack on it to
    create the .idx file.
    However, sometimes if you were unlucky, you'd interrupt PackWriter partway
    through writing an object to the pack.  If only half an object exists at the
    end, it would have the wrong header and thus come out as corrupt when
    index-pack would run.
    Since our objects are meant to be small anyway, just make sure we write
    everything all in one file.write() operation.  The files themselves are
    buffered, so this wouldn't survive a surprise termination of the whole
    unix process, but we wouldn't run index-pack in that case anyway, so it
    doesn't matter.
    Now when I press ctrl-c in 'bup save', it consistently writes the half-saved
    objects as it should.
  3. Correctly pass along SIGINT to child processes.

    apenwarr committed Mar 12, 2010
    Ever since we introduced bup newliner, signal handling has been a little
    screwy.  The problem is that ctrl-c is passed to *all* processes in the
    process group, not just the parent, so everybody would start terminating at
    the same time, with very messy results.
    Two results were particularly annoying: git.PackWriter()'s destructor
    wouldn't always get called (so half-finished packs would be lost instead of
    kept so we don't need to backup the same stuff next time) and bup-newliner
    would exit, so the stdout/stderr of a process that *did* try to clean up
    would be lost, usually resulting in EPIPE, which killed the proces while
    attempting to clean up.
    The fix is simple: when starting a long-running subprocess, give it its own
    session by calling os.setsid().  That way ctrl-c is only sent to the
    toplevel 'bup' process, who can forward it as it should.
    Next, fix bup's signal forwarding to actually forward the same signal as it
    received, instead of always using SIGTERM.
  4. hashsplit: use posix_fadvise(DONTNEED) when available.

    apenwarr committed Mar 12, 2010
    When reading through large disk images to back them up, we'll only end up
    reading the data once, but it still takes up space in the kernel's disk
    cache.  If you're backing up a whole disk full of stuff, that's bad news for
    anything else running on your system, which will rapidly have its stuff
    dumped out of cache to store a bunch of stuff bup will never look at again.
    The posix_fadvise() call actually lets us tell the kernel we won't be using
    this data anymore, thus greatly reducing our hit on the disk cache.
    Theoretically it improves things, anyway.  I haven't been able to come up
    with a really scientific way to test it, since of course *bup's* performance
    is expected to be the same either way (we're only throwing away stuff we're
    done using).  It really does throw things out of cache, though, so the rest
    follows logically at least.
  5. save-cmd: open files with O_NOATIME on OSes that support it.

    apenwarr committed Mar 12, 2010
    Backing up files normally changes their atime, which is bad for two reasons.
    First, the files haven't really been "accessed" in a useful sense; the fact
    that we backed them up isn't an indication that, say, they're any more
    frequently used than they were before.
    Secondly, when reading a file updates its atime, the kernel has to enqueue
    an atime update (disk write) for every file we back up.  For programs that
    read the same files repeatedly, this is no big deal, since the atime just
    gets flushed out occasionally (after a lot of updates).  But since bup
    accesses *every* file only once, you end up with a huge atime backlog, and
    this can wastefully bog down your disks during a big backup.
    Of course, mounting your filesystem with noatime would work too, but not
    everybody does that.  So let's help them out.
Commits on Mar 4, 2010
  1. save-cmd: oops, byte counter was checking sha_missing() too late.

    apenwarr committed Mar 4, 2010
    After validating a backed-up file, sha_missing() goes false.  So we have to
    remember the value from *before* we backed it up.  Sigh.
  2. main: fix problem when redirecting to newliner on MacOS X.

    apenwarr committed Mar 4, 2010
    It's probably just a bug in python 2.4.2, which is the version on my old
    MacOS machine.  But it seems that if you use subprocess.Popen with stdout=1
    and/or stderr=2, it ends up closing the file descriptors instead of passing
    them along.  Since those are the defaults anyway, just use None instead.
  3. save-cmd: when verbose=1, print the dirname *before* backing it up.

    apenwarr committed Mar 4, 2010
    It was really misleading showing the most-recently-completed directory, then
    spending a long time backing up files in a totally different place.
  4. save-cmd: Fix --smaller and other behaviour when files are skipped.

    apenwarr committed Mar 4, 2010
    The --smaller option now uses parse_num() so it can be something other than
    a raw number of bytes (eg. "1.5G").
    We were incorrectly marking a tree as valid when we skipped any of its
    contents for any reason; that's no good.  We can still save a tree to the
    backup, but it'll be missing some stuff, so we have to avoid marking it as
    valid.  That way it won't be skipped next time around.
  5. save-cmd: progress meter wouldn't count identical files correctly.

    apenwarr committed Mar 4, 2010
    This one was really tricky.  If a file was IX_HASHVALID but its object
    wasn't available on the target server (eg. if you backed up to one server
    server and now are backing up to a different one), we could correctly count
    is toward the total bytes we expected to back up.
    Now imagine there are two *identical* files (ie. with the same sha1sum) in
    this situation.  When that happens, we'd back up the first one, after which
    the objects for the second one *are* available.  So we'd skip it, thinking
    that we had skipped it in the first place.  The result would be that our
    backup count showed a final byte percentage less than 100%.
    The workaround isn't very pretty, but should be correct: we add a new
    IX_SHAMISSING flag, setting or clearing it during the initial index scan,
    and then we use *that* as the indicator of whether to add bytes to the count
    or not.
    We also have to decide whether to recurse into subdirectories using this
    algorithm.  If /etc/rc3.d and /etc/rc4.d are identical, and one of the files
    in them had this problem, then we wouldn't even *recurse* into /etc/rc3.d
    after backing up /etc/rc4.d.  That means we wouldn't check the IX_SHAMISSING
    flag on the file inside.  So we had to fix that up too.
    On the other hand, this is an awful lot of complexity just to make the
    progress messages more exact...
Commits on Mar 3, 2010
  1. save-cmd: byte count was missing a few files.

    apenwarr committed Mar 3, 2010
    In particular, if we tried to back up a file but couldn't open it, that
    would fail to increment the byte count.
    We also sometimes counted unmodified directories instead of ignoring them.
  2. main: initialize 'p' before the try/finally that uses it.

    apenwarr committed Mar 3, 2010
    Otherwise, if we fail to run the subprocess, the finally section doesn't
    work quite right.
  3. save-cmd: don't fail an assertion when doing a backup from the root l…

    apenwarr committed Mar 3, 2010
    This wasn't caught by unit tests because "virtual" nodes added by weren't being marked as IX_EXISTS, which in the unit
    tests included the root, so save-cmd was never actually trying to back up
    that node.
    That made the base directories incorrectly marked as status=D (deleted) if
    you printed out the index during the tests.  So add a test for that to make
    it fail if "/" is deleted (which obviously makes no sense), then add another
    test for saving from the root level, then fix both bugs.
  4. 'make stupid' stopped working when I moved subcommands into their own…

    apenwarr committed Mar 3, 2010
    … dir.
    Remote server mode tries to add the directory of argv[0] (the
    currently-running program) to the PATH on the remote server, just in case
    bup isn't installed in the PATH there, so that it can then run 'bup server'.
    However, now that bup-save is in a different place than bup, argv[0] is the
    wrong place to look.  Instead, have the bup executable export an environment
    variable containing its location, and can use that instead of
    argv[0].  Slightly gross, but it works.
  5. bup.options: remove reference to bup.helpers.

    apenwarr committed Mar 3, 2010
    This makes the module more easily reusable in other apps.
  6. log(): handle situations where stderr gets set to nonblocking.

    apenwarr committed Mar 3, 2010
    It's probably ssh doing this, and in obscure situations, it means log() ends
    up throwing an exception and aborting the program.
    Fix it so that we handle EAGAIN correctly if we get it when writing to
    stderr, even though this is only really necessary due to stupidity on
    (I think/hope) someone else's part.
Commits on Mar 2, 2010
  1. Add man pages for random, newliner, help, memtest, ftp.

    apenwarr committed Mar 2, 2010
    Also add a 'help' command to ftp, and fix up some minor help messages.
  2. bup random: fix progress output and don't print to a tty.

    apenwarr committed Mar 2, 2010
    We were printing output using a series of dots, which interacted badly with
    bup newliner (and for good reason).  Change it to actually display the
    number of megabytes done so far.
    Also, don't print random binary data to a tty unless -f is given.  It's
    just more polite that way.
  3. clean up subprocesses dying on signal.

    apenwarr committed Mar 2, 2010
    CTRL-C didn't abort 'bup random' properly, and possibly others as well.
Commits on Mar 1, 2010
  1. Rename PackIndex->PackIdx and MultiPackIndex->PackIdxList.

    apenwarr committed Mar 1, 2010
    This corresponds to the PackMidx renaming I did earlier, and helps avoid
    confusion between (which talks to the 'bupindex' file and has
    nothing to do with packs) and (which talks to packs and has nothing
    to do with the bupindex).  Now pack indexes are always called Idx, and the
    bupindex is always Index.
    Furthermore, MultiPackIndex could easily be assumed to be the same thing as
    a Midx, which it isn't.  PackIdxList is a more accurate description of what
    it is: a list of pack indexes.  A Midx is an index of a list of packs.
  2. main: list common commands before other ones.

    apenwarr committed Mar 1, 2010
    When you just type 'bup' or 'bup help', we print a list of available
    commands.  Now we improve this list by:
    1) Listing the common commands (with one-line descriptions) before listing
    the automatically-generated list.
    2) Printing the automatically-generated list in columns, so it takes up less
    vertical space.
    This whole concept was stolen from how git does it.  I think it should be a
    bit more user friendly for beginners this way.
Commits on Feb 28, 2010
  1. Add a 'bup help' command.

    apenwarr committed Feb 28, 2010
    It works like 'git help xxx', ie. it runs 'man bup-xxx' where xxx is the
    command name.
  2. vfs: supply ctime/mtime for the root of each commit.

    apenwarr committed Feb 28, 2010
    This makes it a little more obvious which backups were made when.
    Mostly useful with 'bup fuse'.
  3. Move cmd-*.py to cmd/*

    apenwarr committed Feb 28, 2010
    The bup-* programs shouldn't need to be installed into /usr/bin; we should
    search for them in /usr/lib somewhere.
    I could have left the names as cmd/cmd-*.py, but the cmd-* was annoying me
    because of tab completion.  Now I can type cmd/ran<tab> to get
  4. Move python library files to lib/bup/

    apenwarr committed Feb 28, 2010
    ...and update other programs so that they import them correctly from their
    new location.
    This is necessary so that the bup library files can eventually be installed
    somewhere other than wherever the 'bup' executable ends up.  Plus it's
    clearer and safer to say 'from bup import options' instead of just 'import
    options', in case someone else writes an 'options' module.
    I wish I could have named the directory just 'bup', but I can't; there's
    already a program with that name.
    Also, in the name of sanity, rename to 'bup memtest' so that it
    can get the new paths automatically.
  5. bup index --check: detect broken index entries.

    apenwarr committed Feb 28, 2010
    Entries with invalid gitmode or sha1 are actually invalid, so if
    IX_HASHVALID is set, that's a bug.  Detect it right away when it happens.
    Also clean up a bit of log output related to checking and status.
  6. cmd-index: auto-invalidate entries without a valid sha1 or gitmode.

    apenwarr committed Feb 28, 2010
    Not exactly sure where these entries came from; possibly a failed save or an
    earlier buggy version of bup.  But previously, they weren't auto-fixable
    without deleting your bupindex.
  7. Add a new 'bup newliner' that fixes progress message whitespace.

    apenwarr committed Feb 28, 2010
    If we have multiple processes producing status messages to stderr and/or
    stdout, and some of the lines ended in \r (ie. a progress message that was
    supposed to be overwritten later) they would sometimes stomp on each other
    and leave ugly bits lying around.
    Now automatically pipes stdout/stderr to the new 'bup newliner'
    command to fix this, but only if they were previously pointing at a tty.
    Thus, if you redirect stdout to a file, nothing weird will happen, but if
    you don't, stdout and stderr won't conflict with each other.
    Anyway, the output is prettier now.  Trust me on this.
  8. Add an options.fatal() function and use it.

    apenwarr committed Feb 28, 2010
    Every existing call to o.usage() was preceded by an error message that
    printed the exename, then the error message.  So let's add a fatal()
    function that does it all in one step.  This reduces the net number of lines
    plus improves consistency.