Commits on Apr 1, 2010
  1. cmd/fsck: correctly catch nonzero return codes of 'par2 create'.

    Oops; we weren't checking the return value like we should.  Reported by
    Sitaram Chamarty.
    committed Apr 1, 2010
  2. helpers.log(): run sys.stdout.flush() first.

    It's annoying when your log messages come out before stdout messages do.
    But it's equally annoying (and inefficient) to have to flush every time you
    print something.  This seems like a nice compromise.
    committed Apr 1, 2010
  3. Get rid of a sha-related DeprecationWarning in python 2.6.

    hashlib is only available in python 2.5 or higher, but the 'sha' module
    produces a DeprecationWarning in python 2.6 or higher.  We want to support
    python 2.4 and above without any stupid warnings, so let's try using
    hashlib.  If it fails, switch to the old sha module.
    committed Apr 1, 2010
Commits on Mar 25, 2010
  1. @rlbdv

    Add support for a global --bup-dir or -d argument.

    When a "--bup-dir DIR" or "-d DIR" argument is provided, act as if
    BUP_DIR=DIR is set in the environment.
    Signed-off-by: Rob Browning <>
    rlbdv committed with Mar 25, 2010
  2. @rlbdv

    Add support for global command-line options (before any subcmd).

    Process global arguments via getopt before handling a subcmd, and add
    initial support for a global --help (or -?) option.
    Also support --help for subcmds by noticing and translating
      git ... subcmd --help ...
      git ... help subcmd ...
    Signed-off-by: Rob Browning <>
    rlbdv committed with Mar 25, 2010
  3. @rlbdv

    cmd/ Use BUP_MAIN_EXE to invoke the correct bup.

    Signed-off-by: Rob Browning <>
    rlbdv committed with Mar 25, 2010
Commits on Mar 23, 2010
Commits on Mar 21, 2010
  1. server: only suggest a max of one pack per receive-objects cycle.

    Since the client only handles one at a time and forgets the others anyway,
    suggesting others is a bit of a waste of time... and because of the cheating
    way we figure out which index to suggest when using a midx, suggesting packs
    is more expensive than it should be anyway.
    The "correct" fix in the long term will be to make the client accept
    multiple suggestions at once, plus make midx files a little smarter about
    figuring out which pack is the one that needs to be suggested.  But in the
    meantime, this makes things a little nicer: there are fewer confusing log
    messages from the server, and a lot less disk grinding related to looking
    into which pack to suggest, followed by finding out that we've already
    suggested that pack anyway.
    committed Mar 21, 2010
  2. rbackup-cmd: we can now backup a *remote* machine to a *local* server.

    The -r option to split and save allowed you to backup from a local machine
    to a remote server, but that doesn't always work; sometimes the machine you
    want to backup is out on the Internet, and the backup repo is safe behind a
    firewall.  In that case, you can ssh *out* from the secure backup machine to
    the public server, but not vice versa, and you were out of luck.  Some
    people have apparently been doing this:
        ssh publicserver tar -c / | bup split -n publicserver
    (ie. running tar remotely, piped to a local bup split) but that isn't
    efficient, because it sends *all* the data from the remote server over the
    network before deduplicating it locally.  Now you can do instead:
        bup rbackup publicserver index -vux /
        bup rbackup publicserver save -n publicserver /
    And get all the usual advantages of 'bup save -r', except the server runs
    locally and the client runs remotely.
    committed Mar 21, 2010
  3. client: Extract 'bup server' connection code into its own module.

    The screwball function we use to let us run 'bup xxx' on a remote server
    after correctly setting the PATH variable is about to become useful for more
    than just 'bup server'.
    committed Mar 21, 2010
  4. options: allow user to specify an alternative to getopt.gnu_getopt.

    The most likely alternative is getopt.getopt, which doesn't rearrange
    arguments.  That would mean "-a foo -p" is considered as the option "-a"
    followed by the non-option arguments ['foo', '-p'].
    The non-gnu behaviour is annoying most of the time, but can be useful when
    you're receiving command lines that you want to pass verbatim to someone
    committed Mar 21, 2010
  5. save/index/drecurse: correct handling for fifos and nonexistent paths.

    When indexing a fifo, you can try to open it (for security reasons) but it
    has to be O_NDELAY just in case the fifo doesn't have anyone on the other
    end; otherwise indexing can freeze.
    In index.reduce_paths(), we weren't reporting ENOENT for reasons I can no
    longer remember, but I think they must have been wrong.  Obviously if
    someone specifies a nonexistent path on the command line, we should barf
    rather than silently not back it up.
    Add some unit tests to catch both cases.
    committed Mar 21, 2010
  6. save-cmd: exit nonzero if any errors were encountered.

    Somehow I forgot to do this before.
    committed Mar 21, 2010
  7. don't leak a file descriptor.

    subprocess.Popen() is a little weird about when it closes the file
    descriptors you give it.  In this case, we have to dup() it because if
    stderr=2 (the default) and stdout=2 (because fix_stderr), it'll close fd 2.
    But if we dup it first, it *won't* close the dup, because stdout!=stderr.
    So we have to dup it, but then we have to close it ourselves.
    This was apparently harmless (it just resulted in an extra fd#3 getting
    passed around to subprocesses as a clone of fd#2) but it was still wrong.
    committed Mar 21, 2010
Commits on Mar 15, 2010
  1. @lkosewsk

    cmd/ How it pains me to have to explicitly close() stuff

    If we don't explicitly close() the wr reader object while running
    update-index, the corresponding writer object won't be able to unlink
    its temporary file under Cygwin.
    lkosewsk committed Mar 15, 2010
  2. @lkosewsk

    lib/bup/ mmap.mmap() objects need to be closed() for Win32.

    Not *entirely* sure why this is the case, but it appears through some
    refcounting weirdness, just setting the mmap variables to None in
    index.Readers doesn't cause the mmap to be freed under Cygwin, though
    I can't find any reason why this would be the case.
    Naturally, this caused all sort of pain when we attempt to unlink
    an mmaped file created while running bup index --check -u.
    Fix the issue by explicitly .close()ing the mmap in Reader.close().
    lkosewsk committed Mar 15, 2010
Commits on Mar 14, 2010
  1. PackIdxList.refresh(): remember to exclude old midx files.

    Previously, if you called refresh(), it would fail to consider
    the contents of already-loaded .midx files as already-loaded.  That means
    it would load all the constituent .idx files, so you'd actually lose all the
    advantages of the .midx after the first refresh().
    Thus, the midx optimization mainly worked before you filled up your first
    pack (about 1GB of data saved) or until you got an index suggestion.  This
    explains why backups would slow down significantly after running for a
    Also, get rid of the stupid forget_packs option; just automatically prune
    the packs that aren't relevant after the refresh.  This avoids the
    possibility of weird behaviour if you set forget_packs incorrectly (which we
    committed Mar 14, 2010
  2. bup.client: fix freeze when suggest-index after finishing a full pack.

    It was just rare enough to be hard to find: if you write an entire pack full
    of stuff (1GB or more) and *then* trigger a suggest-index, the client would
    freeze because it would send a send-index command without actually
    suspending the receive-pack first.
    The whole Client/PackWriter separation is pretty gross, so it's not terribly
    surprising this would happen.
    Add a unit test to detect this case if it ever happens in the future, for
    what it's worth.
    committed Mar 14, 2010
  3. main: even more fixes for signal handling.

    If the child doesn't die after the first SIGINT and the user presses ctrl-c
    one more time, the main bup process would die instead of forwarding it on to
    the child.  That's no good; we actually have to loop forwarding signals
    until the child is really good and dead.
    And if the child refuses to die, well, he's the one with the bug, not  So should stay alive too in the name of not losing track
    of things.
    committed Mar 14, 2010
  4. client/server: correctly handle case where receive-objects had 0 obje…

    Previously we'd throw a (probably harmless other than ugly output)
    exception in this case.
    committed Mar 14, 2010
Commits on Mar 13, 2010
  1. cmd/{index,save}: handle ctrl-c without printing a big exception trace.

    It's not very exciting to look at a whole stack trace just because someone
    hit ctrl-c, especially since that's designed to work fine.  Trim it down in
    that case.
    committed Mar 13, 2010
  2. git.PackWriter: avoid pack corruption if interrupted by a signal.

    PackWriter tries to "finish" a half-written pack in its destructor if
    interrupted.  To do this, it flushes the stream, seeks back to the beginning
    to update the sha1sum and object count, then runs git-index-pack on it to
    create the .idx file.
    However, sometimes if you were unlucky, you'd interrupt PackWriter partway
    through writing an object to the pack.  If only half an object exists at the
    end, it would have the wrong header and thus come out as corrupt when
    index-pack would run.
    Since our objects are meant to be small anyway, just make sure we write
    everything all in one file.write() operation.  The files themselves are
    buffered, so this wouldn't survive a surprise termination of the whole
    unix process, but we wouldn't run index-pack in that case anyway, so it
    doesn't matter.
    Now when I press ctrl-c in 'bup save', it consistently writes the half-saved
    objects as it should.
    committed Mar 12, 2010
  3. Correctly pass along SIGINT to child processes.

    Ever since we introduced bup newliner, signal handling has been a little
    screwy.  The problem is that ctrl-c is passed to *all* processes in the
    process group, not just the parent, so everybody would start terminating at
    the same time, with very messy results.
    Two results were particularly annoying: git.PackWriter()'s destructor
    wouldn't always get called (so half-finished packs would be lost instead of
    kept so we don't need to backup the same stuff next time) and bup-newliner
    would exit, so the stdout/stderr of a process that *did* try to clean up
    would be lost, usually resulting in EPIPE, which killed the proces while
    attempting to clean up.
    The fix is simple: when starting a long-running subprocess, give it its own
    session by calling os.setsid().  That way ctrl-c is only sent to the
    toplevel 'bup' process, who can forward it as it should.
    Next, fix bup's signal forwarding to actually forward the same signal as it
    received, instead of always using SIGTERM.
    committed Mar 12, 2010
  4. hashsplit: use posix_fadvise(DONTNEED) when available.

    When reading through large disk images to back them up, we'll only end up
    reading the data once, but it still takes up space in the kernel's disk
    cache.  If you're backing up a whole disk full of stuff, that's bad news for
    anything else running on your system, which will rapidly have its stuff
    dumped out of cache to store a bunch of stuff bup will never look at again.
    The posix_fadvise() call actually lets us tell the kernel we won't be using
    this data anymore, thus greatly reducing our hit on the disk cache.
    Theoretically it improves things, anyway.  I haven't been able to come up
    with a really scientific way to test it, since of course *bup's* performance
    is expected to be the same either way (we're only throwing away stuff we're
    done using).  It really does throw things out of cache, though, so the rest
    follows logically at least.
    committed Mar 12, 2010
  5. save-cmd: open files with O_NOATIME on OSes that support it.

    Backing up files normally changes their atime, which is bad for two reasons.
    First, the files haven't really been "accessed" in a useful sense; the fact
    that we backed them up isn't an indication that, say, they're any more
    frequently used than they were before.
    Secondly, when reading a file updates its atime, the kernel has to enqueue
    an atime update (disk write) for every file we back up.  For programs that
    read the same files repeatedly, this is no big deal, since the atime just
    gets flushed out occasionally (after a lot of updates).  But since bup
    accesses *every* file only once, you end up with a huge atime backlog, and
    this can wastefully bog down your disks during a big backup.
    Of course, mounting your filesystem with noatime would work too, but not
    everybody does that.  So let's help them out.
    committed Mar 12, 2010
Commits on Mar 4, 2010
  1. save-cmd: oops, byte counter was checking sha_missing() too late.

    After validating a backed-up file, sha_missing() goes false.  So we have to
    remember the value from *before* we backed it up.  Sigh.
    committed Mar 4, 2010
  2. main: fix problem when redirecting to newliner on MacOS X.

    It's probably just a bug in python 2.4.2, which is the version on my old
    MacOS machine.  But it seems that if you use subprocess.Popen with stdout=1
    and/or stderr=2, it ends up closing the file descriptors instead of passing
    them along.  Since those are the defaults anyway, just use None instead.
    committed Mar 4, 2010
  3. save-cmd: when verbose=1, print the dirname *before* backing it up.

    It was really misleading showing the most-recently-completed directory, then
    spending a long time backing up files in a totally different place.
    committed Mar 4, 2010
  4. save-cmd: Fix --smaller and other behaviour when files are skipped.

    The --smaller option now uses parse_num() so it can be something other than
    a raw number of bytes (eg. "1.5G").
    We were incorrectly marking a tree as valid when we skipped any of its
    contents for any reason; that's no good.  We can still save a tree to the
    backup, but it'll be missing some stuff, so we have to avoid marking it as
    valid.  That way it won't be skipped next time around.
    committed Mar 4, 2010
  5. save-cmd: progress meter wouldn't count identical files correctly.

    This one was really tricky.  If a file was IX_HASHVALID but its object
    wasn't available on the target server (eg. if you backed up to one server
    server and now are backing up to a different one), we could correctly count
    is toward the total bytes we expected to back up.
    Now imagine there are two *identical* files (ie. with the same sha1sum) in
    this situation.  When that happens, we'd back up the first one, after which
    the objects for the second one *are* available.  So we'd skip it, thinking
    that we had skipped it in the first place.  The result would be that our
    backup count showed a final byte percentage less than 100%.
    The workaround isn't very pretty, but should be correct: we add a new
    IX_SHAMISSING flag, setting or clearing it during the initial index scan,
    and then we use *that* as the indicator of whether to add bytes to the count
    or not.
    We also have to decide whether to recurse into subdirectories using this
    algorithm.  If /etc/rc3.d and /etc/rc4.d are identical, and one of the files
    in them had this problem, then we wouldn't even *recurse* into /etc/rc3.d
    after backing up /etc/rc4.d.  That means we wouldn't check the IX_SHAMISSING
    flag on the file inside.  So we had to fix that up too.
    On the other hand, this is an awful lot of complexity just to make the
    progress messages more exact...
    committed Mar 4, 2010
Commits on Mar 3, 2010
  1. save-cmd: byte count was missing a few files.

    In particular, if we tried to back up a file but couldn't open it, that
    would fail to increment the byte count.
    We also sometimes counted unmodified directories instead of ignoring them.
    committed Mar 3, 2010
  2. main: initialize 'p' before the try/finally that uses it.

    Otherwise, if we fail to run the subprocess, the finally section doesn't
    work quite right.
    committed Mar 3, 2010