Permalink
Commits on Apr 23, 2010
  1. vfs: take advantage of bup chunking to make file seeking faster.

    apenwarr committed Apr 23, 2010
    If you have a huge file, you can now seek around inside it (eg. in 'bup
    fuse') without having to read its entire contents.  Calculating the file
    size is also really fast now.
    
    This makes a bup fuse-mounted filesystem much more useful for real-time
    access.  For example, I was able to connect to an sqlite3 database and have
    it work at a reasonable speed.  (Obviously, since 'bup fuse' is written in
    python and doesn't currently support threading, the speed could still be
    improved, but at least it's no longer godawful.)
  2. git.CatPipe: more resilience against weird errors.

    apenwarr committed Apr 23, 2010
    Notably, MemoryErrors thrown because the file we're trying to load into
    memory is too big to load all at once.  Now the MemoryError gets thrown, but
    the main program is potentially able to recover from it because CatPipe at
    least doesn't get into an inconsistent state.
    
    Also we can recover nicely if some lamer kills our git-cat-file subprocess.
    
    The AutoFlushIter we were using for this purpose turns out to not have been
    good enough, and it's never been used anywhere but in CatPipe, so I've
    revised it further and renamed it to git.AbortableIter.
  3. cmd/ftp: 'ls' command should print filenames in columns.

    apenwarr committed Apr 23, 2010
    We use the columnate() function from main.py for this, now moved into
    helpers.py.
  4. cmd/ftp: if 'get' command returns an error, print it first.

    apenwarr committed Apr 23, 2010
    Previously we would print "Saving 'filename'" even if we were about to
    report that 'filename' doesn't exist or is the wrong file type.
  5. vfs: cache file sizes in the Node object.

    apenwarr committed Apr 23, 2010
    Since the filesystem is read only, there's no reason to recalculate the file
    size every time someone asks :)
  6. cmd/fuse: add missing Stat entries to appease older versions of pytho…

    apenwarr committed Apr 23, 2010
    …n-fuse.
    
    python-fuse 0.2-pre3-4ubuntu1 didn't work, now it does.
    python-fuse 0.2-pre3-9 on Debian did work, still does.
  7. cmd/save: when a file is chunked, mangle its name from * to *.bup

    apenwarr committed Apr 23, 2010
    Files that are already named *.bup are renamed to *.bup.bupl, so that we can
    just always drop either .bup or .bupl from a filename if it's there, and the
    result will be the original filename.
    
    Also updated lib/bup/vfs.py to demangle the names appropriately, and treat
    git trees named *.bup as real chunked files (ie. by joining them back
    together).
Commits on Apr 14, 2010
Commits on Apr 7, 2010
Commits on Apr 2, 2010
  1. doc: bup-split should mention bup-join (not git-join)

    Kirill Smelkov committed with apenwarr Apr 2, 2010
Commits on Apr 1, 2010
  1. Merge branch 'master' of /tmp/bup

    apenwarr committed Apr 1, 2010
    * 'master' of /tmp/bup:
      Add a 'make install' target.
  2. Add a 'make install' target.

    apenwarr committed Apr 1, 2010
    Also change main.py to search around in appropriate places for the installed
    library files.  By default, if your bup is in /usr/bin/bup, it'll look in
    /usr/lib/bup.  (It drops two words off the end of the filename and adds
    /lib/bup to the end.)
    
    This also makes the Debian packager at
    	http://git.debian.org/collab-maint/bup
    actually produce a usable package.
  3. cmd/fsck: correctly catch nonzero return codes of 'par2 create'.

    apenwarr committed Apr 1, 2010
    Oops; we weren't checking the return value like we should.  Reported by
    Sitaram Chamarty.
  4. helpers.log(): run sys.stdout.flush() first.

    apenwarr committed Apr 1, 2010
    It's annoying when your log messages come out before stdout messages do.
    But it's equally annoying (and inefficient) to have to flush every time you
    print something.  This seems like a nice compromise.
  5. Get rid of a sha-related DeprecationWarning in python 2.6.

    apenwarr committed Apr 1, 2010
    hashlib is only available in python 2.5 or higher, but the 'sha' module
    produces a DeprecationWarning in python 2.6 or higher.  We want to support
    python 2.4 and above without any stupid warnings, so let's try using
    hashlib.  If it fails, switch to the old sha module.
Commits on Mar 25, 2010
  1. Add support for a global --bup-dir or -d argument.

    rlbdv committed with apenwarr Mar 25, 2010
    When a "--bup-dir DIR" or "-d DIR" argument is provided, act as if
    BUP_DIR=DIR is set in the environment.
    
    Signed-off-by: Rob Browning <rlb@defaultvalue.org>
  2. Add support for global command-line options (before any subcmd).

    rlbdv committed with apenwarr Mar 25, 2010
    Process global arguments via getopt before handling a subcmd, and add
    initial support for a global --help (or -?) option.
    
    Also support --help for subcmds by noticing and translating
    
      git ... subcmd --help ...
    
    into
    
      git ... help subcmd ...
    
    Signed-off-by: Rob Browning <rlb@defaultvalue.org>
  3. cmd/help-cmd.py: Use BUP_MAIN_EXE to invoke the correct bup.

    rlbdv committed with apenwarr Mar 25, 2010
    Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Commits on Mar 23, 2010
Commits on Mar 21, 2010
  1. server: only suggest a max of one pack per receive-objects cycle.

    apenwarr committed Mar 21, 2010
    Since the client only handles one at a time and forgets the others anyway,
    suggesting others is a bit of a waste of time... and because of the cheating
    way we figure out which index to suggest when using a midx, suggesting packs
    is more expensive than it should be anyway.
    
    The "correct" fix in the long term will be to make the client accept
    multiple suggestions at once, plus make midx files a little smarter about
    figuring out which pack is the one that needs to be suggested.  But in the
    meantime, this makes things a little nicer: there are fewer confusing log
    messages from the server, and a lot less disk grinding related to looking
    into which pack to suggest, followed by finding out that we've already
    suggested that pack anyway.
  2. rbackup-cmd: we can now backup a *remote* machine to a *local* server.

    apenwarr committed Mar 21, 2010
    The -r option to split and save allowed you to backup from a local machine
    to a remote server, but that doesn't always work; sometimes the machine you
    want to backup is out on the Internet, and the backup repo is safe behind a
    firewall.  In that case, you can ssh *out* from the secure backup machine to
    the public server, but not vice versa, and you were out of luck.  Some
    people have apparently been doing this:
    
        ssh publicserver tar -c / | bup split -n publicserver
    
    (ie. running tar remotely, piped to a local bup split) but that isn't
    efficient, because it sends *all* the data from the remote server over the
    network before deduplicating it locally.  Now you can do instead:
    
        bup rbackup publicserver index -vux /
        bup rbackup publicserver save -n publicserver /
    
    And get all the usual advantages of 'bup save -r', except the server runs
    locally and the client runs remotely.
  3. client: Extract 'bup server' connection code into its own module.

    apenwarr committed Mar 21, 2010
    The screwball function we use to let us run 'bup xxx' on a remote server
    after correctly setting the PATH variable is about to become useful for more
    than just 'bup server'.
  4. options: allow user to specify an alternative to getopt.gnu_getopt.

    apenwarr committed Mar 21, 2010
    The most likely alternative is getopt.getopt, which doesn't rearrange
    arguments.  That would mean "-a foo -p" is considered as the option "-a"
    followed by the non-option arguments ['foo', '-p'].
    
    The non-gnu behaviour is annoying most of the time, but can be useful when
    you're receiving command lines that you want to pass verbatim to someone
    else.
  5. save/index/drecurse: correct handling for fifos and nonexistent paths.

    apenwarr committed Mar 21, 2010
    When indexing a fifo, you can try to open it (for security reasons) but it
    has to be O_NDELAY just in case the fifo doesn't have anyone on the other
    end; otherwise indexing can freeze.
    
    In index.reduce_paths(), we weren't reporting ENOENT for reasons I can no
    longer remember, but I think they must have been wrong.  Obviously if
    someone specifies a nonexistent path on the command line, we should barf
    rather than silently not back it up.
    
    Add some unit tests to catch both cases.
  6. save-cmd: exit nonzero if any errors were encountered.

    apenwarr committed Mar 21, 2010
    Somehow I forgot to do this before.
  7. main.py: don't leak a file descriptor.

    apenwarr committed Mar 21, 2010
    subprocess.Popen() is a little weird about when it closes the file
    descriptors you give it.  In this case, we have to dup() it because if
    stderr=2 (the default) and stdout=2 (because fix_stderr), it'll close fd 2.
    But if we dup it first, it *won't* close the dup, because stdout!=stderr.
    So we have to dup it, but then we have to close it ourselves.
    
    This was apparently harmless (it just resulted in an extra fd#3 getting
    passed around to subprocesses as a clone of fd#2) but it was still wrong.
Commits on Mar 15, 2010
  1. cmd/index-cmd.py: How it pains me to have to explicitly close() stuff

    lkosewsk committed Mar 15, 2010
    If we don't explicitly close() the wr reader object while running
    update-index, the corresponding writer object won't be able to unlink
    its temporary file under Cygwin.
  2. lib/bup/index.py: mmap.mmap() objects need to be closed() for Win32.

    lkosewsk committed Mar 15, 2010
    Not *entirely* sure why this is the case, but it appears through some
    refcounting weirdness, just setting the mmap variables to None in
    index.Readers doesn't cause the mmap to be freed under Cygwin, though
    I can't find any reason why this would be the case.
    
    Naturally, this caused all sort of pain when we attempt to unlink
    an mmaped file created while running bup index --check -u.
    
    Fix the issue by explicitly .close()ing the mmap in Reader.close().
Commits on Mar 14, 2010
  1. PackIdxList.refresh(): remember to exclude old midx files.

    apenwarr committed Mar 14, 2010
    Previously, if you called refresh(), it would fail to consider
    the contents of already-loaded .midx files as already-loaded.  That means
    it would load all the constituent .idx files, so you'd actually lose all the
    advantages of the .midx after the first refresh().
    
    Thus, the midx optimization mainly worked before you filled up your first
    pack (about 1GB of data saved) or until you got an index suggestion.  This
    explains why backups would slow down significantly after running for a
    while.
    
    Also, get rid of the stupid forget_packs option; just automatically prune
    the packs that aren't relevant after the refresh.  This avoids the
    possibility of weird behaviour if you set forget_packs incorrectly (which we
    did).
  2. bup.client: fix freeze when suggest-index after finishing a full pack.

    apenwarr committed Mar 14, 2010
    It was just rare enough to be hard to find: if you write an entire pack full
    of stuff (1GB or more) and *then* trigger a suggest-index, the client would
    freeze because it would send a send-index command without actually
    suspending the receive-pack first.
    
    The whole Client/PackWriter separation is pretty gross, so it's not terribly
    surprising this would happen.
    
    Add a unit test to detect this case if it ever happens in the future, for
    what it's worth.
  3. main: even more fixes for signal handling.

    apenwarr committed Mar 14, 2010
    If the child doesn't die after the first SIGINT and the user presses ctrl-c
    one more time, the main bup process would die instead of forwarding it on to
    the child.  That's no good; we actually have to loop forwarding signals
    until the child is really good and dead.
    
    And if the child refuses to die, well, he's the one with the bug, not
    main.py.  So main.py should stay alive too in the name of not losing track
    of things.
  4. client/server: correctly handle case where receive-objects had 0 obje…

    apenwarr committed Mar 14, 2010
    …cts.
    
    Previously we'd throw a (probably harmless other than ugly output)
    exception in this case.