Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Jan 14, 2010

  1. Dave Coombs

    Change t/tindex.py to pass on Mac OS.

    It turns out /etc is a symlink (to /private/etc) on Mac OS, so checking
    that the realpath of t/sampledata/etc is /etc fails.  Instead we now check
    against the realpath of /etc.
    dcoombs authored

Jan 12, 2010

  1. apenwarr

    Use a PackBitmap file as a quicker way to check .idx files.

    When we receive a new .idx file, we auto-generate a .map file from it.  It's
    essentially an allocation bitmap: for each 20-bit prefix, we assign one bit
    to tell us if that particular prefix is in that particular packfile.  If it
    isn't, there's no point searching the .idx file at all, so we can avoid
    mapping in a lot of pages.  If it is, though, we then have to search the
    .idx *too*, so we suffer a bit.
    
    On the whole this reduces memory thrashing quite a bit for me, though.
    Probably the number of bits needs to be variable in order to work over a
    wider range of packfile sizes/numbers.
    authored
  2. apenwarr

    memtest.py: a standalone program for testing memory usage in PackIndex.

    The majority of the memory usage in bup split/save is now caused by
    searching pack indexes for sha1 hashes.  The problem with this is that, in
    the common case for a first full backup, *none* of the object hashes will be
    found, so we'll *always* have to search *all* the packfiles.  With just 45
    packfiles of 200k objects each, that makes about (18-8)*45 = 450 binary
    search steps, or 100+ 4k pages that need to be loaded from disk, to check
    *each* object hash.  memtest.py lets us see how fast RSS creeps up under
    various conditions, and how different optimizations affect the result.
    authored
  3. apenwarr

    options parser: automatically convert strings to ints when appropriate.

    If the given parameter is exactly an int (ie. str(int(v)) == v) then convert
    it to an int automatically.  This helps avoid weird bugs in apps using the
    option parser.
    authored
  4. apenwarr

    cmd-save: if verbose==1, don't bother printing unmodified names.

    That just clutters the output; clearly what people *really* want to see is
    the list of files we're actually modifying.
    
    But if you want more, increase the verbosity and you'll get more.
    authored
  5. apenwarr

    client-server: only retrieve index files when actually needed.

    A busy server could end up with a *large* number of index files, mostly
    referring to objects from other clients.  Downloading all the indexes not only
    wastes bandwidth, but causes a more insidious problem: small servers end up
    having to mmap a huge number of large index files, which sucks lots of RAM.
    
    In general, the RAM on a server is roughly proportional to the disk space on
    that server.  So it's okay for larger clients to need more RAM in order
    to complete a backup.  However, it's not okay for the existence of larger
    clients to make smaller clients suffer.  Hopefully this change will settle
    it a bit.
    authored
  6. apenwarr

    Reduce default max objects per pack to 200,000 to save memory.

    After some testing, it seems each object sha1 we need to cache while writing
    a pack costs us about 83 bytes of memory.  (This isn't so great, so
    optimizing it in C later could cut this down a lot.)  The new limit of 200k
    objects takes about 16.6 megs of RAM, which nowadays is pretty acceptable.
    It also corresponds to roughly 1GB of packfile for my random select of
    sample data, so (since the default packfile limit is about 1GB anyway), this
    *mostly* won't matter.
    
    It will have an effect if your data is highly compressible, however; an
    8192-byte object could compress down to a very small size and you'd end up
    with a large number of objects.  The previous default limit of 10 million
    objects was ridiculous, since that would take 830 megs of RAM.
    authored
  7. apenwarr

    split_to_blob_or_tree was accidentally not using the 'fanout' setting.

    Thus, 'bup save' on huge files would suck lots of RAM.
    authored

Jan 11, 2010

  1. apenwarr

    cmd-server: receive-objects should return a relative, not absolute, p…

    …ath.
    authored
  2. apenwarr

    Update the README to reflect recent changes.

    authored
  3. apenwarr

    Merge branch 'cygwin'

    * cygwin:
      Assorted cleanups to Luke's cygwin fixes.
      Makefile: work with cygwin on different windows versions.
      .gitignore sanity.
      Makefile:  On Windows, executable files must end with .exe.
      client.py:  Windows files don't support ':', so rename cachedir.
      index.py:  os.rename() fails on Windows if dstfile already exists.
      Don't try to rename tmpfiles into existing open files.
      helpers.py:  Cygwin doesn't support `hostname -f`, use `hostname`.
      cmd-index.py:  Retry os.open without O_LARGEFILE if not supported.
      Makefile:  Build on Windows under Cygwin.
    authored
  4. apenwarr

    Assorted cleanups to Luke's cygwin fixes.

    There were a few things that weren't quite done how I would have done them,
    so I changed the implementation.  Should still work in cygwin, though.
    
    The only actual functional changes are:
     - index.Reader.close() now actually sets m=None rather than just closing it
     - removed the "if rename fails, then unlink first" logic, which is
       seemingly not needed after all.
     - rather than special-casing cygwin to use "hostname" instead of "hostname
       -f", it turns out python has a socket.getfqdn() that does what we want.
    authored
  5. apenwarr

    Makefile: work with cygwin on different windows versions.

    Just check the CYGWIN part; don't depend on the fact that it's NT 5.1.  (Of
    course, uname isn't supposed to report such things by default anyway... but
    that's cygwin for you.)
    authored

Jan 10, 2010

  1. Lukasz (Luke) Kosewski

    Merge branch 'master' of git://github.com/apenwarr/bup

    lkosewsk authored
  2. Lukasz (Luke) Kosewski

    .gitignore sanity.

    lkosewsk authored
  3. Lukasz (Luke) Kosewski

    Makefile: On Windows, executable files must end with .exe.

    lkosewsk authored
  4. Lukasz (Luke) Kosewski

    client.py: Windows files don't support ':', so rename cachedir.

    Cachedir was previously $host:$dir, and is now $host-$dir.
    lkosewsk authored
  5. Lukasz (Luke) Kosewski

    index.py: os.rename() fails on Windows if dstfile already exists.

    Hence, we perform an os.unlink on the dstfile if os.rename() receives
    an OSError exception, and try again.
    lkosewsk authored
  6. Lukasz (Luke) Kosewski

    Don't try to rename tmpfiles into existing open files.

    Linux and friends have no problem with this, but Windows doesn't allow
    this without some effort, which we can avoid by... not needing to write
    to an already-open file.
    
    Give index.Reader a 'close' method which identifies and closes any open
    mmaped files, and make cmd-index.py use this before trying to close a
    index.Writer instance (which renames a tmpfile into the same file the
    Reader has mmaped).
    lkosewsk authored
  7. Lukasz (Luke) Kosewski

    helpers.py: Cygwin doesn't support `hostname -f`, use `hostname`.

    lkosewsk authored
  8. Lukasz (Luke) Kosewski

    cmd-index.py: Retry os.open without O_LARGEFILE if not supported.

    Python under Cygwin doesn't have os.O_LARGEFILE, so if we receive an
    'AttributeError' exception trying to open something, just remove
    O_LARGEFILE and try again.
    lkosewsk authored
  9. Lukasz (Luke) Kosewski

    Makefile: Build on Windows under Cygwin.

    - Python modules have to end with .dll instead .so to load into Python
      via 'import'.
    - GCC under Windows builds all programs with -fPIC, and doesn't accept
      this command-line option.
    - libpython2.5.dll is found in /usr/bin under Cygwin (wtf?), so we need
      to add this to the LDFLAGS path.
    - 'make clean' should remove .dll files too.
    lkosewsk authored
  10. apenwarr

    Oops, 'bup save /' produced an invalid tree.

    Add a bunch of assertions to make sure that never happens.
    authored
  11. apenwarr

    This adds the long-awaited indexfile feature, so you no longer have t…

    …o feed
    
    your backups through tar.
    
    Okay, 'bup save' is still a bit weak... but it could be much worse.
    
    Merge branch 'indexfile'
    
    * indexfile:
      Minor fix for python 2.4.4 compatibility.
      cmd-save: completely reimplement using the indexfile.
      Moved some reusable index-handling code from cmd-index.py to index.py.
      A bunch of wvtests for the 'bup index' command.
      Start using wvtest.sh for shell-based tests in test-sh.
      cmd-index: default indexfile path is ~/.bup/bupindex, not $PWD/index
      cmd-index: skip merging the index if nothing was written to the new one.
      cmd-index: only update if -u is given; print only given file/dirnames.
      cmd-index: correct reporting of deleted vs. added vs. modified status.
      Generalize the multi-index-walking code.
      cmd-index: indexfiles should start with a well-known header.
      cmd-index: eliminate redundant paths from index update command.
      cmd-index: some handy options.
      index: add --xdev (--one-file-system) option.
      Fix some bugs with indexing '/'
      cmd-index: basic index reader/writer/merger.
    authored
  12. apenwarr

    Minor fix for python 2.4.4 compatibility.

    authored
  13. apenwarr

    cmd-save: completely reimplement using the indexfile.

    'bup save' no longer walks the filesystem: instead it walks the indexfile
    (which is much faster) and doesn't bother opening any files that haven't had
    an attribute change, since it can just reuse their sha1 from before.  That
    makes it *much* faster in the common case.
    authored
  14. apenwarr

    Moved some reusable index-handling code from cmd-index.py to index.py.

    authored
  15. apenwarr

    A bunch of wvtests for the 'bup index' command.

    authored
  16. apenwarr

    Start using wvtest.sh for shell-based tests in test-sh.

    This makes the output a little prettier... at least in the common case where
    it passes :)
    authored
  17. apenwarr

    cmd-index: default indexfile path is ~/.bup/bupindex, not $PWD/index

    authored
  18. apenwarr

    cmd-index: skip merging the index if nothing was written to the new one.

    authored
  19. apenwarr

    cmd-index: only update if -u is given; print only given file/dirnames.

    cmd-index now does two things:
     - it updates the index with the given names if -u is given
     - it prints the index if -p, -s, or -m are given.
    
    In both cases, if filenames are given, it operates (recursively) on the
    given filenames or directories.  If no filenames are given, -u fails (we
    don't want to default to /; it's too slow) but -p/s/m just prints the whole
    index.
    authored
  20. apenwarr

    cmd-index: correct reporting of deleted vs. added vs. modified status.

    A file with an all-zero sha1 is considered Added instead of Modified, since
    it has obviously *never* had a valid sha1.  (A modified file has an old
    sha1, but IX_HASHVALID isn't set.)
    
    We also now don't remove old files from the index - for now - so that we can
    report old files with a D status.  This might perhaps be useful eventually.
    
    Furthermore, we had a but where reindexing a particular filename would
    "sometimes" cause siblings of that file to be marked as deleted.  The
    sibling entries should never be updated, because we didn't check them and
    thus have no idea of their new status.  This bug was mostly caused by the
    silly way we current pass dirnames and filenames around...
    authored

Jan 09, 2010

  1. apenwarr

    Generalize the multi-index-walking code.

    Now you can walk through multiple indexes correctly from anywhere, avoiding
    the need for merging a huge index just to update a few files.
    authored
  2. apenwarr

    cmd-index: indexfiles should start with a well-known header.

    authored
Something went wrong with that request. Please try again.