Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Commits on Feb 14, 2011
  1. Move .idx file writing to C

    Brandon Low authored committed
    This was a remaining CPU bottleneck in bup-dumb-server mode.  In a quick
    test, writing 10 .idx files of 100000 elements on my netbook went from
    50s to 4s.  There may be more performance available by adjusting the
    definition of the PackWriter.idx object, but list(list(tuple)) isn't
    bad.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
Commits on Feb 13, 2011
  1. main.py: fix whitespace in the usage string.

    authored
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  2. cmd/daemon: FD_CLOEXEC the listener socket and don't leak fd for the …

    authored
    …connection.
    
    Otherwise the listener gets inherited by all the child processes (mostly
    harmless) and subprograms run by bup-server inherit an extra fd for the
    connection socket (problematic since we want the connection to close as soon
    as bup-server closes).
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  3. cmd/daemon: close file descriptors correctly in parent process.

    authored
    The client wasn't getting disconnected when the server died, because the
    daemon was still hanging on to its copy of the original socket, due to some
    misplaced os.dup() calls.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  4. cmd/daemon: use SO_REUSEADDR.

    authored
    Otherwise we can't re-listen on that socket until the TIME_WAIT period ends,
    under certain conditions.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  5. cmd/daemon: pass extra options along to 'bup server'.

    authored
    Currently 'bup server' doesn't take any options, but that might change
    someday.
    
    Also use a '--' to separate the bup mux command from its arguments, so it
    doesn't accidentally try to parse them.  This didn't matter before (since
    none of the options we were passing along started with a dash) but if the
    user provides extra options, it might matter.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  6. cmd/daemon: correctly report socket binding/listening errors.

    authored
    We should never, ever throw away the string from an exception, because
    that's how people debug problems.  (In this case, my problem was "address
    already in use.")
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  7. main.py: use execvp() instead of subprocess.Popen() when possible.

    authored
    This avoids an extra process showing up in the 'ps' listing if we're not
    going to be using bup-newliner anyhow.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  8. _helpers.c: Remove ugly 'python' junk from argv[0] so 'ps' is prettier.

    authored
    Okay, this is pretty gross.  But the 'ps' output was looking ugly, and
    now it doesn't.  We remove the 'python' interpreter string and the expanded
    pathname of the command being run, so it now shows as (eg.) "bup-join" instead
    of "python /blah/blah/blah/cmd/bup-join".
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  9. cmd/bloom: fix a message pluralization.

    authored
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  10. cmd/join: add a new -o (output filename) option.

    authored
    This is a helpful way to have it open and write to the given output file.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  11. cmd/ls: fix a typo causing 'bup ls foo/latest' to not work.

    authored
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  12. cmd/server: add a new 'help' command.

    authored
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
Commits on Feb 12, 2011
  1. midx4: Fix the other side of my previous nasty bug

    Brandon Low authored committed
    The previous one was a problem with midx4s generated from idx files,
    this one is similar but when they are generated from other .midx4 files.
    
    Many thanks to Aneurin Price for putting up with the awful behavior and
    prodding at bup and whatnot while I was trying to make this one
    disappear under a rug.
    
    Once again, midx4 files generated prior to this patch will want to be
    regenerated.  Once again, only smart servers which have objects not on
    the client's index cache will be effected, but they sure as hell well be
    effected.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
Commits on Feb 8, 2011
  1. midx4: Fix name offsets when generated from idx

    Brandon Low authored committed
    This was a nasty bug, glad it got found before release.  Only effected
    the server's ability to suggest .idxs so far, but would have effected
    any attempt to have bup retrieve objects directly too.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  2. Fix a couple of python 2.4 incompatibilities.

    authored
    Thanks to Jimmy Tang for his help testing these since I don't have python
    2.4 easily available.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  3. Remove incorrect comment

    Brandon Low authored committed
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  4. Merge branch 'bloom'

    authored
    * bloom:
      bloom: avoid kernel disk flushes when we dirty a lot of pages.
      midx4: Properly decide whether to do progress in C
      midx4: Don't use Py_ssize_t, it's not in python2.4
      cmd/bloom: map only one .idx file at a time.
      bloom: Use truncate not writing zeros in create
      bloom: Don't use function pointers in tight loops
      Fix updating of bloom with additional files
      ShaBloom.init(): initialize members before the assert().
      cmd/bloom: actually, always use the same temp filename.
      cmd/bloom: use mkstemp() instead of NamedTemporaryFile().
      midx: Write midx4 in C rather than python
      midx4: midx2 with idx backreferences
      ShaBloom: Add k=4 support for large repositories
      ShaBloom prefilter to detect nonexistant objects
      mmap: Make closing source file optional
  5. bloom: avoid kernel disk flushes when we dirty a lot of pages.

    authored
    Based on the number of objects we'll add to the bloom, decide if we want to
    mmap() the pages as shared-writable ('immediate' write) or else map them
    private-writable for later manual writing back to the file ('delayed'
    write).
    
    A bloom table's write access pattern is such that we dirty almost all the
    pages after adding very few entries; essentially, we can expect to dirty
    about n*k/4096 pages if we add n objects to the bloom with k hashes. But the
    table is so big that dirtying *all* the pages often exceeds Linux's default
    /proc/sys/vm/dirty_ratio or /proc/sys/vm/dirty_background_ratio,
    thus causing it to start flushing the table before we're
    finished... even though there's more than enough space to
    store the bloom table in RAM.
    
    To work around that behaviour, if we calculate that we'll probably end up
    touching the whole table anyway (at least one bit flipped per memory page),
    let's use a "private" mmap, which defeats Linux's ability to flush it to
    disk.  Then we'll flush it as one big lump during close(), which doesn't
    lose any time since we would have had to flush all the pages anyway.
    
    While we're here, let's remove the readwrite=True option to
    ShaBloom.create(); nobody's going to create a bloom file that isn't
    writable.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  6. midx4: Properly decide whether to do progress in C

    Brandon Low authored committed
    Basically just gives us a _helpers.istty to go along with helpers.istty
    and uses it to decide whether or not to write progress messages from
    midx4 generation.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  7. midx4: Don't use Py_ssize_t, it's not in python2.4

    Brandon Low authored committed
    This also uses a slightly more error-checked conversion of input values
    to appropriate C structures.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  8. cmd/bloom: map only one .idx file at a time.

    authored
    This massively decreases virtual memory allocation since we only ever need
    to look at a single idx at once.
    
    In theory, VM doesn't cost us anything, but on 32-bit systems we can
    actually run out of address space if we try to map all the idx files at
    once on a very large repo.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  9. bloom: Use truncate not writing zeros in create

    Brandon Low authored committed
    This lets us test more of bloom's code without writing gigabyte(s) of
    zeros to disk.  As noted in the NOTE: this works on all of the common
    modern unixes that I checked, but may need special handling on other
    systems.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  10. bloom: Don't use function pointers in tight loops

    Brandon Low authored committed
    They really just confused the code at this point and may have prevented
    GCC from doing some optimization.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  11. Fix updating of bloom with additional files

    Brandon Low authored committed
    Make bloom add additional .idx files when it's run on a repo with an
    existing bloom filter file rather than just regenerating all the time.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
Commits on Feb 7, 2011
  1. ShaBloom.init(): initialize members before the assert().

    authored
    Otherwise __del__() throws an exception if the assert triggers, thus hiding
    the original problem.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  2. cmd/bloom: actually, always use the same temp filename.

    authored
    There's no reason to use a different temp filename every time, since we're
    going to just be overwriting the same output file anyhow.  And if we got
    interrupted, we left the temp file lying around.  Let's just always use the
    same temp filename, which means if we get interrupted, we'll clean it up
    next time.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  3. cmd/bloom: use mkstemp() instead of NamedTemporaryFile().

    authored
    Older versions of python (I tested python 2.5) don't support the
    delete=False parameter to NamedTemporaryFile().  In any case, it's not
    actually a temporary file since we're not planning to delete it.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
  4. midx: Write midx4 in C rather than python

    Brandon Low authored committed
    Obviously this is dramatically faster.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  5. midx4: midx2 with idx backreferences

    Brandon Low authored committed
    Like midx3, this adds a lookup table of 4 bytes per entry to
    reference an entry in the idxnames list.  2 bytes should be plenty, but
    disk is cheap and the table will only be referenced when bup server gets
    an object that's already in the midx.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  6. ShaBloom: Add k=4 support for large repositories

    Brandon Low authored committed
    Comments pretty much tell the story, as 3TiB is really not large enough
    for a backup system to support, this adds k=4 support to ShaBloom which
    lets it hold 100s of TiB without too many negative tradeoffs.  Still
    better to use k=5 for smaller repositories, so it switches when the
    repository exceeds 3TiB.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  7. ShaBloom prefilter to detect nonexistant objects

    Brandon Low authored committed
    This inserts a bloom prefilter ahead of midx for efficient checking of
    objects most of which do not exist.  As long as you have enough RAM for
    the bloom filter to stay in memory, this saves a lot of time compared to
    midx files.  Bloom filter is between 1/5th and 1/20th the size of midx
    given the parameters I'm using so far.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  8. mmap: Make closing source file optional

    Brandon Low authored committed
    New index file formats require this behavior (bloom, midx3, etc.)
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
  9. Merge branch 'daemon_msg' of git://github.com/leto/bup

    authored
    * 'daemon_msg' of git://github.com/leto/bup:
      Make 'bup daemon' print a message at startup regardless of debug level
Commits on Feb 6, 2011
  1. @lelutin

    options.py: update docstrings and detail optspec

    lelutin authored committed
    The docstring on the Options class currently refers to a man page which
    does not exist, and still talks about the now-removed 'exe' parameter.
    Update this to be more accurate.
    
    Add a docstring to OptDict.
    
    Finally, the options.py file brings a concept of option spec string. Its
    construction should be documented. Since we'd like the options.py file
    to be a one-file drop-in so that it can be easily used in other
    projects, let's document the option specs in the module's docstring.
    
    Signed-off-by: Gabriel Filion <lelutin@gmail.com>
Something went wrong with that request. Please try again.