Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Feb 26, 2011

  1. apenwarr

    Merge commit '6f02181'

    * commit '6f02181':
      helpers: separately determine if stdout and stderr are ttys.
      cmd/newliner: restrict progress lines to the screen width.
      hashsplit: use shorter offset-filenames inside trees.
      Replace 040000 and 0100644 constants with GIT_MODE_{TREE,FILE}
      git.py: rename treeparse to tree_decode() and add tree_encode().
      hashsplit.py: remove PackWriter-specific knowledge.
      cmd/split: fixup progress message, and print -b output incrementally.
      hashsplit.py: convert from 'bits' to 'level' earlier in the sequence.
      hashsplit.py: okay, *really* fix BLOB_MAX.
      hashsplit.py: simplify code and fix BLOB_MAX handling.
      options.py: o.fatal(): print error after, not before, usage message.
      options.py: make --usage just print the usage message.
    authored
  2. Gabriel Filion

    midx/bloom: use progress() and debug1() for non-critical messages

    Some messages in these two commands indicate progress but are not
    filtered out when the command is not run under a tty. This makes bup
    return some unwanted messages when run under cron.
    
    Using progress() and debug1() instead should fix that.
    
    (Changed a few from progress() to debug1() by apenwarr.)
    
    Signed-off-by: Gabriel Filion <lelutin@gmail.com>
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    lelutin authored committed

Feb 20, 2011

  1. apenwarr

    helpers: separately determine if stdout and stderr are ttys.

    Previously we only cared if stderr was a tty (since we use that to determine
    if we should print progress() or not).  But we might want to check stdout as
    well, for the same reason that gzip does: we should be refusing to write
    binary data to a terminal.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  2. apenwarr

    cmd/newliner: restrict progress lines to the screen width.

    Otherwise \r won't work as expected.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  3. apenwarr

    hashsplit: use shorter offset-filenames inside trees.

    We previously zero-padded all the filenames (which are hexified versions of
    the file offsets) to 16 characters, which corresponds to a maximum file size
    that fits into a 64-bit integer.  I realized that there's no reason to
    use a fixed padding length; just pad all the entries in a particular tree to
    the length of the longest entry (to ensure that sorting
    alphabetically is still equivalent to sorting numerically).
    
    This saves a small amount of space in each tree, which is probably
    irrelevant given that gzip compression can quite easily compress extra
    zeroes.  But it also makes browsing the tree in git look a little prettier.
    
    This is backwards compatible with old versions of vfs.py, since vfs.py has
    always just treated the numbers as an ordered set of numbers, and doesn't
    care how much zero padding they have.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  4. apenwarr

    Replace 040000 and 0100644 constants with GIT_MODE_{TREE,FILE}

    Those constants were scattered in *way* too many places.  While we're there,
    fix the inconsistent usage of strings vs. ints when specifying the file
    mode; there's no good reason to be passing strings around (except that I
    foolishly did that in the original code in version 0.01).
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  5. apenwarr

    git.py: rename treeparse to tree_decode() and add tree_encode().

    tree_encode() gets most of its functionality from PackWriter.new_tree(),
    which is not just a one liner that calls tree_encode().  We will soon want
    to be able to calculate tree hashes without actually writing a tree to a
    packfile, so let's split out that functionality.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  6. apenwarr

    hashsplit.py: remove PackWriter-specific knowledge.

    Let's use callback functions explicitly instead of passing around special
    objects; that makes the dependencies a bit more clear and hopefully opens
    the way to some more refactoring for clarity.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  7. apenwarr

    cmd/split: fixup progress message, and print -b output incrementally.

    As a side effect, you can no longer combine -b with -t, -c, or -n.  But that
    was kind of a pointless thing to do anyway, because it silently enforced
    --fanout=0, which is almost certainly not what you wanted, precisely if you
    were using -t, -c, or -n.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  8. apenwarr

    hashsplit.py: convert from 'bits' to 'level' earlier in the sequence.

    The hierarchy level is a more directly useful measurement than the bit count,
    although right now neither is used very heavily.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  9. apenwarr

    hashsplit.py: okay, *really* fix BLOB_MAX.

    In some conditions, we were still splitting into blobs larger than BLOB_MAX.
    Fix that too.
    
    Unfortunately adding an assertion about it in the 'bup split' main loop
    slows things down by a measurable amount, so I can't easily add that to
    prevent this from happening by accidenta again in the future.
    
    After implementing this, it looks like 8192 (typical blob size) times two
    isn't big enough to prevent this from kicking in in "normal" cases; let's
    use 4x instead.  In my test file, we exceed this maximum much less.  (Every
    time we exceed BLOB_MAX, it means the bupsplit algorithm isn't working, so
    we won't be deduplicating as effectively.  So we want that to be rare.)
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  10. apenwarr

    hashsplit.py: simplify code and fix BLOB_MAX handling.

    This reduces the number of lines without removing functionality.  I renamed
    a few constants to make more sense.
    
    The only functional change is that BLOB_MAX is now an actual maximum instead
    of a variable number depending on buf.used().  Previously, it might have
    been as large as BLOB_READ_SIZE = 1MB, which is much larger than BLOB_MAX =
    16k.  Now BLOB_MAX is actually the max.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  11. apenwarr

    options.py: o.fatal(): print error after, not before, usage message.

    git prints the error *before* the usage message, but the more I play with
    it, the more I'm annoyed by that behaviour.  The usage message can be pretty
    long, and the error gots lost way above the usage message.  The most
    important thing *is* the error, so let's print it last.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  12. apenwarr

    options.py: make --usage just print the usage message.

    This is a relatively common option in other programs, so let's make it work
    in case someone tries to use it.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored

Feb 19, 2011

  1. Gabriel Filion

    doc/import-rsnapshot: small corrections and clarification

    There's a typo in the --dry-run option explanation.
    
    The form "[...] or only imports all [...]" is confusing. Turn it around
    a little bit so that the quantifiers are associated more easily to the
    right portions of the sentence.
    
    Also, add an example for using the backuptarget argument.
    
    Signed-off-by: Gabriel Filion <lelutin@gmail.com>
    lelutin authored committed

Feb 18, 2011

  1. apenwarr

    cmd/midx, git.py: all else being equal, delete older midxes first.

    Previous runs of 'bup midx -f' might have created invalid midx files with
    exactly the same length as a newer run.  bup's "prune redundant midx" logic
    would quasi-randomly choose one or the other to delete (based on
    alphabetical order of filenames, basically) and sometimes that would be the
    new one, not the old one, so the 'bup midx -f' results never actually kicked
    in.
    
    Now if the file sizes are equal we'll use the mtime as a tie breaker; newer
    is better.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  2. apenwarr

    t/test.sh: a test for the recently-uncovered midx4 problem.

    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  3. apenwarr

    _helpers.c: midx4 didn't handle multiple index with the same object.

    It *tried* to handle it, but would end up with a bunch of zero entries at
    the end, which prevents .exists() from working correctly in some cases.
    
    In midx2, it made sense to never include the same entry twice, because the
    only informatin we had about a particular entry was that it existed.  In
    midx4 this is no longer true; we might sometimes want to know *all* the idx
    files that contain a particular object (for example, when we implement
    expiry later).  So the easiest fix for this bug is to just include multiple
    entries when we have them.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  4. apenwarr

    cmd/midx: add a --check option.

    Running this on my system does reveal that some objects return
    exists()==False on my midx even though they show up during iteration.
    
    Now to actually find and fix it...
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  5. apenwarr

    Add git.shorten_hash(), printing only the first few bytes of a sha1.

    The full name is rarely needed and clutters the output.  Let's try this
    instead in a few places.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  6. apenwarr

    tclient.py: add some additional tests that objcache.refresh() is called.

    ...which it is, so no bugs were fixed here.  Aneurin is sitll exposing a bug
    somehow though.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored

Feb 17, 2011

  1. apenwarr

    cmd/server: add a debug message saying which object caused a suggestion.

    Let's use this to try to debug Aneurin's problem (and potentially others).
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  2. apenwarr

    cmd/list-idx: a quick tool for searching the contents of idx/midx files.

    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  3. Add tests around the bloom ruin and check options

    This generally improves our test coverage of bloom filter behavior and
    more specifically makes sure that check and ruin do something.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
    Brandon Low authored committed
  4. Add a bloom --ruin for testing failure cases

    This command option ruins a bloom filter by setting all of its bits to
    zero.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
    Brandon Low authored committed
  5. One more constant for header lengths

    I missed bloom header length in the last pass.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
    Brandon Low authored committed
  6. apenwarr

    Split PackMidx from git.py into a new midx.py.

    git.py is definitely too big.  It still is, but this helps a bit.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  7. apenwarr

    bloom.py: move bloom.ShaBloom.create to just bloom.create.

    I don't really like class-level functions.  Ideally we'd just move all the
    creation stuff into cmd/bloom, but tbloom.py is testing them, so it's not
    really worth it.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  8. apenwarr

    Merge branch 'bl/bloomcheck' into ap/cleanups

    * bl/bloomcheck:
      Bail out immediately instead of redownloading .idx
      Add a --check behavior to verify bloom
      Defines/preprocessor lengths > magic numbers
    
    Conflicts:
    	cmd/bloom-cmd.py
    authored
  9. apenwarr

    Move bloom-related stuff from git.py to a new bloom.py.

    No other functionality changes other than that cmd/memtest now reports the
    number of bloom steps separately from the midx/idx steps.  (This is mostly
    so they don't have to share the same global variables, but it's also
    interesting information to break out.)
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  10. apenwarr

    cmd/bloom: add a --force option to forget regenerating the bloom.

    This corresponds to midx's --force option.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  11. apenwarr

    Use the new qprogress() function in more places.

    qprogress() was introduced in the last commit and has smarter default
    behaviour that automatically reduces progress output so we don't print too
    many messages per second.  Various commands/etc were doing this in various
    different ad-hoc ways, but let's centralize it all in one place.
    
    Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
    authored
  12. Bail out immediately instead of redownloading .idx

    This should make diagnosing / fixing corrupted bloom filters and midx
    files easier, and is generally more sane behavior.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
    Brandon Low authored committed
  13. Add a --check behavior to verify bloom

    This new behavior is useful when diagnosing weird behavior, lets a bloom
    filter claiming to contain a particular idx be verified against that idx
    file.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
    Brandon Low authored committed
  14. Defines/preprocessor lengths > magic numbers

    This just changes some instances of "8", "12" and "20" to use the
    equivalent sizeof or #defined constants to make the code more readable.
    
    Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
    Brandon Low authored committed
Something went wrong with that request. Please try again.