Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Commits on Mar 20, 2011
  1. @lelutin

    Makefile: Fix 'clean' rule

    lelutin authored committed
    In commit 1df0bdd, I introduced a problem in the make file: the
    chmod operation that gives back some permissions on
    lib/bup/t/pybuptest.tmp dies if this directory does not exist.
    pybuptest.tmp is only created when running the tests.
    when the chmod dies, the clean rule stops, thus not completing the
    cleanup, so we must make sure this operation is not fatal if the
    directory doesn't exist.
    Signed-off-by: Gabriel Filion <>
Commits on Mar 11, 2011
  1. _helpers.c: fix a "type punning" warning from gcc.

    _helpers.c: In function ‘bloom_contains’:
    _helpers.c:260: warning: dereferencing type-punned pointer will break strict-aliasing rules
    Whatever, let's use 1 instead of the apparently problematic Py_True.
    Signed-off-by: Avery Pennarun <>
Commits on Mar 10, 2011
  1. Add a test for previous octal/string filemode fix.

    Apparently nothing was testing symlink behaviour; add a basic test for
    symlink save/restore.
    Signed-off-by: Avery Pennarun <>
  2. @Aneurin

    Use debug1() when reporting paths skipped

    Aneurin authored committed
    Skipping paths during indexing is a normal event not indicative of any
    problems, so need not be reported in normal operation.
    Signed-off-by: Aneurin Price <>
  3. Save was using a quoted instead of octal gitmode.

    Brandon Low authored committed
    This bugged in an assert on python 2.7 for me, and I believe was
    incorrect but functional behavior.
    Signed-off-by: Brandon Low <>
    Signed-off-by: Avery Pennarun <>
  4. @lelutin

    Verify permissions in check_repo_or_die()

    lelutin authored committed
    Currently, if one doesn't have read or access permission up to
    repo('objects/pack'), bup exits with the following error:
    error: repo() is not a bup/git repository
    (with repo() replaced with the actual path).
    This is misleading, since there is possibly really a repository there
    but the user can't access it.
    Make git.check_repo_or_die() verify that the current user has the
    permission to access repo('objects/pack'), and if not, output a
    meaningful error message.
    As a bonus, we get an error if the bup_dir path is not a directory.
    Signed-off-by: Gabriel Filion <>
Commits on Feb 26, 2011
  1. Merge commit '6f02181'

    * commit '6f02181':
      helpers: separately determine if stdout and stderr are ttys.
      cmd/newliner: restrict progress lines to the screen width.
      hashsplit: use shorter offset-filenames inside trees.
      Replace 040000 and 0100644 constants with GIT_MODE_{TREE,FILE} rename treeparse to tree_decode() and add tree_encode(). remove PackWriter-specific knowledge.
      cmd/split: fixup progress message, and print -b output incrementally. convert from 'bits' to 'level' earlier in the sequence. okay, *really* fix BLOB_MAX. simplify code and fix BLOB_MAX handling. o.fatal(): print error after, not before, usage message. make --usage just print the usage message.
  2. @lelutin

    midx/bloom: use progress() and debug1() for non-critical messages

    lelutin authored committed
    Some messages in these two commands indicate progress but are not
    filtered out when the command is not run under a tty. This makes bup
    return some unwanted messages when run under cron.
    Using progress() and debug1() instead should fix that.
    (Changed a few from progress() to debug1() by apenwarr.)
    Signed-off-by: Gabriel Filion <>
    Signed-off-by: Avery Pennarun <>
Commits on Feb 20, 2011
  1. helpers: separately determine if stdout and stderr are ttys.

    Previously we only cared if stderr was a tty (since we use that to determine
    if we should print progress() or not).  But we might want to check stdout as
    well, for the same reason that gzip does: we should be refusing to write
    binary data to a terminal.
    Signed-off-by: Avery Pennarun <>
  2. cmd/newliner: restrict progress lines to the screen width.

    Otherwise \r won't work as expected.
    Signed-off-by: Avery Pennarun <>
  3. hashsplit: use shorter offset-filenames inside trees.

    We previously zero-padded all the filenames (which are hexified versions of
    the file offsets) to 16 characters, which corresponds to a maximum file size
    that fits into a 64-bit integer.  I realized that there's no reason to
    use a fixed padding length; just pad all the entries in a particular tree to
    the length of the longest entry (to ensure that sorting
    alphabetically is still equivalent to sorting numerically).
    This saves a small amount of space in each tree, which is probably
    irrelevant given that gzip compression can quite easily compress extra
    zeroes.  But it also makes browsing the tree in git look a little prettier.
    This is backwards compatible with old versions of, since has
    always just treated the numbers as an ordered set of numbers, and doesn't
    care how much zero padding they have.
    Signed-off-by: Avery Pennarun <>
  4. Replace 040000 and 0100644 constants with GIT_MODE_{TREE,FILE}

    Those constants were scattered in *way* too many places.  While we're there,
    fix the inconsistent usage of strings vs. ints when specifying the file
    mode; there's no good reason to be passing strings around (except that I
    foolishly did that in the original code in version 0.01).
    Signed-off-by: Avery Pennarun <>
  5. rename treeparse to tree_decode() and add tree_encode().

    tree_encode() gets most of its functionality from PackWriter.new_tree(),
    which is not just a one liner that calls tree_encode().  We will soon want
    to be able to calculate tree hashes without actually writing a tree to a
    packfile, so let's split out that functionality.
    Signed-off-by: Avery Pennarun <>
  6. remove PackWriter-specific knowledge.

    Let's use callback functions explicitly instead of passing around special
    objects; that makes the dependencies a bit more clear and hopefully opens
    the way to some more refactoring for clarity.
    Signed-off-by: Avery Pennarun <>
  7. cmd/split: fixup progress message, and print -b output incrementally.

    As a side effect, you can no longer combine -b with -t, -c, or -n.  But that
    was kind of a pointless thing to do anyway, because it silently enforced
    --fanout=0, which is almost certainly not what you wanted, precisely if you
    were using -t, -c, or -n.
    Signed-off-by: Avery Pennarun <>
  8. convert from 'bits' to 'level' earlier in the sequence.

    The hierarchy level is a more directly useful measurement than the bit count,
    although right now neither is used very heavily.
    Signed-off-by: Avery Pennarun <>
  9. okay, *really* fix BLOB_MAX.

    In some conditions, we were still splitting into blobs larger than BLOB_MAX.
    Fix that too.
    Unfortunately adding an assertion about it in the 'bup split' main loop
    slows things down by a measurable amount, so I can't easily add that to
    prevent this from happening by accidenta again in the future.
    After implementing this, it looks like 8192 (typical blob size) times two
    isn't big enough to prevent this from kicking in in "normal" cases; let's
    use 4x instead.  In my test file, we exceed this maximum much less.  (Every
    time we exceed BLOB_MAX, it means the bupsplit algorithm isn't working, so
    we won't be deduplicating as effectively.  So we want that to be rare.)
    Signed-off-by: Avery Pennarun <>
  10. simplify code and fix BLOB_MAX handling.

    This reduces the number of lines without removing functionality.  I renamed
    a few constants to make more sense.
    The only functional change is that BLOB_MAX is now an actual maximum instead
    of a variable number depending on buf.used().  Previously, it might have
    been as large as BLOB_READ_SIZE = 1MB, which is much larger than BLOB_MAX =
    16k.  Now BLOB_MAX is actually the max.
    Signed-off-by: Avery Pennarun <>
  11. o.fatal(): print error after, not before, usage message.

    git prints the error *before* the usage message, but the more I play with
    it, the more I'm annoyed by that behaviour.  The usage message can be pretty
    long, and the error gots lost way above the usage message.  The most
    important thing *is* the error, so let's print it last.
    Signed-off-by: Avery Pennarun <>
  12. make --usage just print the usage message.

    This is a relatively common option in other programs, so let's make it work
    in case someone tries to use it.
    Signed-off-by: Avery Pennarun <>
Commits on Feb 19, 2011
  1. @lelutin

    doc/import-rsnapshot: small corrections and clarification

    lelutin authored committed
    There's a typo in the --dry-run option explanation.
    The form "[...] or only imports all [...]" is confusing. Turn it around
    a little bit so that the quantifiers are associated more easily to the
    right portions of the sentence.
    Also, add an example for using the backuptarget argument.
    Signed-off-by: Gabriel Filion <>
Commits on Feb 18, 2011
  1. cmd/midx, all else being equal, delete older midxes first.

    Previous runs of 'bup midx -f' might have created invalid midx files with
    exactly the same length as a newer run.  bup's "prune redundant midx" logic
    would quasi-randomly choose one or the other to delete (based on
    alphabetical order of filenames, basically) and sometimes that would be the
    new one, not the old one, so the 'bup midx -f' results never actually kicked
    Now if the file sizes are equal we'll use the mtime as a tie breaker; newer
    is better.
    Signed-off-by: Avery Pennarun <>
  2. t/ a test for the recently-uncovered midx4 problem.

    Signed-off-by: Avery Pennarun <>
  3. _helpers.c: midx4 didn't handle multiple index with the same object.

    It *tried* to handle it, but would end up with a bunch of zero entries at
    the end, which prevents .exists() from working correctly in some cases.
    In midx2, it made sense to never include the same entry twice, because the
    only informatin we had about a particular entry was that it existed.  In
    midx4 this is no longer true; we might sometimes want to know *all* the idx
    files that contain a particular object (for example, when we implement
    expiry later).  So the easiest fix for this bug is to just include multiple
    entries when we have them.
    Signed-off-by: Avery Pennarun <>
  4. cmd/midx: add a --check option.

    Running this on my system does reveal that some objects return
    exists()==False on my midx even though they show up during iteration.
    Now to actually find and fix it...
    Signed-off-by: Avery Pennarun <>
  5. Add git.shorten_hash(), printing only the first few bytes of a sha1.

    The full name is rarely needed and clutters the output.  Let's try this
    instead in a few places.
    Signed-off-by: Avery Pennarun <>
  6. add some additional tests that objcache.refresh() is called.

    ...which it is, so no bugs were fixed here.  Aneurin is sitll exposing a bug
    somehow though.
    Signed-off-by: Avery Pennarun <>
Commits on Feb 17, 2011
  1. cmd/server: add a debug message saying which object caused a suggestion.

    Let's use this to try to debug Aneurin's problem (and potentially others).
    Signed-off-by: Avery Pennarun <>
  2. cmd/list-idx: a quick tool for searching the contents of idx/midx files.

    Signed-off-by: Avery Pennarun <>
  3. Add tests around the bloom ruin and check options

    Brandon Low authored committed
    This generally improves our test coverage of bloom filter behavior and
    more specifically makes sure that check and ruin do something.
    Signed-off-by: Brandon Low <>
  4. Add a bloom --ruin for testing failure cases

    Brandon Low authored committed
    This command option ruins a bloom filter by setting all of its bits to
    Signed-off-by: Brandon Low <>
  5. One more constant for header lengths

    Brandon Low authored committed
    I missed bloom header length in the last pass.
    Signed-off-by: Brandon Low <>
  6. Split PackMidx from into a new

    authored is definitely too big.  It still is, but this helps a bit.
    Signed-off-by: Avery Pennarun <>
  7. move bloom.ShaBloom.create to just bloom.create.

    I don't really like class-level functions.  Ideally we'd just move all the
    creation stuff into cmd/bloom, but is testing them, so it's not
    really worth it.
    Signed-off-by: Avery Pennarun <>
  8. Merge branch 'bl/bloomcheck' into ap/cleanups

    * bl/bloomcheck:
      Bail out immediately instead of redownloading .idx
      Add a --check behavior to verify bloom
      Defines/preprocessor lengths > magic numbers
Something went wrong with that request. Please try again.