This patch checks for the presence of a colon if the --remote option is used in bup save, bup split, bup join, and bup init. Even though specifying *only* a pathname without a hostname: is perfectly valid, it's confusing to allow users to do so, because if they specify "-r hostname" it will be treated as a path and thus give them a confusing error message. Requiring a colon will avoid this. It adds a few test cases to demonstrate that the code works properly. It also wraps the remote connection in a try except to prevent a traceback if there is an error (so far I have only seen this happen with an invalid bup dir parameter) And I added the netbeans project folder to gitignore Signed-off-by: David Roda <firstname.lastname@example.org>
Gabriel Filion pointed out that bup's version number (which we added to the man pages automatically) was not detected when you used a bup tarball generated from 'git archive' (or more likely, you let GitHub call 'git archive' for you). That makes sense, since our version detection was based on having a .git directory around, which the tarball doesn't. Instead, let's create a .gitattributes file and have it auto-substitute some version information during 'git archive'. Of course, if we actually *do* have a .git directory, continue to use that. While we're here, add a new 'bup version' command and alias "bup --version" and "bup -V" to call it, since those options are pretty standard.
The bup-* programs shouldn't need to be installed into /usr/bin; we should search for them in /usr/lib somewhere. I could have left the names as cmd/cmd-*.py, but the cmd-* was annoying me because of tab completion. Now I can type cmd/ran<tab> to get random-cmd.py.
The man page (bup.1) is total drivel for the moment, though. And arguably we could split up the manpages per subcommand like git does, but maybe that's overkill at this stage.
This makes it work with fink's version of python, among possibly other things. So now we can build chashsplit.so even on MacOS X tiger, even though tiger's python 2.3 is too old, by installing fink's python24 package first.
The majority of the memory usage in bup split/save is now caused by searching pack indexes for sha1 hashes. The problem with this is that, in the common case for a first full backup, *none* of the object hashes will be found, so we'll *always* have to search *all* the packfiles. With just 45 packfiles of 200k objects each, that makes about (18-8)*45 = 450 binary search steps, or 100+ 4k pages that need to be loaded from disk, to check *each* object hash. memtest.py lets us see how fast RSS creeps up under various conditions, and how different optimizations affect the result.
There were a few things that weren't quite done how I would have done them, so I changed the implementation. Should still work in cygwin, though. The only actual functional changes are: - index.Reader.close() now actually sets m=None rather than just closing it - removed the "if rename fails, then unlink first" logic, which is seemingly not needed after all. - rather than special-casing cygwin to use "hostname" instead of "hostname -f", it turns out python has a socket.getfqdn() that does what we want.
Saves a complete tree by recursively iterating into subdirs, and splits large files into chunks using the same algorithm as 'bup split'. Currently no support for special files (symlinks etc), and it generates the resulting git tree incorrectly (by just turning / into _ in filenames).
This lets you generate a git "tree" object with the list of hashes in a particular file, so you can treat the file as a directory as far as git is concerned. And 'bup join' knows how to take a tree and concatenate it together to reverse the operation. Also refactored a bunch of stuff in cmd-split.py.
This is about 80x faster than the old speed (27megs/sec instead of 330k/sec) but still quite a lot slower than the 60+megs/sec I get *without* the checksum stuff. There are a few inefficiencies remaining, but not such easy ones as before...
Useful for testing. Note that we *don't* see the random number generator, so every time you generate the bytes, you get the same sequence. This is also vastly faster than /dev/urandom, since it doesn't try to be cryptographically secure. It generates about 200 megs/sec on my computer, which is much faster than a disk and thus useful for testing the speed of hashsplit.
The checksum algorithm is crap, and we don't actually generate the output files yet, so I'm guessing it's still junk.