Overview of Box Backup
The backup daemon,
bbackupd, runs on all machines to be backed up. The store server daemon,
bbstored runs on a central server. Data is sent to the store server, which stores all data on local filesystems, that is, only on local hard drives. Tape or other archive media is not used.
The system is designed to be easy to set up and run, and cheap to use. Once set up, there should be no need for user or administrative intervention, apart from usual system maintenance. Read why this is a good idea.
bbackupd is configured with a list of directories to back up. It has a lazy approach to backing up data. Every so often, the directories are scanned, and new data is uploaded to the server. This new data must be over a set age before it is uploaded. This prevents rapid revisions of a file resulting in many uploads of the same file in a short period of time.
It can also operate in a snapshot mode, which behaves like traditional backup software. When instructed by an external bbackupctl program, it will upload all changed files to the server.
The daemon is always running, although sleeping most of the time. In lazy mode, it is completely self contained - scripts running under cron jobs are not used. The objective is to keep files backed up, not to make snapshots of the filesystem at particular points in time available.
If an old version of the file is present on the server, a modified version of the rsync algorithm is used to upload only the changed portions of the file.
After a new version is uploaded, the old version is still available (subject to disc space on the server). Similarly, a deleted file is still available. The only limit to their availability is space allocated to this account on the server.
Future versions will add the ability to mark the current state of files on the server, and restore from this mark. This will emulate the changing of tapes in a tape backup system.
Restoring files is performed using a query tool,
bbackupquery. This can be used to restore entire directories, or as an 'FTP-like' tool to list and retrieve individual files. Old versions and deleted files can be retrieved using this tool for as long as they are kept on the server.
Client Resource Usage
bbackupd uses only a minimal amount of disc space to store records on uploaded files - less than 32 bytes per directory and file over a set size threshold. However, it minimises the amount of queries it must make to the server by storing, in memory, a data structure which allows it to determine what data is new. It does not need to store a record of all files, essentially just the directory names and last modification times. This is not a huge amount of memory.
If there are no changes to the directories, then the client will not even connect to the server.
The files, directories, filenames and file attributes are all encrypted. By examining the stored files on the server, it is only possible to determine the approximate sizes of a files and the tree structure of the disc (not names, just number of files and subdirectories in a directory). By monitoring the actions performed by a client, it is possible to determine the frequency and approximate scope of changes to files and directories.
Stored files are encrypted using AES for file data and Blowfish for metadata. This does mean that the one thing you do need to back up off-site and look after is a 1k file containing your keys - the data on the server is useless without it. But the key never changes once generated, so that makes looking after it much easier.
The connections between the server and client are encrypted using TLS (the latest version of SSL). Traffic analysis is possible to some degree, but limited in usefulness.
An attacker will not be able to recover the backed up data without the encryption keys. Of course, you won't be able to recover your files without the keys either, so you must make a conventional, secure, backup of these keys.
SSL certificates are used to authenticate clients. UNIX user accounts are not used to minimise the dependence on the configuration of the operating system hosting the server.
A script is provided to run the necessary certification authority with minimal effort.
The server daemon is designed to be simple to deploy, and run on the cheapest hardware possible. To avoid the necessity to use expensive hardware RAID or software RAID with complex setup, it (optionally) stores files using RAID techniques.
It does not need to run as a privileged user.
Each account has a set amount of disc space allocated, with a soft and a hard limit. If the account exceeds the soft limit, a housekeeping process will start deleting old versions and deleted files to reduce the space used to below the soft limit. If the backup client attempts to upload a file which causes the store to exceed the hard limit, the upload will be refused.