Easy-to-use backup and archive tool.
Arkiv could backup your data on a daily or an hourly basis (you can choose which day and/or which hours it will be launched).
It is written in pure shell, so it can be used on any Unix/Linux machine.
Table of contents
- How it works
- Frequently Asked Questions
1. How it works
1.1 General idea
- Generate backup data from local files and databases.
- Store data on the local drive for a few days/weeks, in order to be able to restore fresh data very quickly.
- Store data on Amazon S3 for a few weeks/months, if you need to restore them easily.
- Store data on Amazon Glacier for ever. It's an incredibly cheap storage that should be used instead of Amazon S3 for long-term conservancy.
Data are deleted from the local drive and Amazon S3 when the configured delays are reached.
If your data are backed up multiple time per day (not just every day), it's possible to define a fine-grained purge of the files stored on the local drive and on Amazon S3.
For example, it's possible to:
- remove half the backups after two days
- keep only 2 backups per day after 2 weeks
- keep 1 backup per day after 3 weeks
- remove all files after 2 months
The same kind of configuration could be defined for Amazon S3 archives.
- Arkiv is launched every day (or every hour) by Crontab.
- It creates a directory dedicated to the backups of the day (or the backups of the hour).
- Each configured path is
tar'ed and compressed, and the result is stored in the dedicated directory.
- If MySQL backups are configured, the needed databases are dumped and compressed, in a sub-directory.
- If encryption is configured, the backup files are encrypted.
- Checksums are computed for all the generated files. These checksums are useful to verify that the files are not corrupted after being transfered over a network.
- If Amazon Glacier is configured, all the generated backup files (not the checksums file) are sent to Amazon Glacier. For each one of them, a JSON file is created with the response's content; these files are important, because they contain the archiveId needed to restore the file.
- If Amazon S3 is configured, the whole directory (backup files + checksums file + Amazon Glacier JSON files) is copied to Amazon S3.
- After a configured delay, backup files are removed from the local disk drive.
- If Amazon S3 is configured, all backup files are removed from Amazon S3 after a configured delay. The checksums file and the Amazon Glacier JSON files are not removed, because they are needed to restore data from Amazon Glacier and check their integrity.
Several tools are needed by Arkiv to work correctly. They are usually installed by default on every Unix/Linux distributions.
- A not-so-old
bashShell interpreter located on
tarfor files concatenation (mandatory)
xzfor compression (at least one)
opensslfor encryption (optional)
sha256sumfor checksums computation (mandatory)
tputfor ANSI text formatting (optional: can be manually deactivated; automatically deactivated if not installed)
To install these tools on Ubuntu:
# apt-get install tar gzip bzip2 xz-utils openssl coreutils ncurses-bin
If you want to encrypt the generated backup files (stored locally as well as the ones archived on Amazon S3 and Amazon Glacier), you need to create a symmetric encryption key.
Use this command to do it (you can adapt the destination path):
# openssl rand 32 -out ~/.ssh/symkey.bin
mysqldump on Ubuntu:
# apt-get install mysql-client
xtrabackup on Ubuntu (see documentation):
# wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb # dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb # apt-get update # apt-get install percona-xtrabackup-24
2.1.4 Amazon Web Services
If you want to archive the generated backup files on Amazon S3/Glacier, you have to do these things:
- Create a dedicated bucket on Amazon S3.
- If you want to archive on Amazon Glacier, create a dedicated vault in the same datacenter.
- Create an IAM user with read-write access to this bucket and this vault (if needed).
- Install the AWS-CLI program and configure it.
Install AWS-CLI on Ubuntu:
# apt-get install awscli
Configure the program (you will be asked for the AWS user's access key and secret key, and the used datacenter):
# aws configure
2.2 Source Installation
Get the last version:
# wget https://github.com/Amaury/Arkiv/archive/0.9.2.zip # unzip Arkiv-0.9.2.zip or # wget https://github.com/Amaury/Arkiv/archive/0.9.2.tar.gz # tar xzf Arkiv-0.9.2.tar.gz
# cd Arkiv-0.9.2 # ./arkiv config
Some questions will be asked about:
- If you want a simple installation (one backup per day, everyday, at midnight).
- The local machine's name (will be used as a subdirectory of the S3 bucket).
- The used compression type.
- If you want to encrypt the generated backup files.
- Which files must be backed up.
- Everything about MySQL backup (SQL or binary backup, which databases, host/login/password for the connection).
- Where to store the compressed files resulting of the backup.
- Where to archive data on Amazon S3 and Amazon Glacier (if you want to).
- When to purge files (locally and on Amazon S3).
Finally, the program will offer you to add the Arkiv execution to the user's crontab.
3. Frequently Asked Questions
3.1 Cost and license
What is Arkiv's license?
Arkiv is licensed under the terms of the MIT License, which is a permissive open-source free software license.
More in the file
How much will I pay on Amazon S3/Glacier?
You can use the Amazon Web Services Calculator to estimate the cost depending of your usage.
How to choose the compression type?
You can use one of the three common compression tools (
Usually, you can follow these guidelines:
gzipif you want the best compression and decompression speed.
xzif you want the best compression ratio.
bzip2if you want the best portability (
xzis younger and less widespread).
Here are some helpful links:
- Gzip vs Bzip2 vs XZ Performance Comparison
- Quick Benchmark: Gzip vs Bzip2 vs LZMA vs XZ vs LZ4 vs LZO
The default usage is
xz, because a reduced file size means faster file transfers over a network.
I choose simple mode configuration (one backup per day, every day). Why is there a directory called "00:00" in the backup directory of the day?
This directory means that your Arkiv backup process is launched at midnight.
You may think that the backed up data should have been stored directly in the directory of the day, without a sub-directory for the hour (because there is only one backup per day). But if someday you'd want to change the configuration and do many backups per day, Arkiv would have trouble to manage purges.
How to execute Arkiv with different configurations?
You can add the path to the configuration file as a parameter of the program on the command line.
To generate the configuration file:
# ./arkiv config --config=/path/to/config/file or # ./arkiv config -c /path/to/config/file
To launch Arkiv:
# ./arkiv exec --config=/path/to/config/file or # ./arkiv exec -c /path/to/config/file
You can modify the Crontab to add the path too.
Is it possible to use a public/private key infrastructure for the encryption functionnality?
It is not possible to encrypt data with a public key; OpenSSL's PKI isn't designed to encrypt large data. Encryption is done using an 256 bits AES algorithm, which is symmetrical.
To ensure that only the owner of a private key would be able to decrypt the data, without transfering this key, you have to encrypt the symmetric key using the public key, and then send the encrypted key to the private key's owner.
Here are the steps to do it (key files are usually located in
Create the symmetric key:
# openssl rand 32 -out symkey.bin
Convert the public and private keys to PEM format (usually people have keys in RSA format, using them with SSH):
# openssl rsa -in id_rsa -outform pem -out id_rsa.pem # openssl rsa -in id_rsa -pubout -outform pem -out id_rsa.pub.pem
Encrypt the symmetric key with the public key:
# openssl rsautl -encrypt -inkey id_rsa.pub.pem -pubin -in symkey.bin -out symkey.bin.encrypt
To decrypt the encrypted symmetric key using the private key:
# openssl rsautl -decrypt -inkey id_rsa.pem -in symkey.bin.encrypt -out symkey.bin
To decrypt the data file:
# openssl enc -d -aes-256-cbc -in data.tgz.encrypt -out data.tgz -pass file:symkey.bin
Why is it not possible to archive on Amazon Glacier without archiving on Amazon S3?
When you send a file to Amazon Glacier, you get back an archiveId (file's unique identifier). Arkiv take this information and write it down in a file; then this file is copied to Amazon S3. If the archiveId is lost, you will not be able to get the file back from Amazon Glacier. An archived file that you can't restore is useless. Even if it's possible to get the list of archived files from Amazon Glacier, it's a slow process; it's more flexible to store archive identifiers in Amazon S3 (and the cost to store them is insignificant).
3.3 Output and log
Is it possible to execute Arkiv without any output on STDOUT and/or STDERR?
Yes, you just have to add some options on the command line:
-o) to avoid output on STDOUT
-e) to avoid output on STDERR
You can use these options separately or together.
How to write the execution log into a file?
You can use a dedicated parameter:
# ./arkiv exec --log=/path/to/log/file or # ./arkiv exec -l /path/to/log/file
It will not disable output on the terminal. You can use the options
--no-stderr for that (see previous answer).
How to write log to syslog?
Add the option
-s) on the command line or in the Crontab command.
How to get pure text (without ANSI commands) in Arkiv's log file?
Add the option
-n) on the command line or in the Crontab command. It will act on terminal output as well as log file (see
--log option above) and syslog (see
--syslog option above).
I open the Arkiv log file with less, and it's full of strange characters
less doesn't interpret ANSI text formatting commands (bold, color, etc.) by default.
To enable it, you have to use the option
3.4 Database backup
What kind of database backups are available?
Arkiv could generate two kinds of database backups:
There is two types of binary backups:
- Full backups; the server's files are entirely copied.
- Incremental backups; only the data modified since the last backup (full or incremental) are copied.
You must do a full backup before performing any incremental backup.
Which databases and table engines could be backed up?
If you choose binary backups (using
xtrabackup), Arkiv can handle:
- MySQL (5.1 and above) or MariaDB, with InnoDB, MyISAM and XtraDB tables.
- Percona Server with XtraDB tables.
Note that MyISAM tables can't be incrementally backed up. They are copied entirely each time an incremental backup is performed.
Are binary backups prepared for restore?
No. Binary backups are done using
xtrabackup --backup. The
xtrabackup --prepare step is not done to save time and space. You will have to do it when you want to restore a database (see below).
How to define a full binary backup once per day and an incremental backup every other hours?
You will have to create two different configuration files and add Arkiv in Crontab twice: once for the full backup (everyday at midnight for example), and once for the incremental backups (every hours except midnight).
You need both executions to use the same LSN file. It will be written by the full backup, and read and updated by each incremental backups.
The same process could be used with any other frequency (for example: full backups once a week and incremental backups every other days).
How to restore a SQL backup?
Arkiv generates one SQL file per database. You have to extract the wanted file and process it in your database server:
# unxz /path/to/database_sql/database.sql.xz # mysql -u username -p < /path/to/database_sql/database.sql
How to restore a full binary backup without subsequent incremental backups?
To restore the database, you first need to extract the data:
# tar xJf /path/to/database_data.tar.xz or # tar xjf /path/to/database_data.tar.bz2 or # tar xzf /path/to/database_data.tar.gz
Then you must prepare the backup:
# xtrabackup --prepare --target-dir=/path/to/database_data
Please note that the MySQL server must be shut down, and the 'datadir' directory (usually
/var/lib/mysql) must be empty. On Ubuntu:
# service mysql stop # rm -rf /var/lib/mysql/*
Then you can restore the data:
# xtrabackup --copy-back --target-dir=/path/to/database_data
Files' ownership must be given back to the MySQL user (usually
# chown -R mysql:mysql /var/lib/mysql
Finally you can restart the MySQL daemon:
# service mysql start
How to restore a full + incrementals binary backup?
Let's say you have a full backup (located in
/full/database_data) and three incremental backups (located in
/inc3/database_data), and you have already extracted the backed up files (see previous answer).
First, you must prepare the full backup with the additional
# xtrabackup --prepare --apply-log-only --target-dir=/full/database_data
And then you prepare using all incremental backups in their creation order, except the last one:
# xtrabackup --prepare --apply-log-only --target-dir=/full/database_data --incremental-dir=/inc1/database_data # xtrabackup --prepare --apply-log-only --target-dir=/full/database_data --incremental-dir=/inc2/database_data
Data preparation of the last incremental backup is done without the
# xtrabackup --prepare --target-dir=/full/database_data --incremental-dir=/inc3/database_data
Once every backups have been merged, the process is the same than for a full backup:
# service mysql stop # rm -rf /var/lib/mysql/* # xtrabackup --copy-back --target-dir=/path/to/database_data # chown -R mysql:mysql /var/lib/mysql # service mysql start
On simple mode (one backup per day, every day at midnight), how to set up Arkiv to be executed at another time than midnight?
You just have to edit the configuration file of the user's Cron table:
# crontab -e
How to execute pre- and/or post-backup scripts?
See the previous answer. You just have to add these scripts before and/or after the Arkiv program in the Cron table.
Is it possible to backup more often than every hours?
No, it's not possible.
I want to have colors in the Arkiv log file when it's launched from Crontab, as well as when it's launch from the command line
The problem comes from the Crontab environment, which is very minimal.
You have to set the
TERM environment variable from the Crontab. It is also a good idea to define the
Edit the Crontab:
# crontab -e
And add these three lines at its beginning:
TERM=xterm MAILTOemail@example.com PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
How to receive an email alert when a problem occurs?
MAILTO environment variable at the beginning of your Crontab. See the previous answer.
How to report bugs?
Why is Arkiv compatible only with Bash interpreter?
read buitin command has a
-s parameter for silent input (used for encryption passphrase and MySQL password input without showing them), unavailable on
zsh (for example).
Arkiv looks like Backup-Manager
Yes indeed. Both of them wants to help people to backup files and databases, and archive data in a secure place.
But Arkiv is different in several ways:
- It can manage hourly backups.
- It can transfer data on Amazon Glacier for long-term archiving.
- It can manage complex purge policies.
- The configuration process is simpler (you answer to questions).
- Written in pure shell, it doesn't need a Perl interpreter.
On the other hand, Backup-Manager is able to transfer to remote destinations by SCP or FTP, and to burn data on CD/DVD.