This project provides a command line interface (CLI) for checking the integrity of a folder located in a hard drive. The tool recursively walks through a path, skipping some folders (not considered valuable in a backup), and computing the CRC32C checksum of each file. The output is a text file with one path/checksum entry per line. The given root path is removed from the paths and entries are alphabetically sorted, so that you can easily compare folders located on two different hard drives with the tool of your choice such as diff
, git
, Meld. The log can be saved into a file in addition to the standard output and contains info, warnings, and errors.
- Top security cloud file storage providers may lack data integrity under specific conditions such as poor network connection. Lost files can be detected by comparing the original local folder to the fetched folder,
- Cloud file storage providers may rename or duplicate files containing special characters in the file name. A copy from one drive to another may remove file metadata, such as the creation/modification date. In those cases, the checksum shall not change,
- Some filesystems do not have integrity check, or do not handle special characters, or may corrupt files without notice. A copied file may be lost if the hard drive has been removed improperly and not actually flushed. Consequently, integrity check is useful, especially on your backup since you realise you have lost files when it is too late,
- You or your applications/system may use symbolic links. If you make the mistake to back up only those links, and the actual files are lost, then you will only find broken links. This tool explicitly tells you which files/folders are symbolic links so that you can act accordingly.
Notices:
- CRC32C is fast, but not recommended for checking malicious change,
- For detecting intrusion on your system, AIDE is probably a good choice,
- This tool has only been tested on a GNU/Linux operating system,
- This tool does not copy anything,
rsync
can be used for selectively sync folders.
For example integral-drive --input /path/to/your/stuff --output integral_drive.txt --log-file backup.log
. If the --output
option is missing, integral_drive.txt
will be saved at the root of the input path. If the --log-file
option is missing, the log will only be printed in the standard output.
The total number of entries is the total number of files and directories excluding symbolic links and directories not considered valuable in a backup. The list of non-valuable directories is configurable in config.toml. You can download this file and append the argument --config path/to/your/config.toml
.
The generated list of checksums is saved into integral_drive.txt
and contains for example:
# Processed at: 2022-03-15 17:05:28 UTC
# Input path: /path/to/your/stuff
blabla.txt; D2D45B01
copy_blabla.txt; D2D45B01
copy_image.jpg; 8BDC85FF
image.jpg; 8BDC85FF
modified_blabla.txt; C5B9D87F
nothing.txt; <EMPTY>
recompressed_image.jpg; CFC7A135
sub_folder/blabla.txt; D2D45B01
sub_folder/image.jpg; <DENIED>
sub_folder/symlink_blabla.txt; <SYMLINK>
symlink_sub_folder; <SYMLINK>
The value <EMPTY>
is printed if the file is empty. An empty directory is skipped. The value <DENIED>
is printed if you don't have read access to the file (or directory), in that case you may retry after either by changing the file/directory permission or by elevating yourself to sudo/admin user. The value <SYMLINK>
is printed if the file (or directory) is a symbolic link, the file/directory is not read/open because the link could potentially break without notice (f.i. when the drive is mounted to another point). Finally, the value <UNKNOWN>
is printed if something unexpected occurred.
To sum up:
Checksum Error | Detail | Apply to File | Apply to Directory | Log Level |
---|---|---|---|---|
<EMPTY> |
Empty file | Yes | No | Nil |
<DENIED> |
Permission denied | Yes | Yes | Warn |
<SYMLINK> |
Symbolic link | Yes | Yes | Info |
<UNKNOWN> |
Unexpected error | Yes | Yes | Error |
Notices:
- The output file is wiped out at the beginning of the process, and filled in at the very end,
- Run
detox
if your backup has missing files containing exotic characters.
It is possible to provide multiple input paths that would be processed in parallel and use the full power of your multicore CPU! Let's say you just backed up your valuable files and want to compute the checksums of both the original files and your backup. You can run: integral-drive --input /path/to/your/stuff --input /path/to/your/backup --output integral_drive_stuff.txt --output integral_drive_backup.txt --log-file backup.log
. The log file is shared. If no output file is given, then an integral_drive.txt
would be found in every given input. The number of input paths is not limited.
This tool has no dependencies except GLIBC 2.31+ that should already be in your system.
Once installed you can get help with integral-drive --help
or man integral-drive
for a more detailed and completely offline documentation generated from integral-drive.1.md.
- Download the RPM
- (Optional) The packages are digitally signed to make sure no third party can alter the content. Download my PGP public key (fingerprint
DFFF34860D361C52
) and import it withsudo rpm --import pgp_keys.asc
and verify the package withrpm -K integral-drive*.rpm
, the output must bedigests signatures OK
- Install by running
sudo rpm -U integral-drive*.rpm
- Download the DEB (and associated .sig for GPG signature verification)
- (Optional) The package is digitally signed to make sure no third party can alter the content. Download my PGP public key (fingerprint
DFFF34860D361C52
) and import it withgpg --import pgp_keys.asc
and verify the package withgpg --verify integral-drive*.deb.asc
, the output must beGood signature
with the email address given in the PGP public key. - Install by running
sudo dpkg -i integral-drive*.deb
First, you need to install Rust, clone this repo, and build with cargo build --release
This project is licensed under the Hippocratic License 3.0. Please read the license (md/pdf) carefully since this is not open according to the Open Source Initiative, and not free according to the Free Software Foundation.
Please read CONTRIBUTING.md.
Please read SECURITY.md.