This is a tool to help validate the integrity of a set of files, e.g. data backups/exports.
- Checks recognized file types for errors, e.g. invalid json.
- Generates thumbnails of files when possible.
- Displays statistics about file types and unrecognized files.
It produces a report like the following in the terminal (seeing images in the terminal requires iTerm2):
Or as HTML:
Install:
- Install python3 and pip
pip3 install spot_check_files[imgcat]
- imgcat is optional and enables support for displaying thumbnails in iTerm2 on OS X
Run:
spotcheck PATH
This will output basic stats and any errors the tool detects in the given files/directories. If you're using iTerm2 on Mac, it will also show thumbnails of files.
Alternatively, you can generate an HTML report:
spotcheck -H PATH > out.html
The full list of options can be seen here or by running spotcheck --help
.
This tool can also be used programmatically.
The main entry point for the library is the CheckerRunner
class in spot_check_files.checker.
You can add support for new file types by subclassing the Checker
class from that module.
The command-line tool currently relies entirely on file extension to determine file types.
Type | Support |
---|---|
Archive files:
|
Recursively checks all the files in the archive (including other archives) |
CSV files:
|
Checks that the CSV dialect can be detected and read by Python, and builds a thumbnail |
Image files:
|
Checks that the file can be loaded by the Python imaging library Pillow, and builds a thumbnail |
JSON files: .json |
Checks that the json can be parsed, and builds a thumbnail of the pretty-printed json |
Text files:
|
Treating the file as plaintext, builds a thumbnail |
XML files: .xml |
Checks that the xml can be parsed, and builds a thumbnail of the pretty-printed xml |
anything supported by OS X Quick Look (HTML, Office docs, ...) | OS X ONLY: generates thumbnails using Quick Look. This greatly increases the number of supported file types. However, it's slow. |
Setup:
- Install python3 and pip
- Clone the repo
- I recommend creating a venv:
cd spot_check_files python3 -m venv venv source venv/bin/activate
- Install dependencies:
pip install . pip install -r requirements-dev.txt
To run tests:
PYTHONPATH=src pytest
(Overriding PYTHONPATH as shown ensures the tests run against the code in the src/ directory rather than the installed copy of the package.)
To run the CLI:
PYTHONPATH=src python -m spot_check_files ...
Bug reports and pull requests are welcome on GitHub at https://github.com/brokensandals/spot_check_files.
This is available as open source under the terms of the MIT License.
This package includes and uses a copy of the Monoid font, which is also MIT-licensed.