Skip to content

brokensandals/spot_check_files

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spot_check_files

This is a tool to help validate the integrity of a set of files, e.g. data backups/exports.

  • Checks recognized file types for errors, e.g. invalid json.
  • Generates thumbnails of files when possible.
  • Displays statistics about file types and unrecognized files.

It produces a report like the following in the terminal (seeing images in the terminal requires iTerm2):

screenshot of sample output in iTerm2

Or as HTML:

screenshot of rendered sample HTML output

Usage

Install:

  1. Install python3 and pip
  2. pip3 install spot_check_files[imgcat]
    • imgcat is optional and enables support for displaying thumbnails in iTerm2 on OS X

Run:

spotcheck PATH

This will output basic stats and any errors the tool detects in the given files/directories. If you're using iTerm2 on Mac, it will also show thumbnails of files.

Alternatively, you can generate an HTML report:

spotcheck -H PATH > out.html

The full list of options can be seen here or by running spotcheck --help.

This tool can also be used programmatically. The main entry point for the library is the CheckerRunner class in spot_check_files.checker. You can add support for new file types by subclassing the Checker class from that module.

Supported file types

The command-line tool currently relies entirely on file extension to determine file types.

Type Support
Archive files:
  • .tar
  • .tar.bz2
  • .tar.gz
  • .tar.xz
  • .tbz
  • .tgz
  • .txz
  • .zip
Recursively checks all the files in the archive (including other archives)
CSV files:
  • .csv
  • .tsv
Checks that the CSV dialect can be detected and read by Python, and builds a thumbnail
Image files:
  • .bmp
  • .gif
  • .icns
  • .ico
  • .jpg
  • .jpeg
  • .png
  • .tiff
  • .webp
Checks that the file can be loaded by the Python imaging library Pillow, and builds a thumbnail
JSON files: .json Checks that the json can be parsed, and builds a thumbnail of the pretty-printed json
Text files:
  • .md
  • .txt
Treating the file as plaintext, builds a thumbnail
XML files: .xml Checks that the xml can be parsed, and builds a thumbnail of the pretty-printed xml
anything supported by OS X Quick Look (HTML, Office docs, ...) OS X ONLY: generates thumbnails using Quick Look. This greatly increases the number of supported file types. However, it's slow.

Development

Setup:

  1. Install python3 and pip
  2. Clone the repo
  3. I recommend creating a venv:
    cd spot_check_files
    python3 -m venv venv
    source venv/bin/activate
  4. Install dependencies:
    pip install .
    pip install -r requirements-dev.txt

To run tests:

PYTHONPATH=src pytest

(Overriding PYTHONPATH as shown ensures the tests run against the code in the src/ directory rather than the installed copy of the package.)

To run the CLI:

PYTHONPATH=src python -m spot_check_files ...

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/brokensandals/spot_check_files.

License

This is available as open source under the terms of the MIT License.

This package includes and uses a copy of the Monoid font, which is also MIT-licensed.

About

Helps validate the integrity of data backups/exports.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published