Intuitive and extendable checksumming for python objects
Latest Release |
|
|
|
Build Status |
|
Coverage |
|
License |
|
Downloads |
|
Platforms |
|
- Provide a checksumming toolkit for python with out of the box support for common types
- Architect a framework for implementing customized checksumming logic
- Produce high quality checksums with extraordinarily low collision rates
- Build a toolkit for using and manipulating checksums
- Test it all with 100% coverage and support python 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12
Source code is available on github: https://github.com/QCoding
Install with conda:
# from conda forge: https://anaconda.org/conda-forge/qsum
conda install qsum
Install with pip:
# from PyPI: https://pypi.org/project/qsum/
pip install qsum
# Functional Interface
from qsum import checksum
checksum('abc')
# Class Interface
from qsum import Checksum
Checksum('abc').checksum_bytes
- QSUM CHECKSUM = TYPE PREFIX + DATA CHECKSUM
- The first two bytes of every checksum represent the type and will be referred to as the 'type prefix'
- The rest of the checksum in a digest of the byte representation of the object and will be refered to as the 'data checksum'
- Respect the same contract as
__hash__
with regards to: 'The only required property is that objects which compare equal have the same hash value' - Do not salt hash values (unless requested) and maintain stability in checksums throughout python sessions and versions along with releases of this package
- PYTHONHASHSEED should have no effect on checksums
- Provide significantly longer checksums than
__hash__
which 'is typically 8 bytes on 64-bit builds and 4 bytes on 32-bit builds' - Represent all checksums as bytes but provide a toolkit to view more human readable formats like hexdigests
- Base checksums on object contents and permit the calculation of checksums on mutable objects
- By default the environment is not included in the checksum but individual package versions can be included if the package name is added via the depends_on argument
- To include the entire python environment in the checksum:
from qsum import checksum, DependsOn checksum('abc', depends_on=DependsOn.PythonEnv)
- The great majority of Built-in Types including collections are checksummable
- bool, int, float, complex, str, bytes, tuple, list, dict, set, deque, etc.
- Common types have registered type prefixes which can be used to recover the type from the checksum
- Custom container classes that inherit from common python containers (E.g. tuple, list, set, dict) are checksummable
- The class name is not recoverable from the type prefix but will be added as salt to the data checksum to prevent collisions
- Functions are checksummed based on a combination of their source code, attributes and module location
- Modules are checksummed simply based on the hash of their source code
- When passed an open file handle qsum will include all the bytes of the file in the checksum calculation