Big files are dealt with by just mapping them into memory.

# Generate a test buffer

In [1]:
from struct import Struct
s = Struct("<H")
testString = bytearray(b"".join(s.pack(i) for i in range(0x7fff)))

# Hashing
## Create the hasher

In [2]:
from DilatedHash import DilatedHash
hh = DilatedHash(len(testString))

## Hash the buffer

In [3]:
hh.digest(testString)

<DilatedHash.DilatedHash at 0x7f90c040e130>

## Get the hash

In [4]:
h = bytes(hh)
h

b'x\x9c\xab\x97\xbc0\x9b\x01\x08"\xb7Z\xff\x02\xd1\xf3\x7f\x84\xce\x07\xd1\xee\xb7\xff\xf0\x83\xe8rk\x1fn\x10m\xd0o \x07\xa2\xed\x0f\xfc\xd8\x05\xa2\xcf;<\x00\x8b\xef\xd7zZ\x0e\xa2\x05VK\xe8\x83hI5W[\x10\xbd\xf5\xe4\xdb\r \xba\xe6\xf9\x1e!\xb09\xbf\xd4\'\x81\xe8\x06/\xfb\x83 \xdaA\xe6\x18X\xdc\xa3\xe1\xf7q\x10\xed\xf4\xde\x93\tD\xb3(2\x80\x01\x009U"g'

# Verifying the hash
## Load the hash

In [5]:
hh = DilatedHash(len(testString))  # it's empty
hh.load(h)

## Verifying
`-1` means everything is OK.
Any positive number is an offset of the block with incorrect hash.


In [6]:
hh.verify(testString)

-1

Shifting changes are usually detected immediately and are reported as the last block having an incorrect hash

In [7]:
hh.verify(b"b" + testString)

65504

Changes not touching the blocks that are hashed are not detected at all.

In [8]:
hh.verify(testString + b"b")

-1

We can get a slice of the `i`-th block

In [9]:
s = hh.getBlockSliceByIndex(10)
s

slice(49520, 49536, None)

And changes within the slice will likely be detected!

In [10]:
testString[s.start + 4] = ord(b"Z")
offsetOfCorruptedBlock = hh.verify(testString)
offsetOfCorruptedBlock

49520

In [11]:
s.start == offsetOfCorruptedBlock

True