# Hashing
Message -> Hash Function -> Hash Value(number)

variable-length input -> fixed-length output

## Hash function properties
- deterministic behaviour - for given input produces the same output
- fixed-length hash values 
- avalanche effect - when small difference in message results in large differences between hash values

eg. `hash()` function used to hash keys in Python dicts. For cryptographic hash function additional properties must be met.


## Cryptographic hash function properties 
- One-way function property - must be difficult to identify input from given output
- Weak collision resistance - given one message it's infeasible to identify second message that computes to the same hash value
- Strong collision resistance - it's infeasible to find any collision at all


In [None]:
print(hash("dupa"))
print(hash("dupa"))
print(hash("dupa2"))

In [38]:
import hashlib

# list of all hash algorithms
print(hashlib.algorithms_available)

# list of hash algorithms available on all platforms
print(hashlib.algorithms_guaranteed)

{'sm3', 'sha384', 'sha3_224', 'ripemd160', 'sha1', 'blake2b', 'sha3_512', 'sha512_224', 'blake2s', 'sha512_256', 'sha3_384', 'shake_256', 'sha3_256', 'sha224', 'shake_128', 'md5', 'md4', 'mdc2', 'sha512', 'md5-sha1', 'whirlpool', 'sha256'}
{'blake2s', 'sha512', 'shake_128', 'md5', 'blake2b', 'sha384', 'sha3_224', 'sha3_384', 'sha1', 'shake_256', 'sha3_256', 'sha256', 'sha3_512', 'sha224'}


`MD5` and `SHA1` are no longer suitable for data integrity.

Use `SHA2` (standard) or `SHA3` (new standard) or `Blake` (fast). Most common is `SHA256`.


In [37]:
import hashlib

# Python 3 strings saved in unicode code points (UTF-8)
# Hash function argument must be bytes; strings must be encoded to become bytes
hash1 = hashlib.sha256(b'duuupa')
hash2 = hashlib.sha256('duuupa'.encode())
print(hash1.digest_size, 'bytes')

# Hash value in str
print(hash1.hexdigest())
print(hash2.hexdigest())

# Hash value in bytes
print(hash1.digest())
print(hash2.digest())


32 bytes
9e708c9659d9aede8542eb411f146002e2751b8e4e15cf620669c3a152d53f9d
9e708c9659d9aede8542eb411f146002e2751b8e4e15cf620669c3a152d53f9d
b'\x9ep\x8c\x96Y\xd9\xae\xde\x85B\xebA\x1f\x14`\x02\xe2u\x1b\x8eN\x15\xcfb\x06i\xc3\xa1R\xd5?\x9d'
b'\x9ep\x8c\x96Y\xd9\xae\xde\x85B\xebA\x1f\x14`\x02\xe2u\x1b\x8eN\x15\xcfb\x06i\xc3\xa1R\xd5?\x9d'


In [34]:
from hashlib import sha256

# Chunked hash generation using update()
many = sha256()
many.update(b'm')
many.update(b'e')
many.update(b's')
many.update(b's')
print(many.hexdigest())

print(sha256(b'mess').hexdigest())

7d64084076881eb259d7f09a68437e81b421a6a95057f65053815173926d1487
7d64084076881eb259d7f09a68437e81b421a6a95057f65053815173926d1487


Checksums (eg. CRC, Adler-32) are fast and have insufficient collision resistance - can be used for error detection.

Hash functions (SHA2 family, SHA3 family) are slower and have sufficient collision resistance - can be used for testing data integrity.

In [42]:
import zlib

# CRC checksum collision
print(zlib.crc32(b'gnu'))
print(zlib.crc32(b'codding'))

# Adler-32 checksum no collision
print(zlib.adler32(b'gnu'))
print(zlib.adler32(b'codding'))


1774765869
1774765869
42533195
190317273


# Keyed hashing

# Symmetric encryption

# Asymmetric encryption

# Transport Layer Security