## Cryptographic Hash Functions

A useful CHF should satisfy the following key properties: 
- Uniformity: the outputs, or *digests* produced by a CHF should look random 
- Determinism: a CHF must be deterministic, producing the same output for the same input
- Irreversibility: a CHF is a *one-way function*. It should be infeasible to try and invert the hashing
- Approximate injectivity: tiny changes in the input should lead to widly divergent digests

Since the CHF outputs do not reveal the actual contents of the original data, they enable validation while preserving privacy. 

### Example of CHF in Python with SHA-256

In [1]:
# Begin by importing some necessary modules
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes

#Helper function that returns the number of characters different in two strings
def char_diff(str1, str2):
    return sum ( str1[i] != str2[i] for i in range(len(str1)) )

# Messages to be hashed
message_1 = b"Buy 10000 shares of WXYZ stock now!"
message_2 = b"Buy 10000 shares of VXYZ stock now!"

print(f"The two messages differ by { char_diff(message_1, message_2)} characters")

The two messages differ by 1 characters


In [2]:
# Create new SHA-256 hash objects, one for each message
chf_1 = hashes.Hash(hashes.SHA256(), backend=default_backend())
chf_2 = hashes.Hash(hashes.SHA256(), backend=default_backend())

# Update each hash object with the bytes of the corresponding message
chf_1.update(message_1)
chf_2.update(message_2)

# Finalize the hash process and obtain the digests
digest_1 = chf_1.finalize()
digest_2 = chf_2.finalize()

#Convert the resulting hash to hexadecimal strings for convenient printing
digest_1_str = digest_1.hex()
digest_2_str = digest_2.hex()

#Print out the digests as strings 
print(f"digest-1: {digest_1_str}")
print(f"digest-2: {digest_2_str}")

print(f"The two digests differ by { char_diff(digest_1_str, digest_2_str)} characters")

digest-1: 6e0e6261b7131bd80ffdb2a4d42f9d042636350e45e184b92fcbcc9646eaf1e7
digest-2: 6b0abb368c3a1730f935b68105e3f3ae7fd43d7e786d3ed3503dbb45c74ada46
The two digests differ by 57 characters


Even though the two messages only differed by 1 character, after SHA-256 they differ by 57. 

The security of a CHF is typically assessed based on resistance to two types of attacks: 

#### 1. Pre-Image Resistance

Given a digest, it should be *infeasible* to find the input. A good CHF is designed such that attackers get forced to use a brute force approach, with time complexity $2^n$ 

#### 2. Collision Resistance

It must be difficult to find two different inputs that hash to the same digest. A *cryptographic has collision* occurs when two inputs hash to the same digest. While collisions inevitable exist given the many-to-one nature of CHFs, a good CHF makes it infeasible to locate one at will. 

This prevents any forgeries that hash to the same value, bypassing in-place security measures. 

Collision resistance is a harder requirement than pre-image resistance and necessitates output lengths twice as long as that needed for pre-image resistance. This is because of the brute force attack known as the ***birthday attack*** which can be used to identity hash collisions has time complexity $2^{n/2}$. 

Thus, the security of a hash function is primarily influenced by its hash length. *The longer the hash, the more secure it is.*

