# Cryptographic Hash Functions Simulation

This notebook demonstrates the mechanics and properties of cryptographic hash functions using Python's standard libraries.

### What is a Hash Function?
Unlike encryption, which is two-way (encrypt -> decrypt), a hash function is a **one-way** mathematical process. It takes input data of *any* size (a short password, a 4GB movie file) and condenses it into a fixed-size string of characters, usually represented in hexadecimal. This output is called a **digest**, **fingerprint**, or simply a **hash**.

### Key Properties we will explore:
1.  **Deterministic:** The same input always produces the exact same hash.
2.  **Avalanche Effect:** A tiny change in the input results in a massive, unpredictable change in the output.
3.  **Pre-image Resistance (One-Way):** It is computationally infeasible to reverse the process (to determine the original input from the hash alone).
4.  **Collision Resistance:** It is highly unlikely to find two different inputs that produce the same hash.

In [None]:
import hashlib
import os

# Helper function to print hashes neatly
def print_hash(label, data, hex_digest):
    print(f"Input String : '{data.decode()}'")
    print(f"{label:<13}: {hex_digest}")
    print("-" * 80)

## Step 1: Basic Hashing with SHA-256

**SHA-256 (Secure Hash Algorithm 256-bit)** is currently one of the most widely used industry standards. It always produces a 256-bit (32-byte) output, typically represented as a 64-character hexadecimal string.

Note that hash functions operate on **bytes**, so we must encode our strings before hashing.

In [None]:
message = b"Hello IoT World!"

# Create a sha256 hash object
hasher = hashlib.sha256()

# Feed the data into the hasher
hasher.update(message)

# Finalize the process and get the hexadecimal digest
digest = hasher.hexdigest()

print_hash("SHA-256 Digest", message, digest)

## Step 2: The Avalanche Effect

This is a critical property for security. If a hash function were predictable (e.g., changing the input slightly only changed the output slightly), an attacker could easily reverse-engineer it.

Let's see what happens when we change just **one character** (or even just one bit) of the input.

In [None]:
data_original = b"iot-firmware-v1.0"
# We change only the last character from '0' to '1'
data_modified = b"iot-firmware-v1.1"

# Calculate hashes directly using the shortcut method
hash_original = hashlib.sha256(data_original).hexdigest()
hash_modified = hashlib.sha256(data_modified).hexdigest()

print("VISUALIZING THE AVALANCHE EFFECT\n")
print_hash("Original", data_original, hash_original)
print_hash("Modified", data_modified, hash_modified)

print("Notice how completely different the two resulting hashes are!")

## Step 3: Using Different Algorithms (SHA-3)

SHA-3 is the newest member of the Secure Hash Algorithm family. It uses an entirely different internal mathematical structure (based on Keccak) compared to SHA-2. It is often used as a fallback or alternative in modern systems.

Switching algorithms in Python is easy.

In [None]:
message_sha3 = b"Testing SHA-3 algorithm"

# Using SHA3-256
sha3_digest = hashlib.sha3_256(message_sha3).hexdigest()

# For comparison, let's see the SHA-256 of the same data
sha2_digest = hashlib.sha256(message_sha3).hexdigest()

print_hash("SHA3-256", message_sha3, sha3_digest)
print_hash("SHA-256", message_sha3, sha2_digest)

## Step 4: Practical Use Case - Verifying Integrity

In IoT, hashes are crucial for **Secure Boot** and **OTA (Over-The-Air) Updates**. Before installing a new firmware update, the device calculates the hash of the downloaded file and compares it to the "official" hash provided by the vendor.

Let's simulate this process.

In [None]:
# --- THE VENDOR SIDE ---
# Simulate a firmware file (just some random bytes)
firmware_data = os.urandom(128)

# Calculate the official hash that will be published on the website
official_hash = hashlib.sha256(firmware_data).hexdigest()
print(f"[VENDOR] Published Official Hash: {official_hash}")

# --- THE NETWORK/ATTACKER ---
# Simulate the file getting corrupted during download or tampered with by an attacker
# We change the very first byte of the firmware file
corrupted_firmware = bytearray(firmware_data)
corrupted_firmware[0] = corrupted_firmware[0] ^ 0xFF # Flip bits

print("[NETWORK] The firmware file was corrupted during transit!\n")

# --- THE IOT DEVICE SIDE ---
print("[DEVICE] Download complete. Verifying integrity...")

# 1. Calculate hash of the downloaded (corrupted) data
downloaded_hash = hashlib.sha256(corrupted_firmware).hexdigest()
print(f"[DEVICE] Calculated Hash:       {downloaded_hash}")

# 2. Compare with the official hash
if downloaded_hash == official_hash:
    print("\n✅ SUCCESS: Integrity verified. Installing firmware.")
else:
    print("\n❌ DANGER: Hash mismatch! The file is corrupted or tampered with. Aborting installation.")

## Summary

You have seen that hash functions are tools for ensuring **data integrity**. They create a unique, irreversible fingerprint of data. If even a single bit of the data changes, the fingerprint changes completely, alerting systems to potential corruption or tampering.