In [349]:
# Import the NumPy library for efficient numerical operations on arrays and integers.
import numpy as np

# Set NumPy to ignore overflow warnings for cleaner output during bitwise operations.
np.seterr(over='ignore')

{'divide': 'warn', 'over': 'ignore', 'under': 'ignore', 'invalid': 'warn'}

# Problem 1: Binary Words and Operations

## Overview

- Purpose: implement and test SHA-256 bitwise primitives and helper functions.
- Key functions: `Parity`, `Ch`, `Maj`, `Rotr`, `Shr`, `Sigma0/Sigma1`, `sigma0/sigma1`.
- Notes: Uses 32-bit NumPy types; run helper cells at top before examples.

## `Parity(x, y, z)`

- **Purpose**: Bitwise parity function used in SHA-1 (for reference).

- **Formula**: `Parity(x, y, z) = x ⊕ y ⊕ z`- **Note**: All operations are performed as 32-bit integers using NumPy

- **Args**: Three 32-bit integers `x`, `y`, `z`- **Behavior**: Returns 1 for each bit position where an odd number of inputs have a 1 bit
- **Returns**: 32-bit integer result of XOR operation on all three inputs

In [350]:
# Parity returns 1 if an odd number of inputs are 1, else 0.
def Parity(x, y, z):
    """
    Implements the Parity function defined in the Secure Hash Standard.
    Parity(x, y, z) = x XOR y XOR z
    All operations are done as 32-bit integers.
    """
    # Convert inputs to 32-bit integers to ensure correct bitwise operations.
    x = np.uint32(x)
    y = np.uint32(y)
    z = np.uint32(z)
    # Perform XOR operation and return the result as a 32-bit integer.
    return np.uint32(x ^ y ^ z)

In [351]:
# Example usage: Calculate the parity of three binary words.
print(Parity(0b10101010, 0b11001100, 0b11110000))

150


## `Ch(x, y, z)`

- **Purpose**: Choice function - "chooses" bits from `y` or `z` based on corresponding bit in `x`.

- **Formula**: `Ch(x, y, z) = (x ∧ y) ⊕ (¬x ∧ z)`- **Note**: All operations are performed as 32-bit integers using NumPy

- **Args**: Three 32-bit integers `x`, `y`, `z`- **Example**: If `x[i]=1` then result `[i]=y[i]`, else result`[i]=z[i]`

- **Returns**: 32-bit integer where each bit is chosen from `y` (if `x` bit is 1) or `z` (if `x` bit is 0)- **Usage**: Used in the SHA-256 compression function (section 4.1.2 of FIPS 180-4)

In [352]:
# Ch selects bits from y or z based on x: (x AND y) XOR (NOT x AND z), 32-bit ops.
def Ch(x, y, z):
    """
    Implements the Ch (Choose) function defined in the Secure Hash Standard.
    Ch(x, y, z) = (x AND y) XOR (NOT x AND z)
    All operations are done as 32-bit integers.
    """
    # Convert inputs to 32-bit integers to ensure correct bitwise operations.
    x = np.uint32(x)
    y = np.uint32(y)
    z = np.uint32(z)
    # Perform the Ch operation and return the result as a 32-bit integer.
    return np.uint32((x & y) ^ (~x & z))

In [353]:
# Example usage: Calculate the Ch function of three binary words.
print(Ch(0b10101010, 0b11001100, 0b11110000))  

216


## `Maj(x, y, z)`

- **Purpose**: Majority function - returns the majority bit value at each position.

- **Formula**: `Maj(x, y, z) = (x ∧ y) ⊕ (x ∧ z) ⊕ (y ∧ z)`- **Note**: All operations are performed as 32-bit integers using NumPy

- **Args**: Three 32-bit integers `x`, `y`, `z`- **Behavior**: For each bit position, outputs the value that appears in at least 2 of the 3 inputs

- **Returns**: 32-bit integer where each bit is 1 if at least two of the corresponding input bits are 1- **Usage**: Used in the SHA-256 compression function (section 4.1.2 of FIPS 180-4)

In [354]:
# Maj returns 1 if at least two inputs are 1, else 0.
def Maj(x, y, z):
    """
    Implements the Maj (Majority) function defined in the Secure Hash Standard.
    Maj(x, y, z) = (x AND y) XOR (x AND z) XOR (y AND z)
    All operations are done as 32-bit integers.
    """
    # Convert inputs to 32-bit integers to ensure correct bitwise operations.
    x = np.uint32(x)
    y = np.uint32(y)
    z = np.uint32(z)
    # Perform the Maj operation and return the result as a 32-bit integer.
    return np.uint32((x & y) ^ (x & z) ^ (y & z))

In [355]:
# Example usage: Calculate the Maj function of three binary words.
print(Maj(0b10101010, 0b11001100, 0b11110000))

232


## `Rotr(x, num)` and `Shr(x, num)` — Bit Rotation and Shift

In [356]:
# Rotr rotates bits of x to the right by n positions, 32-bit ops.
def Rotr(x, num):
    """
    Implements the Rotr (Rotate right) function defined in the Secure Hash Standard.
    Rotr(n, x) = (x right rotated by n bits)
    All operations are done as 32-bit integers.
    """
    # Convert input to 32-bit integer to ensure correct bitwise operations.
    return np.uint32((x >> num) | (x << (32 - num)))

In [357]:
# Shr shifts bits of x to the right by n positions, 32-bit ops.
def Shr(x, num):
    """
    Implements the Shr (Shift right) function defined in the Secure Hash Standard.
    Shr(n, x) = (x right shifted by n bits)
    All operations are done as 32-bit integers.
    """
    # Convert input to 32-bit integer to ensure correct bitwise operations.
    return np.uint32(x >> num)

## `Sigma0(x)` — $\Sigma_0^{\{256\}}(x)$

- **Purpose**: Upper-case Sigma-0 function used in SHA-256 compression (applies to working variables).
- **Formula**: $\Sigma_0^{\{256\}}(x) = \text{ROTR}^2(x) ⊕ \text{ROTR}^{13}(x) ⊕ \text{ROTR}^{22}(x)$

- **Args**: One 32-bit integer `x`- **Note**: All operations are performed as 32-bit integers using NumPy

- **Returns**: 32-bit integer result of XOR-ing three different rotations of `x`- **Operations**: Uses right rotation (ROTR) by 2, 13, and 22 bits
- **Usage**: Applied to working variable `a` in the SHA-256 compression function (section 6.2.2)

In [358]:
# Sigma0 function as defined in the Secure Hash Standard.
def Sigma0(x):
    """
    Implements the Σ0 (Sigma0) function defined in the Secure Hash Standard.
    Σ0(x) = ROTR^2(x) XOR ROTR^13(x) XOR ROTR^22(x)
    All operations are done as 32-bit integers.
    """
    # Convert input to a 32-bit unsigned integer to ensure correct bitwise operations.
    x = np.uint32(x)
    # Perform the rotations and XOR operations.
    rotr2 = Rotr(x, 2)
    rotr13 = Rotr(x, 13)
    rotr22 = Rotr(x, 22)
    # Return the result as a 32-bit integer.
    return np.uint32(rotr2 ^ rotr13 ^ rotr22)

In [359]:
# Example usage: Calculate the Sigma0 function of a binary word.
print(Sigma0(0b10101010101010101010101010101010))

1431655765


## `Sigma1(x)` — $\Sigma_1^{\{256\}}(x)$

- **Purpose**: Upper-case Sigma-1 function used in SHA-256 compression (applies to working variables).

- **Formula**: $\Sigma_1^{\{256\}}(x) = \text{ROTR}^6(x) ⊕ \text{ROTR}^{11}(x) ⊕ \text{ROTR}^{25}(x)$- **Note**: All operations are performed as 32-bit integers using NumPy

- **Args**: One 32-bit integer `x`- **Operations**: Uses right rotation (ROTR) by 6, 11, and 25 bits

- **Returns**: 32-bit integer result of XOR-ing three different rotations of `x`- **Usage**: Applied to working variable `e` in the SHA-256 compression function (section 6.2.2)

In [360]:
# Sigma1 function as defined in the Secure Hash Standard.
def Sigma1(x):
    """
    Implements the Σ1 (Sigma1) function defined in the Secure Hash Standard.
    Σ1(x) = ROTR^6(x) XOR ROTR^11(x) XOR ROTR^25(x)
    All operations are done as 32-bit integers.
    """
    # Convert input to a 32-bit unsigned integer to ensure correct bitwise operations.
    x = np.uint32(x)
    # Perform the rotations and XOR operations.
    rotr6 = Rotr(x, 6)
    rotr11 = Rotr(x, 11)
    rotr25 = Rotr(x, 25)
    # Return the result as a 32-bit integer.
    return np.uint32(rotr6 ^ rotr11 ^ rotr25)

In [361]:
# Example usage: Calculate the Sigma0 function of a binary word.
print(Sigma1(0b10101010101010101010101010101010))

2863311530


## `sigma0(x)` — $\sigma_0^{\{256\}}(x)$

- **Purpose**: Lower-case sigma-0 function used in SHA-256 message schedule (applies to message words).

- **Formula**: $\sigma_0^{\{256\}}(x) = \text{ROTR}^7(x) ⊕ \text{ROTR}^{18}(x) ⊕ \text{SHR}^3(x)$- **Note**: All operations are performed as 32-bit integers using NumPy

- **Args**: One 32-bit integer `x`- **Operations**: Uses right rotation (ROTR) by 7 and 18 bits, plus right shift (SHR) by 3 bits

- **Returns**: 32-bit integer result of XOR-ing two rotations and one shift of `x`- **Usage**: Used in message schedule expansion to generate 64 words from 16 input words (section 6.2.2)

In [362]:
# sigma0 function as defined in the Secure Hash Standard.
def sigma0(x):
    """
    Implements the σ0 (sigma0) function defined in the Secure Hash Standard.
    σ0(x) = ROTR^7(x) XOR ROTR^18(x) XOR SHR^3(x)
    All operations are done as 32-bit integers.
    """
    # Convert input to a 32-bit unsigned integer to ensure correct bitwise operations.
    x = np.uint32(x)
    # Perform the rotations and shift operations.
    rotr7 = Rotr(x, 7)
    rotr18 = Rotr(x, 18)
    shr3 = Shr(x, 3)
    # Return the result as a 32-bit integer.
    return np.uint32(rotr7 ^ rotr18 ^ shr3)

In [363]:
# Example usage: Calculate the sigma0 function of a binary word.
print(sigma0(0b10101010101010101010101010101010))

3937053354


## `sigma1(x)` — $\sigma_1^{\{256\}}(x)$

- **Purpose**: Lower-case sigma-1 function used in SHA-256 message schedule (applies to message words).

- **Formula**: $\sigma_1^{\{256\}}(x) = \text{ROTR}^{17}(x) ⊕ \text{ROTR}^{19}(x) ⊕ \text{SHR}^{10}(x)$- **Note**: All operations are performed as 32-bit integers using NumPy

- **Args**: One 32-bit integer `x`- **Operations**: Uses right rotation (ROTR) by 17 and 19 bits, plus right shift (SHR) by 10 bits

- **Returns**: 32-bit integer result of XOR-ing two rotations and one shift of `x`- **Usage**: Used in message schedule expansion to generate 64 words from 16 input words (section 6.2.2)

In [364]:
# sigma1 function as defined in the Secure Hash Standard.
def sigma1(x):
    """
    Implements the σ1 (sigma1) function defined in the Secure Hash Standard.
    σ1(x) = ROTR^17(x) XOR ROTR^19(x) XOR SHR^10(x)
    All operations are done as 32-bit integers.
    """
    # Convert input to a 32-bit unsigned integer to ensure correct bitwise operations.
    x = np.uint32(x)
    # Perform the rotations76 and shift operations.
    rotr17 = Rotr(x, 17)
    rotr19 = Rotr(x, 19)
    shr10 = Shr(x, 10)
    # Return the result as a 32-bit integer.
    return np.uint32(rotr17 ^ rotr19 ^ shr10)

In [365]:
# Example usage: Calculate the Sigma0 function of a binary word.
print(Sigma1(0b10101010101010101010101010101010))

2863311530


# Problem 2: Fractional Parts of Cube Roots

## Overview

- Purpose: compute fractional parts of cube roots of primes and derive SHA-256 K constants.
- Key functions: `primes(n)`, `cube_root_fractions`, `fractional32_hex`, `cube_root_constants`, `test_cube_root_constants`.
- Notes: Uses NumPy; compare results to FIPS reference values to verify correctness.

## `primes(n)`

- Purpose: Generate the first `n` prime numbers and return them as a Python list.
- Args: `n` (int) — number of primes to generate. If `n <= 0` an empty list is returned.
- Returns: `list[int]` containing the first `n` primes (e.g. `primes(4)` -> `[2, 3, 5, 7]`).
- Notes: This implementation uses simple trial division which is efficient enough for small `n` such as 64. For very large `n` a segmented sieve is preferred.
- Example: `primes(64)` produces the first 64 prime numbers required for the cube root constants.


In [366]:
def primes(n: int) -> list:
    """
    Return the first `n` prime numbers.

    This implementation uses simple trial division which is fine for small n
    (here n=64). It returns an empty list for n <= 0.
    """
    if not isinstance(n, int):
        raise TypeError("n must be an integer")
    if n <= 0:
        return []

    primes_list = [2]
    candidate = 3
    while len(primes_list) < n:
        is_prime = True
        for p in primes_list:
            if p * p > candidate:
                break
            if candidate % p == 0:
                is_prime = False
                break
        if is_prime:
            primes_list.append(candidate)
        candidate += 2
    return primes_list

In [367]:
print(primes(64)) # Example usage: print all primes less than 64.
print(primes(10)) # Example usage: print all primes less than 10.
print(primes(0))  # Example usage: print primes when n is 0 (should be empty list).


[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
[]


## Helper Functions
- `cube_root_fractions(prime_list)`: Returns fractional parts of the cube roots for the given primes as a NumPy array.
- `fractional32_hex(frac_array)`: Converts fractional parts to first 32 bits and formats each as an 8-character hex string.

In [368]:
# Compute the fractional parts of the cube roots for the given prime list.
def cube_root_fractions(prime_list: list) -> np.ndarray:
    """
    Compute the fractional parts of the cube roots for the given prime list.
    Returns a NumPy array of fractional parts (float64).
    """
    arr = np.array(prime_list, dtype=np.float64)
    c = np.cbrt(arr)
    frac = c - np.floor(c)

    return frac


In [369]:
# Extract the first 32 bits of each fractional part and format as 8-digit hex strings.
def fractional32_hex(frac_array: np.ndarray) -> list[str]:
    """
    Extract the first 32 bits of each fractional part and format as 8-digit hex strings.
    """
    words = (np.floor(frac_array * (2**32)).astype(np.uint64) & 0xFFFFFFFF)
    
    return [f"{int(w):08x}" for w in words]

## `cube_root_constants()`
- Purpose: Compute SHA-like cube-root `K` constants from the first 64 primes.
- Output: 64 hex strings (8 chars) — first 32 bits of each fractional cube root.
- Implementation: Uses helpers `cube_root_fractions(prime_list)` and `fractional32_hex(frac_array)`; behavior unchanged from original.
- Example: `cube_root_constants()` → `['428a2f98', '71374491', ...]`.

In [370]:
# Compute cube root K constants: first 32 bits of fractional part of cube roots
def cube_root_constants():
    """
    Compute cube root K constants: first 32 bits of fractional part of cube roots
    of the first 64 primes. Return list of hex strings.
    """
    # Get first 64 primes
    prime_list = primes(64)

    # Fractional parts of cube roots
    frac = cube_root_fractions(prime_list)

    # First 32 bits of fractional part formatted as hex
    computed_hex = fractional32_hex(frac)

    return computed_hex

In [371]:
print(cube_root_constants())  # Example usage: print cube root K constants.

['428a2f98', '71374491', 'b5c0fbcf', 'e9b5dba5', '3956c25b', '59f111f1', '923f82a4', 'ab1c5ed5', 'd807aa98', '12835b01', '243185be', '550c7dc3', '72be5d74', '80deb1fe', '9bdc06a7', 'c19bf174', 'e49b69c1', 'efbe4786', '0fc19dc6', '240ca1cc', '2de92c6f', '4a7484aa', '5cb0a9dc', '76f988da', '983e5152', 'a831c66d', 'b00327c8', 'bf597fc7', 'c6e00bf3', 'd5a79147', '06ca6351', '14292967', '27b70a85', '2e1b2138', '4d2c6dfc', '53380d13', '650a7354', '766a0abb', '81c2c92e', '92722c85', 'a2bfe8a1', 'a81a664b', 'c24b8b70', 'c76c51a3', 'd192e819', 'd6990624', 'f40e3585', '106aa070', '19a4c116', '1e376c08', '2748774c', '34b0bcb5', '391c0cb3', '4ed8aa4a', '5b9cca4f', '682e6ff3', '748f82ee', '78a5636f', '84c87814', '8cc70208', '90befffa', 'a4506ceb', 'bef9a3f7', 'c67178f2']


## `test_cube_root_constants()`
- Purpose: Verify computed constants against FIPS 180-4 reference values.
- Returns: Dict with `computed_hex`, `reference_hex`, and `mismatches` (list of `(index, computed, reference)`).
- Usage: `res = test_cube_root_constants(); print(res['mismatches'])` — empty list means all match.

In [372]:
def test_cube_root_constants():
    """
    Test the computed cube root K constants against FIPS 180-4 reference values.
    
    Returns a dict with computed_hex, reference_hex, and any mismatches found.
    """
    # Get computed constants
    computed_hex = cube_root_constants()
    
    # Reference K constants from FIPS 180-4 (first 64 words)
    reference_hex = [
        "428a2f98","71374491","b5c0fbcf","e9b5dba5","3956c25b","59f111f1","923f82a4","ab1c5ed5",
        "d807aa98","12835b01","243185be","550c7dc3","72be5d74","80deb1fe","9bdc06a7","c19bf174",
        "e49b69c1","efbe4786","0fc19dc6","240ca1cc","2de92c6f","4a7484aa","5cb0a9dc","76f988da",
        "983e5152","a831c66d","b00327c8","bf597fc7","c6e00bf3","d5a79147","06ca6351","14292967",
        "27b70a85","2e1b2138","4d2c6dfc","53380d13","650a7354","766a0abb","81c2c92e","92722c85",
        "a2bfe8a1","a81a664b","c24b8b70","c76c51a3","d192e819","d6990624","f40e3585","106aa070",
        "19a4c116","1e376c08","2748774c","34b0bcb5","391c0cb3","4ed8aa4a","5b9cca4f","682e6ff3",
        "748f82ee","78a5636f","84c87814","8cc70208","90befffa","a4506ceb","bef9a3f7","c67178f2",
    ]

    # Compare
    mismatches = []
    for i, (calc, ref) in enumerate(zip(computed_hex, reference_hex), start=1):
        if calc != ref:
            mismatches.append((i, calc, ref))

    # Return results
    return {
        "computed_hex": computed_hex,
        "reference_hex": reference_hex,
        "mismatches": mismatches,
    }

# Example usage: test cube_root_constants.
print(test_cube_root_constants()) 

{'computed_hex': ['428a2f98', '71374491', 'b5c0fbcf', 'e9b5dba5', '3956c25b', '59f111f1', '923f82a4', 'ab1c5ed5', 'd807aa98', '12835b01', '243185be', '550c7dc3', '72be5d74', '80deb1fe', '9bdc06a7', 'c19bf174', 'e49b69c1', 'efbe4786', '0fc19dc6', '240ca1cc', '2de92c6f', '4a7484aa', '5cb0a9dc', '76f988da', '983e5152', 'a831c66d', 'b00327c8', 'bf597fc7', 'c6e00bf3', 'd5a79147', '06ca6351', '14292967', '27b70a85', '2e1b2138', '4d2c6dfc', '53380d13', '650a7354', '766a0abb', '81c2c92e', '92722c85', 'a2bfe8a1', 'a81a664b', 'c24b8b70', 'c76c51a3', 'd192e819', 'd6990624', 'f40e3585', '106aa070', '19a4c116', '1e376c08', '2748774c', '34b0bcb5', '391c0cb3', '4ed8aa4a', '5b9cca4f', '682e6ff3', '748f82ee', '78a5636f', '84c87814', '8cc70208', '90befffa', 'a4506ceb', 'bef9a3f7', 'c67178f2'], 'reference_hex': ['428a2f98', '71374491', 'b5c0fbcf', 'e9b5dba5', '3956c25b', '59f111f1', '923f82a4', 'ab1c5ed5', 'd807aa98', '12835b01', '243185be', '550c7dc3', '72be5d74', '80deb1fe', '9bdc06a7', 'c19bf174', 'e4

# Problem 3: Padding

## Overview

- Purpose: implement SHA-256 message padding and 512-bit block parsing per FIPS 180-4.
- Key functions: `append_terminator`, `pad_to_block_size`, `append_length`, `block_parse`.
- Workflow: start from a bytes message, append 0x80, zero‑pad until `(len(padded) + 8) % 64 == 0`, then append the 64‑bit big‑endian bit length and yield 64‑byte blocks.
- Notes: Produces blocks ready for the compression function; tests below cover empty input, "abc", and the 56‑byte edge case.

## SHA-256 Message Padding (Section 5.1.1 & 5.2.1)

**Padding rules:**
1. Append a single `1` bit (0x80 byte) immediately after the message
2. Append `k` zero bits where `k` is the smallest non-negative solution to:
   - `(message_length_bits + 1 + k + 64) ≡ 0 (mod 512)`
3. Append the 64-bit big-endian representation of the original message length in bits

**Result:** The padded message length is a multiple of 512 bits (64 bytes).

**Example:** 
- Message `"abc"` (24 bits) → pad with 1 bit, 423 zero bits, then 64-bit length = 512 bits total (1 block)
- Empty message (0 bits) → pad with 1 bit, 447 zero bits, then 64-bit length = 512 bits total (1 block)

## Helper Functions for Padding

Breaking down the padding process into modular steps for clarity and reusability.

## `append_terminator(padded)`

- **Purpose**: Append the '1' bit (0x80 byte) to mark the end of the message.
- **Args**: `padded` (bytearray) — the message being padded.
- **Modifies**: Appends 0x80 in-place.
- **FIPS Ref**: Section 5.1.1, step 1.

In [373]:
def append_terminator(padded):
    """
    Append the '1' bit (0x80 byte) to the message.
    """
    padded.append(0x80)


## `pad_to_block_size(padded)`

- **Purpose**: Append zero bytes until the message is ready for the 64-bit length field.
- **Args**: `padded` (bytearray) — the message with 0x80 already appended.
- **Modifies**: Appends zero bytes until `(len(padded) + 8) % 64 == 0`.
- **FIPS Ref**: Section 5.1.1, step 2.

In [374]:
def pad_to_block_size(padded):
    """
    Append zero bytes until room for the 64-bit length field is available.
    """
    while (len(padded) + 8) % 64 != 0:
        padded.append(0x00)


## `append_length(padded, msg_len_bits)`

- **Purpose**: Append the original message length in bits as a 64-bit big-endian integer.
- **Args**: 
  - `padded` (bytearray) — the padded message.
  - `msg_len_bits` (int) — original message length in bits.
- **Modifies**: Appends 8 bytes (64 bits) representing the message length.
- **FIPS Ref**: Section 5.1.1, step 3.

In [375]:
def append_length(padded, msg_len_bits):
    """
    Append the original message length in bits as a 64-bit big-endian integer.
    """
    padded.extend(msg_len_bits.to_bytes(8, byteorder='big'))

## `block_parse(msg)`

- Purpose: Generator function that parses a message into 512-bit (64-byte) blocks with proper SHA-256 padding.
- Args: `msg` (bytes) — the message to process.
- Yields: Each 512-bit block as a `bytes` object (64 bytes each).
- Padding: Follows FIPS 180-4 Section 5.1.1:
  1. Append 0x80 (single 1 bit followed by zeros)
  2. Append zero bytes until `(len + 9) % 64 == 0`
  3. Append original message length in bits as 64-bit big-endian integer
- Example: `list(block_parse(b"abc"))` yields one 64-byte block with proper padding.

In [376]:
def block_parse(msg):
    """
    Generator that yields 512-bit (64-byte) blocks from msg with SHA-256 padding.
    
    Implements FIPS 180-4 Section 5.1.1 (Padding) and 5.2.1 (Parsing).
    Uses helper functions to break padding into clear, modular steps.
    
    Args:
        msg (bytes): The message to process.
        
    Yields:
        bytes: Each 512-bit block (64 bytes).
    """
    if not isinstance(msg, bytes):
        raise TypeError("msg must be a bytes object")
    
    # Calculate original message length in bits
    msg_len_bits = len(msg) * 8
    
    # Start with the original message
    padded = bytearray(msg)
    
    # Apply padding steps
    append_terminator(padded)
    pad_to_block_size(padded)
    append_length(padded, msg_len_bits)
    
    # Yield 512-bit (64-byte) blocks
    for i in range(0, len(padded), 64):
        yield bytes(padded[i:i+64])


## Test Cases

Testing `block_parse()` with various message lengths to verify proper padding and block output.

In [377]:
def test_block_parse():    
    """
    Test the block_parse function with various message lengths.
    """
    # Test 1: Empty message (should produce 1 block)
    print("Test 1: Empty message")
    msg1 = b""
    blocks1 = list(block_parse(msg1))
    print(f"  Message length: {len(msg1)} bytes")
    print(f"  Number of blocks: {len(blocks1)}")
    print(f"  Block size: {len(blocks1[0])} bytes")
    print(f"  First block (hex): {blocks1[0].hex()}")
    print(f"  Last 8 bytes (length field): {blocks1[0][-8:].hex()} = {int.from_bytes(blocks1[0][-8:], 'big')} bits")
    print()

    # Test 2: "abc" (3 bytes, should produce 1 block)
    print("Test 2: Message 'abc'")
    msg2 = b"abc"
    blocks2 = list(block_parse(msg2))
    print(f"  Message length: {len(msg2)} bytes")
    print(f"  Number of blocks: {len(blocks2)}")
    print(f"  Block size: {len(blocks2[0])} bytes")
    print(f"  First 4 bytes: {blocks2[0][:4].hex()} (should be '61626380')")
    print(f"  Last 8 bytes (length field): {blocks2[0][-8:].hex()} = {int.from_bytes(blocks2[0][-8:], 'big')} bits")
    print()

    # Test 3: 56-byte message (should produce 2 blocks - just over the edge)
    print("Test 3: 56-byte message (requires 2 blocks)")
    msg3 = b"a" * 56
    blocks3 = list(block_parse(msg3))
    print(f"  Message length: {len(msg3)} bytes")
    print(f"  Number of blocks: {len(blocks3)}")
    print(f"  Block size of first block: {len(blocks3[0])} bytes")
    print(f"  Last 8 bytes of final block (length field): {blocks3[-1][-8:].hex()} = {int.from_bytes(blocks3[-1][-8:], 'big')} bits")
    print()

    print("All tests completed. Each block should be exactly 64 bytes.")

# Run the test function to validate block parsing.
test_block_parse()

Test 1: Empty message
  Message length: 0 bytes
  Number of blocks: 1
  Block size: 64 bytes
  First block (hex): 80000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
  Last 8 bytes (length field): 0000000000000000 = 0 bits

Test 2: Message 'abc'
  Message length: 3 bytes
  Number of blocks: 1
  Block size: 64 bytes
  First 4 bytes: 61626380 (should be '61626380')
  Last 8 bytes (length field): 0000000000000018 = 24 bits

Test 3: 56-byte message (requires 2 blocks)
  Message length: 56 bytes
  Number of blocks: 2
  Block size of first block: 64 bytes
  Last 8 bytes of final block (length field): 00000000000001c0 = 448 bits

All tests completed. Each block should be exactly 64 bytes.


# Problem 4: Hashes

## Overview

- Purpose: implement the core SHA-256 compression over 512-bit blocks.
- Key functions: `message_schedule`, `hash` (uses `Sigma0/Sigma1`, `sigma0/sigma1`, `Ch`, `Maj`, and K constants).
- Workflow: parse 64-byte blocks from `block_parse`, expand to 64 words (W[0..63]), run 64 rounds with K and W to update `H0..H7` modulo $2^{32}$.
- Notes: Uses NumPy `uint32` arithmetic; a quick test with the message "abc" is provided below.

## `message_schedule(block)` — Message Schedule Expansion

- **Purpose**: Expand a 512-bit (64-byte) message block into 64 32-bit words for SHA-256 compression.
- **Formula**: 
  - $W[0..15]$ = first 16 words parsed from block as big-endian 32-bit integers
  - $W[t]$ = $\sigma_1^{(256)}(W[t-2]) \oplus W[t-7] \oplus \sigma_0^{(256)}(W[t-15]) \oplus W[t-16]$ for $t = 16..63$
- **Args**: `block` (bytes, 64 bytes) — one 512-bit message block
- **Returns**: `list[np.uint32]` of 64 words (W[0] through W[63])
- **FIPS Reference**: Section 6.2.2 (Parsing the Message Block)
- **Operations**: Uses `sigma0()` and `sigma1()` helper functions
- **Note**: All arithmetic is modulo $2^{32}$ (32-bit unsigned wraparound)

In [378]:
# Constants K: First 64 constants from cube roots of first 64 primes.
def message_schedule(block):
    """
    Expand a 512-bit message block into 64 32-bit words.
    
    Args:
        block: 64-byte message block (512 bits)
    
    Returns:
        List of 64 32-bit words (W[0] to W[63])
    
    From FIPS 180-4 Section 6.2.2:
    - W[0..15]: First 16 words (message block parsed as big-endian 32-bit words)
    - W[16..63]: Computed using sigma0 and sigma1 functions
    """
    # Parse block into 16 32-bit words (big-endian)
    W = []
    for i in range(16):
        word = int.from_bytes(block[i*4:(i+1)*4], byteorder='big')
        W.append(np.uint32(word))
    
    # Expand to 64 words
    for t in range(16, 64):
        s0 = sigma0(W[t-15])
        s1 = sigma1(W[t-2])
        W.append(np.uint32(W[t-16] + s0 + W[t-7] + s1) & 0xFFFFFFFF)
    
    return W

## `hash(current, block)` — SHA-256 Compression Function

- **Purpose**: Update SHA-256 hash state by processing one 512-bit message block through 64 compression rounds.
- **Input**: 
  - `current` (list of 8 `np.uint32`) — current hash state $(H_0, H_1, ..., H_7)$
  - `block` (bytes, 64 bytes) — one 512-bit message block
- **Output**: `list[np.uint32]` of 8 updated hash words $(H'_0, H'_1, ..., H'_7)$
- **Algorithm** (FIPS 180-4 Section 6.2.2):
  1. Initialize working variables $(a, b, c, d, e, f, g, h)$ from `current` hash
  2. Expand block into 64 words using `message_schedule(block)`
  3. For each of 64 rounds $t = 0..63$:
     - $T_1 = h + \Sigma_1^{(256)}(e) + \text{Ch}(e, f, g) + K_t + W_t$
     - $T_2 = \Sigma_0^{(256)}(a) + \text{Maj}(a, b, c)$
     - Shift variables: $(h, g, f, e, d, c, b, a) \leftarrow (g, f, e, d+T_1, c, b, a, T_1+T_2)$
  4. Add working variables back to hash: $H_i' = H_i + v_i$ (mod $2^{32}$)
- **Helper Functions**: Uses `message_schedule()`, `Sigma0()`, `Sigma1()`, `Ch()`, `Maj()`, and cube root constants
- **Note**: All arithmetic modulo $2^{32}$; function is deterministic and idempotent per block

In [379]:
# Generate the first 64 cube root constants in hexadecimal.
def hash(current, block):
    """
    Compute the next SHA-256 hash value.
    
    From FIPS 180-4 Section 6.2.2 - SHA-256 Hash Computation:
    - Takes current 256-bit hash (8 words) and a 512-bit message block
    - Performs 64 rounds of compression function
    - Returns new 256-bit hash (8 words)
    
    Args:
        current: Tuple/list of 8 32-bit integers (H0-H7) representing current hash
        block: 64-byte message block (512 bits)
    
    Returns:
        Tuple of 8 32-bit integers representing the updated hash
    """
    # Initialize working variables from current hash
    a, b, c, d, e, f, g, h = [np.uint32(x) for x in current]
    
    # Get K constants (first 64 constants from cube roots of first 64 primes)
    K_hex = cube_root_constants()

    # Convert hex constants to 32-bit unsigned integers
    K = [np.uint32(int(k, 16)) for k in K_hex]
    
    # Get message schedule (64 words from block)
    W = message_schedule(block)
    
    # 64 compression rounds
    for t in range(64):
        T1 = np.uint32(h + Sigma1(e) + Ch(e, f, g) + K[t] + W[t])
        T2 = np.uint32(Sigma0(a) + Maj(a, b, c))
        h = g
        g = f
        f = e
        e = np.uint32(d + T1)
        d = c
        c = b
        b = a
        a = np.uint32(T1 + T2)
    
    # Add compressed chunk to current hash values
    H = [
        np.uint32(current[0] + a),
        np.uint32(current[1] + b),
        np.uint32(current[2] + c),
        np.uint32(current[3] + d),
        np.uint32(current[4] + e),
        np.uint32(current[5] + f),
        np.uint32(current[6] + g),
        np.uint32(current[7] + h),
    ]
    
    return H

In [380]:
def test_hash():
    """
    Test the SHA-256 hash function implementation with a known input.
    """
    # Test usage of the hash function
    print("\nTesting the hash function with the message 'abc':")
    test_msg = b'abc'
    padded_msg = block_parse(test_msg)

    # padded_msg is a generator of 64-byte blocks — consume it directly
    blocks = list(padded_msg)

    # Initial hash values (H0-H7) from FIPS 180-4 Section 5.3.3
    initial_hash = [ 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
                    0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19 ]

    # Process each block
    for block in blocks:
        # Update hash with current block
        current_hash = hash(initial_hash, block)

    # Format final hash as hexadecimal strings
    hash_hex = [f"0x{value:08x}" for value in current_hash]
    print(f"Computed hash for 'abc': {hash_hex}")

# Run the hash test
test_hash()


Testing the hash function with the message 'abc':
Computed hash for 'abc': ['0xba7816bf', '0x8f01cfea', '0x414140de', '0x5dae2223', '0xb00361a3', '0x96177a9c', '0xb410ff61', '0xf20015ad']


# Problem 5: Passwords


## Overview

- Purpose: verify and (optionally) recover plaintext passwords from given SHA-256 hashes using the notebook’s pure-Python implementation.
- Key functions: `sha256_hex`, candidate wordlist, and a matcher loop; relies on `block_parse` and `hash` from Problem 4.
- Workflow: compute SHA-256 for each candidate, compare to targets, collect matches; expand by supplying larger wordlists when needed.
- Notes: Hex digests are case-insensitive in matching here; constrain searches to avoid long runs. A quick check with "password" is provided.

In [381]:
# List of SHA-256 hashed passwords.
hashed_passwords = [
    "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8",
    "873ac9ffea4dd04fa719e8920cd6938f0c23cd678af330939cff53c3d2855f34",
    "b03ddf3ca2e714a6548e7495e2a03f5e824eaac9837cd7f159c67b90fb4b7342",
]

In [382]:
# Common weak passwords to test against.
passwords = [
    "password","baseball","dragon","cheese","P@ssw0rd","123456","qwerty","letmein",
    "monkey","football","iloveyou","admin","welcome","login","abc123","starwars",
    "trustno1","sunshine","master","hello","freedom","whatever","qazwsx","654321",
    "superman","1q2w3e4r","password1","zaq1zaq1","password123", "123qwe","12345",
    "123456789","12345678", "1234","123","111111","000000","123321","121212",
    "password!","passw0rd","letmein123","football1","baseball1","welcome1",
    "admin123", "qwerty123","dragon123","iloveyou1","trustno1!","sunshine1",
]

## `sha256_hex(msg_bytes)`

- **Purpose**: Compute the SHA-256 digest of `msg_bytes` using the notebook's pure-Python implementation.
- **Args**: `msg_bytes` (bytes) — message to hash.
- **Returns**: Lowercase hexadecimal string (64 hex characters) representing the 256-bit digest.
- **Notes**: This function uses `block_parse()` and `hash()` defined earlier in the notebook, and is intended for direct comparison with values in `hashed_passwords`.

In [383]:
# SHA-256 implementation (from problem 4).
def sha256_hex(msg_bytes: bytes):
    '''
    Compute SHA-256 hash of msg_bytes and return as hex string.
    1. Initialize hash values (first 32 bits of the fractional parts of the square roots of the first 8 primes).
    2. Process each 512-bit block of the message.
    3. Produce the final hash value (concatenate H0-H7).
    '''
    # Compute SHA-256 hash of msg_bytes and return as hex string.
    initial = [0x6a09e667,0xbb67ae85,0x3c6ef372,0xa54ff53a,0x510e527f,0x9b05688c,0x1f83d9ab,0x5be0cd19]
    # Process each 512-bit block
    for block in block_parse(msg_bytes):
        # Update hash with current block
        initial = hash(initial, block)
    # Format final hash as concatenated hex string
    return ''.join(f"{int(x):08x}" for x in initial)

In [384]:
# Attempt to crack the given SHA-256 hashed passwords using a small candidate list.
def crack_passwords(hashed_passwords):
    '''
    Attempt to crack the given SHA-256 hashed passwords using a small candidate list.
    Prints found passwords and their hashes.
    '''    
    # Small candidate password list for demonstration purposes    # Use existing SHA-256 helpers in this notebook to crack the given hashes
    targets = set(h.lower() for h in hashed_passwords)

    found = {}
    # Check each candidate password
    for p in passwords:
        # Compute SHA-256 hash
        h = sha256_hex(p.encode("utf-8"))
        # Check if hash matches any target
        if h in targets:
            found[h] = p
            print("FOUND:", p, "->", h)

    # Report results
    if not found:
        print("No matches in candidate list. To search more, provide a larger wordlist.")
    else:
        print("Matches:", found)

print(crack_passwords(hashed_passwords)) 

FOUND: password -> 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
FOUND: cheese -> 873ac9ffea4dd04fa719e8920cd6938f0c23cd678af330939cff53c3d2855f34
FOUND: P@ssw0rd -> b03ddf3ca2e714a6548e7495e2a03f5e824eaac9837cd7f159c67b90fb4b7342


Matches: {'5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8': 'password', '873ac9ffea4dd04fa719e8920cd6938f0c23cd678af330939cff53c3d2855f34': 'cheese', 'b03ddf3ca2e714a6548e7495e2a03f5e824eaac9837cd7f159c67b90fb4b7342': 'P@ssw0rd'}
None


## Cracked Passwords

- Hash: 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8 → plaintext: **password**.
- Hash: 873ac9ffea4dd04fa719e8920cd6938f0c23cd678af330939cff53c3d2855f34 → plaintext: **cheese**.
- Hash: b03ddf3ca2e714a6548e7495e2a03f5e824eaac9837cd7f159c67b90fb4b7342 → plaintext: **P@ssw0rd**.

### How they were found
1. Took the provided SHA-256 hashes and loaded them into hashed_passwords.
2. Used the notebook’s pure-Python SHA-256 implementation (block_parse, hash, sha256_hex).
3. Ran a small dictionary ("password", "baseball", "dragon", "cheese", "P@ssw0rd", etc.) in crack_passwords to hash each candidate and compare against the targets.
4. Matches were printed when a candidate’s digest equaled a target hash, revealing the plaintexts above.

this tool was quite helpful: https://10015.io/tools/sha256-encrypt-decrypt (enter the plaintext to obtain the SHA-256 hex or enter the hash to verify/check it against the plaintext).

## Security Improvements for Password Hashing

The attack performed above is a **dictionary attack** — testing common passwords and comparing their SHA-256 hashes to the targets. This is effective because:

- **SHA-256 is fast**: Designed for data integrity, not password storage. An attacker can test millions of candidates per second.
- **No salt**: The same plaintext always produces the same hash, enabling rainbow table attacks and making it easy to identify identical passwords.
- **No cost function**: There is no intentional delay per hash attempt; the computation is optimized for speed rather than resistance.
- **Deterministic output**: An attacker can precompute hashes of common passwords and reuse them across multiple systems.

### Recommended Security Improvements

1. **Use a Cryptographic Salt** (16+ bytes of random data per password)
   - Store: `hash = SHA-256(salt || password)`
   - Benefit: Forces attackers to recompute every candidate password, eliminating rainbow tables.
   - Storage: Keep salt alongside the hash (salt does not need to be secret).

2. **Switch to Purpose-Built Key Derivation Functions (KDFs)**
   - **bcrypt**: Includes salt, adjustable cost factor (rounds), and intentional slowness.
   - **scrypt**: Memory-hard; resistant to GPU/ASIC attacks due to high memory consumption.
   - **argon2**: Modern NIST-recommended standard; tunable time, memory, and parallelism.
   - **PBKDF2**: Iterative hashing with configurable iteration count (e.g., 100,000+).

3. **Apply a Work Factor / Cost Parameter**
   - Make each hash computation take 0.1–1 second (not milliseconds).
   - Increases attacker cost exponentially: 1 million attempts becomes infeasible.
   - Adjust as hardware improves to maintain resistance.

4. **Use High Iteration Counts for PBKDF2**
   - At minimum, 100,000–600,000 iterations (OWASP recommendation for 2024).
   - Each iteration multiplies the time required per candidate.

5. **Never Reuse Salts**
   - Generate a unique random salt for every password.
   - Store the salt in plaintext alongside the hash.

6. **Increase Salt Length**
   - Use at least 16 bytes (128 bits) of cryptographically random salt.
   - Larger salts make precomputation impractical.

### Example: Using bcrypt (Python)
```python
import bcrypt

# Hashing a password
password = b"cheese"
salt = bcrypt.gensalt(rounds=12)  # Cost factor 12
hashed = bcrypt.hashpw(password, salt)

# Verifying a password
is_correct = bcrypt.checkpw(password, hashed)
```

**Why this is secure**: Even with the plaintext password and hash, an attacker cannot reverse it in reasonable time due to the cost parameter and salt.

### Summary Table

| Method           | Speed    | Salt? | Cost Factor? | Notes                                 |
|------------------|----------|-------|--------------|---------------------------------------|
| SHA-256 (bare)   | Very fast | No   | No           |  Vulnerable to dictionary attacks   |
| PBKDF2 (100k)    | Slow     | Yes  | Yes          |  Acceptable; widely supported       |
| bcrypt           | Very slow| Yes  | Yes          |  Strong; recommended for most apps  |
| scrypt           | Very slow| Yes  | Yes          |  Memory-hard; excellent security    |
| argon2           | Very slow| Yes  | Yes          |  Modern; NIST recommended (2023+)   |

For production systems, **bcrypt** or **argon2** are preferred over bare SHA-256.


## Reference
- Secure Hash Standard (FIPS 180-4): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf
- UTF-8: https://en.wikipedia.org/wiki/UTF-8
- Bytes Objects: https://realpython.com/python-bytes/
- Generators: https://realpython.com/introduction-to-python-generators/
- OWASP Password Storage Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html
- Prime numbers: https://www.geeksforgeeks.org/python/python-program-to-print-all-prime-numbers-in-an-interval/
- NumPy Bitwise Operations: https://numpy.org/doc/stable/reference/routines.bitwise.html
- BCrypt: https://pypi.org/project/bcrypt/
- Argon2 (NIST-recommended KDF): https://github.com/P-H-C/phc-winner-argon2