## Problem 1 — Binary Words and Operations

This problem asks us to implement several low-level bitwise functions used in SHA-256.  
They are defined in the **Secure Hash Standard (FIPS 180-4)**.  
All operations must be performed on **32-bit unsigned integers**, so I use `numpy.uint32`.

The functions implemented are:

- Parity(x, y, z)
- Ch(x, y, z)
- Maj(x, y, z)
- Σ0(x), Σ1(x)
- σ0(x), σ1(x)

Each function is introduced in its own section below.


In [1]:
import numpy as np  # For 32-bit unsigned integers (np.uint32)


def rotr(x, n):
    """
    Rotate a 32-bit value x to the right by n bits.
    Used in almost all SHA-256 logical functions.
    """
    x = np.uint32(x)
    n = n % 32
    return np.uint32((x >> n) | (x << (32 - n)))


def shr(x, n):
    """
    Logical right shift of x by n bits.
    Unlike rotation, bits shifted off the right are discarded.
    """
    x = np.uint32(x)
    return np.uint32(x >> n)


## Problem 1A: Parity(x, y, z)

**Goal:** Write a function that returns the XOR of three 32-bit numbers.

From the standard (FIPS 180-4):  
**Parity(x, y, z) = x ⊕ y ⊕ z**

This function is very simple — XOR all three values.  
We ensure everything is cast to `np.uint32` so the result stays 32-bit.


In [2]:
def Parity(x, y, z):

    """Return Parity(x, y, z) = x XOR y XOR z operating on 32-bit words.



    - Inputs may be scalars or numpy arrays; they are cast to np.uint32.

    - The result preserves shape and dtype (np.uint32).

    """

    return u32(u32(x) ^ u32(y) ^ u32(z))

def Parity(x, y, z):
    """
    XOR all three 32-bit words together.
    """
    x = np.uint32(x)
    y = np.uint32(y)
    z = np.uint32(z)
    return np.uint32(x ^ y ^ z)


## Problem 1B: Ch(x, y, z)

**Goal:** Implement the SHA-256 “choose” function.

From the standard:  
**Ch(x, y, z) = (x AND y) XOR (~x AND z)**

Interpretation:  
- If a bit of **x** is 1 → choose bit from **y**  
- If a bit of **x** is 0 → choose bit from **z**  


In [3]:
def Ch(x, y, z):
    """
    SHA-256 choice function:
    For each bit, choose from y if x's bit is 1, else from z.
    """
    x = np.uint32(x)
    y = np.uint32(y)
    z = np.uint32(z)
    return np.uint32((x & y) ^ (~x & z))


## Problem 1C: Maj(x, y, z)

**Goal:** Implement the SHA-256 majority function.

From the standard:  
**Maj(x, y, z) = (x AND y) XOR (x AND z) XOR (y AND z)**

Interpretation:  
Bit becomes **1** if *at least two* of x, y, z have a 1 in that position.


In [4]:
def Maj(x, y, z):
    """
    SHA-256 majority function.
    Bit is 1 if two or more of (x, y, z) have bit 1.
    """
    x = np.uint32(x)
    y = np.uint32(y)
    z = np.uint32(z)
    return np.uint32((x & y) ^ (x & z) ^ (y & z))


## Problem 1D: Σ0(x) and Σ1(x)

**Goal:** Implement the two uppercase sigma functions from the standard.  

These use *rotations*, not shifts.

Definitions (FIPS 180-4):

- **Σ0(x) = ROTR²(x) ⊕ ROTR¹³(x) ⊕ ROTR²²(x)**
- **Σ1(x) = ROTR⁶(x) ⊕ ROTR¹¹(x) ⊕ ROTR²⁵(x)**

These are used in the SHA-256 compression function.


In [5]:
def Sigma0(x):
    """
    Σ0(x) = ROTR^2(x) ^ ROTR^13(x) ^ ROTR^22(x)
    """
    x = np.uint32(x)
    return np.uint32(rotr(x, 2) ^ rotr(x, 13) ^ rotr(x, 22))


def Sigma1(x):
    """
    Σ1(x) = ROTR^6(x) ^ ROTR^11(x) ^ ROTR^25(x)
    """
    x = np.uint32(x)
    return np.uint32(rotr(x, 6) ^ rotr(x, 11) ^ rotr(x, 25))


## Problem 1E: σ0(x) and σ1(x)

**Goal:** Implement the lowercase sigma functions used in the message schedule.  

These mix **rotation** and **logical shift**.

Definitions (FIPS 180-4):

- **σ0(x) = ROTR⁷(x) ⊕ ROTR¹⁸(x) ⊕ SHR³(x)**  
- **σ1(x) = ROTR¹⁷(x) ⊕ ROTR¹⁹(x) ⊕ SHR¹⁰(x)**  

These are used when generating the message schedule array `W[t]`.

In [6]:
def sigma0(x):
    """
    σ0(x) = ROTR^7(x) ^ ROTR^18(x) ^ SHR^3(x)
    """
    x = np.uint32(x)
    return np.uint32(rotr(x, 7) ^ rotr(x, 18) ^ shr(x, 3))


def sigma1(x):
    """
    σ1(x) = ROTR^17(x) ^ ROTR^19(x) ^ SHR^10(x)
    """
    x = np.uint32(x)
    return np.uint32(rotr(x, 17) ^ rotr(x, 19) ^ shr(x, 10))


## Problem 1F: Testing all functions

To verify correctness, I test the functions using small example numbers.  
I print the results in hexadecimal (easier to compare with the standard).

In [7]:
a = np.uint32(0x0F0F0F0F)
b = np.uint32(0x33333333)
c = np.uint32(0xAAAAAAAA)

print("Parity =", hex(a ^ b ^ c))
print("Ch     =", hex(Ch(a, b, c)))
print("Maj    =", hex(Maj(a, b, c)))

x = np.uint32(0x12345678)

print("\nSigma0 =", hex(Sigma0(x)))
print("Sigma1 =", hex(Sigma1(x)))
print("sigma0 =", hex(sigma0(x)))
print("sigma1 =", hex(sigma1(x)))


Parity = 0x96969696
Ch     = 0xa3a3a3a3
Maj    = 0x2b2b2b2b

Sigma0 = 0x66146474
Sigma1 = 0x3561abda
sigma0 = 0xe7fce6ee
sigma1 = 0xa1f78649


## Problem 2: Fractional Parts of Cube Roots

**Goal:** Recreate the SHA-256 constants `K[0..63]` from the Secure Hash Standard (FIPS 180-4).

According to the standard, each constant is:

> "the first 32 bits of the fractional parts of the cube roots of the first 64 prime numbers."

So the plan is:

1. Write a function `primes(n)` that returns the first `n` prime numbers.
2. Use NumPy (`np.cbrt`) to calculate the cube root of each prime.  
   See: [NumPy cbrt docs](https://numpy.org/doc/stable/reference/generated/numpy.cbrt.html)
3. Extract the **fractional part** of each cube root.
4. Multiply the fractional part by \( 2^{32} \), take the integer part, and store it as a 32-bit value (`np.uint32`).
5. Display the result in **hexadecimal** and compare manually with the constants listed in FIPS 180-4.


In [8]:
import numpy as np  # For 32-bit unsigned integers (np.uint32)

def primes(n):
    """
    Return a list of the first n prime numbers.

    I use a simple trial division approach:
    - Start at 2.
    - For each candidate, test divisibility by earlier primes.
    - Only test up to the square root of the candidate.
    """
    # If n is zero or negative, just return an empty list.
    if n <= 0:
        return []

    result = []      # list to store primes
    candidate = 2    # first number to test for primality

    # Keep going until we have n primes.
    while len(result) < n:
        is_prime = True  # assume candidate is prime until shown otherwise

        # Check divisibility by previously found primes.
        for p in result:
            # If p^2 is greater than candidate, we can stop checking.
            if p * p > candidate:
                break
            # If candidate is divisible by p, it is not prime.
            if candidate % p == 0:
                is_prime = False
                break

        # If no divisors were found, we have a new prime.
        if is_prime:
            result.append(candidate)

        # Move on to the next integer.
        candidate += 1

    return result


### Problem 2A: Testing the `primes(n)` function

Before using `primes(n)` for the main task, I want to quickly check that it returns
the correct small primes.

Known values:
- First 5 primes: 2, 3, 5, 7, 11
- First 10 primes: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29


In [9]:
# Quick sanity checks for primes(n).

first_5 = primes(5)
first_10 = primes(10)

print("First 5 primes: ", first_5)
print("First 10 primes:", first_10)


First 5 primes:  [2, 3, 5, 7, 11]
First 10 primes: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]


### Problem 2B: Cube roots and fractional parts

Next I need to convert each prime into a 32-bit constant as follows:

1. Compute the cube root of the prime using `np.cbrt(p)`.
2. Extract the fractional part:

   \[
   \text{frac} = \text{cuberoot}(p) - \lfloor \text{cuberoot}(p) \rfloor
   \]

3. Multiply the fractional part by \( 2^{32} \).
4. Take the integer part (floor) and store it as a 32-bit unsigned integer (`np.uint32`).

This should match the way SHA-256 defines its `K` constants.


In [10]:
def cube_root_fraction_to_uint32(p):
    """
    Given a prime p, compute the 32-bit word defined as:
        floor( fractional_part(cuberoot(p)) * 2**32 )

    The result is returned as a NumPy uint32, to match the 32-bit word
    behaviour in the SHA-256 standard.
    """
    # Convert p to a float64 so np.cbrt can operate on it.
    x = np.float64(p)

    # Compute the cube root using NumPy.
    cbrt = np.cbrt(x)

    # Fractional part: total value minus its floor.
    frac = cbrt - np.floor(cbrt)

    # Scale the fractional part to 32 bits.
    scaled = frac * (2**32)

    # Take the integer part.
    integer_bits = int(scaled)

    # Cast to 32-bit unsigned integer for SHA-256.
    return np.uint32(integer_bits)


### Problem 2C: Generating 64 constants from the first 64 primes

Now I:

1. Use `primes(64)` to get the first 64 primes.
2. Apply `cube_root_fraction_to_uint32(p)` to each prime.
3. Store the results in an array `K`.
4. Print them as 8-digit hexadecimal numbers of the form `0x12345678`.

I can then compare these manually to the constants listed in the Secure Hash Standard.


In [11]:
# Get the first 64 primes.
first_64_primes = primes(64)

# Compute the constants from the cube root fractional parts.
K_computed = [cube_root_fraction_to_uint32(p) for p in first_64_primes]

print("Index  Hex value")
for i, k in enumerate(K_computed):
    # int(k) converts from NumPy scalar to plain Python int for formatting.
    print(f"{i:2d}    {int(k):08x}")


Index  Hex value
 0    428a2f98
 1    71374491
 2    b5c0fbcf
 3    e9b5dba5
 4    3956c25b
 5    59f111f1
 6    923f82a4
 7    ab1c5ed5
 8    d807aa98
 9    12835b01
10    243185be
11    550c7dc3
12    72be5d74
13    80deb1fe
14    9bdc06a7
15    c19bf174
16    e49b69c1
17    efbe4786
18    0fc19dc6
19    240ca1cc
20    2de92c6f
21    4a7484aa
22    5cb0a9dc
23    76f988da
24    983e5152
25    a831c66d
26    b00327c8
27    bf597fc7
28    c6e00bf3
29    d5a79147
30    06ca6351
31    14292967
32    27b70a85
33    2e1b2138
34    4d2c6dfc
35    53380d13
36    650a7354
37    766a0abb
38    81c2c92e
39    92722c85
40    a2bfe8a1
41    a81a664b
42    c24b8b70
43    c76c51a3
44    d192e819
45    d6990624
46    f40e3585
47    106aa070
48    19a4c116
49    1e376c08
50    2748774c
51    34b0bcb5
52    391c0cb3
53    4ed8aa4a
54    5b9cca4f
55    682e6ff3
56    748f82ee
57    78a5636f
58    84c87814
59    8cc70208
60    90befffa
61    a4506ceb
62    bef9a3f7
63    c67178f2


### Problem 2D: Verifying against the official SHA-256 constants

The Secure Hash Standard (FIPS 180-4) lists the 64 SHA-256 `K` constants as
32-bit hexadecimal words (see the SHA-256 section and constant table).

To fully "test the results against what is in the standard", I:

1. Hard-code the official 64 hex values from the standard into a list.
2. Convert them to integers.
3. Compare them element-wise with `K_computed`.
4. Report whether there are any mismatches.

If the implementation is correct, there should be **zero** mismatches.


In [12]:
# Official SHA-256 K constants from FIPS 180-4, written in hex.
K_official_hex = [
    "428a2f98", "71374491", "b5c0fbcf", "e9b5dba5",
    "3956c25b", "59f111f1", "923f82a4", "ab1c5ed5",
    "d807aa98", "12835b01", "243185be", "550c7dc3",
    "72be5d74", "80deb1fe", "9bdc06a7", "c19bf174",
    "e49b69c1", "efbe4786", "0fc19dc6", "240ca1cc",
    "2de92c6f", "4a7484aa", "5cb0a9dc", "76f988da",
    "983e5152", "a831c66d", "b00327c8", "bf597fc7",
    "c6e00bf3", "d5a79147", "06ca6351", "14292967",
    "27b70a85", "2e1b2138", "4d2c6dfc", "53380d13",
    "650a7354", "766a0abb", "81c2c92e", "92722c85",
    "a2bfe8a1", "a81a664b", "c24b8b70", "c76c51a3",
    "d192e819", "d6990624", "f40e3585", "106aa070",
    "19a4c116", "1e376c08", "2748774c", "34b0bcb5",
    "391c0cb3", "4ed8aa4a", "5b9cca4f", "682e6ff3",
    "748f82ee", "78a5636f", "84c87814", "8cc70208",
    "90befffa", "a4506ceb", "bef9a3f7", "c67178f2",
]

# Convert official hex strings to plain Python ints.
K_official = [int(x, 16) for x in K_official_hex]

# Build a list of mismatches (index, computed, official) if any.
mismatches = []
for i in range(64):
    computed_val = int(K_computed[i])
    official_val = K_official[i]
    if computed_val != official_val:
        mismatches.append((i, f"{computed_val:08x}", f"{official_val:08x}"))

print("Number of mismatches:", len(mismatches))
if mismatches:
    print("Mismatches found (index, computed, official):")
    for m in mismatches:
        print(m)
else:
    print("All 64 computed K constants match the official SHA-256 values.")


Number of mismatches: 0
All 64 computed K constants match the official SHA-256 values.


### Problem 2: Conclusion

- I used `primes(n)` to generate the first 64 primes.
- For each prime, I computed the cube root, extracted the fractional part,
  multiplied by \( 2^{32} \), and took the integer part as a 32-bit word (`np.uint32`).
- I displayed all 64 constants in hexadecimal.
- I then compared them directly against the official SHA-256 `K` array from
  the Secure Hash Standard ([FIPS 180-4](https://csrc.nist.gov/publications/detail/fips/180/4/final)).

The comparison showed:

> **All 64 computed K constants match the official SHA-256 values.**

This confirms that both the prime generation and the cube-root-based constant computation are correct.


## Problem 3: Padding and `block_parse(msg)` Generator

**Goal:** Implement a generator function `block_parse(msg)` that:

- takes a `bytes` message `msg`,
- pads it according to the Secure Hash Standard (SHA-256),
- yields each 512-bit block (64 bytes) as a `bytes` object.

This should follow **FIPS 180-4**:

- Section 5.1.1: *"Padding the message"*  
- Section 5.2.1: *"SHA-256 Message Schedule"* (which assumes 512-bit blocks)

Reference: [FIPS 180-4 (Secure Hash Standard)](https://csrc.nist.gov/publications/detail/fips/180/4/final)

### Padding rules for SHA-256

Given a message of length `L` bits:

1. Append a single `"1"` bit.  
   In bytes, this is `0x80` = `10000000` in binary.
2. Append `"0"` bits until the total length is congruent to **448 mod 512**.  
   In bytes, this means: the padded message should be 56 bytes mod 64.
3. Append the original message length `L` as a **64-bit big-endian integer**.

So the final padded message length is always a multiple of 512 bits (64 bytes),  
and the last 8 bytes encode the original length in bits.


In [13]:
def block_parse(msg):
    """
    Generator that takes a bytes message and yields 512-bit (64-byte) blocks
    after applying SHA-256 style padding.

    Follows FIPS 180-4, section 5.1.1: "Padding the message".
    """
    # Make sure the input is bytes.
    if not isinstance(msg, (bytes, bytearray)):
        raise TypeError("msg must be a bytes-like object")

    # Original message length in bits.
    bit_len = len(msg) * 8

    # Start with a mutable copy of the message.
    padded = bytearray(msg)

    # Step 1: append the '1' bit, which is 0x80 in a whole byte.
    padded.append(0x80)

    # Step 2: append '0' bits (in whole bytes) until length ≡ 56 (mod 64).
    # 56 bytes = 448 bits, leaving 64 bits (8 bytes) for the length.
    while (len(padded) % 64) != 56:
        padded.append(0x00)

    # Step 3: append 64-bit big-endian length of original message (in bits).
    padded += bit_len.to_bytes(8, byteorder="big")

    # Now padded length should be a multiple of 64 bytes.
    # Yield each 64-byte block as a bytes object.
    for i in range(0, len(padded), 64):
        block = bytes(padded[i:i + 64])
        yield block


### Problem 3A: Basic sanity checks on padding

To check that `block_parse(msg)` is working correctly, I test it on messages
of different lengths:

- The empty message: `b""`
- A very short message: `b"a"`
- A typical short message: `b"abc"`
- Messages around the 56-byte boundary, where padding behaviour changes.

For each test:

1. Run `block_parse(msg)` and collect all blocks into a list.
2. Recombine the blocks into a single `bytes` object.
3. Check if:
   - The total length is a multiple of 64 bytes.
   - The last 8 bytes encode the original message length in bits.

In [14]:
def inspect_padded_message(msg):
    """
    Helper function for testing block_parse.
    It prints information about the padded message.
    """
    print(f"Original message: {msg!r}")
    print(f"Original length (bytes): {len(msg)}")
    print(f"Original length (bits):  {len(msg) * 8}")

    # Collect all blocks from the generator into a list
    blocks = list(block_parse(msg))

    # Number of 64-byte blocks.
    print(f"Number of 512-bit blocks: {len(blocks)}")

    # Reconstruct the full padded message from the blocks
    padded = b"".join(blocks)
    print(f"Padded length (bytes):    {len(padded)}")

    # Check that padded length is a multiple of 64
    print(f"Is padded length % 64 == 0? {len(padded) % 64 == 0}")

    # Extract the last 8 bytes: they should encode the original bit length
    last_8 = padded[-8:]
    bit_len_from_padding = int.from_bytes(last_8, byteorder="big")
    print(f"Bit length from padding:  {bit_len_from_padding}")
    print(f"Matches original bit len? {bit_len_from_padding == len(msg) * 8}")

    # Print the last block in hex
    print("Last 64-byte block (hex):")
    print(padded[-64:].hex())
    print("-" * 60)


In [15]:
# Test with empty message
inspect_padded_message(b"")

# Test with short messages
inspect_padded_message(b"a")
inspect_padded_message(b"abc")

msg_55 = b"A" * 55  # 55 bytes
msg_56 = b"B" * 56  # 56 bytes
msg_57 = b"C" * 57  # 57 bytes

inspect_padded_message(msg_55)
inspect_padded_message(msg_56)
inspect_padded_message(msg_57)


Original message: b''
Original length (bytes): 0
Original length (bits):  0
Number of 512-bit blocks: 1
Padded length (bytes):    64
Is padded length % 64 == 0? True
Bit length from padding:  0
Matches original bit len? True
Last 64-byte block (hex):
80000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
------------------------------------------------------------
Original message: b'a'
Original length (bytes): 1
Original length (bits):  8
Number of 512-bit blocks: 1
Padded length (bytes):    64
Is padded length % 64 == 0? True
Bit length from padding:  8
Matches original bit len? True
Last 64-byte block (hex):
61800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000008
------------------------------------------------------------
Original message: b'abc'
Original length (bytes): 3
Original length (bits):  24
Number of 512-bit blocks: 1
Padded le

## Problem 4: SHA-256 Hash Compression Function

**Goal:** Write a function

```python
hash(current, block)



---

### Markdown Cell – Problem 4A: Constants and initial hash value

For SHA-256 I need

1. The 64 constants `K[0..63]`  
2. The initial hash value `H0` consisting of 8 32-bit words  

Both are given in the SHA-256 section of FIPS 180-4


In [16]:
import numpy as np
import hashlib  # used later to test against Python's SHA-256

# SHA-256 constants K[0..63] in hex, from FIPS 180-4
K_hex = [
    "428a2f98", "71374491", "b5c0fbcf", "e9b5dba5",
    "3956c25b", "59f111f1", "923f82a4", "ab1c5ed5",
    "d807aa98", "12835b01", "243185be", "550c7dc3",
    "72be5d74", "80deb1fe", "9bdc06a7", "c19bf174",
    "e49b69c1", "efbe4786", "0fc19dc6", "240ca1cc",
    "2de92c6f", "4a7484aa", "5cb0a9dc", "76f988da",
    "983e5152", "a831c66d", "b00327c8", "bf597fc7",
    "c6e00bf3", "d5a79147", "06ca6351", "14292967",
    "27b70a85", "2e1b2138", "4d2c6dfc", "53380d13",
    "650a7354", "766a0abb", "81c2c92e", "92722c85",
    "a2bfe8a1", "a81a664b", "c24b8b70", "c76c51a3",
    "d192e819", "d6990624", "f40e3585", "106aa070",
    "19a4c116", "1e376c08", "2748774c", "34b0bcb5",
    "391c0cb3", "4ed8aa4a", "5b9cca4f", "682e6ff3",
    "748f82ee", "78a5636f", "84c87814", "8cc70208",
    "90befffa", "a4506ceb", "bef9a3f7", "c67178f2",
]

# Convert hex strings to 32-bit unsigned integers
K = [np.uint32(int(x, 16)) for x in K_hex]

# Initial SHA-256 hash value H0, from FIPS 180-4
H0_hex = [
    "6a09e667", "bb67ae85", "3c6ef372", "a54ff53a",
    "510e527f", "9b05688c", "1f83d9ab", "5be0cd19",
]

# Store initial state as list of np.uint32
H0_init = [np.uint32(int(x, 16)) for x in H0_hex]

# Just to see them once
print("Initial H0 state words:")
for i, h in enumerate(H0_init):
    print(f"H0[{i}] = 0x{int(h):08x}")


Initial H0 state words:
H0[0] = 0x6a09e667
H0[1] = 0xbb67ae85
H0[2] = 0x3c6ef372
H0[3] = 0xa54ff53a
H0[4] = 0x510e527f
H0[5] = 0x9b05688c
H0[6] = 0x1f83d9ab
H0[7] = 0x5be0cd19


### Problem 4B: Building the message schedule W[0..63]

For each 512-bit block (64 bytes) SHA-256 builds an array `W[0..63]` of 32-bit words

Steps from the standard

1. The first 16 words come directly from the block  
   - Each word is 4 bytes interpreted as a big-endian 32-bit integer  
2. For t from 16 to 63

\[
W_t = \sigma_1(W_{t-2}) + W_{t-7} + \sigma_0(W_{t-15}) + W_{t-16} \mod 2^{32}
\]

where `σ0` and `σ1` are the small sigma functions from Problem 1


In [17]:
def build_message_schedule(block):
    """
    Build the SHA-256 message schedule W[0..63] from a 64-byte block
    """
    if len(block) != 64:
        raise ValueError("block must be exactly 64 bytes")

    # Start with a list of 64 zero words
    W = [np.uint32(0) for _ in range(64)]

    # First 16 words come from the block (big-endian)
    for i in range(16):
        start = i * 4                                       # starting index into the block
        word_bytes = block[start:start + 4]                 # take 4 bytes
        W[i] = np.uint32(int.from_bytes(word_bytes, 'big')) # convert bytes to integer

    # Remaining words are computed using the recurrence relation
    for t in range(16, 64):
        s0 = sigma0(W[t - 15])   # σ0 on W[t-15]
        s1 = sigma1(W[t - 2])    # σ1 on W[t-2]
        W[t] = np.uint32(W[t - 16] + W[t - 7] + s0 + s1)  # sum modulo 2^32 via uint32

    return W


### Problem 4C: Implementing `hash(current, block)`

Now I write the main compression step for a single 512-bit block

Inputs

- `current` – list of 8 32-bit words `[H0, H1, H2, H3, H4, H5, H6, H7]`  
- `block` – 64-byte message block

Steps

1. Build `W[0..63]` using `build_message_schedule`  
2. Copy `current` into working variables `a, b, c, d, e, f, g, h`  
3. For t from 0 to 63

\[
\begin{aligned}
T1 &= h + \Sigma_1(e) + Ch(e, f, g) + K_t + W_t \\
T2 &= \Sigma_0(a) + Maj(a, b, c)
\end{aligned}
\]

then shift the variables and update `a` and `e` with `T1` and `T2`  

4. Add the final working variables back into `current` to get the new hash state


In [18]:
def hash(current, block):
    """
    Perform one SHA-256 compression step on a single 512-bit block

    current: list of 8 np.uint32 words
    block:   64-byte block (bytes)

    Returns a new list of 8 np.uint32 words
    """
    if len(block) != 64:
        raise ValueError("block must be exactly 64 bytes")

    if len(current) != 8:
        raise ValueError("current must contain exactly 8 words")

    # Build message schedule W[0..63] for this block
    W = build_message_schedule(block)

    # Copy current state into working variables a..h
    a = np.uint32(current[0])
    b = np.uint32(current[1])
    c = np.uint32(current[2])
    d = np.uint32(current[3])
    e = np.uint32(current[4])
    f = np.uint32(current[5])
    g = np.uint32(current[6])
    h = np.uint32(current[7])

    # Main loop of 64 rounds
    for t in range(64):
        # temp1 corresponds to T1 in the standard
        temp1 = np.uint32(
            h
            + Sigma1(e)
            + Ch(e, f, g)
            + K[t]
            + W[t]
        )

        # temp2 corresponds to T2 in the standard
        temp2 = np.uint32(
            Sigma0(a)
            + Maj(a, b, c)
        )

        # Update working variables by shifting them and using temp1 and temp2
        h = g
        g = f
        f = e
        e = np.uint32(d + temp1)
        d = c
        c = b
        b = a
        a = np.uint32(temp1 + temp2)

    # Compute the new hash state by adding working vars to current state
    new_state = [
        np.uint32(current[0] + a),
        np.uint32(current[1] + b),
        np.uint32(current[2] + c),
        np.uint32(current[3] + d),
        np.uint32(current[4] + e),
        np.uint32(current[5] + f),
        np.uint32(current[6] + g),
        np.uint32(current[7] + h),
    ]

    return new_state


### Problem 4D: Building a complete SHA-256 function

To test the compression function, I build a full SHA-256 implementation

Algorithm

1. Start with the initial state `H0_init`  
2. Use `block_parse(msg)` from Problem 3 to iterate over padded blocks  
3. For each block, update the state with `hash(current, block)`  
4. After all blocks, convert the final 8 words to 32 bytes and then to hex

In [19]:
def sha256_custom(msg):
    """
    Compute SHA-256 digest of msg using

    - H0_init as the starting state
    - block_parse for padding
    - hash for the compression of each block

    Returns a hex string like hashlib.sha256(msg).hexdigest()
    """
    # Make a copy so we don't modify H0_init globally
    state = [np.uint32(x) for x in H0_init]

    # Process each padded 512-bit block from the message
    for block in block_parse(msg):
        state = hash(state, block)

    # Convert final state into a 32-byte digest
    digest_bytes = b""
    for word in state:
        digest_bytes += int(word).to_bytes(4, byteorder="big")

    # Return hexadecimal representation of the digest
    return digest_bytes.hex()


### Problem 4E: Testing against Python's hashlib

To check that my `hash` function and padding are correct, I compare

- `sha256_custom(msg)`  
- `hashlib.sha256(msg).hexdigest()`

on several test messages

If everything is correct, all the outputs should match exactly


In [20]:
test_messages = [
    b"",                                          # empty message
    b"a",
    b"abc",
    b"hello",
    b"The quick brown fox jumps over the lazy dog",
    b"The quick brown fox jumps over the lazy dog.",
]

for m in test_messages:
    my_digest = sha256_custom(m)
    lib_digest = hashlib.sha256(m).hexdigest()

    print(f"Message: {m!r}")
    print(f"  sha256_custom: {my_digest}")
    print(f"  hashlib      : {lib_digest}")
    print(f"  Match        : {my_digest == lib_digest}")
    print("-" * 60)


Message: b''
  sha256_custom: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
  hashlib      : e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
  Match        : True
------------------------------------------------------------
Message: b'a'
  sha256_custom: ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb
  hashlib      : ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb
  Match        : True
------------------------------------------------------------
Message: b'abc'
  sha256_custom: ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
  hashlib      : ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
  Match        : True
------------------------------------------------------------
Message: b'hello'
  sha256_custom: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
  hashlib      : 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
  Match        : True
------------

  W[t] = np.uint32(W[t - 16] + W[t - 7] + s0 + s1)  # sum modulo 2^32 via uint32
  h
  Sigma0(a)
  e = np.uint32(d + temp1)
  a = np.uint32(temp1 + temp2)
  np.uint32(current[1] + b),
  np.uint32(current[3] + d),
  np.uint32(current[4] + e),
  np.uint32(current[5] + f),
  np.uint32(current[0] + a),
  np.uint32(current[2] + c),
  np.uint32(current[7] + h),


### Problem 4: Conclusion

In this problem I

- Built the message schedule `W[0..63]` directly from a 512-bit block  
- Implemented the SHA-256 compression function `hash(current, block)` using the
  boolean and sigma functions from Problem 1 and the constants from Problem 2  
- Combined everything into `sha256_custom(msg)` using `block_parse` from Problem 3  
- Verified the results against Python's `hashlib.sha256` for several messages  

All the test messages produced digests that match `hashlib.sha256`, so my implementation
of the compression step and overall SHA-256 logic appears to be correct

## Problem 5: Passwords

We are given three SHA-256 hashes of common passwords

1. `5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8`
2. `873ac9ffea4dd04fa719e8920cd6938f0c23cd678af330939cff53c3d2855f34`
3. `b03ddf3ca2e714a6548e7495e2a03f5e824eaac9837cd7f159c67b90fb4b7342`

Each password was hashed once with SHA-256, after encoding as UTF-8

**Tasks**

- Recover the original passwords  
- Explain how I could find them  
- Suggest how password hashing could be improved to prevent this kind of attack  

For hashing in Python I use the standard library `hashlib`  
See: [Python hashlib docs](https://docs.python.org/3/library/hashlib.html)


### Problem 5A: How to find the passwords (dictionary attack idea)

In practice, an attacker would use a **dictionary attack**

1. Take a list of common passwords (e.g. `password`, `123456`, `qwerty`, etc.)
2. For each candidate password
   - Encode it as bytes (UTF-8)
   - Hash it with SHA-256
   - Compare the hash to the stored hash
3. If a hash matches, we have found the password

This is easy because SHA-256 is fast and people often choose weak passwords  

In Python, I can simulate this with `hashlib.sha256`  
Example: [hashlib module](https://docs.python.org/3/library/hashlib.html)


In [21]:
import hashlib  # standard library for hashing

# The three given SHA-256 hashes
target_hashes = [
    "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8",
    "873ac9ffea4dd04fa719e8920cd6938f0c23cd678af330939cff53c3d2855f34",
    "b03ddf3ca2e714a6548e7495e2a03f5e824eaac9837cd7f159c67b90fb4b7342",
]

# A small "dictionary" of very common passwords to try
# In a real attack this list would be huge
candidate_passwords = [
    "password",
    "123456",
    "qwerty",
    "admin",
    "letmein",
    "cheese",
    "P@ssw0rd",
]

# Dictionary to store matches we find
found = {}  # maps hash -> plaintext password

for pwd in candidate_passwords:
    # Encode the password as UTF-8 bytes
    pwd_bytes = pwd.encode("utf-8")

    # Compute SHA-256 hash and get hex string
    hash_hex = hashlib.sha256(pwd_bytes).hexdigest()

    # If this hash is one of the targets, store the match
    if hash_hex in target_hashes:
        found[hash_hex] = pwd

# Show the results
for h in target_hashes:
    print(f"Hash: {h}")
    if h in found:
        print(f"  Recovered password: {found[h]!r}")
    else:
        print("  Not found in candidate list")


Hash: 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
  Recovered password: 'password'
Hash: 873ac9ffea4dd04fa719e8920cd6938f0c23cd678af330939cff53c3d2855f34
  Recovered password: 'cheese'
Hash: b03ddf3ca2e714a6548e7495e2a03f5e824eaac9837cd7f159c67b90fb4b7342
  Recovered password: 'P@ssw0rd'


### Problem 5B: The three passwords

Using a simple dictionary-style attack in Python (small wordlist of common passwords), and cross-checking with online SHA-256 hash lookup tools, I found

1. `5e884898...542d8` → `"password"`
2. `873ac9ff...5f34` → `"cheese"`
3. `b03ddf3c...b7342` → `"P@ssw0rd"`

These are all **very common or guessable passwords**  
They appear in many public examples and wordlists online
