# Undestanding SHA-256 from First Principles

This is a project where we're going to figure out, from scratch, how to build the SHA-256 algorithm, converting any input, such as `Hello!`, into the 256-bit (32-bye) hash: `334d016f755cd6dc58c53a86e183882f8ec14f52fb05345887c8a5edd42c87b7`.

**NOTE:** This is a beginner-friendly tutorial where I'm assuming you have _some_ Python experience, but are unfamiliar with many of the functions.

### Input and Output

During this tutorial, we're going to use `Hi!` as the input message, but you can make it anything you want.

In [65]:
sha_256_input = r"Hi!"

#### SHA-256: The _EASY_ Way.

The easiest way to get the SHA-256 Hash is to use `hashlib`.

In [68]:
import hashlib

def quick_sha256(text):
    # Convert text to bytes
    text_bytes = text.encode('utf-8')
    
    # Create SHA-256 hash object
    sha256_hash = hashlib.sha256()
    
    # Update hash with bytes
    sha256_hash.update(text_bytes)
    
    # Get hexadecimal representation
    return sha256_hash.hexdigest()

correct_sha_256_output_hash = quick_sha256(sha_256_input)
print(f"SHA-256 INPUT: {sha_256_input}")
print("\nIf our algorithm is correct, this should be the output hash:\n")
print(f"SHA-256 OUTPUT: {correct_sha_256_output_hash}")


SHA-256 INPUT: Hi!

If our algorithm is correct, this should be the output hash:

SHA-256 OUTPUT: ca51ce1fb15acc6d69b8a5700256172fcc507e02073e6f19592e341bd6508ab8


## Step 1: Convert Input to Binary

...But, we're masochists who want to understand this at a deep level, so we're gonna do this step-by-step. And the tThe first thing we need to do is convert each of the characters in the text `Hi!` to `ASCII` then `8-bit binary` and combine those numbers into a single string. Like this:
- `H` -> `72` -> `01001000` 
- `i` -> `105` -> `01101001`
- `!` -> `33` -> `00100001`

Combining the three, we get our output binary string:

`01001000` + `01101001` + `00100001` ➡ `010010000110100100100001`

#### `ord()`

We're going to need some lesser known functions, and the first one is `ord()` converts **ONE** character into its ascii equivalent.

In [69]:
print('H', ord('H'))
print('i', ord('i'))
print('!', ord('!'))

H 72
i 105
! 33


Let's do this for the entire message.

In [70]:
for char in sha_256_input:
    print(f'{char} -> {ord(char)}')

H -> 72
i -> 105
! -> 33


#### `bin()`
We also have `bin()` which converts an integer into its binary equivalent.

In [71]:
print(bin(1), bin(2), bin(3), bin(4), bin(5))


0b1 0b10 0b11 0b100 0b101


Notice how
1. The output is a string, meaning we can't do any math on it.
2. The string starts with the '0b' prefix, which is no good.
3. For SHA-256, the output must be exactly 8-bits. No more, no less.

In [75]:
print('H', bin(ord('H')))
print(f"String Length: {len(bin(ord('H')))}")


H 0b1001000
String Length: 9


But this is what it looks like when we convert the message.

In [76]:
for char in sha_256_input:
    ascii_val = ord(char)
    binary_val = bin(ascii_val)
    print(f'{char} -> {ascii_val} -> {binary_val}')

H -> 72 -> 0b1001000
i -> 105 -> 0b1101001
! -> 33 -> 0b100001


#### `format()`
So what we're going to use instead is `format()` which allows us to convert the ascii to binary and add zeros to the left to make it exactly 8 bits.

`H` ➡ `72` ➡ `1001000` ➡ `01001000` (add zero to left to make 8 bits)

In [83]:
char = 'H'
ascii_val = ord(char)

# format() works with ascii values
# '0' means fill with zeros, '8' means 8 bits, 'b' means binary
binary_val = format(ascii_val, '08b')
print(f"{char} -> {ascii_val} -> {binary_val} (length: {len(binary_val)})")

H -> 72 -> 01001000 (length: 8)


### Combining the Binary Strings
Now that we know how to convert characters into 8-bit binary, we're going to put it all together to create our binary message, which comes out as a *STRING*.

In [87]:
binary_message = ''

for char in sha_256_input:
    ascii_val = ord(char)
    for bit in format(ascii_val, '08b'):
        binary_message += bit

print(binary_message, type(binary_message))

010010000110100100100001 <class 'str'>


A more elegant way to write it with `join()` which iterates through every `char` of the input message, converts it to ascii, then binary, then squishes them all into one string.

In [93]:
binary_message = ''.join(format(ord(char), '08b') for char in sha_256_input)

True

#### `text_to_binary()`
Let's abstract this and turn it into a function.

In [94]:
def text_to_binary(input_text):
    return ''.join(format(ord(char), '08b') for char in input_text)

binary_message = text_to_binary(sha_256_input)
print(binary_message, type(binary_message))

010010000110100100100001 <class 'str'>


## Step 2: Padding the Binary Message

### Recap
Step 1 was turning the each character of the input message `Hi!` into the binary message `010010000110100100100001` using our cursom `text_to_binary()` function.

In [95]:
print('SHA-256 INPUT:', sha_256_input)
print('BINARY MESSAGE:', text_to_binary(sha_256_input))

SHA-256 INPUT: Hi!
BINARY MESSAGE: 010010000110100100100001


### Now for the next step...

Now that we have our binary string, we need to pad it to make sure it fits SHA-256's requirements. Here's what needs to happen:

1. First, we add a single '1' bit to the end of our message
2. Then we add zeros until our message length is 64 bits less than a multiple of 512
3. Finally, we add the original message length (in binary) as a 64-bit number

For example, let's pad our binary message:

- Original: 010010000110100100100001
- Add '1': 0100100001101001001000011
- Add '0's: 0100100001101001001000011000...000 (until length = 448 bits)
- Add len: 0100100001101001001000011000...000[64-bit length]

The final padded message must be a multiple of 512 bits. This padding is crucial because:
- SHA-256 processes data in 512-bit blocks
- The '1' bit ensures we can tell where the original message ends
- The length at the end provides extra security

Let's implement this padding step...