## Step 2: Padding the Binary Message

### Recap
Step 1 was turning the each character of the input message `Hi!` into the binary message `010010000110100100100001` using our custom `text_to_binary()` function.

In [None]:
sha_256_input = 'Hi!'
binary_message = text_to_binary(sha_256_input)

print('SHA-256 INPUT:', sha_256_input)
print('BINARY MESSAGE:', binary_message)

SHA-256 INPUT: Hi!
BINARY MESSAGE: 010010000110100100100001


### SHA-256 Structure
Now what we need to do is modify that binary message according to the SHA-256 block structure.
- First, we have our binary message `'Hi!'` in binary.
- We add a '1' to indicate that this is the end of the message.
- We add zeros until we reach bit 447.
- The last 64 bits of the message—bits 448 to 511—are reserved to indicate the length of the message.

![image-2.png](attachment:image-2.png)

#### What happens if the message is longer than `'Hi!'` and the binary message takes up more than 512 bits?
In that case, the message would be expanded to several blocks, like so:

![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)

#### What happens if the message length is greater than 2^64 characters?
That would be more than 2 exobytes of data, and most implementations of SHA-256 would raise an error.

### Anyways, let's Code!
Let's pad the binary message according to the SHA-256 standard.



### Step 2.1: Add the '1' Bit
First, we simply append a '1' to our binary message:

In [None]:
padded_message = binary_message + '1'

### Step 2.2: Calculate Required Padding
Now we need to figure out how many zeros to add. And to do that, we need to remember a few things.

First, SHA-256 processes messages in 512-bit blocks.

In [None]:
BLOCK_SIZE = 512    # SHA-256 processes messages in 512-bit blocks

And there will always be *64 bits* reserved for the length field which comes at the **FINAL** block.

In [None]:
LENGTH_FIELD = 64   # Last 64 bits reserved for message length

So our final block length must be *448 bits*.

In [None]:
final_block_size = BLOCK_SIZE - LENGTH_FIELD
final_block_size

448

Remember, we only add zero padding to the *FINAL* block of the message. 

If the message is 900 bits for example, it will be structured like this:

- **BLOCK 1:** 512 bits allocated to Part I of message.
- **BLOCK 2** 
    - The final *388 bits* the of message
    - *1 bit* to indicate beginning of padding
    - *59 bits* of zeros
    - final *64 bits* to mark the message length
    - for a **TOTAL** of *512 bits*.

Considering this, we will use the modulo operator `%` to calculate how many zeros we need for the padding.

In the case of `Hi!` we need 423 zeros of padding.

In [None]:
padding_zeros = final_block_size - (len(padded_message) % BLOCK_SIZE)
padding_zeros

423


### Step 2.3: Add the Zero Padding
This is easy. We just add 423 zeros to our message.

In [None]:
print(f"BINARY MESSAGE: {binary_message}")
print(f"PADDED MESSAGE: {padded_message}")


BINARY MESSAGE: 010010000110100100100001
PADDED MESSAGE: 0100100001101001001000011


In [None]:
padded_message_with_zeros = padded_message + '0' * padding_zeros
print(f"PADDED MESSAGE WITH ZEROS: {padded_message_with_zeros}")

PADDED MESSAGE WITH ZEROS: 0100100001101001001000011000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000



### Step 2.4: Add the Length
Finally, add the original message length as a 64-bit number:

In [None]:
message_length = len(binary_message)

formatted_message_length = format(message_length, '064b')
print("FORMATTED MESSAGE LENGTH:", formatted_message_length)


FORMATTED MESSAGE LENGTH: 0000000000000000000000000000000000000000000000000000000000011000


### Step 2.5: Put it All Together.

In [None]:
#! Need to accound for longer blocks.

final_padded_block = padded_message_with_zeros + formatted_message_length
print("FINAL PADDED BLOCK:\n")

for bit in range(0, 512, 64):
    print(f"Bits {format(bit, '03d')} to {format(bit+64, '03d')}:", final_padded_block[bit:bit+64])


FINAL PADDED BLOCK:

Bits 000 to 064: 0100100001101001001000011000000000000000000000000000000000000000
Bits 064 to 128: 0000000000000000000000000000000000000000000000000000000000000000
Bits 128 to 192: 0000000000000000000000000000000000000000000000000000000000000000
Bits 192 to 256: 0000000000000000000000000000000000000000000000000000000000000000
Bits 256 to 320: 0000000000000000000000000000000000000000000000000000000000000000
Bits 320 to 384: 0000000000000000000000000000000000000000000000000000000000000000
Bits 384 to 448: 0000000000000000000000000000000000000000000000000000000000000000
Bits 448 to 512: 0000000000000000000000000000000000000000000000000000000000011000


### Everything at once
Here's the cleaner code that pads the binary message with the correct number of zeros and adds the length of the message at the end.

In [None]:
# Start with our message
message = "Hi!"

# 1. Convert to binary
binary = ''.join(format(ord(char), '08b') for char in message)

# 2. Add the '1' bit
padded = binary + '1'

# 3. Add padding zeros
zeros_needed = 448 - (len(padded) % 512)
if zeros_needed < 0:
    zeros_needed += 512
padded += '0' * zeros_needed

# 4. Add 64-bit message length
msg_length = len(binary)
length_bits = format(msg_length, '064b')
final_padded = padded + length_bits

print(f"Original message: {message}")
print(f"As binary: {binary}")
print(f"Length: {len(binary)} bits")

# Show complete blocks with formatting
print("\nComplete 512-bit block breakdown:")
blocks = [final_padded[i:i+512] for i in range(0, len(final_padded), 512)]
for i, block in enumerate(blocks):
    print(f"\nBlock {i+1} (512 bits):")
    print("=" * 100)
    # Print in rows of 64 bits for readability
    for j in range(0, 512, 64):
        row = block[j:j+64]
        print(f"Bits {j:>3}-{j+63:<3}: {row}")
    print("=" * 100)

Original message: Hi!
As binary: 010010000110100100100001
Length: 24 bits

Complete 512-bit block breakdown:

Block 1 (512 bits):
Bits   0-63 : 0100100001101001001000011000000000000000000000000000000000000000
Bits  64-127: 0000000000000000000000000000000000000000000000000000000000000000
Bits 128-191: 0000000000000000000000000000000000000000000000000000000000000000
Bits 192-255: 0000000000000000000000000000000000000000000000000000000000000000
Bits 256-319: 0000000000000000000000000000000000000000000000000000000000000000
Bits 320-383: 0000000000000000000000000000000000000000000000000000000000000000
Bits 384-447: 0000000000000000000000000000000000000000000000000000000000000000
Bits 448-511: 0000000000000000000000000000000000000000000000000000000000011000


### Padding Message Function
Now we will refactor this into a function.

In [None]:
def pad_binary(binary_message,
               block_size=512,
               length_field=64):

    # Add the '1' bit
    padded = binary_message + '1'

    # Calculate Zeros Needed
    msg_size_on_final_block = (len(padded) % BLOCK_SIZE)
    zeros_needed = block_size - msg_size_on_final_block - length_field

    # If there are not enough zeros, add another block w/ zeros
    if zeros_needed < 0:
        zeros_needed += 512
    
    # Otherwise, appen the zeros needed
    padded += '0' * zeros_needed

    # Add 64-bit message length to the end, padded with zeros
    msg_length = len(binary)
    length_bits = format(msg_length, '064b')
    final_padded = padded + length_bits

    return final_padded

pad_binary(binary)

'01001000011010010010000110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011000'

With this `visualize_blocks()` function we can visualize the structure.

In [None]:
def visualize_blocks(binary_message, block_size=512):
    blocks = [binary_message[i:i+block_size] for i in range(0, len(binary_message), block_size)]
    for i, block in enumerate(blocks):
        print(f"\nBlock {i+1} ({block_size} bits):")
        print("=" * 100)
        # Print in rows of 64 bits for readability
        for j in range(0, block_size, 64):
            row = block[j:j+64]
            print(f"Bits {j:>3}-{j+63:<3}: {row}")
        print("=" * 100)

visualize_blocks(final_padded)


Block 1 (512 bits):
Bits   0-63 : 0100100001101001001000011000000000000000000000000000000000000000
Bits  64-127: 0000000000000000000000000000000000000000000000000000000000000000
Bits 128-191: 0000000000000000000000000000000000000000000000000000000000000000
Bits 192-255: 0000000000000000000000000000000000000000000000000000000000000000
Bits 256-319: 0000000000000000000000000000000000000000000000000000000000000000
Bits 320-383: 0000000000000000000000000000000000000000000000000000000000000000
Bits 384-447: 0000000000000000000000000000000000000000000000000000000000000000
Bits 448-511: 0000000000000000000000000000000000000000000000000000000000011000
