# SHA2-256
** Implementation of the Secure Hash Algorithm 2 **

*... in Python*

[SHA2](specs/FIPS-180-2_SHA-2_%282002+2004%29.pdf)

### Define and Select Test Cases

In [1]:
test_case=[["","???"],\
           ["a","???"],\
           ["abc","???"],\
           ["abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq","???"],\
           ["abcdefghbcdefghicdefghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu","???"]]
use_test_case = 2
##
message = test_case[use_test_case][0]
ref_hash = test_case[use_test_case][1]

### Step 1 - Append Padding Bits

The messsage to be hashed is padded to have a length equal to 8 bytes {64 bits} less than being a multiple of 64 bytes {512 bits}. The padding step is performed even if the message length is already of desired length. The padding bit string used is `1` followed by `0` - `100...000`

The message length is eventually 56 bytes {448 bits}, 120 bytes {960 bits}, 184 bytes {1472 bits}, 248 bytes {1984 bits} and so on.

In [2]:
message_len = len(message)
message_len_bits = message_len * 8
print("Message Length : " + str(message_len) + " bytes {" + str(message_len_bits) + " bits}")

Message Length : 3 bytes {24 bits}


In [3]:
# Encode string to bytes
message_b = message.encode('utf-8')

In [4]:
# Calculate padding length
padding_len=56-message_len%64
padding_len=64 if (padding_len==0) else padding_len
print("Padding Length : " + str(padding_len) + " bytes {" + str(padding_len * 8) + " bits}")

Padding Length : 53 bytes {424 bits}


In [5]:
# Display Padded Message, length and calculation.
message_mod448 = message_b + b'\x80' + b'\x00' * (padding_len-1)
print("Padded Message :\n"+str(message_mod448))
print("\nlength(paddedMessage)      : "+str(len(message_mod448))+" bytes {"+str(len(message_mod448*8))+" bits}\nlength(paddedMessage) % 64 : "+str(len(message_mod448)%64)+" bytes {"+str((len(message_mod448)%64) * 8)+" bits}" )

Padded Message :
b'abc\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

length(paddedMessage)      : 56 bytes {448 bits}
length(paddedMessage) % 64 : 56 bytes {448 bits}


### Step 2 - Append Length

The bit length of the original message is appened to this _64 bits short of %512 bit_ message. This bit length is appeneded as an 8 byte {64 bits} little endian integer.

So, a message of length 14 bytes (_try test case # 3_) would have a bit length of 112 bits and the appended 64 bit little endian bit length would be `0x7000000000000000` (as hex) or `b'p\x00\x00\x00\x00\x00\x00\x00'` (as a byte string). If the message length is $> 2^{64}$ bits, only the lower 64 bits are used for padding.

In [6]:
# Append Length
processed_message=message_mod448+(message_len_bits%2**64).to_bytes(8,byteorder='big')
print("LSB64(len(unPaddedMessage)) : "+str((message_len_bits%2**64).to_bytes(8,byteorder='big')))
print("length( paddedMessage | LSB64(len(unPaddedMessage)) ) : "+str(len(processed_message))+" bytes {"+str(len(processed_message)*8)+" bits}")
print("\nPadded Message | LSB64(len(unPaddedMessage)) :\n"+str(processed_message))

LSB64(len(unPaddedMessage)) : b'\x00\x00\x00\x00\x00\x00\x00\x18'
length( paddedMessage | LSB64(len(unPaddedMessage)) ) : 64 bytes {512 bits}

Padded Message | LSB64(len(unPaddedMessage)) :
b'abc\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18'


### Step 3 - Initilize Context

In [7]:
h0 = 0x6a09e667
h1 = 0xbb67ae85
h2 = 0x3c6ef372
h3 = 0xa54ff53a
h4 = 0x510e527f
h5 = 0x9b05688c
h6 = 0x1f83d9ab
h7 = 0x5be0cd19

### Step 4 - Process Message in 16-Word Blocks

In [8]:
# Rotate Right
def rotr(x,s):
    return ( (x>>s) | x<<(32-s))& 0xFFFFFFFF

# Rotate Left
def rotl(x,s):
    return ( (x<<s) | x>>(32-s))& 0xFFFFFFFF

# Shift Right
def shr(x,s):
    return (x>>s) & 0xFFFFFFFF

'''
# Auxulary functions that take in 3x 32bit words and an index integer and return 1x32bit word.
def F(t, X, Y, Z):
    calc = 0;
    if (0<=t & t<=19):
        calc = (X|Y)^(X|Z)
    elif (20<=t & t<=39):
        calc = X^Y^Z
    elif (40<=t & t<=59):
        calc = (X&Y) | (Y&Z) | (Y&Z)
    elif (60<=t & t<=79):
        calc = X^Y^Z
    else:
        raise ValueError('t is not in the range 0<=j<=79 !')
    return calc
'''

def Ch(X, Y, Z):
    return (X&Y)^(~X&Z)

def Maj(X, Y, Z):
    return (X&Y)^(X&Z)^(Y&Z)

def SIGMA0(V):
    return rotr(V,2)^rotr(V,13)^rotr(V,22)

def SIGMA1(V):
    return rotr(V,6)^rotr(V,11)^rotr(V,25)

def sig0(V):
    return rotr(V,7)^rotr(V,18)^shr(V,3)

def sig1(V):
    return rotr(V,17)^rotr(V,19)^shr(V,10)

# Split Message M to words 16x32-bit words
def words(M):
    word_list=[0]*64
    for i in range (0,16):
        word_list[i]=int.from_bytes(M[i*4:i*4+4],byteorder='big')
    return word_list

** function correspondence to FIPS 180-2 document**

$\Sigma^{256}_0$ = `SIGMA0()`

$\Sigma^{256}_1$ = `SIGMA1()`

$\sigma^{256}_0$ = `sig0()`

$\sigma^{256}_1$ = `sig1()`

In [9]:
# SHA-256 Constant Table
K = [0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174, 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da, 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967, 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85, 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070, 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3, 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2]

In [10]:
# Print functions
def print_state(t,a,b,c,d,e,f,g,h):
    print("{:2d} : {:08x} {:08x} {:08x} {:08x} {:08x} {:08x} {:08x} {:08x}".format(t,a,b,c,d,e,f,g,h))

In [11]:
# Loop though the various 512 bit blocks of a long message.
N = int(len(processed_message)/64)
for i in range(0,N):
    M = processed_message[i:i+64]
    print("PROCESSING bytes "+str(i)+"..."+str(i+64))
    print("\nMessage chunk being processed :\n"+str(M)+" \n")
    W = words(M)
    # Prepare Message Schedule
    for t in range (16,64):
        W[t] = (sig1(W[t-2]) + W[t-7] + sig0(W[t-15]) + W[t-16]) & 0xFFFFFFFF
    # Initilize state variables
    [A,B,C,D,E,F,G,H]=[h0,h1,h2,h3,h4,h5,h6,h7]
    print(" t   A        B        C        D        E        F        G        H")
    for t in range (0,64):
        T1 = (H + SIGMA1(E) + Ch(E,F,G) + K[t] + W[t]) & 0xFFFFFFFF
        T2 = (SIGMA0(A) + Maj(A,B,C)) & 0xFFFFFFFF
        [H, G, F, E, D, C, B, A] = [G, F, E, (D+T1)& 0xFFFFFFFF, C, B, A, (T1+T2)& 0xFFFFFFFF]
        print_state(t,A,B,C,D,E,F,G,H)
    [h0,h1,h2,h3,h4,h5,h6,h7]=[(h0+A)& 0xFFFFFFFF, (h1+B)& 0xFFFFFFFF, (h2+C)& 0xFFFFFFFF, (h3+D)& 0xFFFFFFFF, (h4+E)& 0xFFFFFFFF, (h5+F)& 0xFFFFFFFF, (h6+G)& 0xFFFFFFFF, (h7+H)& 0xFFFFFFFF]

    # Display Updated MD Buffers
    print("\n*** SHA Buffers after processing chunk ***\n[H0..H7] = "+"[{:8x} {:8x} {:8x} {:8x} {:8x} {:8x} {:8x} {:8x}]".format(h0,h1,h2,h3,h4,h5,h6,h7)+"\n\n")

PROCESSING bytes 0...64

Message chunk being processed :
b'abc\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18' 

 t   A        B        C        D        E        F        G        H
 0 : 5d6aebcd 6a09e667 bb67ae85 3c6ef372 fa2a4622 510e527f 9b05688c 1f83d9ab
 1 : 5a6ad9ad 5d6aebcd 6a09e667 bb67ae85 78ce7989 fa2a4622 510e527f 9b05688c
 2 : c8c347a7 5a6ad9ad 5d6aebcd 6a09e667 f92939eb 78ce7989 fa2a4622 510e527f
 3 : d550f666 c8c347a7 5a6ad9ad 5d6aebcd 24e00850 f92939eb 78ce7989 fa2a4622
 4 : 04409a6a d550f666 c8c347a7 5a6ad9ad 43ada245 24e00850 f92939eb 78ce7989
 5 : 2b4209f5 04409a6a d550f666 c8c347a7 714260ad 43ada245 24e00850 f92939eb
 6 : e5030380 2b4209f5 04409a6a d550f666 9b27a401 714260ad 43ada245 24e00850
 7 : 85a07b5f e5030380 2b4209f5 04409a6a 0c657a79 9b27a401 714260ad 43ada245
 8 :

In [12]:
# Compute output hash from the MD buffers.
output_hash = '{}'.format(''.join('{:08x}'.format(x) for x in [h0,h1,h2,h3,h4,h5,h6,h7]))

# The MD5 hash starts with the lowest order byte of A ... highest order byte of D
print("OUTPUT      : "+output_hash)
print("REF. Hash   : 0x"+ref_hash)

OUTPUT      : ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
REF. Hash   : 0x???


## Compare with Python's `hashlib`

In [13]:
import hashlib

In [14]:
H = hashlib.new('SHA256')
H.update(message_b)
sha1hash=H.hexdigest()
print("Hashlib SHA1 : 0x"+sha1hash)

Hashlib SHA1 : 0xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad


### References

[SHA1](https://www.ietf.org/rfc/rfc3174.txt)
2. [Wikipedia](https://en.wikipedia.org/wiki/SHA1)
3. [Rosetta Code](https://rosettacode.org/wiki/SHA1/Implementation#Python)
4. [Merkle Damgård construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction)