# SHA2-512
** Implementation of the Secure Hash Algorithm 2 **

*... in Python*

[SHA2](specs/FIPS-180-2_SHA-2_%282002+2004%29.pdf)

### Define and Select Test Cases

In [1]:
# [PlainText, SHA0Digest, SHA1Digest]
test_case=[["","???"],\
           ["a","???"],\
           ["abc","???"],\
           ["abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq","???"],\
           ["abcdefghbcdefghicdefghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu","???"]]
use_test_case = 3

# Assign test case to variables.
message = test_case[use_test_case][0]
ref_hash = test_case[use_test_case][1]

### Step 1 - Append Padding Bits

The message to be hashed is padded to have a length equal to 16 bytes {128 bits} less than being a multiple of 128 bytes {1024 bits}. The padding step is performed even if the message length is already of desired length. The padding bit string used is `1` followed by `0` - `100...000`

The padded message lengths is eventually 56 bytes {448 bits}, 120 bytes {960 bits}, 184 bytes {1472 bits}, 248 bytes {1984 bits} and so on.

The padding method is similar to MD4, with an exception of sizes.

In [2]:
message_len = len(message)
message_len_bits = message_len * 8
print("Message Length : " + str(message_len) + " bytes {" + str(message_len_bits) + " bits}")

Message Length : 56 bytes {448 bits}


In [3]:
# Encode string to bytes
message_b = message.encode('utf-8')

In [4]:
# Calculate padding length
padding_len=112-message_len%128
padding_len=128 if (padding_len==0) else padding_len
print("Padding Length : " + str(padding_len) + " bytes {" + str(padding_len * 8) + " bits}")

Padding Length : 56 bytes {448 bits}


In [5]:
# Display Padded Message, length and calculation.
message_mod871 = message_b + b'\x80' + b'\x00' * (padding_len-1)
print("Padded Message :\n"+str(message_mod871))
print("\nlength(paddedMessage)      : "+str(len(message_mod871))+" bytes {"+str(len(message_mod871*8))+" bits}\nlength(paddedMessage) % 128 : "+str(len(message_mod871)%128)+" bytes {"+str((len(message_mod871)%128) * 8)+" bits}" )

Padded Message :
b'abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

length(paddedMessage)      : 112 bytes {896 bits}
length(paddedMessage) % 128 : 112 bytes {896 bits}


### Step 2 - Append Length

The bit length of the original message is appended to this _128 bits short of %1024 bit_ message. This bit length is appended as an 16 byte {128 bits} little endian integer.

So, a message of length 56 bytes (_try test case # 3_) would have a bit length of 448 bits and the appended 128 bit little endian bit length would be `0x000000000000000000000000000001c0` (as hex) or `b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xc0'` (as a byte string). If the message length is $> 2^{128}$ bits, only the lower 128 bits are used for padding.

In [6]:
# Append Length
processed_message=message_mod871+(message_len_bits%2**128).to_bytes(16,byteorder='big')
print("LSB128(len(unPaddedMessage)) : "+str((message_len_bits%2**128).to_bytes(16,byteorder='big')))
print("length( paddedMessage | LSB64(len(unPaddedMessage)) ) : "+str(len(processed_message))+" bytes {"+str(len(processed_message)*8)+" bits}")
print("\nPadded Message | LSB64(len(unPaddedMessage)) :\n"+str(processed_message))

LSB128(len(unPaddedMessage)) : b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xc0'
length( paddedMessage | LSB64(len(unPaddedMessage)) ) : 128 bytes {1024 bits}

Padded Message | LSB64(len(unPaddedMessage)) :
b'abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xc0'


### Step 3 - Initialize Context
These numbers are defined in the standards - RFC-xxxx

In [7]:
h0 = 0x6a09e667f3bcc908
h1 = 0xbb67ae8584caa73b
h2 = 0x3c6ef372fe94f82b
h3 = 0xa54ff53a5f1d36f1
h4 = 0x510e527fade682d1
h5 = 0x9b05688c2b3e6c1f
h6 = 0x1f83d9abfb41bd6b
h7 = 0x5be0cd19137e2179

### Define various constants
These constants are also defined in the standards documents.

In [8]:
# SHA-512 Constant Table
K = [0x428a2f98d728ae22, 0x7137449123ef65cd, 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc, 0x3956c25bf348b538, 0x59f111f1b605d019, 0x923f82a4af194f9b, 0xab1c5ed5da6d8118, 0xd807aa98a3030242, 0x12835b0145706fbe, 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2, 0x72be5d74f27b896f, 0x80deb1fe3b1696b1, 0x9bdc06a725c71235, 0xc19bf174cf692694, 0xe49b69c19ef14ad2, 0xefbe4786384f25e3, 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65, 0x2de92c6f592b0275, 0x4a7484aa6ea6e483, 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5, 0x983e5152ee66dfab, 0xa831c66d2db43210, 0xb00327c898fb213f, 0xbf597fc7beef0ee4, 0xc6e00bf33da88fc2, 0xd5a79147930aa725, 0x06ca6351e003826f, 0x142929670a0e6e70, 0x27b70a8546d22ffc, 0x2e1b21385c26c926, 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df, 0x650a73548baf63de, 0x766a0abb3c77b2a8, 0x81c2c92e47edaee6, 0x92722c851482353b, 0xa2bfe8a14cf10364, 0xa81a664bbc423001, 0xc24b8b70d0f89791, 0xc76c51a30654be30, 0xd192e819d6ef5218, 0xd69906245565a910, 0xf40e35855771202a, 0x106aa07032bbd1b8, 0x19a4c116b8d2d0c8, 0x1e376c085141ab53, 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8, 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb, 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3, 0x748f82ee5defb2fc, 0x78a5636f43172f60, 0x84c87814a1f0ab72, 0x8cc702081a6439ec, 0x90befffa23631e28, 0xa4506cebde82bde9, 0xbef9a3f7b2c67915, 0xc67178f2e372532b, 0xca273eceea26619c, 0xd186b8c721c0c207, 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178, 0x06f067aa72176fba, 0x0a637dc5a2c898a6, 0x113f9804bef90dae, 0x1b710b35131c471b, 0x28db77f523047d84, 0x32caab7b40c72493, 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c, 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a, 0x5fcb6fab3ad6faec, 0x6c44198c4a475817]

** function correspondence to FIPS 180-2 document**

$\Sigma^{256}_0$ = `SIGMA0()`

$\Sigma^{256}_1$ = `SIGMA1()`

$\sigma^{256}_0$ = `sig0()`

$\sigma^{256}_1$ = `sig1()`

In [9]:
# Rotate Right
def rotr(x,s):
    return ( (x>>s) | x<<(64-s))& 0xFFFFFFFFFFFFFFFFFFFFFFFF

# Rotate Left
def rotl(x,s):
    return ( (x<<s) | x>>(64-s))& 0xFFFFFFFFFFFFFFFF

# Shift Right
def shr(x,s):
    return (x>>s) & 0xFFFFFFFFFFFFFFFF

def Ch(X, Y, Z):
    return (X&Y)^(~X&Z)

def Maj(X, Y, Z):
    return (X&Y)^(X&Z)^(Y&Z)

def SIGMA0(V):
    return rotr(V,28)^rotr(V,34)^rotr(V,39)

def SIGMA1(V):
    return rotr(V,14)^rotr(V,18)^rotr(V,41)

def sig0(V):
    return rotr(V,1)^rotr(V,8)^shr(V,7)

def sig1(V):
    return rotr(V,19)^rotr(V,61)^shr(V,6)

In [10]:
# Split Message M to words 16x32-bit words
def words(M):
    word_list=[0]*80
    for i in range (0,16):
        word_list[i]=int.from_bytes(M[i*8:i*8+8],byteorder='big')
    return word_list

In [11]:
# Print functions
def print_state(t,a,b,c,d,e,f,g,h):
    print("{:2d} : {:016x} {:016x} {:016x} {:016x}".format(t,a,b,c,d))
    print("     {:016x} {:016x} {:016x} {:016x}".format(e,f,g,h))    

### Step 4 - Process Message in 32-Word Blocks

In [12]:
# Loop though the various 1024 bit blocks of a long message.
N = int(len(processed_message)/128)
for i in range(0,N):
    M = processed_message[i:i+128]
    print("PROCESSING bytes "+str(i)+"..."+str(i+128))
    print("\nMessage chunk being processed :\n"+str(M)+" \n")
    W = words(M)
    # Prepare Message Schedule. Ensure that the words are 1024bit long (& 0xFFFFFFFFFFFFFFFF)
    for t in range (16,80):
        W[t] = (sig1(W[t-2]) + W[t-7] + sig0(W[t-15]) + W[t-16]) & 0xFFFFFFFFFFFFFFFF
    # Initilize state variables
    [A,B,C,D,E,F,G,H]=[h0,h1,h2,h3,h4,h5,h6,h7]
    print(" t   A/E              B/F              C/G              D/H")
    for t in range (0,80):
        T1 = (H + SIGMA1(E) + Ch(E,F,G) + K[t] + W[t]) & 0xFFFFFFFFFFFFFFFF
        T2 = (SIGMA0(A) + Maj(A,B,C)) & 0xFFFFFFFFFFFFFFFF
        [H, G, F, E, D, C, B, A] = [G, F, E, (D+T1)& 0xFFFFFFFFFFFFFFFF, C, B, A, (T1+T2)& 0xFFFFFFFFFFFFFFFF]
        print_state(t,A,B,C,D,E,F,G,H)
    [h0,h1,h2,h3,h4,h5,h6,h7]=[(h0+A)& 0xFFFFFFFFFFFFFFFF, (h1+B)& 0xFFFFFFFFFFFFFFFF, (h2+C)& 0xFFFFFFFFFFFFFFFF, (h3+D)& 0xFFFFFFFFFFFFFFFF, (h4+E)& 0xFFFFFFFFFFFFFFFF, (h5+F)& 0xFFFFFFFFFFFFFFFF, (h6+G)& 0xFFFFFFFFFFFFFFFF, (h7+H)& 0xFFFFFFFFFFFFFFFF]

    # Display Updated SHA Buffers
    print("\n*** SHA Buffers after processing chunk ***\n[H0..H7] = "+"[{:8x} {:8x} {:8x} {:8x} {:8x} {:8x} {:8x} {:8x}]".format(h0,h1,h2,h3,h4,h5,h6,h7)+"\n\n")

PROCESSING bytes 0...128

Message chunk being processed :
b'abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xc0' 

 t   A/E              B/F              C/G              D/H
 0 : f6afce9d1f60425a 6a09e667f3bcc908 bb67ae8584caa73b 3c6ef372fe94f82b
     58cb0218dd1883f6 510e527fade682d1 9b05688c2b3e6c1f 1f83d9abfb41bd6b
 1 : 708b4be26129a822 f6afce9d1f60425a 6a09e667f3bcc908 bb67ae8584caa73b
     26e75b12651b9748 58cb0218dd1883f6 510e527fade682d1 9b05688c2b3e6c1f
 2 : fa567898ae2e5460 708b4be26129a822 f6afce9d1f60425a 6a09e667f3bcc908
     f51f9cb6ef58b948 26e75b12651b9748 58cb0218dd1883f6 510e527fade682d1
 3 : 49bbd166c7ade22f fa567898ae2e5460 708b4be26129a822 f6afce9d1f60425a
     c903dd77323baf78

In [13]:
# Compute output hash from the MD buffers.
output_hash = '{}'.format(''.join('{:08x}'.format(x) for x in [h0,h1,h2,h3,h4,h5,h6,h7]))

# The MD5 hash starts with the lowest order byte of A ... highest order byte of D
print("OUTPUT            : "+output_hash)
print("REF. Hash         : 0x"+ref_hash)

OUTPUT            : 204a8fc6dda82f0aced7beb8e08a41657c16ef468b228a8279be331a703c33596fd15c13b1b07f9aa1d3bea57789ca031ad85c7a71dd70354ec631238ca3445
REF. Hash         : 0x???


## Compare with Python's `hashlib`

In [14]:
import hashlib

In [15]:
H = hashlib.new('SHA512')
H.update(message_b)
sha512hash=H.hexdigest()
print("Hashlib SHA512 : 0x"+sha512hash)

Hashlib SHA512 : 0x204a8fc6dda82f0a0ced7beb8e08a41657c16ef468b228a8279be331a703c33596fd15c13b1b07f9aa1d3bea57789ca031ad85c7a71dd70354ec631238ca3445


### References

[SHA1](https://www.ietf.org/rfc/rfc3174.txt)
2. [Wikipedia](https://en.wikipedia.org/wiki/SHA1)
3. [Rosetta Code](https://rosettacode.org/wiki/SHA1/Implementation#Python)
4. [Merkle Damgård construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction)