# SHA1
** Implementation of the RFC-3174 US Secure Hash Algorithm 1 **

*... in Python*

[SHA1](https://www.ietf.org/rfc/rfc3174.txt)

### Define and Select Test Cases

In [1]:
test_case=[["","da39a3ee5e6b4b0d3255bfef95601890afd80709"],\
           ["a","34aa973cd4c4daa4f61eeb2bdbad27316534016f"],\
           ["abc","a9993e364706816aba3e25717850c26c9cd0d89d"],\
           ["abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq","84983e441c3bd26ebaae4aa1f95129e5e54670f1"],\
           ["abcdefghbcdefghicdefghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu",\
            "a49b2446a02c645bf419f995b67091253a04a259"]]
use_test_case = 3
##
message = test_case[use_test_case][0]
ref_hash = test_case[use_test_case][1]

### Step 1 - Append Padding Bits

The messsage to be hashed is padded to have a length equal to 8 bytes {64 bits} less than being a multiple of 64 bytes {512 bits}. The padding step is performed even if the message length is already of desired length. The padding bit string used is `1` followed by `0` - `100...000`

The message length is eventually 56 bytes {448 bits}, 120 bytes {960 bits}, 184 bytes {1472 bits}, 248 bytes {1984 bits} and so on.

In [2]:
message_len = len(message)
message_len_bits = message_len * 8
print("Message Length : " + str(message_len) + " bytes {" + str(message_len_bits) + " bits}")

Message Length : 56 bytes {448 bits}


In [3]:
# Encode string to bytes
message_b = message.encode('utf-8')

In [4]:
# Calculate padding length
padding_len=56-message_len%64
padding_len=64 if (padding_len==0) else padding_len
print("Padding Length : " + str(padding_len) + " bytes {" + str(padding_len * 8) + " bits}")

Padding Length : 64 bytes {512 bits}


In [5]:
# Display Padded Message, length and calculation.
message_mod448 = message_b + b'\x80' + b'\x00' * (padding_len-1)
print("Padded Message :\n"+str(message_mod448))
print("\nlength(paddedMessage)      : "+str(len(message_mod448))+" bytes {"+str(len(message_mod448*8))+" bits}\nlength(paddedMessage) % 64 : "+str(len(message_mod448)%64)+" bytes {"+str((len(message_mod448)%64) * 8)+" bits}" )

Padded Message :
b'abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

length(paddedMessage)      : 120 bytes {960 bits}
length(paddedMessage) % 64 : 56 bytes {448 bits}


### Step 2 - Append Length

The bit length of the original message is appened to this _64 bits short of %512 bit_ message. This bit length is appeneded as an 8 byte {64 bits} little endian integer.

So, a message of length 14 bytes (_try test case # 3_) would have a bit length of 112 bits and the appended 64 bit little endian bit length would be `0x7000000000000000` (as hex) or `b'p\x00\x00\x00\x00\x00\x00\x00'` (as a byte string). If the message length is $> 2^{64}$ bits, only the lower 64 bits are used for padding.

In [6]:
# Append Length
processed_message=message_mod448+(message_len_bits%2**64).to_bytes(8,byteorder='big')
print("LSB64(len(unPaddedMessage)) : "+str((message_len_bits%2**64).to_bytes(8,byteorder='big')))
print("length( paddedMessage | LSB64(len(unPaddedMessage)) ) : "+str(len(processed_message))+" bytes {"+str(len(processed_message)*8)+" bits}")
print("\nPadded Message | LSB64(len(unPaddedMessage)) :\n"+str(processed_message))

LSB64(len(unPaddedMessage)) : b'\x00\x00\x00\x00\x00\x00\x01\xc0'
length( paddedMessage | LSB64(len(unPaddedMessage)) ) : 128 bytes {1024 bits}

Padded Message | LSB64(len(unPaddedMessage)) :
b'abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xc0'


### Step 3 - Initilize Context

In [7]:
h0 = 0x67452301
h1 = 0xEFCDAB89
h2 = 0x98BADCFE
h3 = 0x10325476
h4 = 0xC3D2E1F0

### Step 4 - Process Message in 16-Word Blocks

In [8]:
# Auxulary functions that take in 3x 32bit words and an index integer and return 1x32bit word.

def F(t, B, C, D):
    calc = 0;
    if (0<=t & t<=19):
        calc = (B&C) | (D&~B)
    elif (20<=t & t<=39):
        calc = B^C^D
    elif (40<=t & t<=59):
        calc = (B&C) | (B&D) | (C&D)
    elif (60<=t & t<=79):
        calc = B^C^D
    else:
        raise ValueError('t is not in the range 0<=j<=79 !')
    return calc

In [9]:
# Constant Table
K = [0x5A827999]*20 + [0x6ED9EBA1]*20 + [0x8F1BBCDC]*20 +[0xCA62C1D6]*20

In [10]:
# Rotate Left
def rotl(x,s):
    return ( (x<<s) | x>>(32-s))& 0xFFFFFFFF

def words(M):
    word_list=[0]*80
    for i in range (0,16):
        word_list[i]=int.from_bytes(M[i*4:i*4+4],byteorder='big')
    return word_list

In [11]:
# Shift Table
R1_s=[7,12,17,22]*4
R2_s=[5, 9,14,20]*4
R3_s=[4,11,16,23]*4
R4_s=[6,10,15,21]*4

# K table (to use a sub-string of the message)
R1_k = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
R2_k = [1, 6, 11, 0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12]
R3_k = [5, 8, 11, 14, 1, 4, 7, 10, 13, 0, 3, 6, 9, 12, 15, 2]
R4_k = [0, 7, 14, 5, 12, 3, 10, 1, 8, 15, 6, 13, 4, 11, 2, 9]

In [12]:
def bytereverse(num32):
    rev_byte=0;
    for i in range(0,20):
        #print(hex(num32)+" "+hex(rev_byte))
        rev_byte = rev_byte << 8
        
        low_order_byte = num32 & 0xFF
        rev_byte = rev_byte | low_order_byte
        
        num32 = num32 >> 8
    return rev_byte

In [13]:
# Loop though the various 512 bit blocks of a long message.
for i in range(0,len(processed_message),64):
    M = processed_message[i:i+64]
    W = words(M)
    print("PROCESSING bytes "+str(i)+"..."+str(i+64))
    print("\nMessage chunk being processed :\n"+str(M)+" \n")
    for t in range (16,80):
        W[t]=rotl(W[t-3]^W[t-8]^W[t-14]^W[t-16],1)
        print(W[t])
    [A,B,C,D,E]=[h0,h1,h2,h3,h4]
    for t in range(0,80):
        T = (rotl(A,5) + F(t,B,C,D) + E + W[t] + K[t])& 0xFFFFFFFF
        [E,D,C,B,A]=[D,C,rotl(B,30),A,T]
    [h0,h1,h2,h3,h4]=[(h0+A)& 0xFFFFFFFF, (h1+B)& 0xFFFFFFFF, (h2+C)& 0xFFFFFFFF, (h3+D)& 0xFFFFFFFF, (h4+E)& 0xFFFFFFFF]

    # Display Updated MD Buffers
    print("\n*** SHA Buffers after processing chunk ***\n[H0..H4] = "+"[{:8x} {:8x} {:8x} {:8x} {:8x}]".format(h0,h1,h2,h3,h4)+"\n\n")

PROCESSING bytes 0...64

Message chunk being processed :
b'abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq\x80\x00\x00\x00\x00\x00\x00\x00' 

168180286
3638222047
3671908032
3369252030
1869970267
1971018087
2376166768
3673094842
4210477750
2801493850
207637704
1615642668
1237378786
787008444
3486985924
2548662542
160025576
4064822270
1547751610
2832845883
2998494830
3128652215
548571762
2995451136
2328393794
681238004
1456807713
3647700946
946399096
2875082824
2131632408
3733430545
3904788144
436921938
3444969928
1190078940
1624496749
3723737710
657978857
1216915216
3533772667
2576966459
3603340547
3438678651
2086207085
4258897000
4175634548
3770301565
354523350
2067732866
938661303
2241212990
2164756694
3198200130
298312113
1633210625
1607301081
2133791462
4167618481
3605633001
4112955474
3075968460
1440325108
536254808

*** SHA Buffers after processing chunk ***
[H0..H4] = [f4286818 c37b27ae  408f581 84677148 4a566572]


PROCESSING bytes 64...128

Message chunk being process

In [14]:
# Compute output hash from the MD buffers.
output_int = h0<<128 | h1<<96 | h2 <<64 | h3 << 32 | h4

# The MD5 hash starts with the lowest order byte of A ... highest order byte of D
print("OUTPUT      : "+hex(output_int))
print("REF. Hash   : 0x"+test_case[use_test_case][1])

OUTPUT      : 0x84983e441c3bd26ebaae4aa1f95129e5e54670f1
REF. Hash   : 0x84983e441c3bd26ebaae4aa1f95129e5e54670f1


## Compare with Python's `hashlib`

In [15]:
import hashlib

In [16]:
H = hashlib.new('SHA1')
H.update(message_b)
sha1hash=H.hexdigest()
print("Hashlib SHA1 : 0x"+sha1hash)

Hashlib SHA1 : 0x84983e441c3bd26ebaae4aa1f95129e5e54670f1


### References

[SHA1](https://www.ietf.org/rfc/rfc3174.txt)
2. [Wikipedia](https://en.wikipedia.org/wiki/SHA1)
3. [Rosetta Code](https://rosettacode.org/wiki/SHA1/Implementation#Python)
4. [Merkle Damgård construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction)