# Custom MD5 🕵️️

En este Notebook implementaremos la versión modificada del algoritmo `MD5` descrito en la [Pregunta 2](https://github.com/UC-IIC3253/2021/blob/main/tareas/tarea1/enunciado.pdf) de la Tarea 1 del curso IIC3253 Criptografía y Seguridad Computacional (v.2021-1).

Elaborado por: Vicente Merino

Github: [VicenteMerino](https://github.com/VicenteMerino)

## 1. Importación de librerías 📚
Primero importamos la librería `struct` y las funciones `sin` y `floor` de la librería `math` que nos ayudarán a implementar el algoritmo.

In [2]:
from math import sin, floor
import struct


## 2. Funciones útiles para la implementación del algoritmo 💉

A continuación tenemos una serie de funciones útiles que nos ayudarán a implementar el algoritmo, estas son:

*   `swap32`: Cambia el endiannes de un numero (entero y en base decimal).
*   `add_padding`: Agrega parte del padding de `MD5`. En particular agrega los 0s hasta que el largo del mensaje sea módulo 448 sea igual a 0, y ademas agrega el largo original codificado en binario. El primer 1 se agrega en el algoritmo mismo.
*   `leftrotate`, `F`, `G`, `H`, `I`, `FF`, `GG`, `HH` e `II` son las funciones auxiliares descritas en el [paper original](https://tools.ietf.org/html/rfc1321) de `MD5`, que es la implementación en la cual basamos nuestra solución.

In [3]:
"""
Auxiliar functions to be used in MD5 algorithm
"""

def swap32(i):
  """
  Changes the endianess of a given number
  """
  return struct.unpack("<I", struct.pack(">I", i))[0]

def add_padding(m: str, original_length: int) -> str:
  """
  Adds the 0 paddings such that len(m) mod 448 = 0, and also
  pads the original length
  """
  rest_512 = len(m) - int(len(m)/512)*512
  if rest_512 <= 448:
    m += '0'*(448-rest_512)
  else:
    m += '0'*(960-rest_512)
  

  binary_length = '0'*(64 - len(bin(original_length % 2**64)[2:])) +\
   bin(original_length % 2**64)[2:]

  m +=  binary_length[32:] + binary_length[:32]
  return m

def leftrotate(x: int, c: int) -> int:
  return ((x << c) | (x >> (32 - c)))

def F(x, y, z):
  return (((x) & (y)) | ((~x) & (z)))

def G(x, y, z):
  return (((x) & (z)) | ((y) & (~z)))

def H(x, y, z):
  return ((x) ^ (y) ^ (z))

def I(x, y, z):
  return ((y) ^ ((x) | (~z)))

def FF(a, b, c, d, x, s, ac):
  result = (a + F(b, c, d) + x + (ac)) % (2**32)
  result = leftrotate(result, s)
  result = (result + b) % (2**32)
  return result

def GG(a, b, c, d, x, s, ac):
  result = (a + G(b, c, d) + x + (ac)) % (2**32)
  result = leftrotate(result, s)
  result = (result + b) % (2**32)
  return result

def HH(a, b, c, d, x, s, ac):
  result = (a + H(b, c, d) + x + (ac)) % (2**32)
  result = leftrotate(result, s)
  result = (result + b) % (2**32)
  return result

def II(a, b, c, d, x, s, ac):
  result = (a + I(b, c, d) + x + (ac)) % (2**32)
  result = leftrotate(result, s)
  result = (result + b) % (2**32)
  return result

## 3. Implementación de `custom_md5` 📜

Ahora lo que hacemos es implementar el algoritmo `custom_md5`. Para ello lo primero que hacemos es leer el mensaje original y transformarlo a binario. Luego, le agregamos el padding descrito (1 en binario, los 0s hasta que sea congruente con 448 y el largo original codificado). Luego, transformamos la constante `h0` a un número binario de 128 bits. Después, definimos las constantes mágicas de `MD5` ([Nothing up my sleeves numbers](https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number)). Después obtenemos los valores de las variables `a0`, `b0`, `c0` y `d0` a partir de `h0`, obteniendo segmentos de 32 bits cada una. Luego iteramos por cada segmento de 512 bits del mensaje (con padding), este segmento lo dividimos en partes de 32 bits y obtenemos los nuevos valores de las variables (`a0`, `b0`, `c0`, `d0`), de la forma definida en el paper. Cuando terminamos de iterar sobre el mensaje, retornamos el append de cada variable. Pero por nuestra implementación las tenemos con el endianess cambiado, por lo que antes de appendear todas las variables, es necesario cambiar su endianness. El resultado se entrega como un string de un número hexadecimal.

In [4]:
def custom_md5(m: str, h0: int) -> str:

  ## Codified message:

  message_bytes = bytearray(m, 'utf-8')
  binary_list = ['0'*(8-len(bin(b)[2:])) + bin(b)[2:] for b in message_bytes]

  original_length = sum([len(b) for b in binary_list])
  binary_message = ''
  for i in range(int(len(binary_list)/4)):
    binary_message += ''.join(list(reversed(binary_list[4*i:4*i+4])))

  binary_message += '0'*(32 - len('10000000' + \
                 ''.join(list(reversed(binary_list[4*\
                  (int(len(binary_list)/4)):4*\
                  (int(len(binary_list)/4)+1)]))))) + '10000000' +\
                  ''.join(list(reversed(binary_list[4*\
                  (int(len(binary_list)/4)):4*(int(len(binary_list)/4)+1)])))
  binary_message = add_padding(binary_message, original_length)
  
  binary_h0 = '0'*(128 - len(bin(h0)[2:])) + bin(h0)[2:]


  ## Nothing up my sleeve numbers:

  S11 = 7
  S12 = 12
  S13 = 17
  S14 = 22
  S21 = 5
  S22 = 9
  S23 = 14
  S24 = 20
  S31 = 4
  S32 = 11
  S33 = 16
  S34 = 23
  S41 = 6
  S42 = 10
  S43 = 15
  S44 = 21
  K = list()
  for i in range(64):
    K.append(floor(2**32 * abs(sin(i + 1))))

  a0 = int(binary_h0[:32], 2)
  b0 = int(binary_h0[32:64], 2)
  c0 = int(binary_h0[64:96], 2)
  d0 = int(binary_h0[96:], 2)

  k_length = int(len(binary_message)/512)
  
  for i in range(k_length):
    M = []
    for j in range(16):
      M.append(binary_message[i*512:(i+1)*512][j*32:(j+1)*32])

    a = a0
    b = b0
    c = c0
    d = d0

    a = FF(a, b, c, d, int(M[ 0], 2), S11, 0xd76aa478)
    d = FF(d, a, b, c, int(M[ 1], 2), S12, 0xe8c7b756)
    c = FF(c, d, a, b, int(M[ 2], 2), S13, 0x242070db)
    b = FF(b, c, d, a, int(M[ 3], 2), S14, 0xc1bdceee)
    a = FF(a, b, c, d, int(M[ 4], 2), S11, 0xf57c0faf)
    d = FF(d, a, b, c, int(M[ 5], 2), S12, 0x4787c62a)
    c = FF(c, d, a, b, int(M[ 6], 2), S13, 0xa8304613)
    b = FF(b, c, d, a, int(M[ 7], 2), S14, 0xfd469501)
    a = FF(a, b, c, d, int(M[ 8], 2), S11, 0x698098d8)
    d = FF(d, a, b, c, int(M[ 9], 2), S12, 0x8b44f7af)
    c = FF(c, d, a, b, int(M[10], 2), S13, 0xffff5bb1)
    b = FF(b, c, d, a, int(M[11], 2), S14, 0x895cd7be)
    a = FF(a, b, c, d, int(M[12], 2), S11, 0x6b901122)
    d = FF(d, a, b, c, int(M[13], 2), S12, 0xfd987193)
    c = FF(c, d, a, b, int(M[14], 2), S13, 0xa679438e)
    b = FF(b, c, d, a, int(M[15], 2), S14, 0x49b40821)

    a = GG(a, b, c, d, int(M[ 1], 2), S21, 0xf61e2562)
    d = GG(d, a, b, c, int(M[ 6], 2), S22, 0xc040b340)
    c = GG(c, d, a, b, int(M[11], 2), S23, 0x265e5a51)
    b = GG(b, c, d, a, int(M[ 0], 2), S24, 0xe9b6c7aa)
    a = GG(a, b, c, d, int(M[ 5], 2), S21, 0xd62f105d)
    d = GG(d, a, b, c, int(M[10], 2), S22,  0x2441453)
    c = GG(c, d, a, b, int(M[15], 2), S23, 0xd8a1e681)
    b = GG(b, c, d, a, int(M[ 4], 2), S24, 0xe7d3fbc8)
    a = GG(a, b, c, d, int(M[ 9], 2), S21, 0x21e1cde6)
    d = GG(d, a, b, c, int(M[14], 2), S22, 0xc33707d6)
    c = GG(c, d, a, b, int(M[ 3], 2), S23, 0xf4d50d87)
    b = GG(b, c, d, a, int(M[ 8], 2), S24, 0x455a14ed)
    a = GG(a, b, c, d, int(M[13], 2), S21, 0xa9e3e905)
    d = GG(d, a, b, c, int(M[ 2], 2), S22, 0xfcefa3f8)
    c = GG(c, d, a, b, int(M[ 7], 2), S23, 0x676f02d9)
    b = GG(b, c, d, a, int(M[12], 2), S24, 0x8d2a4c8a)

    a = HH(a, b, c, d, int(M[ 5], 2), S31, 0xfffa3942)
    d = HH(d, a, b, c, int(M[ 8], 2), S32, 0x8771f681)
    c = HH(c, d, a, b, int(M[11], 2), S33, 0x6d9d6122)
    b = HH(b, c, d, a, int(M[14], 2), S34, 0xfde5380c)
    a = HH(a, b, c, d, int(M[ 1], 2), S31, 0xa4beea44)
    d = HH(d, a, b, c, int(M[ 4], 2), S32, 0x4bdecfa9)
    c = HH(c, d, a, b, int(M[ 7], 2), S33, 0xf6bb4b60)
    b = HH(b, c, d, a, int(M[10], 2), S34, 0xbebfbc70)
    a = HH(a, b, c, d, int(M[13], 2), S31, 0x289b7ec6)
    d = HH(d, a, b, c, int(M[ 0], 2), S32, 0xeaa127fa)
    c = HH(c, d, a, b, int(M[ 3], 2), S33, 0xd4ef3085)
    b = HH(b, c, d, a, int(M[ 6], 2), S34,  0x4881d05)
    a = HH(a, b, c, d, int(M[ 9], 2), S31, 0xd9d4d039)
    d = HH(d, a, b, c, int(M[12], 2), S32, 0xe6db99e5)
    c = HH(c, d, a, b, int(M[15], 2), S33, 0x1fa27cf8)
    b = HH(b, c, d, a, int(M[ 2], 2), S34, 0xc4ac5665)

    a = II(a, b, c, d, int(M[ 0], 2), S41, 0xf4292244)
    d = II(d, a, b, c, int(M[ 7], 2), S42, 0x432aff97)
    c = II(c, d, a, b, int(M[14], 2), S43, 0xab9423a7)
    b = II(b, c, d, a, int(M[ 5], 2), S44, 0xfc93a039)
    a = II(a, b, c, d, int(M[12], 2), S41, 0x655b59c3)
    d = II(d, a, b, c, int(M[ 3], 2), S42, 0x8f0ccc92)
    c = II(c, d, a, b, int(M[10], 2), S43, 0xffeff47d)
    b = II(b, c, d, a, int(M[ 1], 2), S44, 0x85845dd1)
    a = II(a, b, c, d, int(M[ 8], 2), S41, 0x6fa87e4f)
    d = II(d, a, b, c, int(M[15], 2), S42, 0xfe2ce6e0)
    c = II(c, d, a, b, int(M[ 6], 2), S43, 0xa3014314)
    b = II(b, c, d, a, int(M[13], 2), S44, 0x4e0811a1)
    a = II(a, b, c, d, int(M[ 4], 2), S41, 0xf7537e82)
    d = II(d, a, b, c, int(M[11], 2), S42, 0xbd3af235)
    c = II(c, d, a, b, int(M[ 2], 2), S43, 0x2ad7d2bb)
    b = II(b, c, d, a, int(M[ 9], 2), S44, 0xeb86d391)
    
    a0 = (a0 + a)%2**32
    b0 = (b0 + b)%2**32
    c0 = (c0 + c)%2**32
    d0 = (d0 + d)%2**32
  
  
  A_hex = hex(a0)[2:]
  B_hex = hex(b0)[2:]
  C_hex = hex(c0)[2:]
  D_hex = hex(d0)[2:]


  # Must change the endianess, then append them all
  A_hex_little = '0'*(8 - len(hex(swap32(int(A_hex, 16)))[2:])) + hex(swap32(int(A_hex, 16)))[2:]
  B_hex_little = '0'*(8 - len(hex(swap32(int(B_hex, 16)))[2:])) + hex(swap32(int(B_hex, 16)))[2:]
  C_hex_little = '0'*(8 - len(hex(swap32(int(C_hex, 16)))[2:])) + hex(swap32(int(C_hex, 16)))[2:]
  D_hex_little = '0'*(8 - len(hex(swap32(int(D_hex, 16)))[2:])) + hex(swap32(int(D_hex, 16)))[2:]

  return A_hex_little + B_hex_little + C_hex_little + D_hex_little

## 4. Testing 🧪

Ahora buscamos testear nuestro algoritmo, para ello tenemos valores de texto y su valor en hexadecimal (obtenidos de un [generador de hashes](https://www.md5hashgenerator.com/) de `MD5` y del paper original), y lo compararemos con el resultado de nuestra implementación. Todo esto, obviamente testeado con la constante `h0` (`a0`, `b0`, `c0` y `d0`) del algoritomo original.

In [5]:
"""
Testing the algorithm, md5 hash values taken from 
https://www.md5hashgenerator.com/ and https://tools.ietf.org/html/rfc1321
"""

h0 = 0x67452301efcdab8998badcfe10325476

m1 = 'The quick brown fox jumps over the lazy dog'
m2 = 'The quick brown fox jumps over the lazy dog.'
m3 = 'The quick brown fox jumps over the lazy dog. The quick brown fox jumps '\
     'over the lazy dog. The quick brown fox jumps over the lazy dog. The quick'\
     ' brown fox jumps over the lazy dog. The quick brown fox jumps over the '\
     'lazy dog. The quick brown fox jumps over the lazy dog. The quick brown f'\
     'ox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.' 

m4 = 'The quick;; brown fox jumps over the lazy dog. The quick brown fox jumps '\
     'over the la;..zy dog. The quick brown fox jumps over the lazy dog. The quick'\
     ' brown fox jum.;;;ps over the lazy dog. The quick brown fox jumps over the '\
     'lazy dog. The quic,,,cgjhfgjfk brown fox jumps over the lazy dog. The quick brown f'\
     'ox jumps over the lazy dog. T.;-__jfgjfgjhe quick brown fox jumps over the lazy dog.'
m5 = ''
m6 = 'a'
m7 = 'abc'
m8 = 'message digest'
m9 = 'abcdefghijklmnopqrstuvwxyz'
m10 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'
m11 = '12345678901234567890123456789012345678901234567890123456789012345678901234567890'
h_m1 = '9e107d9d372bb6826bd81d3542a419d6'
h_m2 = 'e4d909c290d0fb1ca068ffaddf22cbd0'
h_m3 = 'a9c2ca93b6946b79fcbd898275674c61'
h_m4 = '5c50f23e577e42c1a3aaa0e80077d38c'
h_m5 = 'd41d8cd98f00b204e9800998ecf8427e'
h_m6 = '0cc175b9c0f1b6a831c399e269772661'
h_m7 = '900150983cd24fb0d6963f7d28e17f72'
h_m8 = 'f96b697d7cb7938d525a2f31aaf161d0'
h_m9 = 'c3fcd3d76192e4007dfb496cca67e13b'
h_m10 = 'd174ab98d277d9f5a5611c2c9f419d9f'
h_m11 = '57edf4a22be3c955ac49da2e2107b67a'


print(f'Test 1: {h_m1 == custom_md5(m1, h0)}')
print(f'Test 2: {h_m2 == custom_md5(m2, h0)}')
print(f'Test 3: {h_m3 == custom_md5(m3, h0)}')
print(f'Test 4: {h_m4 == custom_md5(m4, h0)}')
print(f'Test 5: {h_m5 == custom_md5(m5, h0)}')
print(f'Test 6: {h_m6 == custom_md5(m6, h0)}')
print(f'Test 7: {h_m7 == custom_md5(m7, h0)}')
print(f'Test 8: {h_m8 == custom_md5(m8, h0)}')
print(f'Test 9: {h_m9 == custom_md5(m9, h0)}')
print(f'Test 10: {h_m10 == custom_md5(m10, h0)}')
print(f'Test 11: {h_m11 == custom_md5(m11, h0)}')

Test 1: True
Test 2: True
Test 3: True
Test 4: True
Test 5: True
Test 6: True
Test 7: True
Test 8: True
Test 9: True
Test 10: True
Test 11: True


Podemos ver que nuestro algoritmo funciona para todos los tests, correctamente  😁