# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Manual-implementation-of-some-hash-functions" data-toc-modified-id="Manual-implementation-of-some-hash-functions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Manual implementation of some hash functions</a></div><div class="lev2 toc-item"><a href="#What-is-a-hash-function?" data-toc-modified-id="What-is-a-hash-function?-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>What is a hash function?</a></div><div class="lev2 toc-item"><a href="#Common-API-for-the-different-classes" data-toc-modified-id="Common-API-for-the-different-classes-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Common API for the different classes</a></div><div class="lev2 toc-item"><a href="#Checking-the-the-hashlib-module-in-Python-standard-library" data-toc-modified-id="Checking-the-the-hashlib-module-in-Python-standard-library-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Checking the <a href="https://docs.python.org/3/library/hashlib.html" target="_blank">the <code>hashlib</code> module in Python standard library</a></a></div><div class="lev2 toc-item"><a href="#First-stupid-example:-a-stupid-hashing-function" data-toc-modified-id="First-stupid-example:-a-stupid-hashing-function-14"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>First stupid example: a stupid hashing function</a></div><div class="lev2 toc-item"><a href="#First-real-example:-the-MD5-hashing-function" data-toc-modified-id="First-real-example:-the-MD5-hashing-function-15"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>First real example: the MD5 hashing function</a></div><div class="lev3 toc-item"><a href="#Useful-functions-for-the-MD5-algorithm" data-toc-modified-id="Useful-functions-for-the-MD5-algorithm-151"><span class="toc-item-num">1.5.1&nbsp;&nbsp;</span>Useful functions for the MD5 algorithm</a></div><div class="lev3 toc-item"><a href="#The-MD5-class" data-toc-modified-id="The-MD5-class-152"><span class="toc-item-num">1.5.2&nbsp;&nbsp;</span>The <code>MD5</code> class</a></div><div class="lev3 toc-item"><a href="#First-check-on-MD5" data-toc-modified-id="First-check-on-MD5-153"><span class="toc-item-num">1.5.3&nbsp;&nbsp;</span>First check on MD5</a></div><div class="lev3 toc-item"><a href="#A-less-stupid-check-on-MD5" data-toc-modified-id="A-less-stupid-check-on-MD5-154"><span class="toc-item-num">1.5.4&nbsp;&nbsp;</span>A less stupid check on MD5</a></div><div class="lev3 toc-item"><a href="#Trying-1000-random-examples" data-toc-modified-id="Trying-1000-random-examples-155"><span class="toc-item-num">1.5.5&nbsp;&nbsp;</span>Trying 1000 random examples</a></div><div class="lev2 toc-item"><a href="#Second-real-example:-the-SHA1-hashing-function" data-toc-modified-id="Second-real-example:-the-SHA1-hashing-function-16"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Second real example: the SHA1 hashing function</a></div><div class="lev3 toc-item"><a href="#Useful-functions-the-SHA-1-algorithm" data-toc-modified-id="Useful-functions-the-SHA-1-algorithm-161"><span class="toc-item-num">1.6.1&nbsp;&nbsp;</span>Useful functions the SHA-1 algorithm</a></div><div class="lev3 toc-item"><a href="#The-SHA1-class" data-toc-modified-id="The-SHA1-class-162"><span class="toc-item-num">1.6.2&nbsp;&nbsp;</span>The <code>SHA1</code> class</a></div><div class="lev3 toc-item"><a href="#First-check-on-SHA-1" data-toc-modified-id="First-check-on-SHA-1-163"><span class="toc-item-num">1.6.3&nbsp;&nbsp;</span>First check on SHA-1</a></div><div class="lev3 toc-item"><a href="#A-less-stupid-check-on-SHA-1" data-toc-modified-id="A-less-stupid-check-on-SHA-1-164"><span class="toc-item-num">1.6.4&nbsp;&nbsp;</span>A less stupid check on SHA-1</a></div><div class="lev3 toc-item"><a href="#Trying-1000-random-examples" data-toc-modified-id="Trying-1000-random-examples-165"><span class="toc-item-num">1.6.5&nbsp;&nbsp;</span>Trying 1000 random examples</a></div><div class="lev2 toc-item"><a href="#Conclusion" data-toc-modified-id="Conclusion-17"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Conclusion</a></div>

# Manual implementation of some hash functions

This small [Jupyter notebook](https://www.Jupyter.org/) is a short experiment, to see if I can implement the some basic [Hashing functions](https://en.wikipedia.org/wiki/Hash_function), more specifically [cryptographic hashing functions](https://en.wikipedia.org/wiki/Cryptographic_hash_function), like `MD5`, `SHA1`, etc

And then I want compare my manual implementations with the functions implemented in [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html).
Ideally, my implementation should work exactly like the reference one, only slower!


- *Reference*: Wikipedia pages on [Hash functions](https://en.wikipedia.org/wiki/Hash_function), [MD5](https://en.wikipedia.org/wiki/MD5), and [SHA1](https://en.wikipedia.org/wiki/SHA1), as well as [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html).
- *Date*: 13 May 2017 (first part about MD5), 19 June 2017 (second part about SHA1).
- *Author*: [Lilian Besson](https://GitHub.com/Naereen/notebooks).
- *License*: [MIT Licensed](https://LBesson.MIT-License.org/).

----
## What is a hash function?
> TL;DR : [Hash functions](https://en.wikipedia.org/wiki/Hash_function) and [cryptographic hashing functions](https://en.wikipedia.org/wiki/Cryptographic_hash_function) on Wikipedia.

----
## Common API for the different classes

I will copy the API proposed by [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html), so it will be very easy to compare my implementations with the one provided with your default [Python](https://www.Python.org/) installation.

In [1]:
class Hash(object):
    """ Common class for all hash methods.
    
    It copies the one of the hashlib module (https://docs.python.org/3.5/library/hashlib.html).
    """
    
    def __init__(self, *args, **kwargs):
        """ Create the Hash object."""
        self.digest_size = 0  # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.digest_size
        self.block_size  = 0  # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.block_size
        self.name = self.__class__.__name__      # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.name

    def __str__(self):
        return self.name
        
    def update(self, arg):
        """ Update the hash object with the object arg, which must be interpretable as a buffer of bytes."""
        pass

    def digest(self):
        """ Return the digest of the data passed to the update() method so far. This is a bytes object of size digest_size which may contain bytes in the whole range from 0 to 255."""
        return b""

    def hexdigest(self):
        """ Like digest() except the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments."""
        return self.digest().hex()

----
## Checking the [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html)

In [2]:
import hashlib

We can check [the available algorithms](https://docs.python.org/3.5/library/hashlib.html#hashlib.algorithms_available), some of them being [guaranteed to be on any platform](https://docs.python.org/3.5/library/hashlib.html#hashlib.algorithms_guaranteed), some are not.

In [4]:
list(hashlib.algorithms_available)

['SHA1',
 'sha256',
 'dsaWithSHA',
 'SHA',
 'SHA224',
 'whirlpool',
 'SHA256',
 'sha384',
 'MD4',
 'md4',
 'sha',
 'SHA384',
 'md5',
 'SHA512',
 'ripemd160',
 'DSA',
 'RIPEMD160',
 'sha224',
 'DSA-SHA',
 'sha512',
 'dsaEncryption',
 'MD5',
 'ecdsa-with-SHA1',
 'sha1']

I will need at least these ones:

In [5]:
assert 'MD5' in hashlib.algorithms_available
assert 'SHA1' in hashlib.algorithms_available

----
## First stupid example: a stupid hashing function

This "stupid" hashing function will use `digest_size` of 128 bytes, and compute it by ... just looking at the first 128 bytes of the input data.

This is just to check the API and how to read from a bytes buffer.

In [6]:
class HeaderHash(Hash):
    """ This "stupid" hashing function will use `digest_size` of 128 bytes, and compute it by ... just looking at the first 128 bytes of the input data.
    """
    
    def __init__(self):
        # Common part
        self.digest_size = 16
        self.block_size  = 16
        self.name = "Header"
        # Specific part
        self._data = b""

    def update(self, arg):
        """ Update the hash object with the object arg, which must be interpretable as a buffer of bytes."""
        if len(self._data) == 0:
            self._data = arg[:self.block_size]

    def digest(self):
        """ Return the digest of the data passed to the update() method so far. This is a bytes object of size digest_size which may contain bytes in the whole range from 0 to 255."""
        return self._data

Let us try it:

In [7]:
h1 = HeaderHash()

In [8]:
h1
print(h1)

<__main__.HeaderHash at 0x7ff83f5d9a20>

Header


Let us use some toy data, to test here and after.

In [9]:
data = b"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" * 100

In [10]:
h1.update(data)
h1.digest()

b'0123456789ABCDEF'

In [11]:
h1.hexdigest()
len(_)

'30313233343536373839414243444546'

32

> Well... It seems to work, even if this first example is stupid.

----
## First real example: the MD5 hashing function
Let start with a simple one: [the MD5 hashing function](https://en.wikipedia.org/wiki/MD5), from Rivest in 1992.

<center><span style="font-size: large; color: red;">**Warning**: it is considered broken since at least 2012, never use it for security purposes.</span></center>

### Useful functions for the MD5 algorithm
Instead of writing the complete MD5 algorithm in the class below, I preferred to define here some useful functions, using [Bitwise operators](https://wiki.python.org/moin/BitwiseOperators).

In [12]:
def MD5_f1(b, c, d):
    """ First ternary bitwise operation."""
    return ((b & c) | ((~b) & d)) & 0xFFFFFFFF

def MD5_f2(b, c, d):
    """ Second ternary bitwise operation."""
    return ((b & d) | (c & (~d))) & 0xFFFFFFFF

def MD5_f3(b, c, d):
    """ Third ternary bitwise operation."""
    return (b ^ c ^ d) & 0xFFFFFFFF

def MD5_f4(b, c, d):
    """ Forth ternary bitwise operation."""
    return (c ^ (b | (~d))) & 0xFFFFFFFF

In [13]:
def leftrotate(x, c):
    """ Left rotate the number x by c byes."""
    x &= 0xFFFFFFFF
    return ((x << c) | (x >> (32 - c))) & 0xFFFFFFFF

In [14]:
from math import floor, sin

### The `MD5` class

It is a direct implementation of the pseudo-code, as given for instance on the Wikipedia page, or the original research article by Rivest.

In [15]:
class MD5(Hash):
    """MD5 hashing, see https://en.wikipedia.org/wiki/MD5#Algorithm."""
    
    def __init__(self):
        self.name = "MD5"
        self.block_size  = 16
        self.digest_size = 16
        # Internal data
        s = [0] * 64
        K = [0] * 64
        # Initialize s, s specifies the per-round shift amounts
        s[ 0:16] = [7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22]
        s[16:32] = [5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20]
        s[32:48] = [4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23]
        s[48:64] = [6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21]
        # Store it
        self._s = s
        # Use binary integer part of the sines of integers (Radians) as constants:
        for i in range(64):
            K[i] = floor(2**32 * abs(sin(i + 1))) & 0xFFFFFFFF
        # Store it
        self._K = K
        # Initialize variables:
        a0 = 0x67452301   # A
        b0 = 0xefcdab89   # B
        c0 = 0x98badcfe   # C
        d0 = 0x10325476   # D
        self.hash_pieces = [a0, b0, c0, d0]
    
    def update(self, arg):
        s, K = self._s, self._K
        a0, b0, c0, d0 = self.hash_pieces
        # 1. Pre-processing
        data = bytearray(arg)
        orig_len_in_bits = (8 * len(data)) & 0xFFFFFFFFFFFFFFFF
        # 1.a. Add a single '1' bit at the end of the input bits
        data.append(0x80)
        # 1.b. Padding with zeros as long as the input bits length ≡ 448 (mod 512)
        while len(data) % 64 != 56:
            data.append(0)
        # 1.c. append original length in bits mod (2 pow 64) to message
        data += orig_len_in_bits.to_bytes(8, byteorder='little')
        assert len(data) % 64 == 0, "Error in padding"
        # 2. Computations
        # Process the message in successive 512-bit = 64-bytes chunks:
        for offset in range(0, len(data), 64):
            # 2.a. 512-bits = 64-bytes chunks
            chunks = data[offset : offset + 64]
            # 2.b. Break chunk into sixteen 32-bit = 4-bytes words M[j], 0 ≤ j ≤ 15
            A, B, C, D = a0, b0, c0, d0
            # 2.c. Main loop
            for i in range(64):
                if 0 <= i <= 15:
                    F = MD5_f1(B, C, D)
                    g = i
                elif 16 <= i <= 31:
                    F = MD5_f2(B, C, D)
                    g = (5 * i + 1) % 16
                elif 32 <= i <= 47:
                    F = MD5_f3(B, C, D)
                    g = (3 * i + 5) % 16
                elif 48 <= i <= 63:
                    F = MD5_f4(B, C, D)
                    g = (7 * i) % 16
                # Be wary of the below definitions of A, B, C, D
                to_rotate = A + F + K[i] + int.from_bytes(chunks[4*g : 4*g+4], byteorder='little')
                new_B = (B + leftrotate(to_rotate, s[i])) & 0xFFFFFFFF
                A, B, C, D = D, new_B, B, C
            # Add this chunk's hash to result so far:
            a0 = (a0 + A) & 0xFFFFFFFF
            b0 = (b0 + B) & 0xFFFFFFFF
            c0 = (c0 + C) & 0xFFFFFFFF
            d0 = (d0 + D) & 0xFFFFFFFF
        # 3. Conclusion
        self.hash_pieces = [a0, b0, c0, d0]

    def digest(self):
        return sum(x << (32 * i) for i, x in enumerate(self.hash_pieces))

    def hexdigest(self):
        """ Like digest() except the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments."""
        digest = self.digest()
        raw = digest.to_bytes(16, byteorder='little')
        return '{:032x}'.format(int.from_bytes(raw, byteorder='big'))

We can also write a function to directly compute the hex digest from some bytes data.

In [16]:
def hash_MD5(data):
    """ Shortcut function to directly receive the hex digest from MD5(data)."""
    h = MD5()
    if isinstance(data, str):
        data = bytes(data, encoding='utf8')
    h.update(data)
    return h.hexdigest()

<div style="text-align:right;"><blockquote> *Note:* [This page helped for debugging](https://rosettacode.org/wiki/MD5/Implementation#Python).</blockquote></div>

### First check on MD5

Let us try it:

In [17]:
h2 = MD5()
h2
print(h2)

<__main__.MD5 at 0x7ff83f5d0ba8>

MD5


In [18]:
h2.update(data)
h2.digest()

52666558089014014065978771967570616878

In [19]:
h2.hexdigest()

'2e224cd661b6b83e0f3a0a06cb359f27'

### A less stupid check on MD5

Let try the example from [MD5 Wikipedia page](https://en.wikipedia.org/wiki/MD5#MD5_hashes) :

In [20]:
hash_MD5("The quick brown fox jumps over the lazy dog")
assert hash_MD5("The quick brown fox jumps over the lazy dog") == '9e107d9d372bb6826bd81d3542a419d6'

'9e107d9d372bb6826bd81d3542a419d6'

Even a small change in the message will (with overwhelming probability) result in a mostly different hash, due to the avalanche effect. For example, adding a period to the end of the sentence:

In [21]:
hash_MD5("The quick brown fox jumps over the lazy dog.")
assert hash_MD5("The quick brown fox jumps over the lazy dog.") == 'e4d909c290d0fb1ca068ffaddf22cbd0'

'e4d909c290d0fb1ca068ffaddf22cbd0'

The hash of the zero-length string is:

In [22]:
hash_MD5("")
assert hash_MD5("") == 'd41d8cd98f00b204e9800998ecf8427e'

'd41d8cd98f00b204e9800998ecf8427e'

$\implies$ We obtained the same result, OK our function works!

### Trying 1000 random examples
On a small sentence:

In [23]:
hash_MD5("My name is Zorro !")

'0ad8cb82874690906cf732223adeebbe'

In [24]:
h = hashlib.md5()
h.update(b"My name is Zorro !")
h.hexdigest()

'0ad8cb82874690906cf732223adeebbe'

It starts to look good.

In [25]:
def true_hash_MD5(data):
    h = hashlib.md5()
    if isinstance(data, str):
        data = bytes(data, encoding='utf8')
    h.update(data)
    return h.hexdigest()

On some random data:

In [26]:
import numpy.random as nr
alphabets = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

def random_string(size=10000):
    return ''.join(alphabets[nr.randint(len(alphabets))] for _ in range(size))

In [27]:
random_string(10)

'K2wAP1dEGY'

In [28]:
from tqdm import tqdm_notebook as tqdm

In [45]:
for _ in tqdm(range(1000)):
    x = random_string()
    assert hash_MD5(x) == true_hash_MD5(x), "Error: x = {} gave two different MD5 hashes: my implementation = {} != hashlib implementation = {}...".format(x, hash_MD5(x), true_hash_MD5(x))




----
## Second real example: the SHA1 hashing function

Let now study and implement another hashing function, slightly harder to write but more secure: SHA1, "Secure Hash Algorithm, version 1".
See [the SHA1 hashing function](https://en.wikipedia.org/wiki/SHA-1) on Wikipedia, if needed, from Rivest in 1992.

<center><span style="font-size: large; color: red;">**Warning**: it is considered broken since at least 2011, it is not advised to use it for real security purposes. SHA-2 or SHA-3 is better advised.</span></center>

### Useful functions the SHA-1 algorithm

Pretty similar to the ones used for the MD5 algorithm.

In [32]:
def SHA1_f1(b, c, d):
    """ First ternary bitwise operation."""
    return ((b & c) | ((~b) & d)) & 0xFFFFFFFF

def SHA1_f2(b, c, d):
    """ Second ternary bitwise operation."""
    return ((b & d) | (c & (~d))) & 0xFFFFFFFF

def SHA1_f3(b, c, d):
    """ Third ternary bitwise operation."""
    return (b ^ c ^ d) & 0xFFFFFFFF

def SHA1_f4(b, c, d):
    """ Forth ternary bitwise operation."""
    return (c ^ (b | (~d))) & 0xFFFFFFFF

def SHA1_f5(b, c, d):
    """ Fifth ternary bitwise operation."""
    return (c ^ (b | (~d))) & 0xFFFFFFFF

This is exactly like for MD5.

In [33]:
def leftrotate(x, c):
    """ Left rotate the number x by c byes."""
    x &= 0xFFFFFFFF
    return ((x << c) | (x >> (32 - c))) & 0xFFFFFFFF

In [34]:
from math import floor, sin

### The `SHA1` class

I will use a simple class, very similar to the class used for the MD5 algorithm (see above).
It is a direct implementation of the pseudo-code, as given for instance on the Wikipedia page.

In [40]:
class SHA1(Hash):
    """SHA1 hashing, see https://en.wikipedia.org/wiki/SHA-1#Algorithm."""
    
    def __init__(self):
        self.name        = "SHA1"
        self.block_size  = 16
        self.digest_size = 16
        # Initialize variables
        h0 = 0x67452301
        h1 = 0xEFCDAB89
        h2 = 0x98BADCFE
        h3 = 0x10325476
        h4 = 0xC3D2E1F0
        # Store them
        self.hash_pieces = [h0, h1, h2, h3, h4]
    
    def update(self, arg):
        h0, h1, h2, h3, h4 = self.hash_pieces
        # 1. Pre-processing, exactly like MD5
        data = bytearray(arg)
        orig_len_in_bits = (8 * len(data)) & 0xFFFFFFFFFFFFFFFF
        # 1.a. Add a single '1' bit at the end of the input bits
        data.append(0x80)
        # 1.b. Padding with zeros as long as the input bits length ≡ 448 (mod 512)
        while len(data) % 64 != 56:
            data.append(0)
        # 1.c. append original length in bits mod (2 pow 64) to message
        data += orig_len_in_bits.to_bytes(8, byteorder='little')
        assert len(data) % 64 == 0, "Error in padding"
        # 2. Computations
        # Process the message in successive 512-bit = 64-bytes chunks:
        for offset in range(0, len(data), 64):
            # 2.a. 512-bits = 64-bytes chunks
            chunks = data[offset : offset + 64]
            # 2.b. Break chunk into sixteen 32-bit = 4-bytes words M[j], 0 ≤ j ≤ 15
            a, b, c, d, e = h0, h1, h2, h3, h4
            # 2.c. Main loop
            for i in range(64):
                # FIXME this step is NOT AT ALL like MD5, I should finish it!
                if 0 <= i <= 15:
                    F = SHA1_f1(b, c, d)
                    g = i
                elif 16 <= i <= 31:
                    F = SHA1_f2(b, c, d)
                    g = (5 * i + 1) % 16
                elif 32 <= i <= 47:
                    F = SHA1_f3(b, c, d)
                    g = (3 * i + 5) % 16
                elif 48 <= i <= 63:
                    F = SHA1_f4(b, c, d)
                    g = (7 * i) % 16
                # FIXME finish!
                to_rotate = a + F + XXX + int.from_bytes(chunks[4*g : 4*g+4], byteorder='little')
                new_b = (b + leftrotate(to_rotate, XXX)) & 0xFFFFFFFF
                a, b, c, d, e = b, new_b, d, e, a
            # Add this chunk's hash to result so far:
            h0 = (h0 + a) & 0xFFFFFFFF
            h1 = (h1 + b) & 0xFFFFFFFF
            h2 = (h2 + c) & 0xFFFFFFFF
            h3 = (h3 + d) & 0xFFFFFFFF
            h4 = (h4 + e) & 0xFFFFFFFF
        # 3. Conclusion
        self.hash_pieces = [h0, h1, h2, h3, h4]

    def digest(self):
        return sum(x << (32 * i) for i, x in enumerate(self.hash_pieces))

    def hexdigest(self):
        """ Like digest() except the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments."""
        digest = self.digest()
        raw = digest.to_bytes(16, byteorder='little')
        return '{:032x}'.format(int.from_bytes(raw, byteorder='big'))

We can also write a function to directly compute the hex digest from some bytes data.

In [41]:
def hash_SHA1(data):
    """ Shortcut function to directly receive the hex digest from SHA1(data)."""
    h = SHA1()
    if isinstance(data, str):
        data = bytes(data, encoding='utf8')
    h.update(data)
    return h.hexdigest()

### First check on SHA-1

Let us try it:

In [42]:
h3 = SHA1()
h3
print(h3)

<__main__.SHA1 at 0x7ff8314e69b0>

SHA1


In [43]:
h3.update(data)
h3.digest()

NameError: name 'XXX' is not defined

In [19]:
h2.hexdigest()

'2e224cd661b6b83e0f3a0a06cb359f27'

### A less stupid check on SHA-1

Let try the example from [SHA-1 Wikipedia page](https://en.wikipedia.org/wiki/SHA-1#SHA-1_hashes) :

In [20]:
hash_SHA1("The quick brown fox jumps over the lazy dog")
assert hash_SHA1("The quick brown fox jumps over the lazy dog") == '9e107d9d372bb6826bd81d3542a419d6'

'9e107d9d372bb6826bd81d3542a419d6'

Even a small change in the message will (with overwhelming probability) result in a mostly different hash, due to the avalanche effect. For example, adding a period to the end of the sentence:

In [21]:
hash_SHA1("The quick brown fox jumps over the lazy dog.")
assert hash_SHA1("The quick brown fox jumps over the lazy dog.") == 'e4d909c290d0fb1ca068ffaddf22cbd0'

'e4d909c290d0fb1ca068ffaddf22cbd0'

The hash of the zero-length string is:

In [44]:
hash_SHA1("")
assert hash_SHA1("") == 'd41d8cd98f00b204e9800998ecf8427e'

NameError: name 'XXX' is not defined

$\implies$ We obtained the same result, OK our function works!

### Trying 1000 random examples
On a small sentence:

In [23]:
hash_SHA1("My name is Zorro !")

'0ad8cb82874690906cf732223adeebbe'

In [24]:
h = hashlib.sha1()
h.update(b"My name is Zorro !")
h.hexdigest()

'0ad8cb82874690906cf732223adeebbe'

It starts to look good.

In [25]:
def true_hash_SHA1(data):
    h = hashlib.sha1()
    if isinstance(data, str):
        data = bytes(data, encoding='utf8')
    h.update(data)
    return h.hexdigest()

On some random data:

In [27]:
random_string(10)

'K2wAP1dEGY'

In [28]:
from tqdm import tqdm_notebook as tqdm

In [29]:
for _ in tqdm(range(1000)):
    x = random_string()
    assert hash_SHA1(x) == true_hash_SHA1(x), "Error: x = {} gave two different SHA1 hashes: my implementation = {} != hashlib implementation = {}...".format(x, hash_SHA1(x), true_hash_SHA1(x))




----
## Conclusion

Well, it was fun and interesting to implement these hashing functions, manually.
Using [Python](https://www.Python.org) made it easy!

[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)
[![made-with-jupyter](https://img.shields.io/badge/Made%20for-Jupyter%20notebook-1f425f.svg)](https://www.jupyter.org/)
[![GitHub license](https://img.shields.io/github/license/Naereen/notebooks.svg)](https://github.com/Naereen/notebooks/blob/master/LICENSE)
[![ForTheBadge built-with-science](http://ForTheBadge.com/images/badges/built-with-science.svg)](https://GitHub.com/Naereen/)
[![ForTheBadge powered-by-electricity](http://ForTheBadge.com/images/badges/powered-by-electricity.svg)](http://ForTheBadge.com)

> See [my GitHub `notebooks` project](https://GitHub.com/Naereen/notebooks/) for others notebooks.