# Hash Functions

**Hash functions map data of any size to a fixed value, called a hash.**

**For example, every character is represented by a number when it is stored in a computer. In the old days, it was an ASCII value but nowadays, it is unicode, since ASCII cannot handle different languages.**

In [2]:
print(ord("a"))

print(ord("b"))

print(ord("z"))

97
98
122


In [3]:
data = [
    ("orange", "a sweet, orange, citrus fruit"), 
    ("apple", "good for making cider"), 
    ("lemon", "a sour, yellow citrus fruit"), 
    ("grape", "a small, sweet fruit growing in bunches"), 
    ("melon", "sweet and juicy")
]

**In theory, you could use `ord()` function to convert characters to integers in a simple hashing function:** 

In [4]:
def my_hash(s):
    # Start with first char in string
    basic_hash = ord(s[0])
    # Take the remainder after dividing by 10 (reduces range between 0-10)
    return basic_hash % 10

In [5]:
my_hash("my old piano")

9

In [6]:
for key, value in data:
    h = my_hash(key)
    print(key, h)

orange 1
apple 7
lemon 8
grape 3
melon 9


In [14]:
# Create list of empty strings

keys = [""] * 10

values = keys.copy()

print(values)

['', '', '', '', '', '', '', '', '', '']


In [15]:
for key, value in data:
    h = my_hash(key)
    print(key, h)
    keys[h] = key
    values[h] = value

orange 1
apple 7
lemon 8
grape 3
melon 9


In [17]:
print(keys)
print()
print(values)

['', 'orange', '', 'grape', '', '', '', 'apple', 'lemon', 'melon']

['', 'a sweet, orange, citrus fruit', '', 'a small, sweet fruit growing in bunches', '', '', '', 'good for making cider', 'a sour, yellow citrus fruit', 'sweet and juicy']


**The keys list is created by using the hash value (1, 7, 8, 3, 9) as indexes to insert the keys.**

**The values list is created by inserting the values at the same index locations.**

**Define a new function to get the value for each key, if it exists (otherwise `None`).**

In [18]:
# Function will crash if you input empty string

def get_value(k):
    hash_code = my_hash(k)
    
    if values[hash_code]:
        return values[hash_code]
    else:
        return None

In [20]:
value = get_value("grape")

print(value)

a small, sweet fruit growing in bunches


In [21]:
value = get_value("tomato")

print(value)

None


**All of this is a simple demonstration of how hash codes act like indexes, where each index stores a unit of information, and the computer uses it as reference, i.e. a hash table. The actual Python is more sophisticated, and dictionaries use hash tables to reference keys and values at super-fast speed.**

**A hash function would use all characters in a string, not just the first character. Python has a `hash()` function, which shows you exactly what hashing a string looks like:**

In [9]:
for key, value in data:
    h = hash(key)
    print(key, h)

orange 1950750167032421488
apple -3915572360825357569
lemon -191648817672587419
grape 20133853204347415
melon 1097979451567495088


**Hash functions are used a lot in cybersecurity, where passwords and emails are stored as hash codes thereby preventing illegal hacking. There are different hashing alorithms available in Python's `hashlib` module.**

**NOTE: Another term for a hash is a 'digest'.**

In [22]:
import hashlib

In [23]:
# Constants available in module to name algorithms you can use

print(sorted(hashlib.algorithms_guaranteed))
print()
print(sorted(hashlib.algorithms_available))

['blake2b', 'blake2s', 'md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'sha512', 'shake_128', 'shake_256']

['blake2b', 'blake2s', 'md4', 'md5', 'md5-sha1', 'mdc2', 'ripemd160', 'sha1', 'sha224', 'sha256', 'sha384', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'sha512', 'sha512_224', 'sha512_256', 'shake_128', 'shake_256', 'sm3', 'whirlpool']


## Hashing an email message

**Generate a hash code for a larger string, including punctuation. The hash functions available, e.g. `sha256()`, do not work with strings, and the text needs to be encoded to utf-8 code.**

In [30]:
python_program = """for i in range(10):
print(i)
"""

print(python_program)

for i in range(10):
print(i)



In [31]:
for b in python_program.encode('utf8'):
    print(b, chr(b))

102 f
111 o
114 r
32  
105 i
32  
105 i
110 n
32  
114 r
97 a
110 n
103 g
101 e
40 (
49 1
48 0
41 )
58 :
10 

112 p
114 r
105 i
110 n
116 t
40 (
105 i
41 )
10 



**As you can see, the number 32 represents whitespace character, 10 is a new line, 102 is the letter f, etc.**

**NOTE: The `chr()` function returns the character that the unicode number represents.**

In [32]:
original_hash = hashlib.sha256(python_program.encode('utf8'))

print(f"SHA256: {original_hash.hexdigest()}")

SHA256: 5033b46b90e4250ce294d67078ad62b516b4b65964488e4605c7d216263c1565


**The `hexdigest()` function produces a hexidecimal representation - 256 bit or 32 bytes number - of the secure hash. You can use this code to check whether the data has been modified in any way, i.e. tampered with.**

In [33]:
python_program += "print('code change')"

print(python_program)

for i in range(10):
print(i)
print('code change')


In [35]:
new_hash = hashlib.sha256(python_program.encode('utf8'))

print(f"SHA256: {new_hash.hexdigest()}")

SHA256: 0b4bd931add7a7164314f8feb7927ebca575417f8650b1f6069ba783a3a0f9e2


**You can easily write a program that uses these functions to check hashes for you:**

In [36]:
if new_hash.hexdigest() == original_hash.hexdigest():
    print("The code has not been changed")
else:
    print("The code has been modified")

The code has been modified
