# Introduction to Hash Functions

## Built-In Hash Function

Python has a [built-in hash function](https://docs.python.org/3/library/functions.html#hash) which is internally used by [sets](https://docs.python.org/3/tutorial/datastructures.html#sets) and [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries). However this is not a secure or cryptographic hash but rather a convient function to make use of the fast speed of [HashTables](https://en.wikipedia.org/wiki/Hash_table).

In [2]:
hash("Hello World!")

3973709272401957730

The result will be an integer and it is designed in such a way that numeric objects which are equal when compared will have the same hash even though they are actually different.

In [7]:
import warnings

In [12]:
with warnings.catch_warnings(): # To avoid Syntax Warning
    print(f"Are 1 and 1.0 the same object? {1 is 1.0}")
print(f"Does 1 have the same hash as 1.0? {hash(1) == hash(1.0)}")

Are 1 and 1.0 the same object? False
Does 1 have the same hash as 1.0? True


  print(f"Are 1 and 1.0 the same object? {1 is 1.0}")


This means this function SHOULD NOT be used for any cryptographic work. Instead, the [hashlib module](https://docs.python.org/3/library/hashlib.html) should be used.

## Hashlib Module

The [hashlib module](https://docs.python.org/3/library/hashlib.html) uses the [OpenSSL](https://www.openssl.org/) library under the hood and exposes several of its cryptographic hash functions.

In [21]:
import hashlib

### Hashing Data

#### Function Call

In [33]:
hash_object = hashlib.sha256(b"Hello World!")
print(f"Bytes Digest: {hash_object.digest()}")
print(f"Hex Digest: {hash_object.hexdigest()}")

Bytes Digest: b'\x7f\x83\xb1e\x7f\xf1\xfcS\xb9-\xc1\x81H\xa1\xd6]\xfc-K\x1f\xa3\xd6w(J\xdd\xd2\x00\x12m\x90i'
Hex Digest: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069


#### Object Instantiation

In [30]:
hasher = hashlib.new('sha256')
hasher.update(b"Hello World!")
print(f"Bytes Digest: {hasher.digest()}")
print(f"Hex Digest: {hasher.hexdigest()}")

Bytes Digest: b'\x7f\x83\xb1e\x7f\xf1\xfcS\xb9-\xc1\x81H\xa1\xd6]\xfc-K\x1f\xa3\xd6w(J\xdd\xd2\x00\x12m\x90i'
Hex Digest: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069


## Appendix: The Bytes Data Type

The functions in the hashlib module requires [Bytes](https://docs.python.org/3/library/stdtypes.html#bytes-objects) objects. In the case of strings, the [encode] and [decode] methods can be used, however for custom objects, two possible ways are:

1. Converting the object to json and then the json string to bytes
1. Implementing a \_\_bytes__ method

However, usually hashes are used with plain types like strings which are easily convertible to bytes.

#### Data Conversions

In [74]:
data_string = "Hello World!"
data_bytes = data_string.encode("utf-8")
data_hex = data_bytes.hex()
data_decoded = data_bytes.decode("utf-8")
data_hex_bytes = bytes.fromhex(data_hex)

print(f"     Original String: {data_string}")
print(f"From String to Bytes: {data_bytes}")
print(f"   From Bytes to Hex: {data_hex}")
print(f"From Bytes to String: {data_decoded}")
print(f"   From Hex to Bytes: {data_hex_bytes}")

     Original String: Hello World!
From String to Bytes: b'Hello World!'
   From Bytes to Hex: 48656c6c6f20576f726c6421
From Bytes to String: Hello World!
   From Hex to Bytes: b'Hello World!'


#### Data Conversions with binascii

The [binascii module](https://docs.python.org/3/library/binascii.html) exposes two utility functions, one to convert from bytes to hex called [hexlify](https://docs.python.org/3/library/binascii.html#binascii.hexlify) and another to do the reverse conversion called [unhexlify](https://docs.python.org/3/library/binascii.html#binascii.unhexlify)

**Important note:** the hexilify function returns a bytes object whereas the .hex() method of bytes returns a string.

In [77]:
import binascii

data_string = "Hello World!"
data_bytes = data_string.encode("utf-8")
data_hex = binascii.hexlify(data_bytes)
data_hex_string = binascii.unhexlify(data_hex)

print(f"     Original String: {data_string}")
print(f"From String to Bytes: {data_bytes}")
print(f"   From Bytes to Hex: {data_hex}")
print(f"   From Hex to Bytes: {data_hex_bytes}")

     Original String: Hello World!
From String to Bytes: b'Hello World!'
   From Bytes to Hex: b'48656c6c6f20576f726c6421'
   From Hex to Bytes: b'Hello World!'


#### Example with plain strings

In [45]:
data = "Hello World!"
data_bytes = data.encode("utf-8")
data_decoded = data_bytes.decode("utf-8")
data_hashed = hashlib.sha256(data_bytes).hexdigest()
print(f"Original: {data}")
print(f" Encoded: {data_bytes}")
print(f" Decoded: {data_decoded}")
print(f"  Hashed: {data_hashed}")

Original: Hello World!
 Encoded: b'Hello World!'
 Decoded: Hello World!
  Hashed: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069


#### Example with Custom objects

In [46]:
from dataclasses import dataclass, asdict
import json

@dataclass
class Person:
    first_name: str
    last_name: str
    
    def __bytes__(self):
        dictionary_representation = asdict(self)
        json_representation = json.dumps(dictionary_representation)
        return json_representation.encode("utf-8")
    
    @classmethod
    def from_bytes(cls, bytes_object):
        string_representation = bytes_object.decode("utf-8")
        dictionary_representation = json.loads(string_representation)
        return cls(**dictionary_representation)
    
person = Person("John", "Doe")
person_bytes = bytes(person)
person_decoded = Person.from_bytes(person_bytes)
person_hashed = hashlib.sha256(person_bytes).hexdigest()

print(f"Original: {person}")
print(f" Encoded: {person_bytes}")
print(f" Decoded: {person_decoded}")
print(f"  Hashed: {person_hashed}")

Original: Person(first_name='John', last_name='Doe')
 Encoded: b'{"first_name": "John", "last_name": "Doe"}'
 Decoded: Person(first_name='John', last_name='Doe')
  Hashed: fee485b19074e0b0b2856ae5f27fcdd67ff12204cbff73c5eaa10b1aac887042
