In [None]:
import hashlib
import string
import secrets

# **Cryptographic Hash**

**A hash function maps arbitrary strings of data to a fixed length bit array. The function is deterministic and public, but the mapping should seem random. Hash functions do not have a secret key. Since there are no secrets and the function itself is public,anyone can evaluate the function.**

**The algorithms can map both alphanumeric and non-alphanumeric characters to a bit array. The bit array can be returned in bytes or into a hexadecimal format.**

<br/>

|   | Input Charachters | Output in Hexadecimal Form  |
|---|---|---|
| **MD5 Algorithm**  | "Test"  | "0cbc6611f5540bd0809a388dc95a615b"  |
| **SHA-256 Algorithm**  | "Test"  | "532eaabd9574880dbf76b9b8cc00832c20a6ec113d682299550d7a6e0f345e25"  |

<br/>

<sup>Source: Lecture - [Cryptography:Hashing from MIT OpenCourseWare](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-design-and-analysis-of-algorithms-spring-2015/lecture-notes/MIT6_046JS15_lec21.pdf)</sup>

## **Properties of Cryptographic Hashes**

- **Pre-Image Resistance: For essentially all pre-specified outputs, it is computationally infeasible to find any input which hashes to that output. This means that a hash can be computed relatively easily for a given string(s), but inverting the output to find the original string(s) is difficult.**

- **Second Pre-Image Resistance: It is computationally infeasible to find any second input which has the same output as any specified input. This means given a certain string input, it should be difficult to find another input that produces the same hash. Also known as Weak Collision Resistance.**

- **Collision Resistance: It is computationally infeasible to find any two distinct inputs which hash to the same output. This means it should be difficult to find two different strings that create the same hash.**


<br/>

<sup>Source: Lecture - [Cryptographic Hash Functions](https://www.cs.purdue.edu/homes/ssw/cs355/hash.pdf) by William R. Speirs</sup>

# **MD5 (Message Digest 5)**

**MD5 is an algorithm that uses a hash function that takes a given input and produces a 128-bit number that is 32 digits long.**

**The algorithm was developed in the 1990's and has been broken since then. It should not be used as a cybersecurity encryption tool.**

<br/>

<sup>Source: Article - [MD5 Homepage  Montana State University](https://www.cs.montana.edu/paxton/classes/ireland/presentations/alex/md5Home.html)</sup>

In [None]:
#needs to be in bytes, not unicode
'hello'.encode('UTF-8') == b'hello'

True

In [None]:
md5_hash_hex = hashlib.md5(b'hello').hexdigest()
print(f'The MD5 hexadecimal hash value for "hello" is: {md5_hash_hex}\nThe length of the hash is: {len(md5_hash_hex)} characters')

The MD5 hexadecimal hash value for "hello" is: 5d41402abc4b2a76b9719d911017c592
The length of the hash is: 32 characters


In [None]:
#can update the hash with chunks
md5_hash = hashlib.md5()

for letter in 'hello':
  md5_hash.update(letter.encode('UTF-8'))
  print(md5_hash.hexdigest())

2510c39011c5be704182423e3a695e91
6f96cfdfe5ccc627cadf24b41725caa4
46356afe55fa3cea9cbe73ad442cad47
4229d691b07b13341da53f17ab9f2416
5d41402abc4b2a76b9719d911017c592


## **Avalanche Effect**

**Within cryptography, the Avalanche Effect is when a small change to the input of an encryption function lead to a significant change to the output of the function.**

In [None]:
md5_hash_1 = hashlib.md5(b'hi').hexdigest()
md5_hash_2 = hashlib.md5(b'Hi').hexdigest()

overlap = 0
for i,j in zip(md5_hash_1, md5_hash_2):
  print(i, j)
  if i == j:
    overlap += 1

print(f'\nOut of {len(md5_hash_1)} characters, only {overlap} overlap between the two hashes.')

4 c
9 1
f a
6 5
8 2
a 9
5 8
c f
8 9
4 3
9 9
3 e
e 8
c 7
2 e
c 8
0 f
b 9
f 6
4 2
8 a
9 5
8 e
2 d
1 f
c c
2 2
1 0
f 6
c 9
3 1
b 8

Out of 32 characters, only 3 overlap between the two hashes.


## **Vulnerabilities in MD5**

**We can see in the below code that the MD5 algorithm is broken because it does not hold up to Collision Resistance. A famous Cryptography paper by by Wang Xiaoyun and Hongbo Yu shows that they were able to break Collision Resistance for MD5 with the two below strings. Even though the strings are different, they produce the same hash. Over time, cryptographers were able to find more examples of violations of Collision Resistance within the MD5 algorithm.**

In [None]:
string_1 = 'd131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70'
string_2 = 'd131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70'
print(f'Check to see if the strings are the same: {string_1 == string_2}')

#convert into binary
string_1_hex = bytearray.fromhex(string_1)
string_2_hex = bytearray.fromhex(string_2)

#this is an example of collision where MD5 fails
print(f'Using the MD5 algorithm, we see that this is a Collision and that the alogirthm fails: {hashlib.md5(string_1_hex).hexdigest() == hashlib.md5(string_2_hex).hexdigest()}')


Check to see if the strings are the same: False
Using the MD5 algorithm, we see that this is a Collision and that the alogirthm fails: True


<sup>Source: Code - [MD5 proving collision for the famous hexadecimal blocks](https://crypto.stackexchange.com/questions/41411/md5-proving-collision-for-the-famous-hexadecimal-blocks) post from Stack Overflow</sup>

<sup>Source: Journal Article - [How to Break MD5 and Other Hash Functions](https://www.researchgate.net/publication/225230142_How_to_Break_MD5_and_Other_Hash_Functions) by Wang Xiaoyun and Hongbo Yu</sup>

# **SHA-256 (Secure Hash Algorithm 256)**

**SHA-256 is an algorithm that uses a hash function that takes a given input and produces a 256-bit number that is 64 digits long.**

<br/>

<sup>Source: Website - [Decryptionary](https://decryptionary.com/dictionary/secure-hash-algorithm-256/)</sup>

In [None]:
md5_hash = hashlib.sha256(b'hello').hexdigest()
print(f'The SHA-256 hexadecimal hash value for "hello" is: {md5_hash}\nThe length of the hash is: {len(md5_hash)} characters')

The SHA-256 hexadecimal hash value for "hello" is: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
The length of the hash is: 64 characters


In [None]:
string_1 = 'd131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70'
string_2 = 'd131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70'
print(f'Check to see if the strings are the same: {string_1 == string_2}')

#convert into binary
string_1_hex = bytearray.fromhex(string_1)
string_2_hex = bytearray.fromhex(string_2)

#this is an example of collision where MD5 fails
print(f'SHA-256 avoids a Collision where MD5 failed: {hashlib.sha256(string_1_hex).hexdigest() == hashlib.sha256(string_2_hex).hexdigest()}')

# **Password Storage and Hashing**
**When you create and save a password for a website, most of the time the password is saved in a hash rather than its original form. The reason that the hash of a password rather than the password itself is saved is for security. If a hacker is able to get into a system, but they only have the hashes of the password, then they will have a much harder time getting access to accounts because they don't know the password.**

## **Dictionary Attack**

**A dictionary attack is a type of brute force attack where an attacker tries to access an account by iterating through a dictionary of common phrases and words. The size of the dictionaries can vary from hundreds of thousands of password variations to billions. A dictionary containing over 1 billion unique words takes up only 15 gigabytes.**

<br/>

<sup>Source: Website - [CrackStation](https://crackstation.net/crackstation-wordlist-password-cracking-dictionary.htm)</sup>

In [None]:
#dictionary that we will use to try to break a password
data_dict_attack = {'12345':hashlib.sha256(b'12345').hexdigest(), 'football':hashlib.sha256(b'football').hexdigest(),
                    '123456':hashlib.sha256(b'123456').hexdigest(), 'test1':hashlib.sha256(b'test1').hexdigest(),
                    'password':hashlib.sha256(b'password').hexdigest(), 'asdf':hashlib.sha256(b'asdf').hexdigest(),
                     '123456789':hashlib.sha256(b'123456789').hexdigest(), 'iloveyou':hashlib.sha256(b'iloveyou').hexdigest(),
                    'qwerty':hashlib.sha256(b'qwerty').hexdigest(), 'basketball':hashlib.sha256(b'basketball').hexdigest()}

In [None]:
intercepted_hash = '6382deaf1f5dc6e792b76db4a4a7bf2ba468884e000b25e7928e621e27fb23cb'

#we were able to crack the password
for password, hash in data_dict_attack.items():
  if hash == intercepted_hash:
    print(password)
  else:
    print('No password found')

## **Salt Value**

**Salt Values are randomly generated characters that are appended to a password in order to make it more secure.**

In [None]:
#prompt user to set their password
password = input('Set your password\n')

#set up list of alphanumeric characters that can be used for the salt value
char_list = string.printable[0:62]

#create salt from random values
salt = ''
while len(salt) < 15:
  salt += secrets.choice(char_list)
#concatenate the original password and the salt
password = password + salt

print(f'The salt is: {salt}\nThe new password is: {password}')
password = hashlib.sha256(password.encode('UTF-8'))
print(password.hexdigest())

### **Dictionary Attack on Salt Adjusted Password**

In [None]:
#no luck on cracking the hash adjusted with a salt value 
intercepted_salt_hash = '4aa66103fe6a334268439e70dab79f9e'

for password, hash in data_dict_attack.items():
  if hash == intercepted_salt_hash:
    print(password)
  else:
    print('No password found')

# **Other Cryptographic Hashing Applications**

- **Digital Signatures**
- **Data Integrity**
- **Cryptocurrency (Proof of work)**

# **References and Additional Learning**

## **Passwords to avoid at all costs!**

- **[Ranked: The World’s Top 100 Worst Passwords](https://www.forbes.com/sites/daveywinder/2019/12/14/ranked-the-worlds-100-worst-passwords/?sh=301d78469b41) by Davey Winder**

## **Online Courses**

- **[Master Modern Security and Cryptography by Coding in Python](https://www.udemy.com/course/learn-modern-security-and-cryptography-by-coding-in-python/), Udemy course by Rune Thomsen**

## **Textbooks**
- **[Implementing Cryptography Using Python](https://www.amazon.com/Implementing-Cryptography-Using-Python-Shannon/dp/1119612209/ref=sr_1_1?dchild=1&keywords=Implementing+Cryptography+Using+Python&qid=1609360861&s=books&sr=1-1) by Shannon Bray**
- **[Practical Cryptography in Python: Learning Correct Cryptography by Example](https://www.amazon.com/Practical-Cryptography-Python-Learning-Correct/dp/1484248996/ref=sr_1_1?crid=1GKREMIFL2A0Y&dchild=1&keywords=practical+cryptography+in+python&qid=1609360771&s=books&sprefix=Practical+Cryptography+in+Python%2Cstripbooks%2C134&sr=1-1) by  Seth James Nielson and Christopher Monson**
- **[Black Hat Python](https://www.amazon.com/Black-Hat-Python-Programming-Pentesters/dp/1593275900) by Justin Seitz**

## **Websites**
- **[hashlib Documentation](https://docs.python.org/3/library/hashlib.html)**
- **[Hashes.com](https://hashes.com/en/decrypt/hash) - Check the strings behind hashes**
- **[Bits, Bytes, Hexadecimals, and ASCII](http://www.c-jump.com/bcc/common/Talk2/Cxx/BitByteHexASCII/BitByteHexASCII.html) - How information is stored digitally**

## **Podcasts**

- **Malicious Life Episode 30: [The Ashley Madison Hack Part 1](https://malicious.life/episode/episode-30/) with Ran Levi talks about how the company saved passwords as MD5 hashes which lead to their site being compromised**

# **Connect**
- **Join [TUDev](https://docs.google.com/forms/d/e/1FAIpQLSdsJbBbza_HsqhGM_5YjaSo-XnWug2KNCXv9CYQcXW4qtCQsw/viewform) and check out our [website](https://tudev.org/)!**

- **Feel free to connect with Adrian on [YouTube](https://www.youtube.com/channel/UCPuDxI3xb_ryUUMfkm0jsRA), [LinkedIn](https://www.linkedin.com/in/adrian-dolinay-frm-96a289106/), [Twitter](https://twitter.com/DolinayG) and [GitHub](https://github.com/ad17171717). Happy coding!**