<a href="https://colab.research.google.com/github/ad17171717/YouTube-Tutorials/blob/main/Cybersecurity%20/File%20Integrity%20with%20Hashing/Cybersecurity_with_Python!_Checking_File_Integrity_with_Hashing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
import hashlib
hash_object = hashlib.sha256()
input_string = "Hello"
encoded_string = input_string.encode('utf-8')
hash_object.update(encoded_string)
hex_digest = hash_object.hexdigest()
print(hex_digest)

185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969


# **Hashing**

**A hashing algorithm converts a plaintext object such as a number or string into a hash value. A hashing algorithm uses a hash function to convert a variable sized input to a fixed sized output.**

**For example we can take the string "hello" and apply the SHA256 hashing algorithm which outputs a 256 bit hexadecimal string:"2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
". If we put in an upper case "H" for hello, this signficantly changes the hash to "185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969".**

**There are numerous hashing algorithms. The table below lists well known hashing algorithms that are available in the `hashlib` module which is part of Python's standard library.**

**The digest size refers to the output size of a hashing algorithm, usually represented in bits. For example the SHA-256 hash output is always 256 bits (32 bytes), regardless of the input size. Internally, SHA-256 processes data in 512-bit blocks, applying a series of transformations. The block size refers to the size of chunks the hashing function processes at a time, before applying compression functions in its internal rounds. If the data is larger than the block size, it is split into multiple parts before being hashed. If it is smaller than the algorithm will pad the data with binary numbers (0 and 1s) until it is the same size as the block size. The SHA256 Algorithm has a block size of 512 bits. The final column indicates if there is a known collision for an algorithm. A collision is when two different inputs generate the same output for a hashing algorithm. Algorithms with known collisions are insecure because digital signatures can be faked with an insecure algorithm.**

<sup>Source: [Computer Science 161: Integrity, Hashes & "Random" Numbers](https://inst.eecs.berkeley.edu/~cs161/fa17/lectures/lec6_crypto_integrity.pdf) by Nicholas Weaver from inst.eecs.berkeley.edu</sup>

<sup>Source: [Secure Hash Standard](https://cis.temple.edu/~giorgio/cis307/readings/sha1.html) by Giorgio Ingargiola from cis.temple.edu</sup>

| Algorithm Name | Digest Size (in bits) | Block Size (in bits) | Known Collision |
|---------------|----------------------|---------------------|----------------|
| SHA-1        | 160                  | 512                 | True           |
| SHA-256      | 256                  | 512                 | False          |
| SHA-512      | 512                  | 1024                | False          |
| MD5          | 128                  | 512                 | True           |

## **File Integrity with Hashing**

**Hashing is used to verify the digital integrity of a file. A hashing algorithm has a property called the Avalanche Effect where a small change to the input will create a significantly different output. When a file is first created it can be hashed and the output saved. If there is a suspected change to the file it can be hashed again to check if there is a difference between the initial hash output and current hash output.**

<sup>Source: [Cryptographic Hash Functions](https://ics.uci.edu/~alfchen/teaching/cs134-2019-Fall/slides/LEC5-134.pdf) by Alfred Chen from ics.uci.edu</sup>

### **Example: Checking the File Integrity of a PDF Contract**

**A contract has been signed between two parties. After the contract has been signed a hash value is generated of the file using Python. After the file has been signed a malicious actor makes a change to the contract. If we are suspicious that the file has been tampered with then we can take the current hash of the file and compare it to the initial hash of the file.**

### **Python Function to Hash a File**

In [None]:
def hash_file(file_path: str, algorithm: str='sha256', chunk_size: int=8192) -> str:
  hasher = hashlib.new(algorithm)
  with open(file_path, 'rb') as f:
    for chunk in iter(lambda: f.read(chunk_size), b''):
      hasher.update(chunk)
  hash_output = hasher.hexdigest()

  return hash_output

### **Retrieve the Hashes of the Files**

In [None]:
fp_initial = 'Contract.pdf'
initial_hash = hash_file(fp_initial)
print(f'The initial SHA256 hash of the file {fp_initial} is\n{initial_hash}')

The initial SHA256 hash of the file Contract.pdf is
d0b4461e42704e4eec8aec655939a2f629711c1f5b8830107ca1b5f7180b94bf


In [None]:
fp_second = 'Contract - Copy.pdf'
second_hash = hash_file(fp_second)
print(f'The second SHA256 hash of the file {fp_initial} is\n{second_hash}')

The second SHA256 hash of the file Contract.pdf is
fa6b2da9dc3d28ca4c78943ce97cef5e2a372d1b2e401c78e2bc8a7f52e377ec


### **Compare the Hashes of the Files**

In [None]:
if initial_hash == second_hash:
  print('The hashes matches! We can confirm the digital integrity of the file.')
else:
  print('The hashes do not match... We cannot confirm the digital integrity of the file.')

The hashes do not match... We cannot confirm the digital integrity of the file.


# **References and Additional Learning**

## **Classes**

- **[Computer Science 161: Integrity, Hashes & "Random" Numbers](https://inst.eecs.berkeley.edu/~cs161/fa17/lectures/lec6_crypto_integrity.pdf) by Nicholas Weaver from inst.eecs.berkeley.edu**

- **[Cryptographic Hash Functions](https://ics.uci.edu/~alfchen/teaching/cs134-2019-Fall/slides/LEC5-134.pdf) by Alfred Chen from ics.uci.edu**

- **[Secure Hash Standard](https://cis.temple.edu/~giorgio/cis307/readings/sha1.html) by Giorgio Ingargiola from cis.temple.edu**

# **Connect**
- **Feel free to connect with Adrian on [YouTube](https://www.youtube.com/channel/UCPuDxI3xb_ryUUMfkm0jsRA), [LinkedIn](https://www.linkedin.com/in/adrian-dolinay-frm-96a289106/), [X](https://twitter.com/DolinayG), [GitHub](https://github.com/ad17171717), [Medium](https://adriandolinay.medium.com/) and [Odysee](https://odysee.com/@adriandolinay:0). Happy coding!**

# **Podcast**

- **Check out Adrian's Podcast, The Aspiring STEM Geek on [YouTube](https://www.youtube.com/@AdrianDolinay/podcasts), [Spotify](https://open.spotify.com/show/60dPNJbDPaPw7ru8g5btxV), [Apple Podcasts](https://podcasts.apple.com/us/podcast/the-aspiring-stem-geek/id1765996824), [Audible](https://www.audible.com/podcast/The-Aspiring-STEM-Geek/B0DC73S9SN?eac_link=MCFKvkxuqKYU&ref=web_search_eac_asin_1&eac_selected_type=asin&eac_selected=B0DC73S9SN&qid=IrZ84nGqvz&eac_id=141-8769271-5781515_IrZ84nGqvz&sr=1-1) and [iHeart Radio](https://www.iheart.com/podcast/269-the-aspiring-stem-geek-202676097/)!**