# Hashing for integrity

## 1. Introduction

### 1.1. Abstract

A hash function maps bit-strings of arbitrary finite length to strings of fixed length. Calculation have to be easy independently of the size of the input. The hash-value serves as a compact representative image of an input (called message digest). Hash functions are used for data integrity in conjunction with digital signature sch-
emes (Van Oorschot, Paul C., Alfred J. Menezes, and Scott A. Vanstone. Handbook of applied cryptography. CRC press, 1996). <br>
This notebook will introduce you to the basic ideas of the practical use of some popular hash algorithms.

### 1.2. Algorithms used

<b>MD5: </b>It is a cryptographic hash function that produce a 128-bit hash value. It was broken but still widely used.<br>
<b>SHA-2 (Secure Hash Algorithm 2): </b> Is a set of cryptographic hash functions designed by the NSA. Consists of six hash functions with digests (hash values) that are 224, 256, 384 or 512 bits: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256. <br>
In this notebook, we use: MD5, SHA-256 and SHA-384

<i>Source and Image: Wikipedia. Link: </i> https://en.wikipedia.org/wiki/Cryptographic_hash_function#/media/File:Cryptographic_Hash_Function.svg

### 1.3. In this notebook

First a basic example of hashing a string. Continues with hashing of a file. It finish with a comparison of performance of hashing files with different size using mentioned algorithms.

## 2. Hashing

### 2.1. A simple example of hashing a String

#### Libraries

In [ ]:
import os
from os import system
import pandas as pd
import datetime as dt
import hashlib
from simple_file_checksum import get_checksum

#### Hashing a String with the different algorithms

In [ ]:
theOriginalString='Hello Hash!'
print('The Original String: ',theOriginalString)
hashMD5Original=hashlib.md5(theOriginalString.encode()).hexdigest()
print('MD5   : ',hashMD5Original)
hashSHA256Original=hashlib.sha256(theOriginalString.encode()).hexdigest()
print('SHA256: ',hashSHA256Original)
hashSHA384Original=hashlib.sha384(theOriginalString.encode()).hexdigest()
print('SHA384: ',hashSHA384Original)

#### Hashing a String modified with the different algorithms

In [ ]:
theModifiedString='Hello Hash!!'
print('The Modified String: ',theModifiedString)
hashMD5Modified=hashlib.md5(theModifiedString.encode()).hexdigest()
print('MD5 Original  : ',hashMD5Original)
print('MD5 Modified  : ',hashMD5Modified)
hashSHA256Modified=hashlib.sha256(theModifiedString.encode()).hexdigest()
print('SHA256 Original  : ',hashSHA256Original)
print('SHA256 Modified  : ',hashSHA256Modified)
hashSHA384Modified=hashlib.sha384(theModifiedString.encode()).hexdigest()
print('SHA384 Original  : ',hashSHA384Original)
print('SHA384 Modified  : ',hashSHA384Modified)

### 2.2. A simple example of hashing a file

In [ ]:
originalFile='samples/test1B.txt' # Original: A file just containing a "1"
notOriginalFile='samples/test1BModified.txt' # Modified: original file changed "1" by "2"
originalHashed=get_checksum(originalFile, algorithm='MD5')
print('Hash of original file: ',originalHashed)
notOriginalHashed=get_checksum(notOriginalFile, algorithm='MD5')
print('Hash of file modified: ',notOriginalHashed)

### 2.3. Comparison of performance of different algorithms

In [ ]:
# Init
size=[]
timeMD5=[]
timeSHA256=[]
timeSHA384=[]
df=pd.DataFrame()
algorithms=['MD5','sha256','sha384']
folder="samples"
files=os.listdir(folder)
files.sort()
# Main
# 1. Size:
for i in range(len(files)):
    archi=str(folder)+"/"+str(files[i])
    size.append(os.stat(archi).st_size)
# 2. Times of obtaining hash:
# Function
def hashing(archi,algorithm):
    init=dt.datetime.now()
    hashed=get_checksum(archi, algorithm)
    end=dt.datetime.now()
    delta=end-init
    return delta
# Main Process
for i in range(len(files)):
    archi=str(folder)+"/"+str(files[i])
    timeMD5.append(str(hashing(archi,'MD5'))[6:])
    timeSHA256.append(str(hashing(archi,'sha256'))[6:])
    timeSHA384.append(str(hashing(archi,'sha384'))[6:])
df['size(bytes)']=size
df['timeMD5']=timeMD5
df['timeSHA256']=timeSHA256
df['timeSHA384']=timeSHA384
print(df)
# For downloading df as a csv file uncoment next line
#df.to_csv("reportHash.csv", index = False)

# RSA encryption, decryption, and digital signing

## Step No.1

### Import Cryptography Libraries

In [1]:
# Importing the default_backend function from the cryptography.hazmat.backends module for cryptographic operations
from cryptography.hazmat.backends import default_backend

# Importing the RSA module from cryptography.hazmat.primitives.asymmetric for RSA asymmetric encryption and key generation
from cryptography.hazmat.primitives.asymmetric import rsa

# Importing the serialization module from cryptography.hazmat.primitives for serializing and deserializing keys
from cryptography.hazmat.primitives import serialization


The code snippet is for setting up essential components from the `cryptography` library in Python, specifically for working with cryptographic keys using RSA, a widely used algorithm for public-key cryptography. The goal of this code can be broken down based on each import statement:

1. **Importing the Default Backend**:
   - `from cryptography.hazmat.backends import default_backend`
   - The `default_backend` function refers to the default cryptographic backend that provides cryptographic algorithm implementations. It is used in various cryptographic operations such as encryption, decryption, and key generation. The backend abstracts the implementation details of these cryptographic algorithms.

2. **Importing RSA for Asymmetric Cryptography**:
   - `from cryptography.hazmat.primitives.asymmetric import rsa`
   - This line imports the RSA module from the `cryptography.hazmat.primitives.asymmetric` package. RSA (Rivest-Shamir-Adleman) is one of the first public-key cryptosystems widely used for secure data transmission. Importing RSA allows the generalisation of private and public keys, encrypting data with the public key, and decrypting it with the private key, among other things.

3. **Importing Serialization Tools**:
   - `from cryptography.hazmat.primitives import serialisationserialisation.`
   - This line imports the serialisation module for serialising keys in various formats. Serialisation is converting a data structure or object into a format easily stored or transmitted (like PEM or DER formats) and then reconstructing it later. In cryptographic operations, we often need to serialise keys for storage or transmit them over a network.


> In summary, this code aims to import necessary functionalities from the `cryptography` library for performing RSA-based cryptographic operations, including key generation, encryption/decryption, and key serialisation/deserialisation. This forms the foundation for implementing secure communication or data storage systems.

### Generate an RSA Private Key

In [2]:
private_key = rsa.generate_private_key(
    public_exponent=65537,
    key_size=2048,
    backend=default_backend()
)


- rsa.generate_private_key is a function to generate a private key for RSA encryption.

- The public_exponent parameter is set to 65537, a common choice for RSA, a prime number that balances security and performance.

- key_size=2048 specifies the size of the key. A key size of 2048 bits is generally considered secure and is a common choice.

- backend=default_backend() specifies the backend cryptographic provider.


### Print the Private Key in PEM Format

In [3]:
print(private_key.private_bytes(
    encoding=serialization.Encoding.PEM,
    format=serialization.PrivateFormat.TraditionalOpenSSL,
    encryption_algorithm=serialization.NoEncryption(),
))


b'-----BEGIN RSA PRIVATE KEY-----\nMIIEpAIBAAKCAQEAuyYZlZy0t4XitJfX2cnJE97/7F0WuluHh/VW8yN35Uti4cz1\nUIkhzgaUGOfewylgWMzH17Acd2LHL6R8uDBUXZARmZshLtgw/v9aisWKt7sie2Mb\nPBjXsMqW+QKVWNKWdlBG+Y8gQ4477UGnZ6VFE8fPALYYRHF30lLpvI52t7Yaxaqc\ne2S1nbRkW5xHfq0Ra/GrxXw6ax4qFfpsTTuVNAPtHdXs/lPMxksO460TOVUFVP1H\ndNn6o81Qx0pkRtfxogNNXHwsyTUhq+Y7rIBcKAYxSP71PmQUSLsBKkYrNxy7YMfu\nnjqxfjXXtDUbTeKsFA4HyB3PH7pQvL80Bt0NDwIDAQABAoIBAFLI0bxqq04bPWNh\nX6wJJJdTp6Wor+sTneo4TpQS9nBJXp4/iaxsXLXEFzLFLrbp0KK3QxdX4d+1pCKh\nAkJ/rnIMzpxCEPWl0FacIjMMmwYXE3O9LUjyPEcJ9qqDyAiYbtI7RIoUE9OOUVfs\nGN8yLlJHqnvIEQgFoVk6MAamhkFQ3K+qBFLq4wu0pfbFWvyUkC/L5McWcSVbZ3Cb\nVOzYgROmkvPcYhKvTazWdCflQ+gud0i6WO6jqH0MyqXc4BAwscAgJP6kkxLBpvD1\nrxe4X7AP06Q6KuT7yQmqPJ5gasT5H9bcuCT1v87ACJvgClC6t5GctN8KrMkTbazI\nzLsOMqECgYEA5sgA5Ql1x9oqkhyO+XqcWEiZc7Da5q/+5QQE/EYxR+XULqffUP/i\nrwToaf23+JaSmD5Mf3mYWASmKsbir4hk8ynuQ4wao/6gvHRIQlPlgyoK1qqAdBIl\nLmUVqwP800MHJQGuRXSr35lKbPBIlTpmmPUM5BbMikO2G6SfLopqaVcCgYEAz5l/\n7h22tpo7O0hvpRGdkKM6pieR6B9l31vdQ3Lrlvgbz

- This section converts the private key into a byte format and prints it.

- The encoding parameter is set to serialization.Encoding.PEM, which means the key is encoded in the PEM (Privacy Enhanced Mail) format, a standard textual format for storing keys.

- format=serialization.PrivateFormat.TraditionalOpenSSL indicates the traditional format used by OpenSSL for private keys.

- encryption_algorithm=serialization.NoEncryption() means the private key is not encrypted and will be output in plain text. It is crucial to handle and store such keys securely.


### Deriving the Public Key from the Private Key

In [4]:
public_key = private_key.public_key()


- This line derives the public key associated with the generated private key.

- In RSA, the public and private keys are mathematically linked. The public key can be safely shared and is used to encrypt data that only the corresponding private key can decrypt.


> In summary, this code generates an RSA private key, prints it in a readable (PEM) format, and then derives the corresponding public key. This is a fundamental process in asymmetric cryptography, where the private key is kept secret, and the public key is shared for encryption purposes.

### Save the RSA Private key in PEM format

In [5]:
 with open("Vahid_private_key.pem", "wb") as f:
    f.write(private_key.private_bytes(
        encoding=serialization.Encoding.PEM,
        format=serialization.PrivateFormat.TraditionalOpenSSL,
        encryption_algorithm=serialization.NoEncryption(),
    ))


- This block of code opens a file named "Vahid_private_key.pem" in binary write mode ("wb").

- It then writes the private key to this file in PEM format. The key is not encrypted (serialization.NoEncryption()), which means it will be stored in plain text.

- The PEM format is widely used for storing cryptographic keys and certificates. It is a Base64 encoded version of the key with specific header and footer lines.


### Print the RSA Public Key in PEM Format

In [6]:
print(public_key.public_bytes(
    encoding=serialization.Encoding.PEM,
    format=serialization.PublicFormat.SubjectPublicKeyInfo,
))


b'-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuyYZlZy0t4XitJfX2cnJ\nE97/7F0WuluHh/VW8yN35Uti4cz1UIkhzgaUGOfewylgWMzH17Acd2LHL6R8uDBU\nXZARmZshLtgw/v9aisWKt7sie2MbPBjXsMqW+QKVWNKWdlBG+Y8gQ4477UGnZ6VF\nE8fPALYYRHF30lLpvI52t7Yaxaqce2S1nbRkW5xHfq0Ra/GrxXw6ax4qFfpsTTuV\nNAPtHdXs/lPMxksO460TOVUFVP1HdNn6o81Qx0pkRtfxogNNXHwsyTUhq+Y7rIBc\nKAYxSP71PmQUSLsBKkYrNxy7YMfunjqxfjXXtDUbTeKsFA4HyB3PH7pQvL80Bt0N\nDwIDAQAB\n-----END PUBLIC KEY-----\n'


- This part of the code prints the public key in PEM format.

- The public_bytes method is used to serialize the public key.

- The encoding is set to serialization.Encoding.PEM, indicating that the output should be in PEM format.

- The format is set to serialization.PublicFormat.SubjectPublicKeyInfo, which is a standard format for public keys.


> In summary, this code securely stores the private key in a file named "Vahid_private_key.pem" and outputs the associated public key to the console. The private key is saved in a non-encrypted format, so it's important to ensure that this file is stored securely and access to it is controlled. The public key, which can be shared, is displayed in a format that can be easily used for encryption or for sharing with others who need to verify signatures or encrypt data intended for the private key holder.

### Save the Public key in PEM format

In [7]:
# Open a file named "Vahid_public_key.pem" in write-binary mode
with open("Vahid_public_key.pem", "wb") as f:
    # Write the public key to this file in PEM format
    f.write(public_key.public_bytes(
        encoding=serialization.Encoding.PEM,
        format=serialization.PublicFormat.SubjectPublicKeyInfo,
    ))


1. Opening the File:
   - with open("Vahid_public_key.pem", "wb") as f:
   
   - This line opens a file named Vahid_public_key.pem in write-binary ("wb") mode. Using with ensures that the file is properly closed after its suite finishes, even if an error is raised. This is a best practice for file handling in Python.

2. Writing the Public Key in PEM Format:
   - f.write(public_key.public_bytes(...))
   
   - The public_bytes method of the public_key object is used to serialize the public key.
   
   - encoding=serialization.Encoding.PEM specifies that the serialization should be in PEM format, a widely used textual format for representing cryptographic keys.
   
   - format=serialization.PublicFormat.SubjectPublicKeyInfo specifies the format for the serialized public key. SubjectPublicKeyInfo is a standard format for public keys within X.509 certificates and is generally used for RSA public keys.

> In summary, this code saves the RSA public key to a file named "Vahid_public_key.pem" in PEM format. The public key can be shared with others for encrypting data or verifying digital signatures, and storing it in a file makes it easier to distribute or use in applications that require public key encryption.

## Step No.2

### Import Cryptography Libraries

This set of import statements from the cryptography library in Python is designed for handling various aspects of cryptographic operations, particularly in the context of asymmetric encryption and key management.

#### Importing the Default Backend

In [8]:
from cryptography.hazmat.backends import default_backend


This line imports default_backend, a function that returns a default provider of cryptographic algorithms and methods. As an argument in various cryptographic operations, it is often required to specify the backend cryptographic library.

#### Importing Padding for Asymmetric Encryption

In [9]:
from cryptography.hazmat.primitives.asymmetric import padding


This imports the padding module, which contains padding algorithms. In asymmetric encryption, like RSA, padding is crucial for security and proper encryption functioning. It includes padding schemes such as PKCS#1 and OAEP, which are standards for encrypting data using RSA.

#### Importing Hash Algorithms


In [10]:
from cryptography.hazmat.primitives import hashes


This statement imports the hashes module, providing various hash algorithms such as SHA-256, SHA-3, and others. Hash functions are essential in cryptography for creating a fixed-size hash value from data, commonly used in digital signatures and data integrity checks.

#### Importing Functions for Loading PEM-formatted Keys

In [11]:
from cryptography.hazmat.primitives.serialization import load_pem_private_key
from cryptography.hazmat.primitives.serialization import load_pem_public_key


These functions load private and public keys from PEM (Privacy-Enhanced Mail) formatted data. load_pem_private_key is for deserializing a private key from a PEM file or string, while load_pem_public_key is for a public key. PEM format is widely used for representing encoded keys and certificates.

#### Importing the Serialization Module

In [12]:
from cryptography.hazmat.primitives import serialization


This import brings in the serialization module, which includes tools for serializing and deserializing cryptographic objects like keys. It supports various formats (like PEM and DER) and functionalities, such as converting keys to a format suitable for storage or transmission and vice versa.

> These imports provide a comprehensive toolkit for handling asymmetric encryption (like RSA), including generating and loading keys, applying padding schemes, utilizing hash functions, and serializing/deserializing cryptographic objects. This setup is essential for encrypting/decrypting data, signing messages, and verifying signatures securely.

### Defining the Plaintext Message

In [14]:
plaintextMessage = b'Hello Andy class .'


This line creates a byte string plaintextMessage with the content "Hello Andy class.". In Python, the b prefix before the quotes indicates that the content is a byte string, the required format for the encryption function.

### Loading the Public Key

In [15]:
VahidPubKey = load_pem_public_key(open('Vahid_public_key.pem', 'rb').read(), default_backend())


- Here, the load_pem_public_key function loads a public key from a file named 'Vahid_public_key.pem'.

- The file is opened in binary read mode ('rb'), and its contents are read and passed to the load_pem_public_key function.

- default_backend() specifies the cryptographic backend that implements the cryptographic algorithms.


### Encrypting the Message

In [16]:
ciphertext = VahidPubKey.encrypt(
    plaintextMessage,
    padding.OAEP(
        mgf=padding.MGF1(algorithm=hashes.SHA256()),
        algorithm=hashes.SHA256(),
        label=None
    )
)


- This section encrypts the plaintextMessage using the public key VahidPubKey.

- The encryption uses OAEP (Optimal Asymmetric Encryption Padding) with MGF1 (a mask generation function) and SHA256 as the hashing algorithm. OAEP is a padding scheme recommended for secure RSA encryption.

- The label is set to None, which is typical for most applications.


### Print the Ciphertext in Hexadecimal Format

In [17]:
print(ciphertext.hex())


8147862dc4df21d058f6e2e4cf363edf08a17ae166b6c51578b9ad98d18dce14219118fb89bd89acee98fb05166d29b5fc44107abf6a8feef9e4bc66d3e38e0c2b82225e9bc318a9cb8c001128e19e175a292c694559924725a0f84edce1a913c0b440f385d58013c560bfe624481c63d03f468ffe5fbf9cced0523c0b6f9ca4e44952353173cd432b84d97002ff39d490be2719d6e64395fe8f56d045db44ab5a1f6839f998e8f3edffbf59f9bb4cc86f06564dd7ca98c5a4e70a7f12974f572063a30f0a57f813ae07ca3a776e6f9701e7ca14c9a725a879b77b96b3a35cf8153cd00030d319cd6ec9b8a4ed8d3971369ba292c875df84a811a78d6404338a


In [ ]:
Finally, the encrypted message (ciphertext) is converted to a hexadecimal string using .hex() and printed. This representation is often used for easier readability and storage of binary data.

> In summary, this code encrypts a plaintext message, "Hello class." using RSA public key encryption with OAEP padding and SHA256 hashing and then prints the encrypted message in a hexadecimal format. The public key is loaded from a file, a common practice for securely handling keys. The resulting ciphertext can only be decrypted by someone with the corresponding private key.