## HW2
### Juan Francisco Cisneros, Randall Mencías, Christian Santamaria

# Part 1: Research on Cryptographic Hash Functions

## 1. Fundamentals Research Questions:

### What are cryptographic hash functions?

Mathematical methods known as cryptographic hash functions take an input, or "message," and output a fixed-length string of bytes. This output, which is frequently called the "hash", is specific to the input, therefore any modification to the input will result in a different hash. Hash functions are one-way and cannot be undone to recover the original input, in contrast to encryption.

### What are the main properties of cryptographic hash functions?

1.	Deterministic: The same input will always produce the same hash.
2.	Fixed Output Size: Regardless of the input size, the output will always be a fixed length.
3.	Efficiency: The function should be fast to compute, even for large inputs.
4.	Pre-image resistance: It should be computationally infeasible to determine the original input based on the hash output.
5.	Second pre-image resistance: Given an input and its hash, it should be hard to find a different input that has the same hash.
6.	Collision resistance: It should be computationally infeasible to find two different inputs that result in the same hash output.

    
### Pre-image resistance

This characteristic guarantees that it is challenging to identify any input that would result in a hash output. Stated otherwise, it should be impossible to decipher the hash and uncover the original message. Since a hash by itself doesn't provide any useful information about the input, this is essential for safeguarding sensitive data.

### Second pre-image resistance

Second, pre-image resistance makes it difficult to locate another input that generates the same hash if you already know the input and its hash. This feature makes sure that even comparable inputs produce distinct hashes, thwarting deliberate attempts to fabricate data.

### Collision resistance

It is impossible to find two different inputs that hash to the same output due to collision resistance. Data security and integrity could be at danger if a hash function is prone to collisions, which would allow attackers to replace one piece of data with another undetected.


### How do these properties contribute to security?

Because any alteration to the data will produce a different hash, these characteristics guarantee that data integrity is maintained, which greatly enhances the security of cryptographic systems. By maintaining authenticity, attackers are prevented from using the same hash to forge data. Additionally, by making it impossible to recover the original data from the hash, the confidentiality is improved.


https://www.cs.columbia.edu/~smb/classes/f23/Files/hash.pdf

https://crypto.stanford.edu/~mironov/papers/hash_survey.pdf

https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/6741d65be662edac7c169c4081b3bd9a_MIT6_046JS15_lec21.pdf



## 2: Common Hash Functions Research Questions

### *What are the most commonly used hash functions?*
#### *MD5 (Message Digest Algorithm 5)*

It's an older cryptographic hash function that produces a 128-bit hash value. While once widely used, it is no longer considered secure due to vulnerabilities and collisions that have been discovered. It's primarily used for historical purposes or in legacy systems.
   - **Algorithm Structure**: MD5 takes an input message of any length, pads it to a multiple of 512 bits, and outputs a fixed 128-bit hash.
   - **Padding and Pre-processing**:
     - The input message is padded to 448 bits modulo 512 (so it’s 64 bits shy of a multiple of 512), then a 64-bit representation of the original message length is appended.
     - This ensures the total message length is a multiple of 512 bits.
   - **Initialization**:
     - MD5 initializes four 32-bit variables (A, B, C, D) with specific constants.
   - **Processing**:
     - The algorithm processes each 512-bit block through **64 operations**, divided into four rounds. In each round, MD5 applies a nonlinear function that mixes the bits of the message block and the variables.
     - These functions involve logical operations (AND, OR, XOR, NOT), and each step uses one of 64 predetermined constants.
     - The output from each step is added to the current block value using modular addition.
   - **Output**:
     - After processing all blocks, the final values of A, B, C, and D are concatenated to produce the 128-bit hash.
   - **Example**: Hashing "Hello, World!" with MD5 yields `fc3ff98e8c6a0d3087d515c0473f8677`.

#### *SHA-1 (Secure Hash Algorithm 1)*
It is another older cryptographic hash function, producing a 160-bit hash value. Though more secure than MD5, it too has been compromised and is not recommended for new applications.
   - **Algorithm Structure**: SHA-1 processes a message in 512-bit chunks and produces a 160-bit hash output.
   - **Padding and Pre-processing**:
     - Similar to MD5, SHA-1 pads the input so it becomes a multiple of 512 bits, adding a single '1' bit, followed by '0' bits, and finally a 64-bit representation of the original length.
   - **Initialization**:
     - SHA-1 initializes five 32-bit variables with fixed constants (A, B, C, D, E).
   - **Processing**:
     - SHA-1 expands each 512-bit block into 80 32-bit words using a **message schedule**.
     - The algorithm has **80 rounds**, and each round applies a bitwise function (AND, OR, XOR, NOT) and a bitwise rotation.
     - SHA-1 uses four nonlinear functions, each applied over 20 rounds, and each round uses one of 80 fixed constants.
     - The values of A, B, C, D, and E are updated and transformed based on the current 32-bit word.
   - **Output**:
     - The final hash is the concatenation of A, B, C, D, and E.
   - **Example**: Hashing "Hello, World!" with SHA-1 produces `943a702d06f34599aee1f8da8ef9f7296031d699`.

#### *SHA-2 Family*
A suite of cryptographic hash functions, including SHA-224, SHA-256, SHA-384, and SHA-512. These functions offer varying levels of security and hash output lengths. They are widely used in various security protocols and applications and are generally considered secure, although specific vulnerabilities may emerge over time.
   - **Algorithm Structure**: SHA-2 has multiple variants that output hashes of different sizes (224, 256, 384, and 512 bits), with SHA-256 and SHA-512 being the most common.
   - **Padding and Pre-processing**:
     - Like SHA-1, SHA-2 pads the input message to a multiple of the block size (512 or 1024 bits, depending on the variant).
   - **Initialization**:
     - Each SHA-2 variant initializes eight variables, with SHA-256 using 32-bit words and SHA-512 using 64-bit words. These variables are set to predefined constants.
   - **Processing**:
     - SHA-2 expands each message block into 64 words for SHA-256 and 80 for SHA-512 through a message schedule.
     - Each word is processed through a **compression function** involving bitwise operations (AND, OR, XOR) and modular addition. SHA-2 uses unique constants for each round.
     - In each round, SHA-2 applies two logical functions (`Σ` and `σ`) to mix the variables. `Σ` and `σ` use bitwise shifts and rotations, increasing resistance to cryptanalysis.
   - **Output**:
     - After processing all blocks, the final values of the variables are concatenated to produce the hash.
   - **Example**: Hashing "Hello, World!" with SHA-256 produces 

#### *SHA-3 (Secure Hash Algorithm 3)*
It is a newer cryptographic hash function designed to be resistant to future attacks. It's based on a different design principle (sponge construction) than previous SHA algorithms and offers various hash output lengths. SHA-3 is considered secure and is widely adopted in modern cryptographic applications.

   - **Algorithm Structure**: SHA-3 uses the **sponge construction** based on the Keccak algorithm, which differs significantly from SHA-1 and SHA-2.
   - **Padding and Pre-processing**:
     - SHA-3 pads the message using a different scheme, known as **multi-rate padding**.
     - It uses a "rate" parameter, determining how much of the state is used to absorb input or squeeze output, and a "capacity" parameter, which controls security strength.
   - **Initialization**:
     - SHA-3 initializes a **state matrix** with 1600 bits, divided into 25 64-bit words.
   - **Processing**:
     - SHA-3 alternates between two phases:
       1. **Absorption**: The padded message is split into chunks matching the rate. Each chunk is XORed into the rate portion of the state matrix, and the entire matrix undergoes a series of transformations known as **rounds**. Each round involves operations like θ (theta), ρ (rho), π (pi), χ (chi), and ι (iota), which mix the bits.
       2. **Squeezing**: After all input chunks are processed, the state matrix is transformed further, and the rate portion is output as the hash. This step is repeated if more output bits are needed.
   - **Output**:
     - The hash output is a fixed number of bits (224, 256, 384, or 512, depending on the SHA-3 variant).
   - **Example**: Hashing "Hello, World!" with SHA3-256 yields `d0e86a1c9e2d3b40c872d7ef7b1997be212bc083d30d582f63b603ba75c130a0`.

### *What are the differences between MD5, SHA-1, and SHA-2 family?*
- **Digest Size**: MD5 generates 128-bit hashes, SHA-1 creates 160-bit hashes, while SHA-2 provides flexibility with 224, 256, 384, and 512-bit outputs, depending on the variant.
- **Security**: MD5 and SHA-1 are considered insecure against collision attacks (finding two different inputs with the same hash). SHA-2 remains widely secure and resilient to such attacks.
- **Algorithmic Structure**: MD5 and SHA-1 use a similar Merkle–Damgård construction, which has been a target for certain vulnerabilities. SHA-2 was designed with a more robust structure to avoid weaknesses seen in MD5 and SHA-1.

### *Why have some hash functions been deprecated?*
Hash functions are deprecated when they are no longer considered secure. This happens due to advancements in computing power and cryptanalysis techniques. As technology evolves, it becomes easier to find collisions in hash functions, rendering them ineffective for cryptographic purposes. Hash functions like MD5 and SHA-1 are deprecated because they are susceptible to collision and preimage attacks. As computing power has increased, researchers have successfully demonstrated collisions in these algorithms, compromising their security for cryptographic uses. NIST and other standards bodies now recommend SHA-2 or SHA-3 for secure applications.

### *What makes SHA-3 different from previous hash functions?*
SHA-3 diverges significantly in structure from its predecessors:

- **Underlying Algorithm**: SHA-3 is based on Keccak, a sponge construction that differs from SHA-2’s Merkle–Damgård structure. This makes it resistant to length extension attacks, a known vulnerability in older hash functions.
- **Flexibility**: SHA-3 can adapt to various levels of security and digest lengths without being susceptible to the same kinds of attacks as SHA-1 and MD5.
- **Backup Security**: SHA-3 serves as a backup algorithm, designed to be a robust alternative in case SHA-2 is found vulnerable in the future.


## 3: Apllication Research Questions


### How are hash functions used in digital signatures?

To understand how hash functions are used in digital signtarures is important to understand the necesity and the purpose this idea. Digital signatures are used to verify the authenticity of a message or document and ensure they have not been tampered with in any way. The digital signature is a virtual way used to validate the authenticity and integrity of a message, software or digital document as normal signature would work on physical documents.
 A digital signature is a cryptographic mechanism  used to verify the authenticity of a document, file etc. When a document is "Signed" the signature is a hash function created utilizing the document which is then encrypted with the private key of the signer. This "Signature" is then attached to the document and with the decryption key the original hash can be obtained. The received document can then be used to generate a veryfyng hash which can be compared to the original hash. If the hashes match the document is valid and has not been altered. 
 
words 169
https://learn.microsoft.com/en-us/devsecops/playbook/capabilities/security/signing

### What role do hash functions play in password storage?

Password storage is a necessary function for multiple different softwares, browsers and specially companies If a password storage table was to be compromised and this one was not encrypted the opportunity for rapid missuse of this information would be much higher than if it was encrypted. Hash functions are used exactly for this motive, due to their one way encryption and deterministic. When a user creates a password, the hash function is used to encrypt this password and store it in the database. When the user logs in, the password is hashed and compared to the stored hash. If the hashes match, the user is authenticated. This way, the password is never stored in plain text, and even if the database is compromised, the passwords are not exposed.

Another important aspect of hash functions in password storage is the use of salt. Salting is the process of adding a random string of characters to the password before hashing it. This way if two users have the same password, the hashed passwords will still be different due to the salt. This makes it harder for attackers to use precomputed hash tables to guess passwords as even common passwords will have different hashes.

words 201

https://www.php.net/manual/en/faq.passwords.php

### How are hash functions used in blockchain technology?

Blockchain technology is the idea of creating block like containers of information, usually transaction records and linking them in a dependant and cascading chain. This is done through the use of hash functions. When a block is created all the information of said block is used to create a hash. This hash is then used to create the next block as a reference within this one. Since attacks on could be made per block the use of hash functions prevents this as any change in the information of a block would change the hash and therefore the hash reference of the next block would not be valid. This breaks the chain, invalidates the tampered block and alerts the network of the attempted change. All of this ensures the data integrity of the blockchain ensuring its uses as a secure and trustable ledger.

Within blockchain technology hash functions are created through a process called "Mining" a very resource intensive process that attempts to create a hash that meets a certain criteria. This process is what validates the block and the transactions within it.

words 182

https://www.investopedia.com/terms/h/hash.asp
https://academy.bit2me.com/que-es-hash/

### What is a HMAC and why is it important?


HMAC stands for Hash-based Message Authentication Code. Its a similar concept to that of a digital signature in which the purpose is to ensure the authenticity of a message and its sender. This is done by a process of a cryptographic hash function in combination with a secret key. When a message is sent along with the original hash, when the message is received its rehashed with the same key and compared. If the hash match it ensures that the message has not been tampered while well as making sure the sender is who they say they are.

WORDS = 138
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-224.ipd.pdf



## Part 2: Secure Communication System Design 

### Design Requirements 
Create a design document for a secure communication system between two parties (Alice and Bob) that ensures: 
- Confidentiality 
- Integrity 
- Authenticity 
- Non-repudiation 

### System Components 
#### 1. Key Management Design considerations: 
- How will keys be generated? 
- How will keys be exchanged? 
- How will keys be stored? 
- How often should keys be rotated? 


Effective key management is essential for secure communication between Alice and Bob, covering key generation, exchange, storage, and rotation. For identity and authentication, they generate asymmetric key pairs (e.g., RSA or ECC) and use symmetric keys (e.g., AES-256) for message encryption in each session. A secure key exchange protocol like Diffie-Hellman (DH) or Elliptic-Curve Diffie-Hellman (ECDH) helps establish session keys, and public key infrastructure (PKI) or a certificate authority (CA) verifies the authenticity of public keys.

For security, private keys are securely stored in hardware security modules (HSMs) or secure enclaves, while public keys are stored in a signed, verified public directory. Key rotation ensures additional security by updating session keys per session or periodically based on message volume, while asymmetric keys are rotated less frequently due to their computational cost, based on risk assessment.


Flow Diagram for the key management process:


![image](./ImagesHW2/keymanagmentFlow.png)



#### 2. Message Flow Design: 
- Step-by-step message encryption process 
- Step-by-step message decryption process 
- Error handling procedures 
- Session management 

In a secure message exchange process between Alice and Bob, they initiate a session by exchanging public keys if not already known. Alice generates a session key, encrypts it with Bob’s public key, and sends it to him. For each message, Alice encrypts it with the session key using AES-256, signs the message hash with her private key for integrity and non-repudiation, and sends both the encrypted message and signature to Bob. Upon receiving these, Bob decrypts the session key with his private key, decrypts the message, and verifies Alice’s signature to confirm its authenticity. Error handling involves discarding messages with failed signature checks and possibly requesting retransmission, while expired or compromised sessions require new keys. To prevent replay attacks, messages are timestamped and assigned unique IDs, discarding duplicates or outdated messages. Session management involves assigning a unique ID to each session, enforcing timeouts to invalidate old sessions and prompt key rotation, and securely ending the session once communication is complete.

Flow Diagram for the key message exchange process:

![image](./ImagesHW2/message_flow_design.png)

#### 3. Security Features Include: 

**Digital Signatures**

Digital signatures are a critical feature in secure communication, serving two main purposes: authentication and non-repudiation. Authentication ensures that Alice and Bob can confirm each other’s identity by verifying the digital signature on each message. In practice, this involves signing a message hash with the sender's private key. Upon receiving the message, the recipient (e.g., Bob) can use the sender's public key (e.g., Alice's) to verify the signature, confirming that the message genuinely came from Alice and hasn’t been tampered with. This verification process also ensures non-repudiation, meaning Alice cannot later deny having sent the message because only her private key could have generated the signature. By using digital signatures, the system guarantees that each message is legitimate and originates from the expected sender, upholding the integrity and authenticity of communication.

**Message Integrity Checks**

To detect any alterations or tampering, message integrity checks are implemented through hashing mechanisms, such as SHA-256. When Alice sends a message, she generates a hash (or checksum) of the message content, which is a unique digital fingerprint for that specific data. This hash is sent along with the message. When Bob receives the message, he can compute his own hash of the received data and compare it with the transmitted hash. If both hashes match, it confirms that the message has not been modified in transit. Even a slight change in the message content would result in a different hash, alerting Bob to possible tampering. By including integrity checks with each message, the system ensures that any unauthorized modifications are easily detectable, preserving the trustworthiness of the communication.

**Replay Attack Prevention**

To prevent replay attacks, where an adversary intercepts and resends a message to trick the recipient into thinking it is new, each message includes both a **timestamp** and a **nonce** (a unique, randomly generated number). The timestamp indicates when the message was sent, while the nonce ensures that each message is uniquely identifiable within a specific timeframe. When Bob receives a message, he checks the timestamp to ensure it falls within an acceptable window of time, preventing stale messages from being accepted. Additionally, Bob verifies that the nonce is unique and has not been used previously, ensuring that each message is processed only once. These measures prevent attackers from resending old messages to manipulate the system, protecting the integrity and relevance of the communication.

#### Considerations

Security Analysis:

Private keys are kept confidential through secure storage solutions like HSMs or secure enclaves, minimizing the risk of unauthorized decryption or forgery. PKI verifies the authenticity of public keys, preventing impersonation and Man-in-the-Middle (MITM) attacks, while digital signatures provide message integrity by allowing each party to confirm the message’s authenticity and detect tampering. Perfect Forward Secrecy (PFS) is achieved by generating unique session keys for each session, ensuring that even if a session key is compromised, past and future communications remain secure. Key rotation policies ensure that session keys are refreshed frequently, further limiting the exposure risk, and strong access control policies with auditing enforce authorized use of private keys only.

The message encryption system for Alice and Bob ensures confidentiality, integrity, authenticity, and protection against replay attacks. By encrypting each message with AES-256, it safeguards confidentiality, while Alice’s digital signature on the message hash allows Bob to verify integrity and origin, providing non-repudiation. The session key is securely exchanged by encrypting it with Bob’s public key, which only he can decrypt, protecting against man-in-the-middle attacks, assuming key authenticity. Replay attacks are mitigated through timestamps and unique message identifiers, though synchronized clocks are essential to prevent valid messages from being rejected. Session timeouts and key rotation further enhance security by limiting the exposure of session keys. Potential weaknesses include reliance on secure key storage, as private key compromise would undermine confidentiality and integrity. Ensuring secure key management, synchronization, and public key authenticity are therefore crucial to the overall effectiveness of this encryption system.

Implementation Considerations:

Implementing effective key management requires careful selection of cryptographic libraries, such that they should be regularly updated to protect against new vulnerabilities. Private keys are securely stored in HSMs or secure enclaves for enhanced tamper resistance, while PKI infrastructure, supported by a Certificate Authority (CA), is used to validate public keys through digital certificates. Key rotation is automated for session keys, while a defined policy dictates when asymmetric keys are rotated, ensuring operational continuity with minimal disruption. Comprehensive error handling processes manage issues like failed key exchanges, expired certificates, or session timeouts, terminating sessions when necessary and notifying users to retry or re-establish secure connections.

For secure message handling between Alice and Bob, each message should be encrypted with AES-256 using the session key to maintain confidentiality, and Alice should sign the message hash with her private key to ensure integrity and authenticity. To prevent replay attacks, each message includes a timestamp and a unique identifier, enabling Bob to detect and discard outdated or duplicate messages while accounting for slight clock drift. Upon receipt, Bob verifies the digital signature to confirm the message’s origin and integrity; if the signature is invalid, the message is discarded, and a retransmission may be requested if needed. Error handling is essential for managing transmission issues, and optimizations in encryption, decryption, and signature verification processes can help balance security with processing efficiency, ensuring the message exchange remains both secure and performant.

Potential Vulnerabilities and Mitigations:

Man-in-the-Middle (MITM) attacks pose a threat during key exchange, where an attacker could intercept or replace keys. Using PKI and verifying certificates helps mitigate this risk, ensuring public keys are authenticated before starting sessions. Key compromise is a significant risk if private keys are exposed, as it allows an attacker to decrypt messages or impersonate users; secure storage using HSMs, strict access control, and multi-factor authentication help prevent this. Key storage vulnerabilities are addressed by encrypting keys at rest with AES-256 and implementing strict access control, while using outdated cryptographic algorithms is prevented by enforcing the use of strong, modern algorithms like RSA-2048 and SHA-256, with regular reviews to maintain compliance. Finally, inadequate key rotation and expiry can allow compromised keys to remain valid; automated session key rotation and defined asymmetric key expiry ensure compromised keys don’t linger, enforcing reauthentication upon key rotation to strengthen security.

The message exchange process between Alice and Bob faces several vulnerabilities, including replay attacks, message tampering and denial of service (DoS). Replay attacks, where an attacker resends captured messages, are mitigated by including a timestamp and unique identifier with each message, allowing Bob to detect duplicates and outdated messages within a set tolerance for clock differences. To prevent message tampering, Alice signs each message so that Bob can verify its integrity and authenticity; any message failing the signature check is immediately discarded. Finally DoS attacks, where attackers flood Bob with excessive messages, are mitigated by rate limiting, filtering recognized session IDs, and quickly discarding invalid messages to prevent resource exhaustion. 

#### 4. System Architecture:

![image](./ImagesHW2/system_architecture.png)

### Deliverables 
Design Document including: 
- System architecture 
- Flow diagrams 
- Security analysis 
- Implementation considerations 
- Potential vulnerabilities and mitigations 