# Exercise 1

### a)
- The probability that a single machine does not fail during the time period $T$ is $1 - p$.
- For $n$ machines, assuming failures are independent, the probabilty that none of the $n$ machines fail is: $(1 - p)^n$.
- The probability of at least one machine failing is the complement of this event: <br> $P(at\ least\ one\ failure) = 1 - P(no\ failures) = 1 - (1 - p)^n$

### b)
The probability $p_k$ of exactly $k$ machines failing can be modeled using the binoial distribution, since each machine independently either fails or does not fail.<br><br>
$p_k = \binom{n}{k} p^k (1 - p)^{n-k}$ <br><br>
Where:<br>
- $\binom{n}{k} = \frac{n!}{k!(n-k)!}$: is the binomial coefficient, representing the number of ways to choose $k$ failing machines out of $n$.
- $p^k$: Probability of $k$ machines failing.
- $(1 - p)^{n - k}$: Probability of the remaining $n - k$ machines not failing.

### c)
The binomial distribution guarantees that the sum of probabilities for all possible outcomes (from $0$ to $n$ failures) equals $1$.<br> <br>
$p_0 + p_1 + \ldots + p_n = 1$ <br><br>
Here: <br>
- $p_0 = (1 - p)^n$ : probabilities of no failures.
- $p_1 + p_2 + \ldots + p_n$: probability of at least one failure.

Therefore:<br><br>
$p_1 + p_2 + \ldots + p_n = 1 - p_0 = 1 - (1 - p)^n$

# Exercise 2

### SHA-256 (Secure Hash Algorithm 256-bit)
- Converts input data into a fixed-length 256-bit hash.
- Commonly used in cryptographic applications like digital signatures and password storage.
### MD5 (Message Digest Algorithm 5)
- Produces a 128-bit hash value.
- Historically used for password hashing but is now considered insecure against brute-force and collision attacks.

### Connection of Hash Functions to Password Security
Hash functions are vital for securely storing passwords because:
1. They transform as password into a fixed-size hash, making the original password irrecoverable.
2. Good hash functions are designed to be:
    - __Deterministic__: Same input always produces the same output:
    - __Non-reversible__: It's computationally infeasible to derive the original input from the hash.
    - __Collision-resistant__: Different inputs should produce unique hashes.

However, if basic hash functions (like unsalted MD5 or SHA-1) are used without additional security measures, they bacome vulnerable to rainbow table attacks.

### Rainbow Table
A rainbow table is a precomputed table of hashes for a large set of possible passwords. It is used to reverse-engineer hashed passwords by looking up their hash values:
- Trade-off: Reduces computational cost of cracking hashes by using storage space.
- Threat: Enables attackers to match hash values quickly to their plaintext equivalents, especially for commonly used passwords.

### Salt
Salt is a random string of data added to a password before hashing it. Its purpose is to:
1. Ensure that even if two users have the same password, their hashes will be unique.
2. Defeat precomputed rainbow tables, as the salt must also be known to match the hash.

# Exercise 3

### Internal Data Structure
1.  Buckets
    -   The HashMap uses an array (`Node<K,V>[] table`) to store key-value pairs. Each element in the array represents a bucket.
2.  Node Class
    -   Each bucket contains a `Node` object that stores:
        -   `key`: The key of the mapping.
        -   `value`: The associated value.
        -   `hash`: The hash code of the key.
        -   `next`: A reference to the next node in the bucket (for collision handling via chaining).
3.  Tree Nodes
    -   If the number of entries in a bucket exceeds the `TREEIFY_THRESHOLD` (8 by default) the bucket switches to a tree structure (red-black tree). This improves the efficiency of operations from $O(n)$ to $O(log(n))$.
### Adding a `<Key, Value>` Pair
1. Hash Computation
    -   A key's hash code is computed using `hash(key)` which applies a transformation to reduce collisions.
2. Bucket Index
    - The bucket index is calculated using `index = (hash & (table.length - 1))`.
3.  Collision Handling:
    -   If the bucket at the computed index is empty, a new node is placed directly.
    -   If the bucket already has entries:
        - The chain of nodes is traversed using `Node.next` to find an existing key or reach the end.
        -   If the key exists (determined using `equals()`), its value is updated.
        -   Otherwise, a new node is appended at the end of the chain.
    -   If the chain lenght exceeds `TREEIFY_THRESHOLD`, the chain is converted into a red-black tree.
4. Rehashing
    -   If the number of entries exceeds the threshold (capacity * load factor), the table is resized (doubled), and the entries are rehashed to redistribute them across the new buckets.
### Retrieving an Entry
1. Hash Computation and Index Lookup:
    -   The key's hash is computed, and the bucket index is determined.
2. Bucket Traversal:
    -   The bucket at the computed index is traversed:
        - If the bucket is a linked list, the nodes are scanned sequentially, comparing the key using `equals()`
        - If the bucket is a tree, a tree search is performed based on the key's hash and order (if keys are comparable).
3. Return Value:
    - If a matching key is found, its value is returned.
    - If no match is found `null` is returned.


# Exercise 4

1. MurmurHash3:
    -   Description: A non-cryptographic hash function that provides high-quality, uniformly distributed hash values.
    - Justification: MurmurHash3 is widely used for its speed and high-quality hash distribution. It provides good mixing properties and is robust against clustering, making it ideal for Bloom filters.
2. FNV-1a (Fowler-Noll-Vo):
    -   Description: A lightweight hash function that sues a prime multiplier for dispersion and processes input byte-by-byte.
    - Justification: FNV-1a is simple, fast and produces distinct hash values due to its unique multiplier. Its design complements MurmurHash3 in terms of computation and output.
3. CityHash:
    -   Description: A hash function designed by Google for fast processing of strings, optimized for low-latency applications.
    - Justification: CityHash offers a good tradeoff between speed and uniformity of hash values. Its implementation uses techniques like hashing chunks of data efficiently, which differ significantly from MurmurHash3 and FNV-1a.

Each hash function employs distinct approaches for combining and distributing input data (bit-mixing, prime-based linear methods and block hashing). Their differing internal designs minimize the likelihood of correlated outputs for similar inputs and therefore reducing false positives. 

# Exercise 5

### a)
The probability formula for a random element hashing to a specific bit in the Bloom filter is: $P(hit) = \frac{1}{n}$

Where $n = 5$ (number of bits in the array). Thus, the probability that a random element gets hashed to a given bit is: $P(hit) = \frac{1}{5} = 20\%$.

This probability arises because each bit in the bit array is equally likely to be chosen by the hash function of a random element, assuming the hash function distributes values uniformly.

### b)
$h_1(x) = x\ mod\ 5$<br>
$h_2(x) = (2x + 3)\ mod\ 5$


For $x = 4$:


$h_1(4) = 4\ mod\ 5 = 4$<br>
$h_2(4) = 11\ mod\ 5 = 1$<br>

|  0   |  1   |  2   |  3   |  4   |
|-----|-----|-----|-----|-----|
| 0 | 1 | 0 | 0 | 1 |


For $x = 1$:


$h_1(4) = 1\ mod\ 5 = 1$<br>
$h_2(4) = 5\ mod\ 5 = 0$<br>

|  0   |  1   |  2   |  3   |  4   |
|-----|-----|-----|-----|-----|
| 1 | 1 | 0 | 0 | 1 |

Every bit is equally likely to be hit by the two hash functions if the input values are uniformly distributed.
- $h_1(x)$ simply maps $x$ to its remainder when divided by 5. Since it cylcles through all possible values as $x$ increases, it uniformly distributes across the 5 bits if $x$ itself is uniformly distributed.
- $h_2(x)$ also distributes the values uniformly since the GCD of 2 and 5 is 1 (2 is relatively prime to 5 since they're both prime numbers). This property ensures that the mapping $2x\ mod\ 5$ generates a complete cycle over all residues before repeating. Adding 3 doesn't change the uniformity but merely shifts the sequence cyclically.


### c)
$P_{false\ positive} = (1 - e^{-\frac{k*n}{m}})^k$

Where:
- $k = 2$ (number of hash functions)
- $n = 2$ (number of elements inserted)
- $m = 5$ (size of the bit array)

The fraction of 1s in the bit array is determined by $1 - e^{-\frac{k*n}{m}} = 0,551$. Thus, approximaltey $55,1\%$ of the bits are expected to be set, resulting in a $P_{false\ positive} = (0,551)^2 = 0,304$.

The false positive probability is approximatley $30,4\%$. This means that for any random number checked against the Bloom filter, there is about a $30,4\%$ chance it will falsely appear to be in the set.