# Cryptography (CC4017) -- Week 2

Recall that a probability distribution $D$ over a set $S$ can be seen as a deterministic function mapping random coins $C sampled uniformly at random from a set $C$ to $S$. In this case, the probability mass function is defined, for all $S' ∈ S$, as:

$$Pr[S=S':S←\$D] = Pr[S=S':C←\$C;S←D(C)] = \frac{\#\{C:D(C) =S'\}}{|C|}$$

We abbreviate this, when clear from the context, to Pr[S'].

Recall also that the entropy of such a distribution is given by:

$$∑_{S'∈ S} −Pr[S']·log2(Pr[S'])$$

For example, the entropy associated with a perfect coin flip is $\frac{-1}{2}·log_2(\frac{1}{2}) + (−\frac{1}{2}·log_2(\frac{1}{2})) = 1$

**Answer the following questions**

1. Consider S the set of integers in the range 0..250 and note p= 251 is a prime number. Take C to be the set of all bit strings of length 8. Let the distribution D to be defined by the function $D(C) :=C (mod \ p)$, i.e. takes the remainder of coins C divided by p.
    
    1.1. Calculate the probability of each value in S to be produced by D.

    1.2. Repeat the above considering now the set C to be the set of all bit strings of length 64.
    
    1.3. Are these distributions uniform? If not, can you think of a way to quantify how distant they are from uniform?

> 
>1.1.
>
> S = [0,250]
>
> p = 251
>
> c = $2^8$ = 256
>
>$$
>\Pr[S'] = \begin{cases} 
>\frac{2}{256}, & S' \in \{0, 1, 3, 4\} \\
>\frac{1}{256}, & \text{otherwise} 
>\end{cases}
>$$
>
> 1.2.
>
> c = $\{2^{64}-1\}$ (set of bit strings of length 8)
>
>$$
>b = \frac{2^{64}}{251} \approx 7.35 * 10^{16} 
>$$ 
>
> - b represents the number of times each value in S appears when we map  all C elements in S using $D(C) = (C) \ mod \ 251$.
>
> - Para alguns valores, este valor será arredondado para cima, então valores em S podem aparecer b ou b−1 vezes.
>
> **Probability**
>
> - For the first m values ​​(where m is the number of values ​​that appear exactly b times), we have:
>
> $$Pr[S'] = \frac{b}{2^{64}}$$
>
> - For the remaing values in S, we have:
>
> $$Pr[S'] = \frac{b-1}{2^{64}}$$
>
> 1.3.
>
> The perfectly uniform distribution where $Pr[S'] = \frac{1}{251}$,  $\forall S' ∈ S $ teh entropy would be:
>
>$$
>H = -251 \times \frac{1}{251} \times \log_2 \left( \frac{1}{251} \right) = \log_2(251) \approx 7.97
>$$
>
> The distributions are not uniform. We can use entropy to quantify how distant they are from uniform
>
> $$
> 
>H = -\left( 4 \cdot \frac{2}{256} \cdot \log_2\left(\frac{2}{256}\right) + 247 \cdot \frac{1}{256} \cdot \log_2\left(\frac{1}{256}\right) \right) \approx 7.94
>$$
>
><br>
>
>$$
>H = -\left( \sum_{i=0}^{m-1} \frac{b}{2^{64}} \cdot \log_2\left( \frac{b}{2^{64}} \right) + \sum_{i=m}^{250} \frac{b - 1}{2^{64}} \cdot \log_2\left( \frac{b - 1}{2^{64}} \right) \right) \approx 7.98
>$$

2. Repeat question #1 but take $p = 2^8$, i.e., a power of 2.

> 2.1
> 
> S = [0,250]
>
> p = $2^8$ = 256
>
> c = $2^8$ = 256
>
>$$
>\Pr[S'] = \frac{1}{256}, S' \in [0,...,250]
>$$
>
> 2.2
>
> c = $2^{64}$
>
> $b = \frac{2^{64}}{251}$
>
> $
> Pr[S'] = \frac{b-1}{2^{64}}
> $
>
> 2.3
>
> If the length of bit strings of c is a multiple of p, then the entropy value will always be the same.
>
> $
> H1 = -251 \cdot \frac{1}{256} \cdot \log_2\left( \frac{1}{256} \right) \approx 7.84375
> $
>
> $
> H2 = -251 \cdot \frac{b-1}{2^{64}} \cdot \log_2\left( \frac{b-1}{2^{64}} \right) \approx 7.84375
> $


3. Use **Sage** to compute the entropy of the two distributions referred in questions #1 and #2. Compute also the entropy of the uniform distribution over S.

4. Generalize the computations from question #3 in **Sage** to compute the entropy of distribution D when C is the set of bit strings of length k. Check (approximately) what is the smallest k for which the entropy computed in Sage for D which matches the entropy of the uniform distribution over S

In [3]:
import math

def D(c, p):
    return c % p

# Function to help calculate the entropy values 
def entropy_aux(p):
    if p <= 0:
        return 0
    entropy_aux_value = -p * math.log2(p) 
    return entropy_aux_value

print("QUESTION 3")
print("-- 1 --")
# Entropy for first distribution
print(f"Entropy for first distribution: {5 * entropy_aux(2/256) + 246 * entropy_aux(1/256)}")

# Entropy for second distribution
blocks = ((2**64)/251)
blocks_plus_one_occurrences = D((2**64)-1, 251)
blocks_occurrences = 251-blocks_plus_one_occurrences
print(f"Entropy for second distribution: {blocks_plus_one_occurrences * entropy_aux((blocks + 1)/(2**64)) + blocks_occurrences * entropy_aux(blocks/(2**64))}")

# Entropy for the uniform distribution over S
max_entropy =  251*entropy_aux(1/251)
print(f"Entropy for the uniform distribution over S: {max_entropy}") 


print("-- 2 --")
print(f"Entropy for first distribution: {251 * entropy_aux(1/256)}")

blocks = ((2**64)/(2**8))
print(f"Entropy for second distribution: {251 * entropy_aux((blocks + 1)/(2**64))}")

print()
print("QUESTION 4")
ks = range(40, 64)
for k in ks:
    b = int((2**k)/251)
    b_p1_occurrences = int(D((2**k)-1, 251))
    b_occurrences = 251-b_p1_occurrences
    h = b_p1_occurrences * entropy_aux((b+1)/(2**k)) + b_occurrences * entropy_aux(b/(2**k))
    print(f"Entropy for {k} :{h}")
    if h == max_entropy:
        print(f"True for {k}")

QUESTION 3
-- 1 --
Entropy for first distribution: 7.9609375
Entropy for second distribution: 7.971543553950772
Entropy for the uniform distribution over S: 7.971543553950772
-- 2 --
Entropy for first distribution: 7.84375
Entropy for second distribution: 7.84375

QUESTION 4
Entropy for 40 :7.971543553944834
Entropy for 41 :7.971543553947803
Entropy for 42 :7.971543553949287
Entropy for 43 :7.9715435539500294
Entropy for 44 :7.971543553950402
Entropy for 45 :7.971543553950587
Entropy for 46 :7.97154355395068
Entropy for 47 :7.971543553950726
Entropy for 48 :7.971543553950749
Entropy for 49 :7.9715435539507595
Entropy for 50 :7.971543553950767
Entropy for 51 :7.971543553950769
Entropy for 52 :7.971543553950771
Entropy for 53 :7.971543553950772
True for 53
Entropy for 54 :7.971543553950772
True for 54
Entropy for 55 :7.971543553950772
True for 55
Entropy for 56 :7.971543553950772
True for 56
Entropy for 57 :7.971543553950772
True for 57
Entropy for 58 :7.971543553950772
True for 58
Entro

5. **hexdump** can be used to extract randomness from **/dev/urandom**. Explain what the following command is doing,and how the different flags influence its behavior.

    hexdump -n 32 -e'1/4 "%0X" 1 "\n"'/dev/urandom

    Implement an alternative command that uses/dev/urandom to create a file with random bytes.

    **HINT:** use the shellddcommand.
    
    Use openSSL to do exactly the same.
    
    **HINT:** look at commandrand.

> This command is extracting 32 bytes of random data from /dev/unrandom.
>- n specifies the number of bytes to read.
>- e specifies the format of the string
>
> dd if = /dev/urandom bs = 32 count = 8
>
> openssl rand -hex 32


6 - Use openSSL to generate a key pair where private key is protected with a password. 

openssl genrsa 4096

See what happens when you increase/decrease the key size.

Investigate how openSSL converts the passphrase into a cryptography key for encryption/wrapping

> Effect of increased / decrease in key size:
>
> Larger key sizes (for example, 4096 bits) increase safety, but also make operations (encryption and decription) slower.
> Smaller key sizes are faster but offer less security.
>
> OpenSSL takes the secret phrase supplied and derives a cryptographic key to it. He used a key derivation function (KDF), typically using algorithms like PBKDF2 to make the process safe against brute force attacks


7 - Use openSSL to generate random Diffie-Hellman parameters.

openssl dh param 2048

See what happens when you increase/decrease the key size. Compare to the previous case.

> The similarity of RSA, increased key size in DH parameters increases safety, but makes the key change process slower. Comparing a 2048-bit DH key with a smaller one (such as 1024 bits) will show that the largest key provides stronger security but requires more computational power for the exchange process.