## Exercise 05.1 (random numbers)

- Using the '`randint`' function from the '`random`' module (https://docs.python.org/3/library/random.html#random.randint), 
  develop a function `dice_roll` that emulates the roll of a dice with $n$ sides.

- For $n=6$, devise and implement a test to check that it is a fair dice.

#### (a) Dice roll code:

In [14]:
import random

# YOUR CODE HERE
def dice_roll(n):
    "Emulates roll of dice"
    return random.randint(1,n)

In [15]:
for n in range(1, 20):
    for j in range(100):
        value = dice_roll(n) 
        assert value >= 1 and value <= n

#### (b) Test for fairness

Test dice for $n = 6$, for $k$ number of tests

Increment counters for the frequency $f_i$

Calculate the standard deviation of the values $\frac{f_i}{k}$

Compare it against the uniform distribution ${Var}_{actual} = \frac{b^2 - 1}{12}$ with $b = 6$

NPTE: SD comparison doesn't seem to be working, so it may be better to use E(x)?


In [24]:
import numpy as np

# YOUR CODE HERE
k = 1000000
n = 6
freq = np.zeros([n,1])

for i in range(k):
    index = dice_roll(n) - 1 # get 0 -indexing
    freq[index] += 1

freq /= k    

print(freq)
print("SD Observed {}".format(np.std(freq))) #Calculate standard deviation as a measure of error
print("SD Actual {}".format(np.sqrt((35/12))))

[[0.166648]
 [0.166364]
 [0.166298]
 [0.166755]
 [0.167064]
 [0.166871]]
SD Observed 0.00026935086576598436
SD Actual 1.707825127659933


## Exercise 05.2 (data compression)

For devices with limited memory, data compression can be important. Data compression is
a field of its own, but with libraries we can compress (and uncompress) data easily without being expert in
the details.

Below is a program code for compressing a passage from Hamlet, by Shakespeare.

In [25]:
# Import the compression module
import zlib

# Create a string that we wish to compress
text = """
Welcome, dear Rosencrantz and Guildenstern!
Moreover that we much did long to see you,
The need we have to use you did provoke
Our hasty sending. Something have you heard
Of Hamlet's transformation; so call it,
Sith nor the exterior nor the inward man
Resembles that it was. What it should be,
More than his father's death, that thus hath put him
So much from the understanding of himself,
I cannot dream of: I entreat you both,
That, being of so young days brought up with him,
And sith so neighbour'd to his youth and havior,
That you vouchsafe your rest here in our court
Some little time: so by your companies
To draw him on to pleasures, and to gather,
So much as from occasion you may glean,
Whether aught, to us unknown, afflicts him thus,
That, open'd, lies within our remedy."""

# Convert Python string to bytes and check type
text_bytes = text.encode("utf-8")
print(type(text_bytes))

# Get number of bytes used to store string
print("Number of bytes for uncompressed string:", len(text_bytes))

# Compress string and get number of byes used for compressed string
text_comp = zlib.compress(text_bytes)
print("Number of bytes for compressed string:", len(text_comp))

# Display the compression efficiency
print("Compression efficiency: ", len(text_comp)/len(text_bytes))

# Decompress the string
text_decomp = zlib.decompress(text_comp)

# Check that original and decompressed string are the same (more on aseret)
if text != text_decomp.decode("utf-8"):
    print("Problem: original and decompressed string differ.")

<class 'bytes'>
Number of bytes for uncompressed string: 785
Number of bytes for compressed string: 466
Compression efficiency:  0.5936305732484076


Using the above as a guide, examine the compression efficiency of 

1. Compressing one large string made up of the the passage by Shakespeare repeated 100 times; and
2. Compressing a random string of the same length as the repeated Shakespeare passage.

To help you, the below function generates random string of length `N`:

In [26]:
import random
import string

def random_string(N):
    return ''.join([random.choice(string.ascii_letters + string.digits) for n in range(N)])

print(random_string(8))

9HXvJRP9


### Solution

In [27]:
# Create a string
text = """
Welcome, dear Rosencrantz and Guildenstern!
Moreover that we much did long to see you,
The need we have to use you did provoke
Our hasty sending. Something have you heard
Of Hamlet's transformation; so call it,
Sith nor the exterior nor the inward man
Resembles that it was. What it should be,
More than his father's death, that thus hath put him
So much from the understanding of himself,
I cannot dream of: I entreat you both,
That, being of so young days brought up with him,
And sith so neighbour'd to his youth and havior,
That you vouchsafe your rest here in our court
Some little time: so by your companies
To draw him on to pleasures, and to gather,
So much as from occasion you may glean,
Whether aught, to us unknown, afflicts him thus,
That, open'd, lies within our remedy."""

Import the necessary modules:

In [28]:
import random
import string
import zlib

Repeat the Shakespeare string 100 times, and compress:

In [29]:
# Create string of Shakespeare passage repeated 100 times
# YOUR CODE HERE
text*=100
text_bytes = text.encode("utf-8")
%time text_comp = zlib.compress(text_bytes)

# Get number of bytes used to store string
print("Number of bytes for uncompressed string:", len(text_bytes))

CPU times: user 981 µs, sys: 0 ns, total: 981 µs
Wall time: 586 µs
Number of bytes for uncompressed string: 78500


Create random string and compress:

In [30]:
# YOUR CODE HERE
num_of_chars = len(text)
ran_text = random_string(num_of_chars)
ran_text_bytes = ran_text.encode("utf-8")
%time ran_text_comp = zlib.compress(ran_text_bytes)

# Get number of bytes used to store string
print("Number of bytes for uncompressed string:", len(ran_text_bytes))

CPU times: user 9.57 ms, sys: 0 ns, total: 9.57 ms
Wall time: 9.58 ms
Number of bytes for uncompressed string: 78500


Compare compression efficiency:

In [31]:
# YOUR CODE HERE

# Display the compression efficiency
print("Compression efficiency for repeated text: ", len(text_comp)/len(text_bytes))
print("Compression efficiency for random text: ", len(ran_text_comp)/len(ran_text_bytes))

Compression efficiency for repeated text:  0.01178343949044586
Compression efficiency for random text:  0.7521273885350318
