# Self-study try-it activity 4.1: Identifying the Python functions used to calculate MLE

In [None]:
#Importing the necessary libraries
import numpy as np
from scipy.stats import bernoulli
from scipy.stats import binom
import random
import math
from matplotlib import pyplot as plt
plt.rcParams["figure.figsize"] = plt.rcParamsDefault["figure.figsize"]


## MLE and coin flips

### Question 1: How can you simulate 100 flips of a loaded coin that shows 'H' with a probability of θ = 0.6? What are the different ways of doing so?

**Answer 1.1**: Let’s start with a simple case: manually performing 100 coin flips and recording the outcome of each iteration.

In [None]:
np.random.seed(15)
a = np.zeros(100)
for i in range(100):
    flip = np.random.uniform(0,1) #Sample a probability uniformly from (0,1)
    if flip <= 0.6: #If the sampled value is less than 0.6, then head comes, which happens 60% of the time
        a[i] = 1    #you see "H"!!

In [None]:
#Now let's see some basic statistics from your sequence
print("The fraction of Heads we see is", np.mean(a))

Isn't it interesting that although $\theta = 0.6$, the estimate from the sample is $\hat{\theta} = 0.69$? You'll explore why this happens in this notebook.

**Answer 1.2**: Simplifying the code above, you will get the following results.

In [None]:
np.random.seed(15) #A fixed random seed so that everyone gets the same results
a = np.zeros(100) #A lot of zeros
for i in range(100):
    a[i] = bernoulli.rvs(0.6) #This will generate a value of 1 with probability 0.6, and "0" otherwise

In [None]:
print("The fraction of Heads we see is", np.mean(a))

**Answer 1.3**: You can even simplify the code into a single line. Almost all statistical functions in Python scale with a parameter.

In [None]:
np.random.seed(15) #A fixed random seed so that everyone gets the same results
a = bernoulli.rvs(0.6, size = 1000000) #Generate 100 flips where each flip comes "H" with 0.6 probability, and "0" o.w.

In [None]:
print("The fraction of Heads we see is", np.mean(a))

### Question 2: How do you explain the fact that the fraction of heads is $\hat{\theta} = 0.69$, while $\theta = 0.6$?

Plot the results to demonstrate the law of large numbers: as the number of coin tosses increases, the estimated value $\hat{\theta}$ converges to the true value $\theta$.

**Answer 2.1**: As the number of flips increases, the estimate $\hat{\theta}$ becomes more accurate; the chances of it being far off from the true value $\theta$ also decrease dramatically.

**Answer 2.2**: Let's analyse whether this is accurate.

In [None]:
np.random.seed(15) #A fixed random seed so that everyone gets the same results
a = bernoulli.rvs(0.6, size = pow(10,6))
print("The fraction of Heads we see is", round(np.mean(a),4))

**Answer 2.3**: Let $\hat{\theta}(n)$ represent the estimate of $\theta$ based on the outcomes of $n$ coin tosses. By plotting $\hat{\theta}(n)$ for $n = 1,\ldots, 10^3$, you can observe how the estimate converges to $\theta$. This is an empirical illustration of the law of large numbers.

In [None]:
ratios = np.cumsum(a) / (np.arange(1,pow(10,6)+ 1)) #Running ratios. ratios[10] would give us hat_theta(9)

In [None]:
plot_limit = 1000 #plot how the ratio changes oby increasing $n$
x = np.arange(1, plot_limit + 1)
y = ratios[:plot_limit]
plt.title(r"$\hat{\theta}(n)$ as a function of $n=1,\ldots$")
plt.xlabel("n = number of coin tosses")
plt.ylabel(r"$\hat{\theta}(n)$")
plt.axhline(y = 0.6, color = 'k', linestyle='--', alpha = 0.6, label = r"true value of $\theta$")
plt.plot(x,y)
plt.xlim(-5,plot_limit)
plt.ylim(0.4,0.8)
plt.legend()
plt.show()

### Question 3: If you iterate the same experiment of tossing a loaded coin $100$ times again, say $2,000$ times, do you expect more or less than a $0.69$ fraction of heads each time?

**Answer 3:** No. That was only the first experiment. The second one can give less than $0.6$ fraction. Let's review.

In [None]:
np.random.seed(15) #A fixed random seed so that everyone gets the same results
for i in range(10):
    a = bernoulli.rvs(0.6, size = 100)
    print("Experiment #", i+1, ": hat_theta =", np.mean(a))
print("When we average all of these until 2,000-th experiment, we will get ~0.6!")

### Question 4: Is there a simpler value to sample than a Bernoulli distribution?

**Answer 4:** There is indeed a simpler way. Recall that the number of 'H' outcomes in $n$ tosses of a coin is actually a binomial distribution.

$binom.rvs (n, p, size = 100)$

where:
 
    n = number of trails

    p = probability

    size = number of times the experiment is conducted

In [None]:
print("I am easily simulating an example of 100 coin tosses and the number of 'H' is:",  binom.rvs(100, 0.6))

In [None]:
print("I am doing the same, but this time repeating my experiment three times and here are the results",  binom.rvs(100, 0.6, size = 3))

### Question 5: An adversary hands you a sequence of coin tosses, stored in `sequence`. They claim this sequence was generated by flipping a possibly loaded coin $200$ times. You’re told the same coin was used throughout with no switching midway. 

### They also mention a hidden `magic_number`, which is the true value of $\theta$, the probability of getting 'H', but you're not allowed to see it.

### Then, they pose a challenge: 'I will flip the same coin 1,000 times. Each time it lands on "H", you win 60. Each time it lands on "T", you lose 120.' Would you take the bet?

In [None]:
np.random.seed(85) #A fixed random seed so that everyone gets the same results
magic_number = np.random.uniform(0,1) #Do not print this number
sequence = bernoulli.rvs(magic_number, size = 200)

**Answer 5:** You will lose $120$ but only make $60$. Why should you even consider taking this bet? But wait, how many 'heads' appeared last time?

In [None]:
print("There were", np.sum(sequence),"many H's in the given sequence.")

Remember that in this programme, you learned that the MLE estimate of the true $\theta$ (i.e. the probability that heads will appear) is actually proportional to the number of heads you see in this sequence. In other words:

In [None]:
hat_theta = np.mean(sequence)
print("The MAXIMUM LIKELIHOOD, i.e., the best-estimation of the true theta is", hat_theta)

Now that you estimated ```hat_theta```, you simulate what will happen by writing a code:

In [None]:
np.random.seed(85)
flips = bernoulli.rvs(hat_theta, size = 1000) #flipping a coin
total_money = np.sum(flips*60 - (1-flips)*120)
#The formula profit = flips×60 − (1−flips)×120 correctly calculates your profit by awarding $60 for each heads (when flips = 1)
#and subtracting $120 for each tails (when flips = 0), directly applying the game's payoff rules to every simulated coin flip.
print("Hmm, you will make $", total_money, "! Nice.")

After the simulation, you see that if you take this bet, you will make a profit. However, you are not sure whether this is accurate. You simulate the experiment several more times, and then you take the average.

In [None]:
number_of_simulations = 10000 #You simulate this much
simulations = np.zeros(number_of_simulations)
for sim in range(number_of_simulations):
    flips = bernoulli.rvs(hat_theta, size = 1000) #Flipping a coin
    simulations[sim] = np.sum(flips*60 - (1-flips)*120)
print("Even in multiple simulations, in average you make $", np.mean(simulations), "! Nice.")

Did you overdo the simulations? When playing one round of this game, the expected profit is:

$$\begin{align}
\mathbb{E}[\text{profit}] &=  \mathbb{E}[\text{profit} | H]\cdot\theta + \mathbb{E}[\text{profit} | T]\cdot(1 - \theta)  \\
& = 60 \theta - 120 (1 - \theta).
\end{align}$$

But you don't know $\theta$, so the estimation is:

$$\begin{align}
\mathbb{E}[\text{estimated profit}] &= 60 \hat{\theta} - 120 (1 - \hat{\theta}) \\
& = 60 \cdot 0.69 - 120 \cdot 0.31
& = 4.2
\end{align}$$

When you play this game for $1,000$ times, you will make an expected $\$4,200$! So the previous simulation was close to this number, but you did not need to simulate.

Let's try the game now.

In [None]:
np.random.seed(15)
flips = bernoulli.rvs(magic_number, size = 100) #flipping a coin
np.sum(flips*60 - (1-flips)*120)

You started making money in the first $100$ rounds. Let's wait until the game is finished.

In [None]:
np.random.seed(15)
flips = bernoulli.rvs(magic_number, size = 1000) #flipping a coin
profit = np.sum(flips*60 - (1-flips)*120)

In [None]:
print("Overall, our profit is $", profit)
print("This shows we actually lost money.")
# The adversary reveals the true value of $\theta$:
print("The true value of theta, which we estimated as", hat_theta, "is:", round(magic_number,2))

Now you realise the reason you lost money: the provided sequence wasn’t truly representative of the underlying probability.

This raises a couple of important questions:

1. What if the adversary gave you a sequence of $1,000,000$ coin tosses? Would you be more or less confident about playing this game?  

2. How would you explain this game when the given sequence has only $1$ coin toss?

3. Please explain why MLE is favourable in the case above.

**Answer 1:** The more times the same coin is flipped, the higher the probability of winning, according to the law of large numbers.

**Answer 2:** With just one coin toss, there's significant uncertainty about the coin's bias. Making decisions based on such limited data is risky; it's generally better to collect more evidence before committing.

**Answer 3:** MLE is known for being consistent, efficient and easy to use. But with very small samples, like a single coin toss, its estimates can be highly uncertain. MLE works best when there's enough data. Its strengths really show as sample size increases.