<h2> Exploring estimators </h2>

In this week's notebook, we're going to explore various estimators, their biases, and their efficiencies. 

<h4> An estimator for an exponential parameter </h4>

Let's draw a dataset from an exponential distribution $\operatorname{Exp}(\lambda)$, which has mean $1/\lambda$. We know that if we build a dataset $\{x_1, x_2, ..., x_n\}$ by sampling from this distribution, then the sample mean is an unbiased estimator for $1/\lambda$. On your homework, you found that
$$T_n = \frac{n}{x_1 + x_2 + \cdots + x_n}$$
is *not* an unbiased estimator for $\lambda$; for example, $E[T_1] = \infty$. On the other hand, an easy-to-compute biased estimator with low variance is sometimes preferable to a hard-to-compute unbiased estimator with high variance. We'll explore what happens as $n \to \infty$ in the **following questions**:

* Suppose that $\lambda = 0.5$ and that $n = 2$. Make an estimate for $E[T_2]$ using an appropriate number of simulations. As an estimator for $\lambda$, is it biased positively or negatively?
* Repeat the previous part with $n = 10$ and $n = 100$. Can you make a conjecture for the behavior of $E[T_n]$ as $n \to \infty$?

<h4> Quantifying efficiency of an estimator </h4>

Suppose that we know data is drawn from a uniform distribution $\operatorname{Unif}(-\theta, \theta)$, where $\theta$ is unknown. At this point, we have three different estimators for $\theta$; they are

* $A = 3 X_1^2,$ which is unbiased,
* $B = 2 X_1$, which is unbiased (from the homework!), and
* $C = \max\{X_1, X_2, ..., X_n\}$, which has expectation $\frac{n}{n + 1} \theta$ (from the homework!).

One tool for quantifying how "good" an estimator is the *mean squared error*, or MSE. It's defined as the mean of the squared error:
$$MSE(T) = E[(T - \theta)^2].$$ 
For an unbiased estimator, this is exactly the variance of $T$ itself; naturally, this means that smaller values are generally better. We'll explore the MSE for each of the three estiamtors for $\theta$ in the **following questions**:

* Suppose that you know $\theta = 1$. Estimate the MSE of $A$ using an appropriate number of trials.
* Estimate the MSE of $B$ under the same conditions.
* Estimate the MSE of $C$ using $n = 1$, $n = 10$, and $n = 100$. Make a conjecture for what happens as $n \to \infty$.

In [82]:
# To get you started: here's the exponential distribution again:
from random import random
from math import log

def Exp(lamb):
    return -log(random()) / lamb


In [12]:
#Anwers to questions here and code is below...
#T2:
#When n = 2, T2 is given by T2 = 2 / (X1 + X2), where X1 and X2 are two independent exponential random variables with parameter A = 0.5.
#The expected value of T2 is E[T2] = 2 / (E[X1] + E[X2]), where E[X1] and E[X2] are the expected values of X1 and X2.
#For the exponential distribution, E[X] = 1 / A. So, E[X1] = E[X2] = 1 / 0.5 = 2.
#Therefore, E[T2] = 2 / (2 + 2) = 2 / 4 = 0.5.
#If the expected value of the estimator is equal to the true parameter value, it is an unbiased estimator. 
#In this case, for n = 2, E[T2] is equal to the true parameter A (0.5), so the estimator T2 is unbiased.
#E[Tn] as n → ∞, A conjecture about the behavior of E[Tn] using law of large numbers.
#The law states that as the sample size increases, the sample mean converges to the true population mean.
# for the exponential distribution with parameter A = 0.5 and n → ∞:
#Law of Large Numbers:
#As n approaches infinity, the sample mean approaches 1/A = 1/(0.5) = 2
#As n becomes very large, and the estimator Tn becomes unbiased.
#So, as n → ∞, conjecture that E[Tn] converges to the true parameter value A = 0.5, estimator unbiased.

#To make a conjecture for the behavior of the MSE as n → ∞ for Estimator C( and A or B):
#Estimator C = max{X1, X2, ..., Xn}.
#Estimator C is a non-linear function that takes the maximum of n samples.
#As n increases, the maximum value is likely to approach the true maximum value in the distribution.
#The bias of Estimator C is likely to decrease as n increases, as it approximates the true maximum value better.
#Smaller bias generally leads to a decrease in MSE for unbiased estimators as n increases.
#Therefore, you can conjecture that as n → ∞, the MSE for Estimator C is likely to decrease and approach zero. Same applies to B and A.

In [6]:
from random import random
from math import log

def Exp(lamb):
    return -log(random()) / lamb

n = 2  # Number of exponential random variables to sum
A = 0.5  # Parameter for the exponential distribution
num_simulations = 1000
total = 0

for _ in range(num_simulations):
    y_total = 0
    for i in range(n):
        y = Exp(A)
        y_total += y

    T_n = n / y_total
    total += T_n

mean = total / num_simulations
print("E[T2] estimate:", mean)


E[T2] estimate: 0.9772121744216277


In [9]:
from random import random
from math import log

def Exp(lamb):
    return -log(random()) / lamb

A = 0.5  # Parameter for the exponential distribution
n = 2  # Number of exponential random variables to sum
num_simulations = [1, 10, 100]  # Vary the number of simulations

for num_sim in num_simulations:
    total = 0

    for _ in range(num_sim):
        y_total = 0
        for i in range(n):
            y = Exp(A)
            y_total += y

        T_n = n / y_total
        total += T_n

    mean = total / num_sim
    print(f'E[T2] estimate with {num_sim} simulations:', mean)


E[T2] estimate with 1 simulations: 0.43742752366261317
E[T2] estimate with 10 simulations: 0.6916104469428148
E[T2] estimate with 100 simulations: 1.0284968089843567


In [7]:
def simulate_uniform():
    return -0.5 + random()

n = 1000  # Number of trials
true_value = 1  # True value of theta

# Estimator A = 3 * X^2
mse_A = 0
for _ in range(n):
    X = simulate_uniform()
    estimator_A = 3 * X**2
    mse_A += (estimator_A - true_value)**2

mse_A /= n

# Estimator B = 2 * X
mse_B = 0
for _ in range(n):
    X = simulate_uniform()
    estimator_B = 2 * X
    mse_B += (estimator_B - true_value)**2

mse_B /= n

print("MSE for Estimator A:", mse_A)
print("MSE for Estimator B:", mse_B)


MSE for Estimator A: 0.6139321071568131
MSE for Estimator B: 1.3678617151635195


In [8]:
from random import random

def simulate_uniform():
    return -0.5 + random()

def estimator_C(sample):
    return max(sample)

n = 1000  # Number of trials
true_value = 1  # True value of theta
num_samples = 10  # Number of samples for Estimator C

mse_C = 0

for _ in range(n):
    samples = [simulate_uniform() for _ in range(num_samples)]
    estimator_C_value = estimator_C(samples)
    mse_C += (estimator_C_value - true_value)**2

mse_C /= n

print("MSE for Estimator C:", mse_C)


MSE for Estimator C: 0.35319454653365734


In [13]:
import random
import numpy as np

def Uniform(theta, n):
    return [random.uniform(0, theta) for _ in range(n)]
#  (1/n)
def estimate_unbiased_1n(samples):
    return 1 / np.mean(samples)
# (1/n + 1)
def estimate_unbiased_1n1(samples):
    return (n + 1) / np.sum(samples)
# (1/(n+1))
def estimate_biased_1n1(samples):
    return n / (n + 1) * np.max(samples)
# Nmber of trials
num_simulations = 1000
true_theta = 10
# Sample size
n = 100
# Lists to store MSE values for each estimator
mse_unbiased_1n = []
mse_unbiased_1n1 = []
mse_biased_1n1 = []
for _ in range(num_simulations):
    uniform_samples = Uniform(true_theta, n)
    
    # Estimate
    theta_hat_unbiased_1n = estimate_unbiased_1n(uniform_samples)
    theta_hat_unbiased_1n1 = estimate_unbiased_1n1(uniform_samples)
    theta_hat_biased_1n1 = estimate_biased_1n1(uniform_samples)
    
    # Calculate squared errors
    se_unbiased_1n = (theta_hat_unbiased_1n - true_theta)**2
    se_unbiased_1n1 = (theta_hat_unbiased_1n1 - true_theta)**2
    se_biased_1n1 = (theta_hat_biased_1n1 - true_theta)**2
    
    mse_unbiased_1n.append(se_unbiased_1n)
    mse_unbiased_1n1.append(se_unbiased_1n1)
    mse_biased_1n1.append(se_biased_1n1)
# Calculate the mean squared error
mse_mean_unbiased_1n = np.mean(mse_unbiased_1n)
mse_mean_unbiased_1n1 = np.mean(mse_unbiased_1n1)
mse_mean_biased_1n1 = np.mean(mse_biased_1n1)
print("MSE for Unbiased Estimator (1/n):", mse_mean_unbiased_1n)
print("MSE for Unbiased Estimator (1/n + 1):", mse_mean_unbiased_1n1)
print("MSE for Biased Estimator (1/(n+1)):", mse_mean_biased_1n1)


MSE for Unbiased Estimator (1/n): 96.02097989805965
MSE for Unbiased Estimator (1/n + 1): 95.98159911624448
MSE for Biased Estimator (1/(n+1)): 0.04769239512697997
