In [56]:
import numpy as np
from RNG.ContinuousRV import ParetoInverse

# Exercise 8 - task 1)

The main issue with calculating the given probability is getting the unknown mean.

We know that the expected value of our empirical CDF of our bootstrapped samples follows the unknown distribution, and we can therefore estimate this unknown mean using our empirical CDFs.

After having derived a series of empirical means $\bar{X}$, we know from the law of large numbers, that as the sample size tends to infinity $E[\bar{X}] = E[X] = \mu$.  We know that the MSE of the boostrapped estimation of the mean is the empirical variance over n-squared, so we will also print this.


# Exercise 8 - task 2)

See the code below.

In [68]:
def bootStrap(data_vector):
    U_disc = np.random.randint(0,len(data_vector),len(data_vector))
    return [data_vector[u] for u in U_disc]

n = 10
X_i = np.array([56,101,78,67,93,87,64,72,80,69])
X_bar = []
MSE_sample = []

for r in range(100):
    X_sample = bootStrap(X_i)
    X_bar += [np.mean(X_sample)]
    MSE_sample += [np.sum((np.array(X_sample) - np.mean(X_sample))**2)/n**2]


mu = np.mean(X_bar)
print(f"This is the expectation of our MSE: {np.mean(MSE_sample)}")


X = (X_i.sum()/n - mu)
p = ((X>-5) & (X<5)).sum()/n
print(f"This is our empirical estimate of the given probability:{p}")

This is the expectation of our MSE: 15.18222
This is our empirical estimate of the given probability:0.1


# Exercise 8 - task 2)

For each sample, we approximate a sample mean and based off of that we calculate a simple example of a sample variance.

Then we take the expected value of the variance as our best estimate of the variance. We know that the sample variance is an unbiased estimator.



In [69]:
X_i = [5,4,9,6,21,17,11,20,7,10,21,15,13,16,8]
X_bar = []
S_bar = []

for r in range(100):
    X_sample = bootStrap(X_i)
    S_bar += [np.sum((np.array(X_sample) - np.mean(X_sample))**2)/(n-1)]

print(f"Our best estimate of Var[X]: {np.mean(S_bar)}")

Our best estimate of Var[X]: 49.137777777777785


# Exercise 8 - Task 3)

Code and printouts should be sufficient.

In [70]:
pareto_sample = ParetoInverse(np.random.uniform(0,1,200),1.05,1) # Pareto>=Beta
sample_mean = np.mean(pareto_sample)
theoretical_mean = 1*(1.05/(1.05-1))
sample_med = np.median(pareto_sample)

In [71]:
X_bar = []
MSE_sample = []

for r in range(100):
    X_sample = bootStrap(pareto_sample)
    X_bar += [np.mean(X_sample)]
    MSE_sample += [np.sum((np.array(X_sample) - np.mean(X_sample))**2)/n**2]

print(f"Bootstrap estimate of Var[X_bar]: {np.var(X_bar)}")

Bootstrap estimate of Var[X_bar]: 7.641023635022513


In [67]:
X_med = []
MSE_sample = []

for r in range(100):
    X_sample = bootStrap(pareto_sample)
    X_med += [np.median(X_sample)]
    MSE_sample += [np.sum((np.array(X_sample) - np.mean(X_sample))**2)/n**2]

print(f"Bootstrap estimate of Var[X_bar]: {np.var(X_med)}")

Bootstrap estimate of Var[X_bar]: 0.013167737197817641


# Task 3,d)

We see that the variance of the mean is much larger than that of the median. We believe this is due to the way that the Pareto distribution is heavy-tailed, such that the median will typically be at around 0 or beta (depending on the domain of the distribution,) whereas the mean can vary quite a bit based off of which observations we see.

That is, when we sample, we rarely see samples from the far right of the distribution's domain, so therefore if we do see such a sample in one of the bootstrapped samples, it will change the overall mean.