<h2> Long-term averages </h2>

In this week's notebook, you'll explore data and understand how the averages tend towards their limits. We'll generate data from different distributions and see how it settles down towards a particular value over time, by repeatedly computing averages.

<h3> Questions </h3>

* Generate $100$ random numbers from a $\operatorname{Unif}(0, 1)$ distribution (this is what Python's `rand()` will default to). We'll let $Y$ be the average of these, $\mu$ be its mean, and $\sigma$ its standard deviation. Using an appropriate number of trials, estimate the probabilities $P(|Y - \mu|) \ge 0.1$ and $P(|Y - \mu|) \ge 0.01$. Compare your answers to the bound from Chebyshev's inequality; are your simulations consistent with the theory?

* Repeat the first question but with $25$ random numbers drawn from an $\operatorname{Exp}(4)$ distribution.

* Repeat the first question but with $400$ random numbers drawn from a $\operatorname{Par}(2)$ distribution.

<h4> Notes </h4>

You can stitch together code from past weeks' notebooks to answer this. In particular, we've generated random numbers to simulate data before, stored it, and averaged it. We've even generated random data from an exponential distribution (look at `weekly3.ipynb`). 

Make use of the previous notebooks! You won't have to write very much from scratch. In fact, the only thing we haven't done yet is to generate random data from a Pareto distribution. The following does that:

In [46]:
from random import random
import numpy as np
import math
from math import sqrt
from math import log

def Unif():
    return random()

def Exp(lamb):
    r = random()
    return -log(1 - r) / lamb

def Pareto(alpha):
    r = random()
    return (1 - r)**(-1/alpha)

# Usage: Pareto(alpha) returns a Pareto random variable with that parameter. 
# Example:
# print(Pareto(1))

def UnifAverages(n1):
    # Run the trials to get the simulation
    num_trials1 = n1
    outcomes1 = [Unif() for _ in range(num_trials1)]
    #mean, var, std
    mean1 = 0.5  #a+b/2 = 0+1/2 = 0.5
    var1 = 1/12 #(b-a)^2/12 = (1-0)^2/12 = 1/12 
    std1 = sqrt(var1)  #sqrt var1
    # Compute the running averages
    averages1 = []
    for i in range(1,n1+1):
        running_average1 = sum(outcomes1[0:i]) / i
        averages1.append(running_average1)
        return averages1
    
def ExpAverages(n2):
    # Run the trials to get the simulation
    num_trials2 = n2
    outcomes2 = [Exp(4) for _ in range(num_trials2)]
    #mean, var, std
    mean2 = 1/4  #1/lamb
    var2 = 1/16 #1/(lamb)^2
    std2 = sqrt(var2)  #sqrt var1
    # Compute the running averages
    averages2 = []
    for j in range(1,n2+1):
        running_average2 = sum(outcomes2[0:j]) / j
        averages2.append(running_average2)
        return averages2

    #Need to edit the mean, var, exp of parato...
def ParetoAverages(n):
    # Run the trials to get the simulation
    num_trials = n
    outcomes = [Pareto(2) for _ in range(num_trials)]
    #mean, var, std
    mean = 0.5  #a+b/2 = 0+1/2 = 0.5
    var = 1/12 #(b-a)^2/12 = (1-0)^2/12 = 1/12 
    std = sqrt(var)  #sqrt var
    # Compute the running averages
    averages = []
    for k in range(1,n+1):
        running_average = sum(outcomes[0:k]) / k
        averages.append(running_average)
        return averages
    
#printing out avearges...
print(f"The Uniform Running Averages: {UnifAverages(100)}")
print(f"The Exponential Running Averages: {ExpAverages(25)}")
print(f"The Pareto Running Averages: {ParetoAverages(400)}")
#now fompute the prob with chebvybyl's inequality...

The Uniform Running Averages: [0.8300598311083475]
The Exponential Running Averages: [0.5878762551611998]
The Pareto Running Averages: [1.6894969048402229]


In [43]:
from random import random
import numpy as np #for comparing bounds
samples = []
n=100000
for i in range(n):
    nums = [random() for _ in range(100)]
    Y = sum(nums)/len(nums)
    samples.append(Y)
    
n1=100000
for i1 in range(n1):
    nums1 = [random() for _ in range(25)]
    Y1 = sum(nums1)/len(nums1)
    samples.append(Y1)

n2=100000
for i2 in range(n2):
    nums2 = [random() for _ in range(400)]
    Y2 = sum(nums2)/len(nums2)
    samples.append(Y2)

#uniform
mu = 0.5
count = 0
for Y in samples:
    if abs(Y-mu)>0.1:
        count+=1;
print("An estimate for Uniform P(|Y-mu|>0.1) is:", count/n)

mu = 0.5
count = 0
for Y in samples:
    if abs(Y-mu)>0.01:
        count+=1;
print("An estimate for Uniform P(|Y-mu|>0.01) is:", count/n)

#Exponential
mu = 0.25
count = 0
for Y2 in samples:
    if abs(Y2-mu)>0.1:
        count+=1;
print("An estimate for Exponential P(|Y-mu|>0.1) is:", count/n)

mu = 0.25
count = 0
for Y2 in samples:
    if abs(Y2-mu)>0.01:
        count+=1;
print("An estimate for Exponential P(|Y-mu|>0.01) is:", count/n)


#Pareto
mu = 0.5
count = 0
for Y3 in samples:
    if abs(Y3-mu)>0.1:
        count+=1;
print("An estimate for Pareto P(|Y-mu|>0.1) is:", count/n)

mu = 0.5
count = 0
for Y3 in samples:
    if abs(Y3-mu)>0.01:
        count+=1;
print("An estimate for Pareto P(|Y-mu|>0.01) is:", count/n)


#NOW
# Comparing to bound
n = 100  # Number of samples
k = 0.1  # Distance
sigma_squared = 1/12  # Variance of the uniform distribution
chebyshev_bound = sigma_squared / (n * k**2)
print("Chebyshev's bound for Uniform of 0.1 is:", chebyshev_bound)

n1 = 100  
k1 = 0.01  
sigma_squared1 = 1/12 
chebyshev_bound1 = sigma_squared1 / (n1 * k1**2)
print("Chebyshev's bound for Uniform of 0.01 is:", chebyshev_bound1)

n2 = 25 
k2 = 0.1 
sigma_squared2 = 1/16 
chebyshev_bound2 = sigma_squared2 / (n2 * k2**2)
print("Chebyshev's bound for Exponential of 0.1 is:", chebyshev_bound2)

n3 = 25  
k3 = 0.01 
sigma_squared3 = 1/16 
chebyshev_bound3 = sigma_squared3 / (n3 * k3**2)
print("Chebyshev's bound for Exponential of 0.01 is:", chebyshev_bound3)

n4 = 400 
k4 = 0.1 
sigma_squared4 = 1/12 
chebyshev_bound4 = sigma_squared4 / (n4 * k4**2)
print("Chebyshev's bound for Pareto of 0.1 is:", chebyshev_bound4)

n5 = 400 
k5 = 0.01 
sigma_squared5 = 1/12  
chebyshev_bound5 = sigma_squared5 / (n5 * k5**2)
print("Chebyshev's bound for Pareto of 0.01 is:", chebyshev_bound5)


An estimate for Uniform P(|Y-mu|>0.1) is: 0.08375
An estimate for Uniform P(|Y-mu|>0.01) is: 2.08364
An estimate for Exponential P(|Y-mu|>0.1) is: 2.99577
An estimate for Exponential P(|Y-mu|>0.01) is: 2.99998
An estimate for Pareto P(|Y-mu|>0.1) is: 0.08375
An estimate for Pareto P(|Y-mu|>0.01) is: 2.08364
Chebyshev's bound for Uniform of 0.1 is: 0.08333333333333331
Chebyshev's bound for Uniform of 0.01 is: 8.333333333333332
Chebyshev's bound for Exponential of 0.1 is: 0.24999999999999994
Chebyshev's bound for Exponential of 0.01 is: 25.0
Chebyshev's bound for Pareto of 0.1 is: 0.02083333333333333
Chebyshev's bound for Pareto of 0.01 is: 2.083333333333333


In [None]:
#Answering weekly hw 6 follow up question from codes above...
# Comparing my answers with Chebyshev's inequality shows that the are your simulations are consistent with the theory
# It is accurate witht the answers that I have provaided given that the first section was the averages and the below
# is the Chebyshev's inequality comparison.
# I think that it is close to the acutal averages...

In [None]:
#code below is not related to weekly 6

In [44]:
# This code is below is not related to the weekly hw 6 but it is related to daily hw 14
# CLT estimate for hw 14 daily 
from scipy.stats import norm

z1 = 1.89
probability1 = 1 - norm.cdf(z1)
print("Probability for 1000 flips (CLT estimate):", probability1)
z2 = 2
probability2 = 1 - norm.cdf(z2)
print("Probability for 10000 flips (CLT estimate):", probability2)

Probability for 1000 flips (CLT estimate): 0.029378980040409397
Probability for 10000 flips (CLT estimate): 0.02275013194817921


In [45]:
# This code is below is not related to the weekly hw 6 but it is related to daily hw 14
# exact value for hw 14 daily 
from scipy.stats import binom

n1 = 1000  # Number of trials
p1 = 0.5   # Probability of success (getting heads)
k1 = 530   # Minimum number of heads

probability1 = 1 - binom.cdf(k1 - 1, n1, p1)
print("Exact Probability for 1000 flips:", probability1)

n2 = 10000  # Number of trials
p2 = 0.5   # Probability of success (getting heads)
k2 = 5100   # Minimum number of heads

probability2 = 1 - binom.cdf(k2 - 1, n2, p2)
print("Exact Probability for 10000 flips:", probability2)

Exact Probability for 1000 flips: 0.031011597549181702
Exact Probability for 10000 flips: 0.023292763852473586
