<a href="https://colab.research.google.com/github/RoetGer/decisions-under-uncertainty/blob/main/solved_problems.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Ex**: Assume 100 iid samples from a Poisson distribution with mean 1. What is the probabílity that the sum of samples is below 90?

Solution approach:

*   X_i ~ Pois(lambda)
*   Assume Y = sum(X_i)
*   Realize that Y = n*mean(X)
*   P(Y <= 90) = P(n*mean(X) <= 90) = P(mean(X) <= 90/n) 
*   Use central limit theorem to approximate distribution of mean(X), that is mean(X) ~ N(lambda, lambda/n)
*   P(sqrt(n)(mean(X) - lambda)/sqrt(lambda) <= sqrt(n)(90/n - lambda)/sqrt(lambda))



In [6]:
import scipy as sp
import numpy as np
from scipy.stats import norm

In [10]:
n = 100
val_to_compare = 90
pois_lambda = 1

stand_X = np.sqrt(n)*(1 - pois_lambda)/np.sqrt(pois_lambda)
stand_90 = np.sqrt(n)*(val_to_compare/n - pois_lambda)/np.sqrt(pois_lambda)

norm.cdf(stand_90, loc=0, scale=1)

0.15865525393145707

In [11]:
norm.cdf(0.9, loc=1., scale=np.sqrt(pois_lambda/n))

0.15865525393145707

In [13]:
norm.cdf(90, loc=n*pois_lambda, scale=np.sqrt(n*pois_lambda))

0.15865525393145707

Simulation study to test results ;)

In [22]:
samples = np.random.poisson(lam=1., size=(100000, n))
np.mean(samples.sum(axis=1) < 90)

0.14677

Difference can be explained that for the actual random variable of sum over the different samples, having a sum of 90 has a positive probability. In contrast, the CLT approximation assigns a probability of 0 to the event of observing 90. As the question asks for below 90, this leads to the difference, as the CDF evaluates X <= 90 instead of X < 90.

In [23]:
norm.cdf(89, loc=n*pois_lambda, scale=np.sqrt(n*pois_lambda))

0.13566606094638267

**Ex2**: Conduct a t-test in Python.

In [24]:
import statsmodels.api as sm

  import pandas.util.testing as tm


In [None]:
sm.stats.