#### Problem 1:
A stock share in the company Temple Inc. has a price Yn on the nth business day of the year. You observe that the price change Xn = Yn+1 − Yn appears to be a random variable with mean 0 and variance 1/4. If Y1 = 30, find a lower bound for the following probabilities (under the assumption that the Xn’s are mutually independent)

(a) $Pr(25 < Y_2 < 35)$

(b) $Pr(25 < Y_{11} < 35)$

(c) $Pr(25 < Y_{101} < 35)$

Let total change in stock price after $n$ days be a random variable $C_n$ such that:

$$C_n = \sum_1^n X_n$$

$$E(C_n) = 0$$

$$Var(C_n) = n/4$$

$$Std(C_n) = \sqrt{n}\ /\ 2$$

$$Y_n = 30 + C_n$$


$$Pr(25 < Y_2 < 35)  = Pr(-5 < C_1 < 5) = Pr(-10\sigma < C_1 < 10\sigma) > 1 - \frac{1}{100}\ =\ 0.99$$

$$Pr(25 < Y_11 < 35)\ =\ Pr(-5 < C_{10} < 5)\ =\ Pr(-5/\sqrt{2.5}\ \sigma < X_1 < 5/\sqrt{2.5}\ \sigma)\ >\ 1 - \frac{1}{2.5}\ =\ 0.6$$

$$Pr(25 < Y_101 < 35) = Pr(-5 < C_{100} < 5) = Pr(-1\sigma < X_1 < 1\sigma) > 1 - \frac{1}{1}\ =\ 0$$

#### Problem 2: 
Repeat Problem 1 but assume that the price changes are Normally distributed (using the same mean and variance: 0 and 1/4, respectively)

Same random variable $C_n$ with $E(C_n) = 0$ and $std(C_n) = \sqrt{n}\ /\ 2$. Just integrate the normal distribution to get the probabilities:

In [8]:
import numpy as np
import scipy.stats as stats

In [7]:
def std(n):
    return np.sqrt(n)/2
TB = "\t"

# a
nn = stats.norm(loc=0,scale=std(1))
intg = nn.cdf(5) - nn.cdf(-5)
print("part [a]:" + TB + f"{intg:0.3f}")

# b
nn = stats.norm(loc=0,scale=std(10))
intg = nn.cdf(5) - nn.cdf(-5)
print("part [b]:" + TB + f"{intg:0.3f}")

# c
nn = stats.norm(loc=0,scale=std(100))
intg = nn.cdf(5) - nn.cdf(-5)
print("part [c]:" + TB + f"{intg:0.3f}")


part [a]:	1.000
part [b]:	0.998
part [c]:	0.683


#### Problem 3: 
At widget factory, the machines produce about 5 percent defective widgets even when properly adjusted. The widgets are then packed in crates containing 1900 widgets each. A crate is examined and found to contain 115 defective widgets. What is the approximate probability of finding at least this many defective widgets if the machine is properly adjusted?

This is basically a coin flip with p=0.05 and n = 1900. We can approximate the randomness using a normal distribution and integrate from 115 to infinity:

In [9]:
n,p = 1900,0.05
nn = stats.norm(loc=n*p , scale=np.sqrt(n*p*(1-p)))
intg = 1-nn.cdf(115)
print("Problem 3:" + TB + f"{intg:0.3f}")


Problem 3:	0.018


#### Problem 4: 
In an opinion poll it is assumed that an unknown proportion p of the people are in favor of a proposed new law and a proportion 1 − p are against it. A sample of n people is taken to obtain their opinion. The proportion ¯p in favor in the sample is taken as an estimate of p. Using the Central Limit Theorem, determine how large a sample will ensure that the estimate will, with probability .95, be correct to within .01.

The estimate of $\hat{p}$ will be normally distributed with $E(\hat{p}) = p$   and   $Var(\hat{p}) = \frac{p(1-p)}{n}$

The difference between the estimate and the true value will be normally distributed with $E=0$ and same variance. Therefore the probability the estimate error being $\pm 0.01$ can be computed using:

In [10]:
nn = stats.norm(loc=0,scale=np.sqrt(p*(1-p)/n) )
intg = nn.cdf(0.01) - nn.cdf(-0.01)

... for some values of n and p. The worst case occurs when $p=0.5$, so lets assume that. Then we can iteratively increase n until we achieve an integral of 0.95:

In [12]:
p = 0.5
intg = 0
n = 10

while intg < 0.95:
    n = n + 1
    nn = stats.norm(loc=0,scale=np.sqrt(p*(1-p)/n) )
    intg = nn.cdf(0.01) - nn.cdf(-0.01)

print("Problem 4:" + TB + "n = " , n)

Problem 4:	n =  9604


You can also get a good estimate by noting that in a normal distribution, about 95% of the integral is between $\pm 2\sigma$. Therefore we want $2\sigma = 0.01$. Knowing that $\sigma^2 = p(1-p)\ /\ n$ we can work out:

$$ \left(\frac{0.01}{2}\right)^2 = \frac{0.5\cdot0.5}{n}$$

$$n = 10,000$$

... so pretty much the same answer.

We can confirm that p=0.5 is the worst case by repeating Problem 4 for other values of p. In each of those cases, we will require a smaller sample size to reach statistical significance. (Here I'm incrementing by n=10 instead of n=1 just to make it run faster. The required sample size will be accurate to within 10 people).

In [16]:
print("p" + TB + "sample size")

for p in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]:

    intg,n = 0,0

    while intg < 0.95:
        n = n + 10
        nn = stats.norm(loc=0,scale=np.sqrt(p*(1-p)/n) )
        intg = nn.cdf(0.01) - nn.cdf(-0.01)
    
    print(p, TB , n)
    

p	sample size
0.1 	 3460
0.2 	 6150
0.3 	 8070
0.4 	 9220
0.5 	 9610
0.6 	 9220
0.7 	 8070
0.8 	 6150
0.9 	 3460
