# Problem Set 2, Part Two: Due Tuesday, February 4 by 8am Eastern Standard Time

## Name:

**Show your work on all problems!** Be sure to give credit to any
collaborators, or outside sources used in solving the problems. Note
that if using an outside source to do a calculation, you should use it
as a reference for the method, and actually carry out the calculation
yourself; it’s not sufficient to quote the results of a calculation
contained in an outside source.

Fill in your solutions in the notebook below, inserting markdown and/or code cells as needed.  Try to do reasonably well with the typesetting, but don't feel compelled to replicate my formatting exactly.  **You do NOT need to make random variables blue!**

In [None]:
%matplotlib inline

In [None]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (8.0,5.0)
plt.rcParams['font.size'] = 14

### Bonus for Correct Filename

Your submitted version of the notebook should have a filename `ps02_2_lastname.ipynb` where `lastname` should be replaced by your last name, in all lowercase letters.  You'll get a bonus point here if this was done correctly.

### Estimating the Power Curve for a Two-Tailed $t$-test

Consider a sample of size $n=40$ drawn from a normal distribution of
mean $\mu$ and variance $\sigma^2$. A two-tailed $t$ test of
significance $\alpha=.09$ rejects the null hypothesis $H_0$: $\mu=0$ in
favor of the alternative hypothesis $\mu\ne 0$ when
$${\left\lvert\frac{{{\overline{x}}}}{s/\sqrt{n}}\right\rvert} \ge t_{n-1,0.955}$$
Determine the power curve when the sampling distribution is normal
numerically as follows.

**(a)**  Generate $N=10^4$ samples of $n=40$ points each from a standard
    normal distribution $N(0,1)$ as in last week’s problem set, and
    determine the sample mean and sample standard deviation of each
    (which should be stored in $N$-point vectors `xbar_I` and `s_I`).

**(b)** Create a vector of $\mu$ values with

In [None]:
mu_m = np.linspace(-1,1,101)

**(c)**  Explain why we can use a sample drawn from $N(0,1)$ as a “stand-in”
    for sample drawn from $N(\mu,1)$ by making the transformation
    ${{\overline{x}}}\rightarrow{{\overline{x}}}+\mu$ and
    $s\rightarrow s$.  (We sketched out the $\overline{x}$ transformation in lesson 02.1, but be sure to explain the $s$ transformation.) This means we won’t have to re-generate ten
    thousand $40$-point samples for each value of $\mu$; we can just
    adjust the ten thousand ${{\overline{x}}}$ and $s$ values and use those
    to construct the test statistic.

**(d)**  Produce a $101\times 10^4$ array of $t=\frac{{{\overline{x}}}}{s/\sqrt{n}}$ values using vectorization with:

In [None]:
t_mI = (mu_m[:,None] + xbar_I[None,:]) / (s_I[None,:]/np.sqrt(n))

(This command will only work if you've defined `n` and constructed `xbar_I` and `s_I` correctly above.)

**(e)**  For each of the 101 $\mu$ values, find the fraction of $t$ scores
    which lie in the critical region
    ${\left\lvert t\right\rvert}\ge t_{n-1,0.955}$, using a construction like (you'll have to use the appropriate command to define `tcrit` to be $t_{n-1,0.955}$)

In [None]:
gamma_m = np.mean(np.abs(t_mI) >= tcrit,axis=-1)

**(f)**  Plot $\gamma(\mu)$ versus $\mu$, and verify that $\gamma(0)=\alpha$.

## Confidence Interval for Proportion

Consider the Clopper-Pearson confidence interval for population
proportion, as tabulated in Table A4 of Conover and calculated by

In [None]:
def ClopperPearsonCI(CL,n,x):
    tailprob = 0.5*(1.-CL)
    lower = stats.beta.ppf(tailprob,x,n-x+1)
    upper = stats.beta.isf(tailprob,x+1,n-x)
    lowernan = np.isnan(lower)
    if isinstance(lowernan,np.ndarray):
        lower[lowernan] = 0.
    elif lowernan:
        lower = 0.
    uppernan = np.isnan(upper)
    if isinstance(uppernan,np.ndarray):
        upper[uppernan] = 1.
    elif uppernan:
        upper = 1.
        
    return (lower,upper)

The second half of the function, with all of the `if` statements, is to make sure that the function behaves correctly if one of the ends of the confidence interval is $0$ or $1$. 

**(a)**  Suppose we have a binomial experiment with $n=30$ trials. For what
    values of $x$, the number of successes, does the 90% confidence
    interval contain $p=0.20$?

**(b)** Suppose that the true value of $p$ is in fact $0.20$.  What is the total probability that the observed value of $x$ will be one of those listed in part (a)?

Compare this actual confidence level to the requested confidence level of 90%.

**(c)**  Repeat the calculation in parts (a) and (b) for a confidence level of 97% and a true
    proportion of $p=0.35$. (You’ll have to use software for this, since
    these values are not in the tables.)