# Chapter 10 Notes


One way of defining <b>Hypothesis testing</b> is the process of determining whether there are sufficient evidence against a <b>null hypothesis</b> $H_0: \theta \in \Theta_0$. Where $\theta$ is the parameter of an undefined distribution of $X$ and $\Theta_0$ is one element of a two-element partition of $\Theta$ (such that $\theta \in \Theta$ and $\Theta_0 \cup \Theta_1 = \Theta$). 

In short, the problem takes the form of either accepting a <b>null hypothesis</b> $H_0$ or rejecting it for an <b>alternate hypothesis</b> $H_1$: 
    
$$ 
H_0 : \theta \in \Theta_0 \\
H_1 : \theta \in \Theta_1
$$



The rejection process usually takes the form:  

$\quad\text{ Reject } H_0$ if $X \in R$ where  $R = \{x : T(x) > c\}$<br> 

$R$ is referred to as the <b>rejection region</b>. <br> 
The crux of the problem is to select the right <b>test statistic</b> $T(X)$ and the right <b>critial value</b> $c$ in order to minimize both false rejections (Type I) and incorrect retainings (Type II error). 


For a given test and distribution function of $f(X;\theta)$, the <b>power function</b> $\beta(\theta)$ is defined as: 
$$\beta(\theta) = \mathbb{P}_\theta(X \in R)$$

The <b>size</b> of the test is defined as follows. It is the max probability of committing a type I error: 
$$\alpha = \sup_{\theta\in\Theta_0} \beta(\theta)$$

Tests that result in a high power for $\theta\in\Theta_1$ are desirable. However, they can be difficult to find. In this chapter, four hypothesis tests are laid out: 1. the Wald Test, 2. the $\chi^2$ test 3. the permutation test and 4. the likelihood ratio test 


### The Wald test ###

The Wald test useful if $\theta$ is a scalar and the hypothesis testing form is $H_0: \theta = \theta_0$ vs. $H_1: \theta = \theta_0$. <br> 

To perform the wald test, both an estimator $\hat\theta$ and standard error $\hat{se}$ for $\theta$ must be calculated. <br> 
The Wald statistic $W$ is defined as $(\hat\theta - \theta_0 )/\hat{se}$ and it can be used to determine whether the null hypothesis is rejected or using the criterion: 

$\quad$Reject $H_0$ if $|W| > \mathcal{z_{\alpha/2}}$ 

Asymptotically, the Wald test has a size $\alpha$: <br> 
Given $W \sim$ N(0,1), the odds of rejecting $H_0$ are given by: <br> 
$\mathbb{P}(  |\hat\theta - \theta_0 |/\hat{se} > \mathcal{z}_{\alpha/2} ) = \mathbb{P}(  |W| > \mathcal{z}_{\alpha/2} )= \mathbb{P}( |Z| > \mathcal{z}_{\alpha/2} )  = \alpha$

#### theorem ##### 
If the null hypothesis is false, i.e. $\theta^* \neq \theta_0$, then the probablity of correctly rejecting the null hypothesis ($\beta(\theta^*)$) is given by: <br> 
$$1 - \phi( (\theta_0 - \theta^*)/\hat{se}+ \mathcal{z}_{\alpha/2}) - \phi( (\theta_0 - \theta^*)/\hat{se} - \mathcal{z}_{\alpha/2}) $$



#### p-value ####
The p-value correspond to the smallest <b>test size</b> $\alpha$ for which the null hypothesis is rejected, given a set of data. 

$$ \text{p-value} = \inf\{\alpha : T(X) \in R_\alpha\}$$

Informally, it corresponds to the evidence against $H_0$. However, is <b>a large p-value is not evidence for $H_1$</b>. Also, p_value $\neq \mathbb{P}(H_0 | \text{Data})$

#### theorem 10.12 ####

Suppose that a test is of the form: $\text{reject } H_0 \text{ iff } T(X^n) \geq c_\alpha$, then the p-value corresponds to $\sup_{\theta\in\Theta_0}\mathbb{P}(T(X^n) > T(n^n))$ <br> 
If the $H_0$ has the form $\Theta_0 = \{\theta_0\}$, then: <br> 
p-value $ = \mathbb{P}(T(X^n) > T(x^n))$ <br> 

#### theorem 10.13 ####

For the Wald test, the p-value is defined as: 
$\mathbb{P}(|W| > |w|) = \mathbb{P}(|Z| > |w|) = 2\Phi^{-1}(-|w|)$ 

#### theorem 10.14 ####

If the test statistic has a continuous distribution, then under $H_0: \theta = \theta_0$, the p-value has a uniform (0,1) distribution. Therefore, if we reject $H_0$ if the p-value is less than $\alpha$, the probability of type I error is $\alpha$. 


### The $\chi^2$ square test ###

The Pearson's $\chi^2$ is used for multinomial data. Recall that if a $X = (X_1, \dots, X_k)$ has a multinomial $(n,p)$ distribution, then the MLE of $p$ is $\hat p = (\hat p_1, \dots, \hat p_k) = (X_1/n, \dots, X_k/n)$. Let $p_0 = (p_01, \dots, p_0k)$ and say we need to test the hypothesis: 

$H_0 : p = p_0$ versus $H_1 : p \neq p_0$ <br> 

The Person's $\chi^2$ statistic is defined as: <br> 
$$T = \sum^k_{j=1}\dfrac{(X_j - np_{0j})^2}{np_0j} = \sum_{j=1}^n \dfrac{(X_j - E_j)^2}{E_j}$$

#### Theorem 10.17 #### 
Under $H_0$, $T\rightsquigarrow \chi^2_{k-1}$. Hence, the test is: rejects $H_0$ if $T > \chi^2_{k-1,\alpha}$ has asymptotic level $\alpha$. The p-value is $\mathbb{P}(\chi^2_{k-1} > t)$. Where $t$ is the observed value of the test statistic. 

### The permuation test ###



The permutation is a non-parametric that tests where two samples might be from the same distribution: <br> 
Suppose $X_1, \dots X_n \sim F_x$ and $Y_1, \dots, Y_m \sim F_y$. The null and alternative hypotheses are: <br> 
$H_0 : F_x = F_y$ versus $H_1 : F_x \neq F_y$  

Let $T(x_1, \dots, x_n, y_1, \dots, y_m)$ be some statistic. In total, $(n+m)!$ permutation can be done. In theory, the permutation test consists of evaluting the statistic function on every possible permutation to obtain the <b>Permutation distribution</b> $T$. 

Then, the p-value can be obtained as follows: <br> 

$$\text{p-value} = \mathbb{P}(T > t_{obs}) = \dfrac{1}{N!}\sum_{j=1}^{N!}I(T_j > t_{obs})$$

where: <br> 
$N = n+m$ <br> 
$t_{obs}$ is the statistic evaluated with no permutation. <br> 


Practically, it can be difficult to evaluate all possible permutation. Therefore, only using B random permutation can be done in practice. To do this, replace $N$ with $B$ in the previous equation. 


### The likelihood ratio test


The likelihood ratio statistic is defined as: 


$$ \lambda = 2\log\big(\dfrac{\sup_{\theta\in\Theta}\mathcal{L(\theta)}}{\sup_{\theta\in\Theta_0}\mathcal{L(\theta)}}\big) = 2\log\big(\dfrac{\mathcal{L}(\hat\theta)}{\mathcal{L}(\hat\theta_0)}\big)$$



#### Theorem 10.22 ####
Supposed that $\theta = (\theta_1, \dots, \theta_q, \theta_{q+1}, \dots, \theta_r)$ and let $\Theta_0 - \{\theta : (\theta_{q+1}, \dots, \theta_r) = (\theta_{0,q+1}, \dots, \theta_{0,r})\}$


Let $\lambda$ be the likelihood ratio test statistic. Under $H_1 : \theta \in \Theta_0$, 

$\lambda(x^n) \rightsquigarrow \chi^2_{r-q,\alpha}$
where $r-q$ is the dimension of $\Theta$ minus the dimension of $\Theta_0$. The p-value for the test is $\mathbb{P}(\chi^2_{r-q} > \lambda)$











# Chapter 10 Exercises




### Exercise 10.1

Given $\theta^*$ is the true parameter and assuming $\hat\theta \rightsquigarrow \text{N}(\theta^*, \hat{se}^2)$


$\beta(\theta^*) = \mathbb{P}(T(X_{\theta^*}) \in R) = \mathbb{P}(|W| > \mathcal{z}_{\alpha/2}) 
= \mathbb{P}(\dfrac{|\hat\theta - \theta_0|}{\hat{se}}> \mathcal{z}_{\alpha/2}) 
\approx \mathbb{P}(\dfrac{|\theta^* - \theta_0|}{\hat{se}} > \mathcal{z}_{\alpha/2})\quad$ <br> 
$ =  \mathbb{P}(\dfrac{\theta_0-\theta^*}{\hat{se}} < -\mathcal{z}_{\alpha/2}) + \mathbb{P}(\mathcal{z}_{\alpha/2} < \dfrac{\theta_0-\theta^*}{\hat{se}}) $ <br> 

$ =  \Phi^{-1}(\dfrac{\theta_0-\theta^*}{\hat{se}} - \mathcal{z}_{\alpha/2}) + 1 - \Phi^{-1}(\dfrac{\theta_0-\theta^*}{\hat{se}} + \mathcal{z}_{\alpha/2}) $



### Exercise 10.2

Under $H_0: \theta = \theta_0$, p-value $= \mathbb{P_{\theta_0}}(T(X^N) \geq T(x^n))$

Given $X^n$ and $x^n$ are sampled form the same distribution $f(x;\theta_0)$ and $x^n$ is observed already, then $\mathbb{P_{\theta_0}}(T(X^N) \geq T(x^n)) = \mathbb{P_{\theta_0}}(T(X^N) \geq T(x^n) | T(x^n) )$

Let $Y = T(X^N)$ and $Y \sim F_Y$

$\mathbb{P_{\theta_0}}(Y \geq y | y) = 1 - F_Y(y)$ with $F_Y(y) \sim \text{uniform}(0,1)$ by definition of CDF.  

Interesting proof for how $F_Y(y)$ for any distribution follows a uniform distribution: https://stats.stackexchange.com/questions/161635/why-is-the-cdf-of-a-sample-uniformly-distributed


### Exercise 10.3

The wald test consists of not rejecting the Null hypothesis if: <br> 

$|W| < z_{\alpha/2}$ <br> 
$ - z_{\alpha/2} < W < z_{\alpha/2}$ <br> 
$ - z_{\alpha/2} <  \dfrac{\hat\theta - \theta_0}{\hat{se}} < z_{\alpha/2}$ <br>
$ - z_{\alpha/2}\hat{se}-\hat\theta <  -\theta_0 < z_{\alpha/2}\hat{se}-\hat\theta$ <br> 
$  -z_{\alpha/2}\hat{se}+\hat\theta < \theta_0 <z_{\alpha/2}\hat{se}+\hat\theta $ <br> 
i.e. $\theta_0 \in C$ where $C = \big(-z_{\alpha/2}\hat{se}+\hat\theta, z_{\alpha/2}\hat{se}+\hat\theta\big) $ 

### Exercise 10.4

Recall $\alpha = \sup\limits_{\theta\in\Theta_0}\beta(\theta) = \sup\limits_{\theta\in\Theta_0}\mathbb{P}(T(X_n) \in R_\alpha)$

let $\theta_p = \{\theta : \beta(\theta) = \alpha\}$

p-value $ = \inf\big\{\alpha : T(x^n)\in R_\alpha \big\}$ Definition of p-value<br> 
$= \inf\big\{\alpha : T(x^n) \geq c_\alpha\big\}$ As assumed in the problem statement<br>
$= \inf\big\{\mathbb{P}_{\theta_p}(T(X_n) \geq c_\alpha) : T(x^n) \geq c_\alpha)\big\}$ <br> 
The term $\mathbb{P}_{\theta_p}(T(X_n) \geq c_\alpha)$,   decreases as $c_n$ increases. However, $c_n$ is upper-
bounded by $T(x_n)$. Therefore, $c_n$ is set to $T(x_n)$ to obtain the infimum . <br>
$= \mathbb{P}_{\theta_p}(T(X_n) \geq T(x_n))$ <br>
$= \sup_\limits{\theta\in\Theta_0}\mathbb{P}_{\theta}(T(X_n) > T(x_n))$ <br>


The second case to prove is a special case of the first case where $\sup\limits_{\theta\in\Theta_0}\mathbb{P}(T(X_n) \in R_\alpha) =\mathbb{P}_{\theta_0}(T(X_n) \in R_\alpha)$ which yields: 


p-value $ =\mathbb{P}_{\theta_0}(T(X_n) > T(x_n))$ <br>

### Exercise 10.5


(a)<br>
$\beta(\theta) = \mathbb{P}_{\theta}(T(X_n) \in R) = \mathbb{P}_{\theta}(\max\{X_1\dots X_n\} > c) = 1 - \mathbb{P}_{\theta}(\max\{X_1\dots X_n\} \leq c)$ which yields: 

$$
 \beta(\theta) =
  \begin{cases} 
      \hfill \text{if } \theta < c    \hfill &  0 \\
      \hfill \text{if } \theta \geq c  \hfill & 1 - \big(c/\theta\big)^n \\
  \end{cases}
$$

(b)<br> 
To make this a size 0.05 test, $c$ must be chosen such that $\sup_\limits{\theta\in\Theta_0}\beta(\theta) = \beta(1/2) = 0.05$

$0.05 = 1 - \big(c/\theta\big)^n $<br>
$\implies \theta(1-0.05)^{1/n} = (1/2)(1-0.05)^{1/n} = c $


(c)<br> 
$(1/2)(1-0.05)^{1/20} = c = 0.4987193$ <br> 
$Y = 0.48 < c = 0.4987$, Therefore, the null hypothesis is not rejected. <br>  
The p-value associated to this result is $\mathbb{P}_{\theta_0}(Y > 0.48) = 1 - (0.48/50)^{20} = 0.55799756$. <br> 
Since the p-value is not close to zero, no conclusion can be drawn. 

(d)<br> 
$(1/2)(1-0.05)^{1/20} = c = 0.4987193$ <br> 
$Y = 0.52 > c = 0.4987$, Therefore, the null hypothesis is rejected. <br> 
The p-value associated to this result is $\mathbb{P}(Y > 0.52) = 0$. <br> 
Since the p-value is exactly zero, the null hypothesis is certainly wrong. This make sense since $Y\sim\text{uniform}(0,0.5)$ cannot result in 0.52. 


### Exercise 10.6

$H_0 : \theta = 1/2$ and $H_1: \theta \neq 1/2$

The Wald test is used since we are given a two-sided test and the estimator used to find the mean of binomial distribution is asympotically normal. 

It can be proved the MLE $\hat\theta$ for $X\sim\text{binomial}(\theta, n)$ is $\bar{X}/n$ and $\hat{se} = \sqrt{\hat{p}(1-\hat{p})/n}$. 

$n = 997 + 922 = 1919$ <br> 
$\hat\theta = 997 / 1919 = 0.519541$ <br> 
$\hat{se} = \sqrt{0.5195(1-0.519541)/1919} = 0.01140513$ <br> 



$w = (\hat\theta - \theta_0)/\hat{se} =(0.519541-0.5)/0.01140513 = 1.71335$ <br> 
p-value = $2\Phi(-|w|) = 0.0866$ <br> 
The p-value indicates weak evidence against the null hypothesis (based on the table shown in page 157).  <br>
A 95% confidence confidence interval can be built for $\hat\theta$: <br>  
$C_n = \hat\theta \pm \Phi^{-1}(0.975)\hat{se} = 0.5195 \pm (1.960)(0.01140) = \big(0.497, 0.542\big)$




### Exercise 10.7

(a) <br> 
Non-parametric plug-in estimator for Mark Twain 3-word frequency mean: <br> 
$\hat\mu = \sum_{i=1}^n X_i/n = 1.855 / 8 = 0.23187$

Non-parametric plug-in estimator for Mark Twain 3-word frequency standard deviation: <br>
$\hat\sigma = \sqrt{\sum_{i=1}^n (X_i - \bar{X})^2/n} = 0.038534075$

If 10 other essays from Mark Twain were sampled, the mean for 3-word frequency should also be 0.23187 but the standard deviation for the mean $\hat{se}$ would be $\hat\sigma/n^{1/2} = 0.01219$ 


Two-sided hypothesis to test: 
$H_0: \theta = 0.23187$ and $H_1: \theta \neq 0.23187$  


$ w = (\hat\theta - \theta_0)/\hat{se} = (0.2097 -0.2318)/0.01219 = -1.81296$ <br> 
p-value $ = 2\Phi(-|w|) = 0.06984$, which correpond to a weak evidence against the null hypothesis.  <br> 
95% confidence interval : $0.23187 \pm (1.960)0.01219 = \big( 0.20815, 0.2556 \big)$


(b) <br> 
The permutation test's p-value lies near 0.000463. The permutation test result suggests very strong evidence against the null hypothesis. 



In [32]:
import numpy as np 

def T(X): 
    return np.mean(X[0:8]) - np.mean(X[8:18])

A = [.225, .262, .217, .240, .230, .229 ,.235, .217]
B = [.209, .205, .196, .210, .202, .207, .224, .223, .220, .201]
C = np.concatenate((A,B), axis=None)
baseline_S = T(C)

trials = 1000000
count = 0
for i in range(0,trials): 
    np.random.shuffle(C)
    S = T(C)
    if (S > baseline_S): 
        count+=1
        
print(count/trials)




0.000432


### Exercise 10.8


(a) <br> 
$\alpha = \sup_\limits{\theta \in \Theta_0}\beta(\theta) = \beta(0) = \mathbb{P}_{\theta_0}(T(x^n) > c) $ <br>

Since $T(x^n) = n^{-1}\sum{X_i}$, with $X_i \sim \text{N}(0,1)$ <br> 
$\mathbb{E}(T(X^n)) = 0, \mathbb{V}(T(X^n)) = 1/n$


$\dots = \alpha =\mathbb{P}_{\theta_0}(T(x^n)n^{1/2} > c n^{1/2}) = 1-\Phi(c n^{1/2})\quad\quad$ (Since $T(x^n)n^{1/2}\sim Z)$<br> 
$\implies c = \Phi^{-1}(1-\alpha)/n^{1/2}$


(b) <br> 
$\beta(1) = \mathbb{P}_{\theta_1}(T(x^n) > c) = \mathbb{P}_{\theta_1}\big(\dfrac{T(x^n)-1 }{n^{-1/2}}> \dfrac{c-1 }{n^{-1/2}}\big) 
= 1 - \Phi\big( \dfrac{c-1 }{n^{-1/2}}\big)
= 1 - \Phi\big( \dfrac{\Phi^{-1}(1-\alpha)/n^{1/2}-1 }{n^{-1/2}}\big)
= 1 - \Phi\big( \Phi^{-1}(1-\alpha)-n^{1/2} \big)
$ <br>

(c) <br>
$\lim_{n\to\infty}1 - \Phi\big( \Phi^{-1}(1-\alpha)-n^{1/2} \big)$ <br>
$1 -\lim_{n\to\infty} \Phi\big( \Phi^{-1}(1-\alpha)-n^{1/2} \big) = 1$ <br> 

proof: <br> 

Chain rule can be used to solve this limit:<br>
Let $L(n) = \Phi^{-1}(1-\alpha)-n^{1/2}$

$\lim_{n\to\infty} L(n) = -\infty$ <br> 
$\lim_{x\to-\infty} \Phi(x) = 0 $ <br> 
and $\Phi(x)$ is continuous at $x = \infty$
Therefore: 

$\lim_{n\to\infty} \Phi(L(n)) = 0 $ <br>
$1 - \lim_{n\to\infty} \Phi(L(n)) = 1$ <br>



### Exercise 10.9

$\hat{se} = (nI(\theta))^{-1/2}$ <br> 
$R = \{ x^n : |Z| > z_{\alpha/2}\}$ <br> 
$\beta(\theta_0) = \mathbb{P}(T(X_n) \in R) = \mathbb{P}(|\dfrac{\hat\theta - \theta_0}{\hat{se}}| > z_{\alpha/2})
= \mathbb{P}(|\dfrac{\hat\theta - \theta_0}{(nI(\theta))^{-1/2}}| > z_{\alpha/2})
$

$\beta(\theta_0) = \mathbb{P}(\big|\dfrac{\hat\theta - \theta_0}{(nI(\theta))^{-1/2}}\big| > z_{\alpha/2})$ <br> 
$= 1 - \mathbb{P}(-z_{\alpha/2} \leq \dfrac{\hat\theta - \theta_0}{(nI(\theta))^{-1/2}} \leq z_{\alpha/2})$ <br> 
$= 1 - \mathbb{P}(-z_{\alpha/2}n^{-1/2} \leq \dfrac{\hat\theta - \theta_0}{(I(\theta))^{-1/2}} \leq z_{\alpha/2}n^{-1/2})$ <br> 
$\lim_\limits{n\to\infty}\beta(\theta_1) = 1 - \lim_\limits{n\to\infty}\mathbb{P}(-z_{\alpha/2}n^{-1/2} \leq \dfrac{\hat\theta - \theta_1}{(I(\theta))^{-1/2}} \leq z_{\alpha/2}n^{-1/2})$ <br> 
$ = 1 - \mathbb{P}(0 \leq \dfrac{\hat\theta - \theta_1}{(I(\theta))^{-1/2}} \leq 0) = 1$ <br> 
### Exercise 10.10

A rapid inspection of the data suggests the number of death for both group has increased after the Chinese Harvest Moon Festival,  with a little more deaths on the chinese group.  

Let's test the null Hypothesis for both groups that the number of death has not increased. Let's also test that the ratio of chinese death over Jewish death has no increased. 

$H_0: \theta = 0$ and $H_1: \theta > 0$

where $\theta = \mu_{\text{after}} - \mu_\text{before}$

The permutation test can be used here since it is non-parametric.  

In [111]:
import numpy as np 
import itertools

A = np.array([55, 33, 70, 49]) # chinese women data set 
B = np.array([141, 145, 139, 161]) # Jewish women data set 
ratio = A/B # ratio betwen chinese and Jewish womne death 

def T(X): 
    return np.mean(X[2:4]) - np.mean(X[0:2])

data_sets = [A, B, ratio]
data_set_names = ['Chinese', 'Jewish', 'Ratio']

for data_set in data_sets: 
    t_obs = T(data_set)

    X = list(itertools.permutations(data_set))
    stat = np.empty(shape=len(X))
    
    for i in range(0,len(data_set)): 
        stat[i] = T(X[i])

    stat = np.sort(stat)

    count = 0
    for i in range(0,len(stat)): 
        if (t_obs < stat[i]): 
            count += 1
    p_value = count / len(stat)
            
    print("p-value = " + str(p_value))





p-value = 0.0
p-value = 0.4166666666666667
p-value = 0.8333333333333334


The p-value for the chinese group suggests strong evidence against the null hypothesis. Although more data is needed make this claim as the permutation test results in a 4-point distribution. 

The p-value for the Jewish group suggests there is no evidence against the null hypothesis. Same comment regarging the small data set is applicable. 

The p-value for the ratio dataset suggests there is no evidence against the null hypothesis



### Exercise 10.11

Testing the null hypothesis that each individual drug is ineffective can be done using the Wald test. The MLE $\hat{p}$ and standard error $\hat{se}$ for the incidence of nausea in the absence of drug are: <br> 

$\hat{p} = \bar{X}/n = 45/80 = 0.5625$ <br> 
$\hat{se} = \sqrt{\hat{p}(1-\hat{p})/n} = 0.05546$ <br>

(a) <br> 

|  drug                  | $\hat{p}$     | $\hat{se}$  |   w     |p-value   | Odd-ratio |Conclusion                        |
|----------------------- | ------        |-------------| ------  |------    | ------    | ---------------------------------|
| Chlorpromazine         | 0.346666      | 0.05495     | -3.9278 | 0.0000857| 0.6162    | p<0.05, reject  $H_0$            |
| Dimenhydrinate         | 0.611764      | 0.05286     |  0.9319 | 0.35138  | 1.0875    | p>0.05, cannot reject $H_0$      |
| Pentobarbital (100 mg) | 0.522388      | 0.06102     | -0.6573 |  0.51098 | 0.9286    | p>0.05, cannot reject $H_0$      | 
| Pentobarbital (150 mg) | 0.435294      | 0.05377     | -2.3657 |  0.01799 | 0.7738    | p<0.05, reject  $H_0$            |  

(b) <br>

<b> Bonferroni Method method </b>: <br> 
p-value acceptance threshold = 0.05 / 4 = 0.0125 <br> 
All drugs except for Chlorpromazine have a p-value greater than 0.0125 and should therefore have their null hypothesis rejected.  

<b> FDR method </b>: <br> 


|  drug                  | p-value       | $l_i$    | p-value < l_i |Conclusion            |
|----------------------- | ------        |--------- | ------------  | ------------         |
| Chlorpromazine         | .0000857      | 0.0125   | true          |  reject  $H_0$       |
| Pentobarbital (150 mg) |0.01799        | 0.0250   | true          |  reject  $H_0$       |
| Dimenhydrinate         | 0.35138       | 0.0375   | false         |  don't reject  $H_0$ |
| Pentobarbital (100 mg) |  0.51098      | 0.0500   | false         |  don't reject  $H_0$ |

$R = 2, T = P_{(2)} =0.01799$



### Exercise 10.12
The MLE $\hat\lambda$ can be calculated as: $n^{-1}\sum_{i=1}^nX_i$. The fisher information $I(\lambda)$ for a the poisson distribution is $\lambda^{-1}$. The standard error of its MLE is approximately: $\hat{se} \approx (nI(\hat\lambda))^{-1/2} = (\hat\lambda/n)^{1/2}$. 

(a) <br> 
The Wald test criterion for the rejection of $H_0$ is: $|\dfrac{\hat\lambda-\lambda_0}{ (\hat\lambda/n)^{1/2}}| > z_{\alpha/2}$ or equivalently $|\dfrac{\bar{X}-\lambda_0}{ (\bar{X}/n)^{1/2}}| > z_{\alpha/2}$



In [43]:
import numpy as np 
from scipy.stats import poisson, norm

n = 20 
lambda_0 = 1
alpha = 0.05
w_limit = norm.ppf(0.975)

trials_n = 100000 
rejection_count = 0


for i in range(0, trials_n): 
    X = poisson.rvs(mu=1, size=n) 
    w = abs((np.mean(X) - lambda_0)/(np.mean(X)/n)**0.5)    
    if (w > w_limit): 
        rejection_count += 1
        
print(rejection_count / trials_n)



0.05269




### Exercise 10.13



The log-likelihood for the normal distribution is derived in the book. It can be found in page 124. <br> 
$\mathcal{l}(\mu,\sigma) = -n\log\sigma - \dfrac{n(n^{-1}\sum(X_i - \bar{X})^2)}{2\sigma^2}- \dfrac{n(\bar{X} - \mu)^2}{2\sigma^2}$



The likelihood ratio for the normal distribution is:

$\lambda = 2\log\big(\dfrac{\mathcal{L}(\hat\mu)}{\mathcal{L}(\hat\mu_0)}\big) 
= 2(\log\mathcal{L}(\hat\mu) - \log\mathcal{l}(\mu_0)) 
= 2(\mathcal{l}(\hat\mu) -\mathcal{l}(\mu_0))
$ $\quad$(Notice $\hat\mu_0 = \mu_0$ since the null hypothesis consists of a single point.) 

The formula can be expanded using the log-likelihood function provided in the book, which results in: <br> 
$
= 2(-n\log\sigma - \dfrac{n(n^{-1}\sum(X_i - \bar{X})^2)}{2\sigma^2}- \dfrac{n(\bar{X} - \hat\mu)^2}{2\sigma^2}) - 2(-n\log\sigma - \dfrac{n(n^{-1}\sum(X_i - \bar{X})^2)}{2\sigma^2}- \dfrac{n(\bar{X} - \mu_0)^2}{2\sigma^2})
$ <br> 
The first and second term cancel out with the fourth and fifth terms. Also, recall $\hat\mu = \bar{X}$ which makes the third term vanish.<br> 
$
= \dfrac{n(\bar{X} - \mu_0)^2}{\sigma^2} = \big(\dfrac{(\bar{X} - \mu_0)}{\sigma/n^{1/2}}\big)^2
$ <br>

The null hypothesis can be rejected or not based on the resulting p-value. A 2 percent level is used as an example: <br> 
Reject if $\mathbb{P}(\lambda > \chi^2_{1,\alpha}) > 0.02$

The p-value for the test is identical to that of the Wald test. Therefore, the two tests are equivalent:  <br> 
$\mathbb{P}(\lambda > \chi^2_{1,\alpha}) = \mathbb{P}(\big(\dfrac{(\bar{X} - \mu_0)}{\sigma/n^{1/2}}\big)^2 > z_{\alpha/2}^2) 
= \mathbb{P}(|\dfrac{(\bar{X} - \mu_0)}{\sigma/n^{1/2}}| > z_{\alpha/2}) = \mathbb{P}(|\dfrac{(\bar{X} - \mu_0)}{\hat{se}}| > z_{\alpha/2})$


### Exercise 10.14

The Fisher information $I(\sigma)$ for the normal distribution is $2/\sigma^2$ (refer to example 9.26). Therefore, $\hat{se} = (nI(\sigma))^{-1/2} \approx (nI(\hat\sigma))^{-1/2} = \hat\sigma/\sqrt{2n}$


The likelihood ratio for the normal distribution is:

$\lambda = 2\log\big(\dfrac{\mathcal{L}(\hat\sigma)}{\mathcal{L}(\hat\sigma_0)}\big)$
$ = 2\log\big(\dfrac{\prod\hat{\sigma}^{-1}\exp{(-(X_i -\mu)^2/(2\hat\sigma^2))}}{\prod\sigma_0^{-1}\exp(-(X_i -\mu)^2/(2\sigma_0^2))}   \big)$ <br> 
$ = 2\log\big((\sigma_0/\hat{\sigma})^n\prod\exp{((X_i -\mu)^2(\dfrac{-1}{2\hat\sigma^2} + \dfrac{1}{2\hat\sigma_0^2}))}   \big)$ <br> 
$ = 2n\log(\sigma_0/\hat{\sigma}) + 2\sum{((X_i -\mu)^2(\dfrac{-1}{2\hat\sigma^2} + \dfrac{1}{2\hat\sigma_0^2}))}  $<br> 
$ = 2n\log(\sigma_0/\hat{\sigma}) + (\dfrac{-1}{\hat\sigma^2} + \dfrac{1}{\hat\sigma_0^2})\sum{((X_i -\mu)^2)}  $<br> 
$ = 2n\log(\sigma_0/\hat{\sigma}) + (\dfrac{-1}{\hat\sigma^2} + \dfrac{1}{\hat\sigma_0^2})n\hat\sigma^2  $ <br> 
$ = 2n\log(\sigma_0/\hat{\sigma}) + (\hat\sigma^2 /\hat\sigma_0^2 - 1)n  $

Reject if p-value ($\mathbb{P}(\lambda > \chi^2_{1,\alpha})$) is above test level. 

The Wald test rejection check appears to be different: <br> 
$\big|\dfrac{\hat\sigma - \sigma_0}{\hat\sigma/\sqrt{2n}}\big| > z_{\alpha/2}$ <br>
$\big|(1 - \sigma_0/\hat\sigma)/\sqrt{2n}| > z_{\alpha/2}$


### Exercise 10.15 


The likelihood ratio for the binomial distribution is: <br> 
$\lambda = 2\log\big(\dfrac{\mathcal{L}(\hat{p})}{\mathcal{L}(\hat{p}_0)}\big)$
$ = 2\log\big(\dfrac{\prod_{i=1}^n{n \choose x}\hat{p}^{x_i}(1-\hat{p})^{n-{x_i}}}{\prod_{i=1}^n{n \choose x}p_0^{x_i}(1-p_0)^{n-{x_i}}}\big)$ 
$ = 2\log\big(\dfrac{\prod_{i=1}^n\hat{p}^{x_i}(1-\hat{p})^{n-{x_i}}}{\prod_{i=1}^np_0^{x_i}(1-p_0)^{n-{x_i}}}\big)$ <br>
$ = 2\log\big(      \prod_{i=1}^n(\hat{p}/p_0)^{x_i}((1-\hat{p})/(1-p_0))^{n-{x_i}}   \big)$ <br>
$ = 2\big(      \sum_{i=1}^nx_i\log(\hat{p}/p_0) + \sum_{i=1}^n (n-{x_i})\log((1-\hat{p})/(1-p_0))   \big)$ <br>
$ = 2\big(      n\hat{p}\log(\hat{p}/p_0) + n(1-\hat{p})\log((1-\hat{p})/(1-p_0))   \big)$ <br>
$ = 2n\log\big( (\frac{\hat{p}}{p_0})^\hat{p}(\dfrac{1-\hat{p}}{1-p_0})^{1-\hat{p}}\big) $ <br> 

Reject if p-value ($\mathbb{P}(\lambda > \chi^2_{1,\alpha})$) is above test level. 

The Wald test rejection check appears to be different: <br> 
It can be proved the MLE for $\hat{p}$ is $\bar{X}/n$. It can also be proved the fisher information $I(p)$ is $\dfrac{n}{(p-1)p}$, hence $se(\hat{p}) = (nI(p))^{-1/2} \approx (nI(\hat{p}))^{-1/2} = \big(\dfrac{n^2}{(\hat{p}-1)\hat{p}}\big)^{-1/2} = n^{-1}((\hat{p}-1)\hat{p})^{1/2}$ <br> 
The Wald test is then:

Reject $H_0$ if 
$\big|\dfrac{\hat{p} - p_0}{\hat{se}({\hat{p})}} \big| = \big|\dfrac{n(\hat{p} - p_0)}{((\hat{p}-1)\hat{p})^{1/2}} \big| > z_{\alpha}$

### Exercise 10.16

$\lambda = 2\log\big(\dfrac{\mathcal{L}(\hat\theta)}{\mathcal{L}(\hat\theta_0)}\big) 
= 2(\log\mathcal{L}(\hat\theta) - \log\mathcal{l}(\theta_0)) 
= 2(\mathcal{l}(\hat\theta) -\mathcal{l}(\theta_0))$ <br> 


The taylor expansion for the log-likelihood $\mathcal{l}(\theta_0)$ about $\hat\theta$: <br> 

$\mathcal{l}(\theta_0) \approx  \mathcal{l}(\hat\theta) + \mathcal{l}'(\hat\theta)(\theta_0-\hat\theta) + \dfrac{\mathcal{l}''(\hat\theta)}{2!}(\theta_0-\hat\theta)^2 + \dfrac{\mathcal{l}'''(\hat\theta)}{3!}(\theta_0-\hat\theta)^3 \dots$ <br>
Set $\mathcal{l}'(\hat\theta) = 0$ (definition of MLE) and neglect higher order terms: 
<br> 
$\mathcal{l}(\theta_0) - \mathcal{l}(\hat\theta)\approx \dfrac{\mathcal{l}''(\hat\theta)}{2!}(\theta_0-\hat\theta)^2$<br> 
$2(\mathcal{l}(\hat\theta) - \mathcal{l}(\theta_0)) \approx -\mathcal{l}''(\hat\theta)(\hat\theta-\theta_0)^2 $ <br> 
$2(\mathcal{l}(\hat\theta) - \mathcal{l}(\theta_0)) = \lambda  \approx \big(-\dfrac{1}{n}\mathcal{l}''(\hat\theta)\big)\big(\sqrt{n}(\hat\theta-\theta_0)\big)^2 $ <br> 

Another approximation can be made: <br> 
$\quad-\mathcal{l}''(\theta) \approx -\mathbb{E}_\theta(\dfrac{\partial^2 \log \prod_if(x;\theta)}{\partial\theta^2}) = 
-\mathbb{E}_\theta(\dfrac{\partial^2 l(\theta)}{\partial\theta^2})
= I_n(\theta)$ <br> 
$\quad\implies -\dfrac{1}{n}\mathcal{l}''(\hat\theta) \approx \dfrac{1}{n}I_n(\hat\theta) = \dfrac{1}{n\hat{se}(\hat\theta)^2} $

Consequently: <br> 
$\lambda = \big(\dfrac{(\hat\theta-\theta_0)}{\hat{se}(\hat\theta)}\big)^2 +$ H.O.T <br> 

$W = \big(\dfrac{(\hat\theta-\theta_0)}{\hat{se}(\hat\theta)}\big)$

$W^2 / \lambda = \big(\dfrac{(\hat\theta-\theta_0)}{\hat{se}(\hat\theta)}\big)^2 / \big(\dfrac{(\hat\theta-\theta_0)}{\hat{se}(\hat\theta)}\big)^2 \xrightarrow{P} 1$ 