This notebook aims to be an informal reference for hypothesis testing aimed at engineers. All content here is taken from Montgomery and Runger - Applied Statistics and Probability for Engineers 7ed.

<h1>Hypothesis Testing</h1>

<b>Hypothesis testing</b> involves making statements about population parameters $\theta$. These statements are called hypotheses. In many instances, a sample will be drawn from a population and the data within the sample will be used to make statements about the population. Just as in constructing confidence intervals for estimation, hypothesis testing uses the concepts and properties of sampling distributions to make statements about population parameters with certain degrees of probability. Similar to confidence intervals, these probabilities are represented by the chosen <b>significance level</b> $\alpha$.


<h2>Context</h2>

The questions hypothesis testing tries to answer are usually of the form, "Is the mean of the population not equal to a certain value?" or, "Is the mean of the population greater or less than a certain value?". This context is expressed through a null hypothesis $H_0$ and an alternative hypothesis $H_1$. The <b>null hypothesis</b> $H_0$ is usually an equality - that the population parameter is equal to some value $\theta_0$. It represents the status quo that needs evidence to disprove. In the case that the parameter in question is the population mean:<br><br>
<font size = "4">
    $$H_0: \mu = \mu_0$$
</font>

The <b>alternative hypothesis</b> can either be:
<ul>
    <li>One-sided, lower-bound - The value of the population parameter is greater than some value $\mu_0$:</li><br>
    <font size = "4">
        $$H_1: \mu > \mu_0$$
    </font><br>
    <li>One-sided, upper-bound - The value of the population parameter is less than some value $\theta_0$</li><br>
    <font size = "4">
        $$H_1: \mu < \mu_0$$
    </font><br>
    <li>Two-sided - The Value of the population parameter is not equal to some value $\theta_0$</li><br>
    <font size = "4">
        $$H_1: \mu \ne \mu_0$$
    </font>
</ul>

The alternative hypothesis is accepted over the null hypothesis if there is evidence from sample data to reject the null hypothesis.

<h2>Procedure for a Known Variance</h2>

<ul>
    <li>State the null and alternative hypotheses.</li>
    <font size = "4">
        $$H_0: \mu = \mu_0$$<br>
        $$H_1: \mu \ne \mu_0$$
    </font><br>
    <li>Specify a significance level $\alpha$. A commonly-used value is $\alpha=0.05$. This is dependent on the context and domain under which the test is being performed. For example, consumer electronics may not require significance levels as high as pharmaceuticals.</li>
    <li>Draw a sample of size $n$.</li>
    <li>Calculate the sample mean $\overline{x}.$</li>
    <li>Since the population standard deviation $\sigma$ is known, the <b>test statistic</b> we use is $Z$, which follows the standard normal distribution. Recall that the standard normal distribution of $Z$ represents the sampling distirbution of sample means. It is just centered at 0 and scaled by the standard deviation of sampling distribution.</li>
    <font size="4">
        $$Z=\frac{\overline{X}-\mu_0}{\frac{\sigma}{\sqrt{n}}}$$
    </font><br>
    <li>Using the standard normal distribution, calculate values of $Z$ that define the <b>acceptance</b> and <b>critical</b> or <b>rejection</b> regions:
    <ul>
        <li>The values $z_{lo}$ and $z_{hi}$ can be calculated using tables or computer software. In Python, use scipy.stats.norm.ppf().</li>
        <li>If the hypothesis test is two-sided, calculate $z_{lo}$ and $z_{hi}$ such that:</li><br>
        <font size = "4">
            \begin{align}
            z_{lo} &= \mathrm{ppf}\left(\frac{\alpha}{2}\right); P(Z \le z_{lo}) = \frac{\alpha}{2}\\\\
            z_{hi} &= \mathrm{ppf}\left(1 - \frac{\alpha}{2}\right); P(Z \ge z_{hi}) = 1 - \frac{\alpha}{2}
            \end{align}
        </font><br>
        <li>If the hypothesis test is one-sided and lower-bound, calculate $z_{lo}$ such that:</li><br>
        <font size = "4">
            \begin{align}
            z_{lo} &= \mathrm{ppf}\left(\alpha\right); P(Z \le z_{lo}) = \alpha
            \end{align}
        </font><br>
        <li>If the hypothesis test is one-sided and upper-bound, calculate $z_{hi}$ such that:</li><br>
        <font size = "4">
            \begin{align}
            z_{hi} &= \mathrm{ppf}\left( 1 - \alpha\right); P(Z \le z_{hi}) = 1 - \alpha
            \end{align}
        </font><br>
    </ul>
    <li>Using the null value $\mu_0$, $z_{lo}$, and $z_{hi}$, calculate the critical values $x_{lo}$ and $x_{hi}$.</li><br>
    <font size = "4">
        \begin{align}
        x_{lo} &= \mu_0 + z_{lo} \frac{\sigma}{\sqrt{n}}\\\\
        x_{hi} &= \mu_0 + z_{hi} \frac{\sigma}{\sqrt{n}} 
        \end{align}
    </font><br>
    <ul>
        <li>Note that at this point, all we've done is to construct a confidence interval with a significance level $\alpha$ centered around the null value $\mu_0$.</li>
        <li>$x_{lo}$ and $x_{hi}$ are <b>critical values</b>. They represent the boundaries between the acceptance region and the critical/rejection region.</li>
        <ul>
            <li>If the hypothesis test is two-sided, the acceptance region consists of all values between $[x_{lo}, x_{hi}]$.</li>
            <li>If the hypothesis test is one-sided and lower-bound, the acceptance region consists of all values $\ge x_{lo}$.</li>
            <li>If the hypothesis test is one-sided and upper-bound, the acceptance region consists of all values $\le x_{hi}$.</li>
            <li>The rejection region is any range of values not inside the acceptance region.</li>
        </ul>
        <li>In other words, the rejection region is anything outside the confidence interval that was just constructed.</li>
    </ul>
    <li>If the sample mean $\overline{x}$ is in the acceptance region, then we fail to reject the null hypothesis $H_0$. This is the equivalent to saying that there is not enough evidence in the sample to favor the alternative hypothesis $H_1$ over the null hypothesis $H_0$.</li>
    <li>If the sample mean $\overline{x}$ is in the critical region, then we reject the null hypothesis $H_0$ in favor of the altnerative hypothesis $H_1$. This is equivalent to saying that we have enough evidence to think that the alternative hypothesis $H_1$ might be true and that the null hypothesis might be $H_0$ false.
</ul>

<h2>Procedure for an Unknown Variance</h2>

<ul>
    <li>State the null and alternative hypotheses.</li>
    <font size = "4">
        $$H_0: \mu = \mu_0$$<br>
        $$H_1: \mu \ne \mu_0$$
    </font><br>
    <li>Specify a significance level $\alpha$. A commonly-used value is $\alpha=0.05$.</li>
    <li>Draw a sample of size $n$.</li>
    <li>Calculate the sample mean $\overline{x}.$</li>
    <li>Calculate the sample standard deviation $s$ with $n - 1$ degrees of freedom.</li>
    <li>Since the population standard deviation $\sigma$ is unknown, we need to use the $T$ statistic. Recall that if we keep drawing samples from the population and calculating different values of $T$, we will find that the probability distribution of $T$ follows Student's t distribution.</li><br>
    <font size="4">
        $$T=\frac{\overline{X}-\mu_0}{\frac{s}{\sqrt{n}}}$$
    </font><br>
    <li>Using the $t$ distribution, calculate values of $T$ that define the acceptance and rejection regions:</li>
    <ul>
        <li>If the hypothesis test is two-sided, using $n - 1$ degrees of freedom, calculate $t_{lo}$ and $t_{hi}$ such that:</li><br>
        <font size = "4">
            \begin{align}
            t_{lo} &= \mathrm{ppf}\left( \frac{\alpha}{2} \right); P(T \le t_{lo}) = \frac{\alpha}{2}\\\\
            t_{hi} &= \mathrm{ppf}\left( 1 - \frac{\alpha}{2} \right); P(T \ge t_{hi}) = \frac{\alpha}{2}
            \end{align}
        </font><br>
        <li>If the hypothesis test is one-sided and lower-bound, using $n - 1$ degrees of freedom, calculate $t_{lo}$ such that:</li><br>
        <font size = "4">
            \begin{align}
            t_{lo} &= \mathrm{ppf}\left( \alpha \right); P(T \le t_{lo}) = \alpha
            \end{align}
        </font><br>
        <li>If the hypothesis test is one-sided and upper-bound, using $n - 1$ degrees of freedom, calculate $t_{hi}$ such that:</li><br>
        <font size = "4">
            \begin{align}
            t_{hi} &= \mathrm{ppf}\left( 1 - \alpha \right); P(T \ge t_{hi}) = \alpha
            \end{align}
        </font><br>
    </ul>
    <li>Using the null value $\mu_0$, $t_{lo}$, and $t_{hi}$, calculate the critical values $x_{lo}$ and $x_{hi}$.</li><br>
    <font size = "4">
        \begin{align}
        x_{lo} &= \mu_0 + t_{lo} \frac{s}{\sqrt{n}}\\\\
        x_{hi} &= \mu_0 + t_{hi} \frac{s}{\sqrt{n}} 
        \end{align}
    </font><br>
    <li>The critical values define the acceptance and rejection regions. Evaluate whether the sample mean $\overline{x}$ is in the acceptance or rejection region. Accept or reject the null hypothesis appropriately.</li>
</ul>

<h2>Procedure for Proportion of a Population</h2>

The sampling distribution for the proportion of items that belong to a class of interest $\hat{P} = \frac{x}{n}$ is a binomial distribution with parameters $n$, the sample size, and $p$, the proportion of the population belonging to a class of interest. This binomial distribution can be approximated by the normal distribution. The approximated normal distribution will have the following mean and variance:
<font size="4">
    $$\mu=np$$<br>
    $$\sigma^2=np\left( 1 - p \right)$$
</font><br>

Thus, the following $Z$ statistic will follow the standard normal distribution:<br><br>
<font size="4">
    $$Z=\frac{X-np}{\sqrt{np\left(1-p\right)}}$$
</font><br>

<ul>
    <li>State the null and alternative hypotheses.</li>
    <font size = "4">
        $$H_0: p = p_0$$<br>
        $$H_1: p \ne p_0$$
    </font><br>
    <li>Specify a significance level $\alpha$. A commonly-used value is $\alpha=0.05$.</li>
    <li>Draw a sample of size $n$.</li>
    <li>Calculate the sample mean $np$.</li>
    <li>The population standard deviation is known. By using the normal approximation to the binomial distribution, this is equal to $\sqrt{np\left(1 - p\right)}$.</li>
    <li>Since the population standard deviation is known, we can use the $Z$ test statistic.</li><br>
    <font size="4">
        $$Z_0 = \frac{X-np_0}{\sqrt{np_0\left(1 - p_0\right)}}$$
    </font><br>
    <li>Using the standard normal distribution, calculate values of $Z$ that define the acceptance and rejection regions:</li>
    <ul>
        <li>If the hypothesis test is two-sided, calculate $z_{lo}$ and $z_{hi}$ such that:</li><br>
        <font size = "4">
            \begin{align}
            z_{lo} &= \mathrm{ppf}\left( \frac{\alpha}{2} \right); P(Z \le z_{lo}) = \frac{\alpha}{2}\\\\
            z_{hi} &= \mathrm{ppf}\left( 1 - \frac{\alpha}{2} \right); P(Z \ge z_{hi}) = \frac{\alpha}{2}
            \end{align}
        </font><br>
        <li>If the hypothesis test is one-sided and lower-bound, calculate $z_{lo}$ such that:</li><br>
        <font size = "4">
            \begin{align}
            z_{lo} &= \mathrm{ppf}\left( \alpha \right); P(Z \le z_{lo}) = \alpha
            \end{align}
        </font><br>
        <li>If the hypothesis test is one-sided and upper-bound, calculate $z_{hi}$ such that:</li><br>
        <font size = "4">
            \begin{align}
            z_{hi} &= \mathrm{ppf}\left( 1 - \alpha \right); P(Z \ge z_{hi}) = \alpha
            \end{align}
        </font><br>
    </ul>
    <li>Using the null value $\mu_0$, $z_{lo}$, and $z_{hi}$, calculate the critical values $x_{lo}$ and $x_{hi}$.</li><br>
    <font size = "4">
        \begin{align}
        x_{lo} &= np_0 + z_{lo} \sqrt{np_0\left(1 - p_0\right)}\\\\
        x_{hi} &= np_0 + z_{hi} \sqrt{np_0\left(1 - p_0\right)} 
        \end{align}
    </font><br>
    <li>The critical values define the acceptance and rejection regions. Evaluate whether the sample mean $\overline{x}$ is in the acceptance or rejection region. Accept or reject the null hypothesis appropriately.</li>
</ul>

<h2>Error Types</h2>

We can never know for sure, with 100% certainty, the value of a population parameter like $\mu$ without knowing all the values of a population. We can only make statements about the population parameter based on evidence represented by the sample drawn from the same population.

<h3>Type 1 Error</h3>

Note in the procedure, that an acceptance region and rejection region is defined. In essence, what we are doing is saying, "Well, based on what we know about sampling distributions, encountering a sample mean that is in the critical region is quite unlikely. So, we don't think the null hypothesis is true and will choose to believe the alternative hypothesis instead." But, there is still a small probability that the null hypothesis $H_0$ is true even when the sample mean $\overline{x}$ is in the critical region. In other words, some percentage of the time, we will incorrectly reject the null hypothesis $H_0$. This percentage or probability is the significance level $\alpha$. In other words, we will commit a <b>Type 1 Error</b>, incorrectly rejecting the null hypothesis $H_0$ when it is true, with a probability equal to $\alpha$.

<h3>Type 2 Error</h3>

In contrast, a <b>Type 2 Error</b> is failing to reject the null hypothesis $H_0$ when it is false. Note the following about Type 2 Errors:
<ul>
    <li>It pertains to failing to reject the null hypothesis. So, the probability of committing a Type 2 Error must lie somewhere within the acceptance region.</li>
    <li>By asserting that $H_0$ is false, one needs to suspect a true value for $theta$, in this case, $\mu$.</li>
</ul>

In the case when the population standard deviation $\sigma$ is known, the $Z$ variable centered around $\mu_0$ is used:<br><br>
<font size="4">
    $$Z_0 = \frac{\overline{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}}$$
</font><br>

However, how does the suspected true value of $\mu$ affect the distribution of $Z_0$? We know from the Central Limit Theorem that the sampling distribution will be approximately normal if the sample size $n$ is reasonably large, $n \ge 30$. We also know that the sampling distribution will have a standard deviation of $\frac{\sigma}{n}$. Since $Z$ is simply the mean-centering and scaling of the sampling distribution, we can expect that the true sampling distribution be a normal distribution with variance $\sigma^2$ and centered around the suspectedly true mean $\mu_a$.

To find the probability $\beta$ of committing a Type 2 Error, one would have to:
<ul>
    <li>Determine the acceptance region depending on the type of hypothesis test.</li>
    <ul>
        <li>For a two-sided hypothesis test, this means determining $x_{lo}$ and $x_{hi}$.</li><br>
        <font size="4">
            $$z_{lo} = \mathrm{ppf}\left( \frac{\alpha}{2} \right)$$<br>
            $$z_{hi} = \mathrm{ppf}\left( 1 - \frac{\alpha}{2} \right)$$<br>
            $$x_{lo} = \mu_0 + z_{lo}\frac{\sigma}{\sqrt{n}}$$<br>
            $$x_{hi} = \mu_0 + z_{hi}\frac{\sigma}{\sqrt{n}}$$<br>
        </font>
        <li>For a one-sided lower bound hypothesis test, this means determining $x_{lo}$.</li><br>
        <font size="4">
            $$z_{lo} = \mathrm{ppf}\left( \alpha \right)$$<br>
            $$x_{lo} = \mu_0 + z_{lo}\frac{\sigma}{\sqrt{n}}$$<br>
        </font>
        <li>For a one-sided upper bound hypothesis test, this means determining $x_{hi}$.</li><br>
        <font size="4">
            $$z_{hi} = \mathrm{ppf}\left( 1 - \frac{\alpha}{2} \right)$$<br>
            $$x_{hi} = \mu_0 + z_{hi}\frac{\sigma}{\sqrt{n}}$$<br>
        </font>
    </ul>
    <li>Then, with a normal distribution centered around $\mu_a$, calculate the probability of observing a sample mean $\overline{x}$ within the acceptance region.</li><br>
    <ul>
        <li> For convenience, we define the suspectedly true mean $\mu_a$ as an offset from $\mu_0$:</li><br>
        <font size="4">
            $$\mu_a=\mu_0 + \delta$$
        </font><br>
        <li>For a two-sided hypothesis test:</li><br>
        <font size="4">
            \begin{align}
            \beta&=P\left(x_{lo} \le \overline{X} \le x_{hi}\right)\\
            &=P\left(\overline{X} \le x_{hi}\right) - P\left(\overline{X} \le x_{lo}\right)\\
            &=P \left(Z \le \frac{x_{hi} - \mu_a}{\frac{\sigma}{\sqrt{n}}}\right) - P \left(Z \le \frac{x_{lo} - \mu_a}{\frac{\sigma}{\sqrt{n}}}\right)\\
            &=P \left(Z \le \frac{\mu_0 + z_{hi}\frac{\sigma}{\sqrt{n}} - \mu_a}{\frac{\sigma}{\sqrt{n}}}\right) - P \left(Z \le \frac{\mu_0 + z_{lo}\frac{\sigma}{\sqrt{n}} - \mu_a}{\frac{\sigma}{\sqrt{n}}}\right)\\
            &=P \left(Z \le \frac{\mu_0 + z_{hi}\frac{\sigma}{\sqrt{n}} - \mu_0 - \delta}{\frac{\sigma}{\sqrt{n}}}\right) - P -\left(Z \le \frac{\mu_0 + z_{lo}\frac{\sigma}{\sqrt{n}} - \mu_0 - \delta}{\frac{\sigma}{\sqrt{n}}}\right)\\
            &=P \left(Z \le \frac{z_{hi}\frac{\sigma}{\sqrt{n}} - \delta}{\frac{\sigma}{\sqrt{n}}}\right) - P \left(Z \le \frac{z_{lo}\frac{\sigma}{\sqrt{n}} - \delta}{\frac{\sigma}{\sqrt{n}}}\right)\\
            &=P \left(Z \le z_{hi} - \frac{\delta}{\frac{\sigma}{\sqrt{n}}}\right) - P \left(Z \le z_{lo} - \frac{\delta}{\frac{\sigma}{\sqrt{n}}}\right)\\
            &=P \left(Z \le z_{hi} - \frac{\delta\sqrt{n}}{\sigma}\right) - P \left(Z \le z_{lo} - \frac{\delta\sqrt{n}}{\sigma}\right)\\
            &=P \left(Z \le \mathrm{ppf}\left(1 - \frac{\alpha}{2}\right) - \frac{\delta\sqrt{n}}{\sigma}\right) - P \left(Z \le \mathrm{ppf}\left(\frac{\alpha}{2}\right) - \frac{\delta\sqrt{n}}{\sigma}\right)
            \end{align}
        </font><br>
        <li>For a one-sided lower-bound hypothesis test:</li><br>
        <font size="4">
            \begin{align}
            \beta&=P \left( \overline{X} \ge x_{lo} \right)\\
            &=1 - P\left(\overline{X} \le x_{lo}\right)\\
            &=1 - P \left(Z \le \frac{x_{lo} - \mu_a}{\frac{\sigma}{\sqrt{n}}} \right)\\
            &=1 - P \left(Z \le \frac{\mu_0 + z_{lo}\frac{\sigma}{\sqrt{n}} - \mu_a}{\frac{\sigma}{\sqrt{n}}} \right)\\
            &=1 - P \left(Z \le \frac{\mu_0 + z_{lo}\frac{\sigma}{\sqrt{n}} - \mu_0 - \delta}{\frac{\sigma}{\sqrt{n}}} \right)\\
            &=1 - P \left(Z \le \frac{z_{lo}\frac{\sigma}{\sqrt{n}} - \delta}{\frac{\sigma}{\sqrt{n}}} \right)\\
            &=1 - P \left(Z \le z_{lo} - \frac{\delta}{\frac{\sigma}{\sqrt{n}}} \right)\\
            &=1 - P \left(Z \le z_{lo} - \frac{\delta\sqrt{n}}{\sigma} \right)\\
            &=1 - P \left(Z \le \mathrm{ppf}\left(\alpha\right) - \frac{\delta\sqrt{n}}{\sigma} \right)
            \end{align}
        </font><br>
        <li>For a one-sided upper-bound hypothesis test:</li><br>
        <font size="4">
            \begin{align}
            \beta&=P \left( \overline{X} \le x_{hi} \right)\\
            &=P \left( Z \le \frac{x_{hi} - \mu_a}{\frac{\sigma}{\sqrt{n}}} \right)\\
            &=P \left( Z \le \frac{\mu_0 + z_{hi}\frac{\sigma}{\sqrt{n}} - \mu_a}{\frac{\sigma}{\sqrt{n}}} \right)\\
            &=P \left( Z \le \frac{\mu_0 + z_{hi}\frac{\sigma}{\sqrt{n}} - \mu_0 - \delta}{\frac{\sigma}{\sqrt{n}}} \right)\\
            &=P \left( Z \le z_{hi} - \frac{\delta}{\frac{\sigma}{\sqrt{n}}} \right)\\
            &=P \left( Z \le z_{hi} - \frac{\delta\sqrt{n}}{\sigma} \right)\\
            &=P \left( Z \le \mathrm{ppf}\left(1 - \alpha\right) - \frac{\delta\sqrt{n}}{\sigma} \right)
            \end{align}
        </font>
    </ul>

<h2>Controlling $\beta$ by Changing Sample Size</h2>

It is possible to compute an appropriate sample size that will result in a desired probability for committing Type 2 Errors. See hypothesis_testing.py for better visual aids.

<h3>For Two-Sided Hypothesis Tests</h3>

The probability of Type 2 Errors is:
<font size="4">
    $$\beta = P\left( x_{lo} \le \overline{X} \le x_{hi} \right)$$
</font><br>

For $\delta > 0$:
<font size="4">
    $$\beta = P\left( \overline{X} \le x_{hi} \right)  - P\left( \overline{X} \le x_{lo} \right)$$<br>
    $$P \left(\overline{X} \le x_{lo} \right) \approx 0$$<br>
    $$\beta \approx P \left( \overline{X} \le x_{hi} \right)$$<br>
    $$\beta \approx P \left( Z \le \frac{x_{hi} - \mu_a}{\frac{\sigma}{\sqrt{n}}} \right)$$<br>
    $$\beta \approx P \left( Z \le \frac{\mu_0 + z_{hi}\frac{\sigma}{\sqrt{n}} - \mu_0 - \delta}{\frac{\sigma}{\sqrt{n}}} \right)$$<br>
    $$\beta \approx P \left( Z \le \frac{z_{hi}\frac{\sigma}{\sqrt{n}} - \delta}{\frac{\sigma}{\sqrt{n}}} \right)$$<br>
    $$\beta \approx P \left( Z \le z_{hi} - \frac{\delta}{\frac{\sigma}{\sqrt{n}}} \right)$$<br>
    $$\beta \approx P \left( Z \le z_{hi} - \frac{\delta\sqrt{n}}{\sigma} \right)$$<br>
    $$\mathrm{ppf}\left(\beta \right) \approx z_{hi} - \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$\mathrm{ppf}\left(\beta \right) - z_{hi} \approx - \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$z_{hi} - \mathrm{ppf}\left(\beta \right) \approx \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$\sigma\left(z_{hi} - \mathrm{ppf}\left(\beta \right)\right) \approx \delta\sqrt{n}$$<br>
    $$\sqrt{n} \approx \frac{\sigma\left(z_{hi} - \mathrm{ppf}\left(\beta \right)\right)}{\delta}$$<br>
    $$n \approx \left( \frac{\sigma\left(z_{hi} - \mathrm{ppf}\left(\beta \right)\right)}{\delta} \right)^2$$
    $$n \approx \left( \frac{\sigma\left(\mathrm{ppf}\left(1 - \frac{\alpha}{2}\right) - \mathrm{ppf}\left(\beta \right)\right)}{\delta} \right)^2$$
</font><br>

For $\delta < 0$:
<font size = "4">
    $$\beta = P\left( \overline{X} \ge x_{lo}\right) - P\left( \overline{X} \ge x_{hi}\right)$$<br>
    $$P\left( \overline{X} \ge x_{hi}\right) \approx 0$$<br>
    $$\beta \approx P\left( \overline{X} \ge x_{lo}\right)$$<br>
    $$\beta \approx 1 - P\left( \overline{X} \le x_{lo}\right)$$<br>
    $$\beta \approx 1 - P\left( Z \le \frac{x_{lo} - \mu_a}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta \approx 1 - P\left( Z \le \frac{\mu_0 + z_{lo}\frac{\sigma}{\sqrt{n}} - \mu_0 - \delta}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta \approx 1 - P\left( Z \le \frac{z_{lo}\frac{\sigma}{\sqrt{n}} - \delta}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta \approx 1 - P\left( Z \le z_{lo} - \frac{\delta}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta \approx 1 - P\left( Z \le z_{lo} - \frac{\delta\sqrt{n}}{\sigma}\right)$$<br>
    $$\beta - 1\approx - P\left( Z \le z_{lo} - \frac{\delta\sqrt{n}}{\sigma}\right)$$<br>
    $$1 - \beta \approx P\left( Z \le z_{lo} - \frac{\delta\sqrt{n}}{\sigma}\right)$$<br>
    $$\mathrm{ppf}\left(1 - \beta\right) \approx z_{lo} - \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$\mathrm{ppf}\left(1 - \beta\right) - z_{lo}\approx - \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$z_{lo} - \mathrm{ppf}\left(1 - \beta\right) \approx \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$\sigma \left(z_{lo} - \mathrm{ppf}\left(1 - \beta\right)\right) \approx \delta\sqrt{n}$$<br>
    $$\delta\sqrt{n} \approx \sigma \left(z_{lo} - \mathrm{ppf}\left(1 - \beta\right)\right)$$<br>
    $$\sqrt{n} \approx \frac{\sigma \left(z_{lo} - \mathrm{ppf}\left(1 - \beta\right)\right)}{\delta}$$<br>
    $$n \approx \left(\frac{\sigma \left(\mathrm{ppf}\left(\frac{\alpha}{2}\right) - \mathrm{ppf}\left(1 - \beta\right)\right)}{\delta}\right)^2$$
</font><br>

<h3>For One-Sided Lower-bound Hypothesis Tests</h3>

<font size="4">
    $$\beta = P\left( \overline{X} \ge x_{lo}\right)$$<br>
    $$\beta = 1 - P\left( \overline{X} \le x_{lo}\right)$$<br>
    $$\beta = 1 - P\left( Z \le \frac{x_{lo} - \mu_a}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta = 1 - P\left( Z \le \frac{\mu_0 + z_{lo}\frac{\sigma}{\sqrt{n}} - \mu_0 - \delta}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta = 1 - P\left( Z \le \frac{z_{lo}\frac{\sigma}{\sqrt{n}} - \delta}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta = 1 - P\left( Z \le z_{lo} - \frac{\delta}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta = 1 - P\left( Z \le z_{lo} - \frac{\delta\sqrt{n}}{\sigma}\right)$$<br>
    $$\beta - 1= - P\left( Z \le z_{lo} - \frac{\delta\sqrt{n}}{\sigma}\right)$$<br>
    $$1 - \beta = P\left( Z \le z_{lo} - \frac{\delta\sqrt{n}}{\sigma}\right)$$<br>
    $$\mathrm{ppf}\left(1 - \beta\right) = z_{lo} - \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$\mathrm{ppf}\left(1 - \beta\right) - z_{lo}= - \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$z_{lo} - \mathrm{ppf}\left(1 - \beta\right) = \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$\sigma \left(z_{lo} - \mathrm{ppf}\left(1 - \beta\right)\right) = \delta\sqrt{n}$$<br>
    $$\delta\sqrt{n} = \sigma \left(z_{lo} - \mathrm{ppf}\left(1 - \beta\right)\right)$$<br>
    $$\sqrt{n} = \frac{\sigma \left(z_{lo} - \mathrm{ppf}\left(1 - \beta\right)\right)}{\delta}$$<br>
    $$n = \left(\frac{\sigma \left(\mathrm{ppf}\left(\alpha\right) - \mathrm{ppf}\left(1 - \beta\right)\right)}{\delta}\right)^2$$
</font><br>

<h3>For One-Sided Upper-bound Hypothesis Tests</h3>

<font size="4">
    $$\beta = P\left( \overline{X} \le x_{hi}\right)$$<br>
    $$\beta = P\left( Z \le \frac{x_{hi}-\mu_a}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta = P\left( Z \le \frac{\mu_0 + z_{hi}\frac{\sigma}{\sqrt{n}} - \mu_a}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta = P\left( Z \le \frac{\mu_0 + z_{hi}\frac{\sigma}{\sqrt{n}} - \mu_0 - \delta}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta = P\left( Z \le \frac{z_{hi}\frac{\sigma}{\sqrt{n}} - \delta}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta = P\left( Z \le z_{hi} - \frac{\delta}{\frac{\sigma}{\sqrt{n}}}\right)$$<br>
    $$\beta = P\left( Z \le z_{hi} - \frac{\delta\sqrt{n}}{\sigma}\right)$$<br>
    $$\mathrm{ppf}\left(\beta\right) = z_{hi} - \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$\mathrm{ppf}\left(\beta \right) - z_{hi} = - \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$z_{hi} - \mathrm{ppf}\left(\beta \right) = \frac{\delta\sqrt{n}}{\sigma}$$<br>
    $$\sigma\left(z_{hi} - \mathrm{ppf}\left(\beta \right)\right) = \delta\sqrt{n}$$<br>
    $$\sqrt{n} = \frac{\sigma\left(z_{hi} - \mathrm{ppf}\left(\beta \right)\right)}{\delta}$$<br>
    $$n = \left( \frac{\sigma\left(\mathrm{ppf}\left(1 - \alpha\right) - \mathrm{ppf}\left(\beta \right)\right)}{\delta} \right)^2$$
</font><br>

<h3>For Two-Sided Hypothesis Tests on the Proportion of a Population</h3>

Recall that the sampling distribution used to approximate the binomial distribution is the normal distribution with mean $np$ and standard deviation $\sqrt{np\left(1-p\right)}$. Instead of assuming a $\delta$, we simply assume that the suspectedly true value for the population proportion is $p_a$.

For $\mu_a > \mu_0 \rightarrow np_a > np_0$:
<font size="4">
    $$\beta = P\left( x_{lo} \le \overline{X} \le x_{hi}\right)$$<br>
    $$\beta = P\left(\overline{X} \le x_{hi} \right) - P\left(\overline{X} \le x_{lo} \right)$$<br>
    $$P\left(\overline{X} \le x_{lo} \right) \approx 0$$<br>
    $$\beta \approx P\left(\overline{X} \le x_{hi} \right)$$<br>
    $$\beta \approx P\left(Z \le \frac{x_{hi} - np_a}{\sqrt{np_a\left(1 - p_a\right)}} \right)$$<br>
    $$\beta \approx P\left(Z \le \frac{np_0 + z_{hi}\sqrt{np_0\left(1 - p_0\right)} - np_a}{\sqrt{np_a\left(1 - p_a\right)}} \right)$$<br>
    $$\mathrm{ppf}\left(\beta\right) \approx \frac{np_0 + z_{hi}\sqrt{np_0\left(1 - p_0\right)} - np_a}{\sqrt{np_a\left(1 - p_a\right)}}$$<br>
    $$\mathrm{ppf}\left(\beta\right) \sqrt{np_a\left(1 - p_a\right)} \approx np_0 + z_{hi}\sqrt{np_0\left(1 - p_0\right)} - np_a$$<br>
    $$\mathrm{ppf}\left(\beta\right) \sqrt{np_a\left(1 - p_a\right)} - z_{hi}\sqrt{np_0\left(1 - p_0\right)} \approx n\left(p_0 - p_a\right)$$<br>
    $$\sqrt{n}\left(\mathrm{ppf}\left(\beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{hi}\sqrt{p_0\left(1 - p_0\right)} \right) \approx n\left(p_0 - p_a\right)$$<br>
    $$\frac{\sqrt{n}}{n}\left(\mathrm{ppf}\left(\beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{hi}\sqrt{p_0\left(1 - p_0\right)} \right) \approx \left(p_0 - p_a\right)$$<br>
    $$\frac{1}{\sqrt{n}}\left(\mathrm{ppf}\left(\beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{hi}\sqrt{p_0\left(1 - p_0\right)} \right) \approx \left(p_0 - p_a\right)$$<br>
    $$\frac{1}{\sqrt{n}} \approx \frac{\left(p_0 - p_a\right)}{\left(\mathrm{ppf}\left(\beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{hi}\sqrt{p_0\left(1 - p_0\right)} \right)}$$<br>
    $$\sqrt{n} \approx \frac{\left(\mathrm{ppf}\left(\beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{hi}\sqrt{p_0\left(1 - p_0\right)} \right)}{\left(p_0 - p_a\right)}$$<br>
    $$n \approx \left(\frac{\left(\mathrm{ppf}\left(\beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{hi}\sqrt{p_0\left(1 - p_0\right)} \right)}{\left(p_0 - p_a\right)}\right)^2$$<br>
    $$n \approx \left(\frac{\left(\mathrm{ppf}\left(\beta\right) \sqrt{p_a\left(1 - p_a\right)} - \mathrm{ppf}\left(1 - \frac{\alpha}{2}\right)\sqrt{p_0\left(1 - p_0\right)} \right)}{\left(p_0 - p_a\right)}\right)^2$$<br>
</font>

For $\mu_a < \mu_0 \rightarrow np_a < np_0$:
<font size="4">
    $$\beta = P\left( x_{lo} \le \overline{X} \le x_{hi}\right)$$<br>
    $$\beta = P\left(\overline{X} \ge x_{lo} \right) - P(\left(\overline{X} \ge x_{hi}\right)$$<br>
    $$P(\left(\overline{X} \ge x_{hi}\right) \approx 0$$<br>
    $$\beta \approx P\left(\overline{X} \ge x_{lo} \right)$$<br>
    $$\beta \approx 1 - P\left(\overline{X} \le x_{lo} \right)$$<br>
    $$\beta \approx 1 - P\left(Z \le \frac{x_{lo} - np_a}{\sqrt{np_a\left(1 - p_a\right)}} \right)$$<br>
    $$\beta \approx 1 - P\left(Z \le \frac{np_0 + z_{lo}\sqrt{np_0\left(1 - p_0\right)} - np_a}{\sqrt{np_a\left(1 - p_a\right)}} \right)$$<br>
    $$\beta - 1 \approx - P\left(Z \le \frac{np_0 + z_{lo}\sqrt{np_0\left(1 - p_0\right)} - np_a}{\sqrt{np_a\left(1 - p_a\right)}} \right)$$<br>
    $$1 - \beta \approx P\left(Z \le \frac{np_0 + z_{lo}\sqrt{np_0\left(1 - p_0\right)} - np_a}{\sqrt{np_a\left(1 - p_a\right)}} \right)$$<br>
    $$\mathrm{ppf}\left(1 - \beta\right) \approx \frac{np_0 + z_{lo}\sqrt{np_0\left(1 - p_0\right)} - np_a}{\sqrt{np_a\left(1 - p_a\right)}}$$<br>
    $$\mathrm{ppf}\left(1 - \beta\right) \sqrt{np_a\left(1 - p_a\right)} \approx np_0 + z_{lo}\sqrt{np_0\left(1 - p_0\right)} - np_a$$<br>
    $$\mathrm{ppf}\left(1 - \beta\right) \sqrt{np_a\left(1 - p_a\right)} - z_{lo}\sqrt{np_0\left(1 - p_0\right)} \approx n\left(p_0 - p_a\right)$$<br>
    $$\sqrt{n}\left(\mathrm{ppf}\left(1 - \beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{lo}\sqrt{p_0\left(1 - p_0\right)} \right) \approx n\left(p_0 - p_a\right)$$<br>
    $$\frac{\sqrt{n}}{n}\left(\mathrm{ppf}\left(1 - \beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{lo}\sqrt{p_0\left(1 - p_0\right)} \right) \approx \left(p_0 - p_a\right)$$<br>
    $$\frac{1}{\sqrt{n}} \approx \frac{\left(p_0 - p_a\right)}{\mathrm{ppf}\left(1 - \beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{lo}\sqrt{p_0\left(1 - p_0\right)}}$$<br>
    $$\sqrt{n} \approx \frac{\mathrm{ppf}\left(1 - \beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{lo}\sqrt{p_0\left(1 - p_0\right)}}{\left(p_0 - p_a\right)}$$<br>
    $$n \approx \left(\frac{\mathrm{ppf}\left(1 - \beta\right) \sqrt{p_a\left(1 - p_a\right)} - z_{lo}\sqrt{p_0\left(1 - p_0\right)}}{\left(p_0 - p_a\right)}\right)^2$$<br>
    $$n \approx \left(\frac{\mathrm{ppf}\left(1 - \beta\right) \sqrt{p_a\left(1 - p_a\right)} - \mathrm{ppf}\left(\frac{\alpha}{2}\right)\sqrt{p_0\left(1 - p_0\right)}}{\left(p_0 - p_a\right)}\right)^2$$<br>
</font>

<h3>For One-Sided Lower-Bound Hypothesis Tests on the Proportion of a Population</h3>

The derivation is exactly the same as for the approximation of a two-sided hypothesis test where $np_a < np_0$. So:<br><br>
<font size="4">
    $$n = \left(\frac{\mathrm{ppf}\left(1 - \beta\right) \sqrt{p_a\left(1 - p_a\right)} - \mathrm{ppf}\left(\alpha\right)\sqrt{p_0\left(1 - p_0\right)}}{\left(p_0 - p_a\right)}\right)^2$$<br>
</font><br>

Where:
<ul>
    <li>$\beta$ is the chosen probability of Type 2 Errors</li><br>
    <li>$p_a$ is the suspectedly true population proportion</li><br>
    <li>$p_0$ is null value for the population proportion</li><br>
    <li>$\alpha$ is the chosen significance level</li><br>
</ul>

<h3>For One-Sided Upper-Bound Hypothesis Tests on the Proportion of a Population</h3>

The derivation is exactly the same as for the approximation of a two-sided hypothesis test where $np_a < np_0$. So:<br><br>
<font size="4">
    $$n \approx \left(\frac{\left(\mathrm{ppf}\left(\beta\right) \sqrt{p_a\left(1 - p_a\right)} - \mathrm{ppf}\left(1 - \alpha\right)\sqrt{p_0\left(1 - p_0\right)} \right)}{\left(p_0 - p_a\right)}\right)^2$$<br>
</font><br>

Where:
<ul>
    <li>$\beta$ is the chosen probability of Type 2 Errors</li><br>
    <li>$p_a$ is the suspectedly true population proportion</li><br>
    <li>$p_0$ is null value for the population proportion</li><br>
    <li>$\alpha$ is the chosen significance level</li><br>
</ul>

<h2>Power</h2>

The <b>power</b> of a statistical test is the probabilitiy of correctly rejecting the null hypothesis $H_0$ when the alternative hypothesis $H_1$ is true. In other words, this is the probability of <b>not committing a Type 2 Error</b>. It is calculated as:<br><br>
<font size="4">
    $$\mathrm{Power} = 1 - \beta$$
</font>

In terms of the sampling distributions centered at $\mu_0$ and $\mu_a$ when discussing Type 2 Error, Power is the probability represented by the area of the sampling distribution centered at $\mu_a$ that does not overlap with the acceptance region of the sampling distribution centered at $\mu_0$.

Note again that Power is relative to some assumed-to-be-true value of the population parameter $(\mu_a)$.

<h2>P-values</h2>

P-values are an alternative way of reporting the results of a hypothesis test. Instead of defining a rejection region and an acceptance region, we ask, "What is the smallest value of the significance level $\alpha$ that would lead us to reject the null hypothesis $H_0$?" The answer to this question is simply finding the significance level represented by the sample mean. To find this value:
<ul>
    <li>If the observed sample mean $\overline{x}$ is greater than the null value $\mu_0$, compute the probability of observing sample means that are at least as large. Then double this probability to account for the other symmetric tail of the t-distribution:</li><br>
    <font size="4">
        \begin{align}
        p = 2P(X \ge \overline{x})
        \end{align}
    </font><br>
    <li>If the observed sample mean $\overline{x}$ is less than the null value $\mu_0$, compute the probability of observing sample means that are at most the value of the observed sample mean $\overline{x}$. Then double this probability to account for the other symmetric tail of the t-distribution:</li><br>
    <font size="4">
        \begin{align}
        p = 2P(X \le \overline{x})
        \end{align}
    </font><br>
    <li>If the alternative hypothesis is $H_1: \mu > \mu_0$, then find the probability of observing of observing sample means greater than the observed sample mean $\overline{x}$:</li><br>
    <font size="4">
        \begin{align}
        p = P(X \ge \overline{x})
        \end{align}
    </font><br>
    <li>If the alternative hypothesis is $H_1: \mu < \mu_0$, then find the probability of observing of observing sample means less than the observed sample mean $\overline{x}$:</li><br>
    <font size="4">
        \begin{align}
        p = P(X \le \overline{x})
        \end{align}
    </font><br>
</ul>

Once the p-value $p$ is computed, deciding on whether to reject the null hypothesis is easy. If the $p < \alpha$, we reject the null hypothesis $H_0$ in favor of the alternative hypothesis $H_1$ if $p<\alpha$. Because of the way the $p$ is computed, $p<\alpha$ always corresponds to the observed sample mean $\overline{x}$ being in the critical region.