# DATA 5600: Introduction to Regression and Machine Learning for Analytics

## __Topic: Some (Very) Brief Notes on Hypothesis Testing__ <br>

Author:  Tyler J. Brough <br>
Updated: October 18, 2021 <br>

---

<br>

In [None]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = [10, 8]

In [None]:
np.random.seed(7)

---

<br>

## __Elements of Hypothesis Testing__

<br>

These notes are based upon readings from the following books:

* _Introduction to Probability and Mathematical Statistics_ by Bain & Engelhart

* _Mathematical Statistics with Applications_ by Wackerly, Mendenhall, and Scheaffer

* _Statistics for Business and Economics_ by McClave, Benson, and Sincich

<br>


### __Statistical Hypothesis__

---

A statistical __hypothesis__ is a statement about the numerical value of a population parameter.

---

<br>
<br>

### __Null Hypothesis__

---

The __null hypothesis__, denoted $H_{0}$, represents the hypothesis that is assumed to be true unless the data provide convincing evidence that it is false.
This usually represents the "status quo" or some claim about the population parameter that the research wants to test.

---

<br>
<br>

### __Alternative Hypothesis__

---

The __alternative (research) hypothesis__, denoted $H_{a}$, represents the hypothesis that will be accepted only if the data provide convincing evidence
of its truth. This usually represents the values of a population parameter for which the researcher wants to gather evidence to support.

---

<br>
<br>

### __Test Statistic__

---

The __test statistic__ is a sample statistic, computed from information provided in the sample, that the researcher uses to decide between the null and alternative hypotheses.

---

<br>
<br>

### __Type I Error__

---

A __Type I error__ occurs if the researcher rejects the null hypothesis if favor of the alternative hypothesis when, in fact, $H_{0}$ is true. The probability
of committing a Type I error is denoted by $\alpha$. 

This type of error is commonly referred to as a false positive. 

---

<br>
<br>

### __Rejection Region__

---

The __rejection region__ of a statistical test is the set of possible values of the test statistic for which the researcher will reject $H_{0}$ in favor
of $H_{a}$.

---


<br>
<br>

### __Type II Error__

---

A __Type II error__ occurs if the researcher accepts the null hypothesis when, in fact, $H_{0}$ is false. The probability of committing
a Type II error is denoted $\beta$.

---


<br>
<br>

<br>

__NB:__ this setup is conceptually similar to the judgement in a court trial. 

* The null hypothesis corresponds to the position of the defendant (presumed innocent until proven guilty)

* The alternative hypothesis corresponds to the position against the defendant

* The null: the absence of a difference, or the the absence of an association

* Type I error corresponds to convicting an innocent defendant

* Type II error corresponds to acquitting a guilty defendant

<br>

### __Elements of a Test of Hypothesis__

---

__1.__ _Null hypothesis_ ($H_{0}$): A theory about the specific values of one or more population parameters. The theory generally represents the status quo, which we adopt until it is proven false. The theory is always stated as $H_{0}: \mbox{parameter} = \mbox{value}$. 

__2.__ _Alternative (research) hypothesis ($H_{a}$): A theory that contradicts the null hypothesis. The theory generally represents that which we will adopt only when sufficient evidence exists to establish truth.

__3.__ _Test statistic:_ A sample statistic used to decide whether to reject the null hypothesis.

__4.__ _Rejection region:_ The numerical values of the test statistic for which the null hypothesis will be rejected. The rejection region is chosen so that the 
probability is $\alpha$ that it will contain the test statistic when the null hypothesis is true, thereby leading to a Type I error. The value of $\alpha$ is 
usually chosen to be small (e.g. .01, .05, or .10) and is referred to as the __level of significance__ of the test. 

__5.__ _Assumptions:_ Clear statement(s) of any assumptions made about the population(s) being sampled. 

__6.__ _Experiment and calculation of test statistic:_ Performance of the sampling experiment and determination of the numerical value of the test statistic.

__7.__ _Conclusion:_ 

* __a.__ If the numerical value of the test statistic falls in the rejection region, we reject the null hypothesis and conclude that the alternative hypothesis is true. We know that the hypothesis-testing process will lead to this conclusion incorrectly (Type I error) only $100\alpha$\%$ of the time when $H_{0}$ is true.
      
* __b.__ If the test statistic does not fall in the rejection region, we do not reject $H_{0}$. Thus, we reserve judgement about which hypothesis is true. We do not conclude that the null hypothesis is true because we do not (in general) know the probability $\beta$ that our test procedure will lead to an incorrect acceptance of $H_{0}$ (Type II error).

### __One-Tailed Test__

---

A __one-tailed test__ of hypothesis is one in which the alternative hypothesis is directional and includes the symbol "<" or ">". Some key words that help you identify the direction are: 

* _Upper-tailed_ (>): "greather than", "larger", "above"

* _Lower-tailed_ (<): "less than", "smaller", "below"

---

<br>
<br>

### __Two-Tailed Test__

---

A __two-tailed test__ of hypothesis is one in which the alternative hypothesis does not specify departure from $H_{0}$ in a particular direction and is written with the symbol "$\ne$". Some key words that help you identify this nondirectional nature are: 

* _Two-tailed_ ($\ne$): "not equal to", "differs from"

---


<br>
<br>

### __Steps for Selecting the Null and Alternative Hypotheses__

---

__1.__ Select the _alternative hypothesis_ as that which the sampling experiment is intended to establish. The alternative hypothesis will assume one of three forms:

- __a.__ One-tailed, __upper-tailed__      (e.g., $H_{a}: \mu > 2,400$)


- __b.__ One-tailed, __lower-tailed__      (e.g., $H_{a}: \mu < 2,400$)


- __c.__ Two-tailed                        (e.g., $H_{a}: \mu \ne 2,400$)


<br>

__2.__ Select the _null hypothesis_ as the status quo, that which will be presumed true unless the sampling experiment conclusively establishes the alternative hypothesis. The null hypothesis will be specified as that parameter value closest to the alterntive in one-tailed tests and as the complementary (or only unspecified) value in two-tailed tests.

$$
(\mbox{e.g.,  } H_{0}: \mu = 2,400)
$$

---


<br>
<br?

## __Example Problems__

<br>

<u><b>Example 1</b></u>


A metal lathe is checked periodically by quality-control inspectors to determine whether or not it is producing machine bearings with a mean diameter of 0.5 inch. If the mean diameter of the bearnings is larger or smaller than 0.5 inch, then the process is out of control and must be adjusted. Formulate the null and alternative hypotheses for a test to determine whether the bearing production process is out of control.

<br>

<u><b>Solution</b></u>

The hypotheses must be stated in terms of a population parameter. Here, we define $\mu$ as the true mean diameter (in inches) of all bearings produced by the metal lathe. If either $\mu > 0.5$ or $\mu < 0.5$, then the lathe's production process if out of control. Because the inspectors want to be able to detect either possibility (indicating that the process is in need of adjustment), these values of $\mu$ represent the alternative (or research) hypothesis. Alternatively, because $\mu = 0.5$ represents an in-control process (the status quo), this represents the null hypothesis. Therefore, we want to conduct the two-tailed test:

<br>

$$
\large{
\begin{aligned}
H_{0}: & \mu = 0.5 \quad \mbox{(i.e., the process is in control)} \\
& \\
H_{a}: & \mu \ne 0.5 \quad \mbox{(i.e., the process if out of control)}
\end{aligned}
}
$$

<br>

__NB:__ Here, the alternative hypothesis is not necessarily the hypothesis that he quality-control inspectors desire to support. However, they will make adjustments to the metal lathe settings only if there is strong evidence to indicate that the process is out of control. Consequently, $\mu \ne 0.5$ must be stated as the alternative hypothesis.

<br>
<br>

<u><b>Example 2</b></u>


A manufacturer of cereal wants to test the performance of one of its filling machines. The machine is designed to discharge a mean amount of 12 ounces per box, and the manufacturer wants to detect any departure from this setting. This quality study calls for randomly sampling 100 boxes from today's production run and determining whether or not the mean fill for the run is 12 ounces per box. Set up a test of hypothesis for this study, using $\alpha = 0.01$.

<br>

<u><b>Solution</b></u>

__Step 1.__ First, we identify the _parameter_ of interest. The key word _mean_ in the statement of the problem implies that the target parameter is $\mu$, the mean of cereal discharged into each box. 

__Step 2.__ Next, we set up the _null and alternative hypotheses_. Because the manufacturer wishes to detech a departure from the setting $\mu = 12$ in either direction, $\mu < 12$ or $\mu > 12$, we conduct a two-tailed statistical test. Following the procedure for selecting the null and alternative hypotheses, we specify as the alternative hypothesis that the mean differences from 12 ounces because detecting the machine's departure from specifications is the purpose of the quality-control study. The null hypothesis is the presumption that the filling machine is operating properly unless the sample data indicate otherwise. Thus,

<br>

$$
\large{
\begin{aligned}
H_{0}:  & \mu = 12 \quad \mbox{(Population mean fill amount is 12 ounces)} \\
& \\
H_{a}:  & \mu \ne 12 \quad \mbox{(i.e., $\mu < 12$ or $\mu > 12$; machine is under or overfilling each box)} 
\end{aligned}
}
$$

__Step 3.__ Now we specify the _test statistic_. The test statistic measures the number of standard deviations between the observed value of $\bar{x}$ and the null hypothesized value $\mu = 12$:

<br>

$$
\large{\mbox{Test statistic  } z = \frac{\bar{x} - 12}{\sigma_{\bar{x}}}}
$$

<br>

__Step 4.__ Next, we determine the _rejection region_. The rejection region must be designated to detect a departure from $\mu = 12$ in _either_ direction, so we will reject $H_{0}$ for values of $z$ that are either too small (negative) or too large (positive). To determine the precise values of $z$ that comprise the rejection region, we first select $\alpha$, the probability that the test will lead to incorrect rejection of the null hypothesis. The we divide $\alpha$ equally between the lower and upper tails of the distribution of $z$. In this example, $\alpha = 0.01$, so $\frac{\alpha}{2} = 0.005$ is placed in each tail. The areas in the tails correspond to $z = -2.576$ and $z = 2.576$, respectively.

<br>

$$
\large{\mbox{Rejection region:  } z = < -2.576 \quad \mbox{ or } \quad z > 2.576}
$$

<br>

In [None]:
α = 0.01
lo = α / 2.0
hi = 1. - α / 2.0
stats.norm.ppf([lo, hi])

<br>

__Step 5.__ Finally, we list any _assumptions_ about the data necessary for the validity of the test. Because the sample size of the experiment is large enough ($n > 30$), the CLT will apply, and no assumptions need be made about the population of fill measurements. The sampling distribution of the sample mean fill of 100 boxes will be approximately normal regardless of the distribution of the individual boxes' fills.


__NB:__ Note that the test is set up _before_ the sampling experiment is conducted. The data are not used to develop the test. Evidently, the manufacturer does not want to disrupt the filling process to adjust the machine, unless the sample data provide every convincing evidence that it is not meeting specifications, because the value of $\alpha$ has been set quite low at $0.01$. If the sample evidence results in rejection of $H_{0}$, the manufacturer will confidently conclude that the machine needs adjustment because there is only a $0.01$ probability of Type I error.

<br>
<br>

### __The Observed Significance Level__ or __$p$-Value__

---

The __observed significance level__, or __$p$-value__, for a specific statistical test is the probability (assuming $H_{0}$ is true) of observing a value of the test statistic that is at least as contradictory to the null hypothesis, and supportive of the alternative hypothesis, as the actual one computed from the sample data.

---

<br>
<br>

### __Steps for Calculating the $p$-Value for a Test of Hypothesis__

---

__1.__ Determine the value of the test statistic $z$ corresponding to the result of the sampling experiment.


__2. a.__ If the test is one-tailed, the $p$-value is equal to the tail area beyond $z$ in the same direction as the alternative hypothesis. Thus, if the alternative hypothesis is of the form $>$, the $p$-value is the area to the right of, or above, the observed $z$-value. Conversely, if the alternative hypothesis is of the form $<$, the $p$-value is the area to the left of, or below, the observed $z$-value.


__2. b.__ If the test is two-tailed, the $p$-value is equal to twice the tail area beyond the observed $z$-value in the direction of the sign of $z$ - that is, if $z$ is positive, the $p$-value is twice the area to the right of, or above, the observed $z$-value. Conversely, if $z$ is negative, the $p$-value is twice the area to the left of, or below, the observed $z$-value.

---


<br>
<br>

<u><b>Example 3</b></u>

Consider the one-tailed test of hypothesis:

<br>

$$
\large{
\begin{aligned}
H_{0}: & \mu = 100 \\
& \\
H_{a}: & \mu > 100
\end{aligned}
}
$$

<br>

__a.__ Assume that $z = 1.44$. Find the $p$-value of the test and the rejection region for the test when $\alpha = 0.05$. Then show that the conclusion using the rejection region approach will be identical to the conclusion based on the $p$-value.

__b.__ Now suppose the test statistic is $z = 3.01$; find the $p$-value and rejection region for the test when $\alpha = 0.05$. Again, show that the conclusion using the rejection region approach will be identical to the conclusion based on the $p$-value.

<br>

<u><b>Solution</b></u>

__a.__ The $p$-value for the test is the probability of observing a test statistic more contradictory to the null hypothesis than the value $z = 1.44$. Since we are conducting an upper-tailed test ($H_{a}: \mu > 100$), the probability we seek is:

<br>

$$
\large{\mbox{$p$-value} = P(z > 1.44) = 1 - P(z < 1.44)}
$$

In [None]:
1. - stats.norm.cdf(1.44)

<br>

Since $\alpha = 0.05$ and the test is upper-tailed, the rejection region for the test is $z > 1.645$ 

<br>

In [None]:
α = 0.05
stats.norm.ppf(1. - α)

<br>

Observe that the test statistic $z = 1.44$ falls outside the rejection region, implying that we fail to reject $H_{0}$. Also, $\alpha = 0.05$ is less than $p$-value = 0.075. This also implies that we should fail to reject $H_{0}$. Consequently, both rules agree - there is insufficient evidence to reject $H_{0}$.

<br>

__b.__ For $z = 3.01$, the observed significance level of the test is: 

<br>

$$
\large{\mbox{$p$-value} = P(z > 3.01) = 1 - P(z < 3.01)}
$$

<br>

In [None]:
1. - stats.norm.cdf(3.01)

<br>

Again, for $\alpha = 0.05$ and an upper-tailed test, the rejection region is $z > 1.645$. The test statistic ($z = 3.01$) falls within the rejection region, leading us to reject $H_{0}$. And $\alpha = 0.05$ now exceeds the $p$-value ($0.0013$), which also implies that we should reject $H_{0}$. Once again, both decision rules agree - and they always will if the same value of $\alpha$ is used to make the decision.

<br>

## __Small Sample Hypothesis Testing__

<br>

Just as with Confidence Intervals, if we are dealing with a situation when $\sigma_{\bar{x}}$ is unknown (the typical case) we will need to estimate it also from the sample and plug it in. When we are working with a small sample, we will want to use Student's $t$ distribution.