# 6.1 An Introduction to Hypothesis Testing

## Objectives

- Compute probabilities using the Central Limit Theorem and demonstrate the ability to interpret sampling distributions of both population proportions and means.
- Analyze a problem involving hypothesis testing, apply the correct techniques, and come to a conclusion for a claim about population proportion and mean, all this while using appropriate levels of statistical significance, $p$-values, and determining what would constitute a type I and type II error.
- Analyze an application in the disciplines business, social sciences, psychology, life sciences, health science, and education, and utilize the correct statistical processes to arrive at a solution.

## Introduction to Hypothesis Testing
Sometimes we want to test whether a claim made about a population parameter is true or not. For instance, a car dealer advertises that its new small truck gets 35 miles per gallon, on average. A tutoring service claims that its method of tutoring helps 90% of its students get an A or a B. A company says that women managers in their company earn an average of \\$53,000 per year.

It's often impractical to survey every member of a population to confirm the claim or **hypothesis**. Instead, statisticians sample a few members of the population and calculate the point estimate of the parameter in question. Then they use the Central Limit Theorem to determine how likely it is to randomly select a sample with that particular point estimate **assuming the hypothesis is true**. If it is exceptionally unlikely that the sample would have the point estimate given that the hypothesis were true, we can conclude with some confidence that the claim is probably not true.

In summary, the fundamental components to a **hypothesis test** are as follows:

1. We want to test if a claim about a population parameter is true.
2. We randomly sample members of the population to find a point estimate of the population parameter.
3. Assuming the claim about the population parameter *is* true, we use the Central Limit Theorem to determine the probability that we obtain the point estimate of our random sample.
4. If the probability of obtaining the point estimate is exceptionally small, we reject the claim.

We will formalize these ideas in the coming chapter and return to them throughout the rest of the course. For now, let's go over a few examples focusing on the big idea and the fundamental components.

***


### Example 1.1
A company says that women managers in their company earn an average of \\$53,000 per year with a standard deviation of \\$5,000. Janice wants to test the claim. She surveys 25 women managers and obtains the following data on income per year:

48230, 53491, 52926, 54832, 48080, 47101, 50126, 47433, 57753, 55107, 57205, 46297, 61249, 66936, 5463, 44429, 52696, 51243, 49008, 51522, 46658, 51788, 60244, 46164, 51103

Do you think Janice should conclude that the claim is not true?

#### Solution
##### Step 1: State the Hypothesis
The claim or hypothesis is that the average annual income of women managers at the company is \\$53,000. Written mathematically, the hypothesis is

$$ \mu = 53,000. $$

##### Step 2: Determine the Features of the Distribution of Point Estimates Using the Central Limit Theorem
We are testing the mean $\mu$ and we are told the population standard deviation $\sigma$. Then by the Central Limit Theorem, sample means are normally distributed with distribution mean

$$\mu_{\overline{X}} = \mu = 53,000 $$

and distribution standard deviation

$$ \sigma_{\overline{X}} = \frac{\sigma}{\sqrt{n}} = \frac{5000}{\sqrt{25}} = 1000. $$

##### Step 3: Assuming the Claim is True, Find the Probability of Obtaining the Point Estimate

First, we calculate the sample mean $\bar{x}$, which is the point estimate of the population mean $\mu$.


In [1]:
x = c(48230, 53491, 52926, 54832, 48080, 47101, 50126, 47433, 57753, 55107, 57205, 46297, 61249, 66936, 5463, 44429, 52696, 51243, 49008, 51522, 46658, 51788, 60244, 46164, 51103)
n = length(x)

xbar = sum(x)/n
xbar

The sample mean is $\bar{x} = 50283.36$.

The $z$-score associated with the sample mean $\bar{x}$ is

$$z = \frac{\bar{x} - \mu_{\overline{X}}}{\sigma_{\overline{X}}} = \frac{50283.36 - 53000}{1000} = -2.7166. $$

We want to find $P(\bar{x} \leq 50283.36) = P(z \leq -2.7166)$$. Let's use R to find this.

In [2]:
pnorm(q = -2.7166, lower.tail = TRUE)

So $P(\bar{x} \leq 50283.36) = P(z \leq -2.7166) = 0.0033$. That is, if the claim were true, there is only a 0.33% chance that a random sample of women managers would yield a sample mean salary of \\$50,283.36 or less

##### Step 4: Make a Conclusion About the Claim
In this case, it is very unlikely that we would obtain the point estimate we did if the claim were true. In fact, it is so unlikely (less than a 1% chance) that we feel confident in concluding that the claim is not true. 

We conclude that the average salary of women managers at the company is not \\$53,000.

***


### Example 1.2
A tutoring service claims that 90% of its students get an A or a B. Mike wants to test the claim. He surveys 45 students who utilized the tutoring service and found that 37 students got an A or a B. Do you think Mike should reject the claim?

#### Solution
##### Step 1: State the Hypothesis
The claim we are testing is the proportion of students of the tutoring service who get an A or a B is 90%. Mathematically, the hypothesis is

$$p = 0.90. $$

##### Step 2: Determine the Features of the Distribution of Point Estimates Using the Central Limit Theorem
By the Central Limit Theorem, we know that sample proportions are normally distributed with distribution mean

$$ \mu_{P'} = p = 0.90 $$

and distribution standard deviation

$$ \sigma_{P'} = \sqrt{\frac{p(1 - p)}{n}} = \sqrt{\frac{0.90(1 - 0.90)}{45}} = 0.0447. $$

##### Step 3: Assuming the Claim is True, Find the Probability of Obtaining the Point Estimate

First note that the sample proportion, the point estimate of the population proportion, is

$$ p' = \frac{x}{n} = \frac{37}{45} = 0.8222. $$

The $z$-score associated with the sample proportion $p'$ is

$$z = \frac{p' - \mu_{P'}}{\sigma_{P'}} = \frac{0.8222 - 0.90}{0.0447} = -1.7405.$$

We want to find $P(p' \leq 0.8222) = P(z \leq -1.7405)$. We calculate this using R.

In [1]:
pnorm(q = -1.7405, lower.tail = TRUE)

Then $P(p' \leq 0.8222) = P(z \leq -1.7405) = 0.0409$. So if the population proportion is actually $p = 0.90$, there is a 4.09% chance that a random sample gives a sample proportion of $p' \leq 0.8222$.

##### Step 4: Make a Conclusion About the Claim
There is about a 4% chance of obtaining a sample proportion of $p' \leq 0.8222$ if the claim were true. This is somewhat unlikely, but it wouldn't be outlandish. 

Ultimately, whether we reject a hypothesis or not is a choice. Would you reject the hypothesis if there is only a 1% chance that we get the point estimate we do? What if it was a 4% chance? An 8% chance? The choice ultimately must be made by the statistician where to draw the line. In a later section, we will cover some guidelines on making this choice.

So what do you think? In this particular case, would you reject the claim that 90% of students of the tutoring service get an A or a B? Or do you think the evidence isn't strong enough to reject the claim?

***


### Example 1.3
A statistics instructor claims that the class average on an exam is 65 points. Richard thinks the class average is higher. He collects the following scores from 9 students in the class:

84, 67, 60, 55, 51, 64, 66, 70, 83

Should Richard reject his instructor's claim?

#### Solution
##### Step 1: State the Hypothesis
The hypothesis is that the class average is 65 points. Mathematically, we write

$$\mu = 65. $$

##### Step 2: Determine the Features of the Distribution of Point Estimates Using the Central Limit Theorem
We want to test the population mean $\mu$, but we are *not* given the population standard deviation $\sigma$. We will need to use the sample standard deviation $s$ and a $t$-distribution with degrees of freedom

$$ df = n-1 = 9-1 = 8. $$

To find the $t$-score, we will need the distribution mean

$$ \mu_{\overline{X}} = \mu = 65. $$

The distribution mean is given by the formula $\sigma_{\overline{X}} = \frac{s}{\sqrt{n}}$, so we will need to find the sample standard deviation $s$ first.

To find the sample standard deviation $s$, first find the sample mean $\bar{x}$.

In [1]:
x = c(84, 67, 60, 55, 51, 64, 66, 70, 83)
n = length(x)

xbar = sum(x)/n
xbar

The sample mean is $\bar{x} = 66.6667$.

Next, we find the sample standard deviation.

In [2]:
s = sqrt(sum( (x - xbar)^2 )/(n - 1))
s

So the sample standard deviation is $s = 11.2472$. We can now calculate the standard deviation of the distribution of sample means:

$$ \sigma_{\overline{X}} = \frac{s}{\sqrt{n}} = \frac{11.2472}{\sqrt{9}} = 3.7491. $$ 

##### Step 3: Assuming the Claim is True, Find the Probability of Obtaining the Point Estimate
We calculated the point estimate in step 2: the sample mean is $\bar{x} =  66.6667$. The $t$-score associated with $\bar{x} = 66.6667$ is

$$t = \frac{\bar{x} - \mu_{\overline{X}}}{\sigma_{\overline{X}}} = \frac{66.6667 - 65}{3.7491} = 0.4446. $$

We want to find $P(\bar{x} \geq 66.6667) = P(t \geq 0.4446)$. We calculate this using R.

In [3]:
pt(q = 0.4446, df = 8, lower.tail = FALSE)

So $P(\bar{x} \geq 66.6667) = P(t \geq 0.4446) = 0.3342$. That is, assuming the claim is true, there is a 33.42% chance that our sample would have a sample mean of $66.6667$ or greater.

##### Step 4: Make a Conclusion About the Claim
If the claim is true, there is a 33.42% chance that a random sample of the population has a sample mean as extreme as $\bar{x} = 66.6667$. This isn't a small probability; there is a fairly large chance that we obtain a sample with the sample mean we do given that the claim is true. We usually don't reject a claim unless we obtain at a probability of at least less than 10%. So we cannot conclude that the mean exam score is not 65 points.


***

### Example 1.4

In [None]:
#**VID=yJWkbN-91xc**#

***

### Example 1.5

In [None]:
#**VID=0Vj0LibwID8**#

***

### Example 1.6

In [None]:
#**VID=z2rIPx5p7lI**#

***

<small style="color:gray"><b>License:</b> This work is licensed under a [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/) license.</small>

<small style="color:gray"><b>Author:</b> Taylor Baldwin, Mt. San Jacinto College</small>

<small style="color:gray"><b>Adapted From:</b> <i>Introductory Statistics</i>, by Barbara Illowsky and Susan Dean. Access for free at [https://openstax.org/books/introductory-statistics/pages/1-introduction](https://openstax.org/books/introductory-statistics/pages/1-introduction).</small>