In [1]:
import math
import scipy.stats as st

import ipywidgets as widgets
from ipywidgets import interact

# Chapter 9: Hypothesis Testing with One Sample

Where Confidence intervals allow us to estimate a population parameter, the process of **hypothesis testing** allows us to make a _decision_ about a parameter.

In this chapter, you will conduct hypothesis tests on **single means** and **single proportions**. You will also learn about the **errors** associated with these tests.

Hypothesis testing consists of two contradictory hypotheses or statements, a decision based on the data, and a conclusion. To perform a hypothesis test, a statistician will:
1. Set up two contradictory hypotheses.
2. Collect sample data (in homework problems, the data or summary statistics will be given to you).
3. Determine the correct distribution to perform the hypothesis test.
4. Analyze sample data by performing the calculations that ultimately will allow you to reject or decline to reject the null hypothesis.
5. Make a decision and write a meaningful conclusion.

## Null and Alternative Hypotheses
The actual test begins by considering two **hypotheses**.  They are called the **null hypothesis** and the **alternative hypothesis**.  These hypotheses contain opposing viewpoints.

$H_0$: **The null hypothesis**: It is a statement of no difference between the variables—they are not related. This can often be considered the _status quo_ and as a result if you cannot accept the null it requires some action.

$H_a$: **The alternative hypothesis**: It is a claim about the population that is contradictory to $H_0$ and what we conclude when we reject $H_0$. <span style="color:yellow">This is usually what the researcher is trying to prove.</span>

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a **decision**. There are two options for a decision. They are:
* "reject $H_O$" if the sample information favors the alternative hypothesis
* "do not reject $H_O$" or "decline to reject $H_O$" if the sample information is insufficient to reject the null hypothesis.

Mathematical Symbols Used in $H_0$ and $H_a$:

|$H_0$|$H_a$|
|--|--|
|equal(=)|not equal($\ne$) **or** greater than ($\gt$) **or** less than ($\lt$)|
|greater than or equal to ($\geq$)|less than ($\lt$)|
|less than or equal to ($\leq$)|more than ($\gt$)|

> Note: H0 always has a symbol with an equal in it. Ha never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

<span style="color:orange">Example 9.2</span>

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). 

The null and alternative hypotheses are:
* $H_0$: μ = 2.0
* $H_a$: μ ≠ 2.0

<span style="color:orange">Example 9.3</span>

We want to test if college students take less than five years to graduate from college, on the average. 

The null and alternative hypotheses are:
* $H_0$: μ ≥ 5
* $H_a$: μ < 5

<span style="color:orange">Example 9.4</span>

In an issue of U. S. News and World Report, an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses.
* $H_0$: p ≤ 0.066
* $H_a$: p > 0.066

## Outcomes and the Type I and Type II Errors
When you perform a hypothesis test, there are <span style="color:pink">four possible outcomes</span> depending on the actual truth (or falseness) of the null hypothesis $H_0$ and the decision to reject or not. The outcomes are summarized in the following table:

|**ACTION**|**$H_0$ IS ACTUALLY**|...|
|--|--|--|
||True|False|
|**Do not reject $H_0$**|Correct Outcome|Type II error|
|**Reject $H_0$**|Type I Error|Correct Outcome|

The four possible outcomes in the table are:
1. The decision is **not to reject $H_0$** when **$H_0$ is true (correct decision)**.
2. The decision is to **reject $H_0$** when **$H_0$ is true** (incorrect decision known as a **Type I error**).
3. The decision is **not to reject $H_0$** when, in fact, **$H_0$ is false** (incorrect decision known as a **Type II error**).
4. The decision is to **reject $H_0$** when **$H_0$ is false** (**correct decision** whose probability is called the **Power of the Test**).

Each of the errors occurs with a particular probability. The Greek letters $\alpha$ and $\beta$ represent the probabilities.
* $\alpha$ = probability of a Type I error = **P(Type I error)** = probability of rejecting the null hypothesis when the null hypothesis is true.

* $\beta$ = probability of a Type II error = **P(Type II error)** = probability of not rejecting the null hypothesis when the null hypothesis is false.

<span style="color:yellow">$\alpha$ and $\beta$ should be as small as possible because they are probabilities of errors. They are rarely zero.</span>

The Power of the Test is $1-\beta$. Ideally, we want a high power that is as close to one as possible. Increasing the sample size can increase the Power of the Test.

The following are examples of Type I and Type II errors.

<span style="color:orange">Example 9.5</span>

Suppose the null hypothesis, $H_0$, is: Frank's rock climbing equipment is safe.

(So, Frank is trying to prove that his equipment is not safe)

* **Type I error**: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe.
    * **$\alpha =$ probability** that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe.
* **Type II error**: Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.
    * **$\beta =$ probability** that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.

Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his rock climbing equipment is safe, he will go ahead and use it.)

<span style="color:orange">Example 9.6</span>

Suppose the null hypothesis, $H_0$, is: The victim of an automobile accident is alive when he arrives at the emergency room of a hospital.

* **Type I error**: The emergency crew thinks that the victim is dead when, in fact, the victim is alive.
    * ** $\alpha =$** probability that the emergency crew thinks the victim is dead when, in fact, he is really alive = P(Type I error).
* **Type II error**: The emergency crew does not know if the victim is alive when, in fact, the victim is dead.
    * ** $\beta =$** probability that the emergency crew does not know if the victim is alive when, in fact, the victim is dead = P(Type II error).

The error with the greater consequence is the Type I error. (If the emergency crew thinks the victim is dead, they will not treat him.)

<span style="color:orange"> Example 9.7 </span>

It’s a Boy Genetic Labs claim to be able to increase the likelihood that a pregnancy will result in a boy being born. Statisticians want to test the claim. Suppose that the null hypothesis, H0, is: It’s a Boy Genetic Labs has no effect on gender outcome.

* **Type I error**: This results when a true null hypothesis is rejected. In the context of this scenario, we would state that we believe that It’s a Boy Genetic Labs influences the gender outcome, when in fact it has no effect. The probability of this error occurring is denoted by the Greek letter alpha, α.

* **Type II error**: This results when we fail to reject a false null hypothesis. In context, we would state that It’s a Boy Genetic Labs does not influence the gender outcome of a pregnancy when, in fact, it does. The probability of this error occurring is denoted by the Greek letter beta, β.

The error of greater consequence would be the Type I error since couples would use the It’s a Boy Genetic Labs product in hopes of increasing the chances of having a boy.

<span style="color:orange">Example 9.8</span>

A certain experimental drug claims a cure rate of at least 75% for males with prostate cancer. Describe both the Type I and Type II errors in context. Which error is the more serious?

* **Type I**: A cancer patient believes the cure rate for the drug is less than 75% when it actually is at least 75%.

* **Type II**: A cancer patient believes the experimental drug has at least a 75% cure rate when it has a cure rate that is less than 75%.

In this scenario, the Type II error contains the more severe consequence. If a patient believes the drug works at least 75% of the time, this most likely will influence the patient’s (and doctor’s) choice about whether to use the drug as a treatment option.

## Distribution Needed for Hypothesis Testing

Recall particular distributions are associated with hypothesis testing.

**Perform tests of a population mean** using a:
* normal distribution, or a
* Student's t-distribution
    * use a Student's t-distribution when:
        * the population `standard deviation` is unknown and
        * the distribution of the sample mean is approximately normal.

**Perform tests of a proportion** using a normal distribution (usually $n$ is large).

Writing a test for a `single population mean`, the distribution for the test is for **means**:
* $\bar{X}\sim N\bigl(\mu_X, \frac{\sigma_X}{\sqrt{n}}\bigr)$ or $t_{df}$
    * The population parameter is $\mu$
    * The estimated value (point estimate) for $\mu$ is $\bar{x}$, the sample mean

Writing a test for a `single population proportion`, the distribution for the test is for **proportions** or **percentages**:
* $P^\prime \sim N \bigl(p, \sqrt{\frac{p \cdot q}{n}}\bigr)$
    * The population parameter is $p$
    * The estimated value (point estimate) for $p$ is $p^\prime$.
        * $p^\prime = \frac{x}{n}$
            * $x$ is the number of successes and
            * $n$ is the sample size

### Assumptions

* When you perform a **hypothesis test of a single population mean $\mu$** using a **Student's t-distribution** (t-test):
    * Assumptions that need to be met for the test to work properly:
        * Data should be a **simple random sample** that comes from a population which is approximately **normally distributed**
        * use the sample **standard deviation** to approximate the population stndard deviation.
        * (Note, if sample size is sufficiently large, a t-test will work even if the population is not approximately normally distributed)
* When you perform a **hypothesis test of a single population mean $\mu$** using a **normal distribution**:
    * Assumptions that need to be met for the test to work properly:
        * take a simple random sample from the population
        * the population is normally distributed _or_ your sample size is sufficiently large
        * You know the value of the population standard deviation (rarely known).
* When you perform a **hypothesis test of a single population proportion $p$:
    * Assumptions that need to be met for the test to work properly:
        * take a simple random sample from the population
        * you must meet the conditions for a **binomial distribution**:
            * there are a certain number $n$ of independent trials
            * the outcomes of any trial are success or failure
            * each trial has the same probability of a success $p$
        * the shape of the binomial distribution needs to be similar to the shape of the normal distribution.
            * to ensure this, the quantities $np$ and $nq$ must both be greater than five ($np \gt 5$ and $nq \lt 5$)
            * the binomial distribution of a sample (extimated) proportion can be approximated by the normal distribution with:
                * $\mu = p$
                * $\sigma = \sqrt{\frac{pq}{n}}$
                * Remember that $q=1-p$

<span style="color:yellow">How to go about a hypothesis test:</span>
* <span style="color:yellow">Establish the type of distribution.</span>
* <span style="color:yellow">Establish the sample size</span>
* <span style="color:yellow">Known or unknown standard deviation can help you figure out how to go about a hypothesis test.</span>

<span style="color:yellow">Other items to consider:</span>
* **Rare Events**

### Using the Sample to Test the Null Hypothesis
Use the sample data to calculate the actual probability of getting the test result, called the **$p$-value**
* $p$-value: the probability that, if the null hypothesis is true, the results from another randomly selected sample will be as extreme or more extreme as the results obtianed from the given sample.
* A **large** $p$-value calculated from the data indicates that we should **not reject** the **null hypothesis**.
* The **smaller** $p$-value the more unlikely the outcome, and the stronger the evidence is **against the null hypothesis**.
* We would **reject** the null hypothesis if the evidence is strongly against it.

<span style="color:orange">Example 9.9</span>
Suppose a baker claims that his bread height is more than 15 cm, on average. Several of his customers do not believe him. To persuade his customers that he is right, the baker decides to do a hypothesis test. He bakes 10 loaves of bread. The mean height of the sample loaves is 17 cm. The baker knows from baking hundreds of loaves of bread that the standard deviation for the height is 0.5 cm. and the distribution of heights is normal.

$\therefore$
* $n=10$
* **$\sigma$ is known**, $\sigma = 0.5$ cm (for the height)
* $\bar{x} = 17$ cm
* distribution is _normal_

The null hypothesis could be $H_0 : \mu \leq 15$

The alternate hypothesis is $H_a : \mu \gt 15$

$\therefore$
* $\mu = 15$
* $\frac{\sigma}{\sqrt{n}}=\frac{0.5}{\sqrt{10}}=0.16$

The words **"is more than"** translates as a "$\gt$" so "$\gt 15$" goes into the alternate hypothesis.  The null hypothesis must _contradict_ the alternate hypothesis.

Suppose the null hypothesis is true (the mean height of the loaves is no more than 15 cm). Then is the mean height (17 cm) calculated from the sample unexpectedly large? <span style="color:pink">The hypothesis test works by asking the question how **unlikely** the sample mean would be if the null hypothesis were true</span>. The graph shows how far out the sample mean is on the normal curve.

The $p$-value is the probability that, _if we were to take other samples, any other sample mean would fall at least as far out as 17 cm._

**The $p$-value, then, is the probability that a sample mean is the same or greater than 17 cm. when the population mean is, in fact, 15 cm**. We can calculate this probability using the normal distribution for means.

![image.png](attachment:a316d27b-c72f-4d51-a5be-0f6c460b24ac.png)

$p\text{-value}=P(\bar{x} \lt 17)$ which is approximately zero.

In [2]:
1 - st.norm.cdf(17, loc=15, scale=0.5)

3.167124183311998e-05

A p-value of approximately zero tells us that it is highly unlikely that a loaf of bread rises no more than 15 cm, on average. That is, almost 0% of all loaves of bread would be at least as high as 17 cm. **purely by CHANCE** had the population mean height really been 15 cm. Because the outcome of 17 cm. is so **unlikely (meaning it is happening NOT by chance alone**), we conclude that the evidence is strongly against the null hypothesis (the mean height is at most 15 cm.). There is sufficient evidence that the true mean height for the population of the baker's loaves of bread is greater than 15 cm.

### Decision and Conclusion
A systematic way to make a decision of whether to reject or not reject the **null hypothesis** is to compare the $p$-value and a **preset or preconceived $\alpha$ (also called a "significance level")**.

A preset $\alpha$ is the probabilility of a **Type I error** (rejecting the null hypothesis whent he null hypothesis is true).  It may or may not be given to you at the beginning of the problem.

When you make a **decision** to reject or not reject $H_0$, do as follows:
* If $\alpha \gt p$-value, reject $H_0$. The results of the sample data are significant. There is sufficient evidence to conclude that H0 is an incorrect belief and that the **alternative hypothesis**, $H_\alpha$, may be correct.
* If $\alpha \leq p$-value, do not reject $H_0$. The results of the sample data are not significant.There is not sufficient evidence to conclude that the alternative hypothesis,Ha, may be correct.
* When you "do not reject $H_0$", it does not mean that you should believe that $H_0$ is true. It simply means that the sample data have failed to provide sufficient evidence to cast serious doubt about the truthfulness of $H_0$.

**Conclusion**: After you make your decision, write a thoughtful **conclusion** about the hypotheses in terms of the given problem.

<span style="color:orange">Example 9.10</span>

When using the $p$-value to evaluate a hypothesis test, it is sometimes useful to use the following memory device

If the $p$-value is low, the null must go.

If the $p$-value is high, the null must fly.

This memory aid relates a $p$-value less than the established alpha (the $p$ is low) as rejecting the null hypothesis and, likewise, relates a $p$-value higher than the established alpha (the $p$ is high) as not rejecting the null hypothesis.

Fill in the blanks.

Reject the null hypothesis when `the p-value is less than the established alpha value`. The results of the sample data `support the alternative hypothesis`.

Do not reject the null when hypothesis when `the p-value is greater than the established alpha value`. The results of the sample data `do not support the alternative hypothesis`.

## Additional Information and Full Hypothesis Text Examples

* In a **hypothesis test** problem, you may see words such as "the level of significance is 1%." The "1%" is the preconceived or preset $\alpha$.
* The statistician setting up the hypothesis test selects the value of α to use **before** collecting the sample data.
* **If no level of significance is given, a common standard to use is $\alpha = 0.05$**.
* When you calculate the p-value and draw the picture, the $p$-value is the area in the left tail, the right tail, or split evenly between the two tails. For this reason, we call the hypothesis test left, right, or two tailed.
* The **alternative hypothesis**,  $H_\alpha$ , tells you if the test is left, right, or two-tailed. It is the **key** to conducting the appropriate test.
* $H\alpha$ never has a symbol that contains an equal sign.
* Thinking about the meaning of the p-value: A data analyst (and anyone else) should have more confidence that he made the correct decision to reject the null hypothesis with a smaller p-value (for example, 0.001 as opposed to 0.04) even if using the 0.05 level for alpha. Similarly, for a large $p$-value such as 0.4, as opposed to a p-value of 0.056 (alpha = 0.05 is less than either number), a data analyst should have more confidence that she made the correct decision in not rejecting the null hypothesis. This makes the data analyst use judgment rather than mindlessly applying rules.
The following examples illustrate a left-, right-, and two-tailed test.

<span style="color:orange">Example 9.11</span>

* $H_0: \mu=5$
* $H_\alpha: \mu \lt 5$

Test of a single population mean.

$H_\alpha$ tells you the test is left-tailed.

The picture of the $p$-value is as follows:

![image.png](attachment:2e293e47-c3e7-47db-b1a8-cc2a7c29d149.png)

<span style="color:orange">Example 9.12</span>

* $H_0: p \leq 0.2$
* $H_\alpha: p \gt 0.2$

This is a test of a single population proportion.

$H_\alpha$ tells you the test is **right-tailed**.

The picture of the p-value is as follows:

![image.png](attachment:7b485cb9-deda-4423-b969-f6f7c95ad0f4.png)

<span style="color:orange">Example 9.13</span>

* $H_0: p = 50$
* $H_\alpha: p \ne 50$

This is a test of a single population mean.

$H_\alpha$ tells you the test is **two-tailed**.

The picture of the $p$-value is as follows.

![image.png](attachment:69346d08-8e09-4571-bd3b-11764d9fe368.png)

### Full Hypothesis Test Examples

<span style="color:orange">Example 9.14</span>

Jeffrey, as an eight-year old, **established a mean time of 16.43 seconds** for swimming the 25-yard freestyle, with a **standard deviation of 0.8 seconds**. His dad, Frank, thought that Jeffrey could swim the 25-yard freestyle faster using goggles. Frank bought Jeffrey a new pair of expensive goggles and timed Jeffrey for **15 25-yard freestyle swims**. For the 15 swims, **Jeffrey's mean time was 16 seconds. Frank thought that the goggles helped Jeffrey to swim faster than the 16.43 seconds**. Conduct a hypothesis test using a preset α = 0.05. Assume that the swim times for the 25-yard freestyle are normal.

**Setup the hypothesis test**:

swimming 25-yard freestyle


Since the problem is about a mean, this is a test of a **single population mean**.
* $H_0: \mu = 16.43 $
* $H_\alpha : \mu \lt 16.43$

For Jeffrey to swim faster, his time will be less than 16.43 seconds. The "<" tells you this is left-tailed.

Determine the distribution needed:

**Random Variable**: $\bar{X}=$ the mean time to swim the 25-yard freestyle.

**Distribution for the test**: $\bar{X}=$ is normal (population **standard deviation** is known: $\sigma = 0.8$)
* $\bar{X} \sim N \bigl( \mu , \frac{\sigma_x}{\sqrt{n}} \bigr)$ Therefore, $\bar{X} \sim N \bigl(16.43, \frac{0.8}{\sqrt{15}} \bigr)$

* $\mu = 16.43 $ seconds, comes from $H_0$ and not the data.
* $\sigma = 0.8$ seconds

With new goggles:
* $n = 15$
* $\bar{x} = 16$ seconds

Calculate the $[$-value using the normal distribution for a mean:

$p$-value $=P(\bar{x}\lt 16 = 0.0187)$ where the sample mean in the problem is given as 16.

Father thinks with goggles Jeffrey swam faster than the 16.43 seconds.
* preset $\alpha = 0.05$
* Assume swim times for 25-yard freestyle are normal.


In [3]:
st.norm.cdf(16, loc=16.43, scale=0.8/math.sqrt(15))

0.018683635713606015

![image.png](attachment:d7be6a7e-21e3-4d72-931c-e39a44874efd.png)

$p$-value = 0.0187 (This is called the **actual level of significance**.) The p-value is the area to the left of the sample mean is given as 16.

$\mu$ = 16.43 comes from $H_0$. Our assumption is $\mu = 16.43$.

**Interpretation of the p-value: If $H_0$ is true**, there is a 0.0187 probability (1.87%)that Jeffrey's mean time to swim the 25-yard freestyle is 16 seconds or less. Because a 1.87% chance is small, the mean time of 16 seconds or less is unlikely to have happened randomly. It is a rare event.

<span style="color:lightblue">Compare $\alpha$ and the $p$-value:</span>

$\alpha$ = 0.05 $p$-value = 0.0187 $\alpha$ > $p$-value

**Make a decision**: Since $\alpha$ > $p$-value, reject $H_0$.

This means that you reject $\mu = 16.43$. In other words, you do not think Jeffrey swims the 25-yard freestyle in 16.43 seconds but faster with the new goggles.

**Conclusion**: At the 5% significance level, we conclude that Jeffrey swims faster using the new goggles. The sample data show there is sufficient evidence that Jeffrey's mean time to swim the 25-yard freestyle is less than 16.43 seconds.

The $p$-value can easily be calculated.

The Type I and Type II errors for this problem are as follows:

The Type I error is to conclude that Jeffrey swims the 25-yard freestyle, on average, in less than 16.43 seconds when, in fact, he actually swims the 25-yard freestyle, on average, in 16.43 seconds. (Reject the null hypothesis when the null hypothesis is true.)

The Type II error is that there is not evidence to conclude that Jeffrey swims the 25-yard free-style, on average, in less than 16.43 seconds when, in fact, he actually does swim the 25-yard free-style, on average, in less than 16.43 seconds. (Do not reject the null hypothesis when the null hypothesis is false.)

<span style="color:orange">Example 9.15</span>

A college football coach records the mean weight that his players can bench press as **275 pounds**, with a **standard deviation of 55 pounds**. Three of his players thought that the mean weight was **more than** that amount. They asked **30** of their teammates for their estimated maximum lift on the bench press exercise. The data ranged from 205 pounds to 385 pounds. The actual different weights were (frequencies are in parentheses) 205(3); 215(3); 225(1); 241(2); 252(2); 265(2); 275(2); 313(2); 316(5); 338(2); 341(1); 345(2); 368(2); 385(1).
Conduct a hypothesis test using a 2.5% level of significance to determine if the bench press mean is **more than 275 pounds**.

In [4]:
sample_data = [
    205, 205, 205, 215, 215, 215, 225,
    241, 241, 252, 252, 265, 265, 275,
    275, 313, 313, 316, 316, 316, 316,
    316, 338, 338, 341, 345, 345, 368,
    368, 385
]

In [5]:
len(sample_data)

30

In [6]:
sum(sample_data)/len(sample_data)

286.1666666666667

The problem is about a mean weight, this is a **test of a single population mean**.

Calculate the distribution needed:

Random variable: $\bar{X}$ = the mean weight, in pounds, lifted by the football players.

Distribution for the test: It is a normal because $\sigma$ is known.

$\bar{X} \sim N \bigl( 275, \frac{55}{\sqrt{30}} \bigr)$

So,
* $\mu = 275$ pounds
* $\sigma = 55$ pounds

Hypotheses:
* $H_0: \mu = 275$
* $H_a: \mu \gt 275$
* $\alpha = 0.025$

This is a right-tailed test.

* $n=30$
* $\bar{x} = 286.166$ (derived from the data)

In [7]:
1 - st.norm.cdf(286.166, loc=275, scale=55/math.sqrt(30))

0.13307415356649321

![image.png](attachment:fd203265-8687-4004-b7e8-79a2ff86824e.png)

$p-\text{value } = P(\bar{x} \gt 286.2) = 0.1323$

**Interpretation of the $p$-value**: If H$_0$ is true, then there is a 0.1331 probability (13.23%) that the football players can lift a mean weight of 286.2 pounds or more. Because a 13.23% chance is large enough, a mean weight lift of 286.2 pounds or more is not a rare event.

Compare $\alpha$ and the $p$-value:

$\alpha$ = 0.025 $p$-value = 0.1323

**Make a decision**: Since $\alpha$ < $p$-value, do not reject $H_0$.

**Conclusion**: At the 2.5% level of significance, from the sample data, there is not sufficient evidence to conclude that the true mean weight lifted is more than 275 pounds.

The $p$-value can easily be calculated.

<span style="color:orange">Example 9.16</span>

Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 65; 65; 70; 67; 66; 63; 63; 68; 72; 71. He performs a hypothesis test using a 5% level of significance. The data are assumed to be from a normal distribution.

In [8]:
sample_data = [65, 65, 70, 67, 66, 63, 63, 68, 72, 71]

In [9]:
sum(sample_data)/len(sample_data)

67.0

* The test is for a single population mean
* WE DO NOT KNOW $\sigma$, the population standard deviation - therefore we must use a student's t-test
* $n = 10$
* $\mu = 65$
* $\bar{x} = 67$
* $\alpha = 0.05$
* $H_0: \mu = 65$
* $H_a: \mu \gt 65$

Since the instructor thinks the average score is higher, use a ">". The ">" means the test is right-tailed.

* $t_{df} = t_{10-1} = t_9$

Calculate the p-value using the Student's t-distribution:

$p$-value = $P(\bar{x} \gt 67)=0.0396$ where the sample mean and sample standard deviation are calculated as 67 and 3.1972 from the data.

In [10]:
st.tstd(sample_data)

3.197221015541813

In [11]:
1 - st.norm.cdf(67, loc=65, scale=3.1972)

0.2658059395805097

In [12]:
st.t.cdf(67, loc=65, scale=3.1972, df=9)

0.7264277292244297

In [13]:
1 - st.t.cdf(67, loc=65, df=9)

0.03827641188535047

**Interpretation of the $p$-value**: If the null hypothesis is true, then there is a 0.0396 probability (3.96%) that the sample mean is 65 or more.

![image.png](attachment:fc5cb45d-39a2-4fd9-af9b-247e86410cf7.png)

Compare $\alpha$ and the $p$-value:

Since $\alpha = 0.05$ and $p$-value = 0.0396. $\alpha$ > p-value.

Make a decision: Since $\alpha$ > $p$-value, reject $H_0$.

This means you reject $\mu = 65$. In other words, you believe the average test score is more than 65.

Conclusion: At a 5% level of significance, the sample data show sufficient evidence that the mean (average) test score is more than 65, just as the math instructor thinks.

The p-value can easily be calculated.

<span style="color:orange">Example 9.17</span>

Joon believes that 50% of first-time brides in the United States are younger than their grooms. She performs a hypothesis test to determine if the percentage is **the same or different from 50%**. Joon samples **100 first-time brides** and **53** reply that they are younger than their grooms. For the hypothesis test, she uses a 1% level of significance.

##### Set up the hypothesis test:
The 1% level of significance means that:
* $\alpha = 0.01$

This is a **test of a single population proportion**
* $H_0: p = 0.50$
* $H_a: p \ne 0.50$

The words **"is the same or different from"** tell you this is a two-tailed test.

##### Calculate the distribution needed:
* **Random variable**: $P^\prime$ = the percent of first-time brides who are younger than their grooms.
* **Distribution for the test**: The problem contains no mention of a mean.  The information is given in terms of percentages.  Use the distribution for $P^\prime$, the estimated proportion.
* $p=0.50$
* $q=1-p=0.50$
* $n=100$
* $P^\prime \sim N \bigl (p, \sqrt{\frac{p \cdot q}{n}} \bigr) \therefore, P^\prime \sim N \bigl (0.5, \sqrt{\frac{0.5 \cdot 0.5}{100}} \bigr)$

##### Calculate the p-value using the normal distribution for proportions:
* $p\text{-value } = P(p^\prime \lt 0.47 \text{ or } p^\prime \gt 0.53) = 0.5485$
* where
* $x=53$
* $p^\prime = \frac{x}{n}=\frac{53}{100}=0.53$

![image.png](attachment:5965703a-eb2d-4bd0-ac0c-135f364d4003.png)

**Interpretation of the $p$-value**: If the null hypothesis is true, there is 0.5485 probability (54.85%) that the sample (estimated) proportion $p^\prime$ is 0.53 or more OR 0.47 or less.

$\mu = p = 0.50$ comes from $H_0$, the null hypothesis.

$p^\prime = 0.53$. Since the curve is symmetrical and the test is two-tailed, the $p^\prime$ for the left tail is equal to 0.50 – 0.03 = 0.47 where $\mu = p = 0.50$. (0.03 is the difference between 0.53 and 0.50.)

Compare $\alpha$ and the $p$-value:

Since $\alpha = 0.01$ and $p$-value = 0.5485. $\alpha \lt p-$value.

**Make a decision**:
* Since $\alpha \lt p-$value, you cannot reject $H_0$.

**Conclusion**:
* At the 1% level of significance, the sample data do not show sufficient evidence that the percentage of first-time brides who are younger than their grooms is different from 50%.

The $p$-value can easily be calculated.

<span style="color:orange">Example 9.18</span>

Suppose a consumer group suspects that the proportion of households that have three cell phones is 30%. A cell phone company has reason to believe that the proportion is not 30%. Before they start a big advertising campaign, they conduct a hypothesis test. Their marketing people survey 150 households with the result that 43 of the households have three cell phones.

b.  What is a **success** for this problem?

c. What is the level of significance?

d. Draw the graph for this problem. Draw the horizontal axis. Label and shade appropriately.
Calculate the $p$-value.

e. Make a decision. _____________(Reject/Do not reject) $H_0$ because____________.

#### The Hypothesis Test:
* $H_0: p = 0.30$
* $H_a: p \ne 0.30$

#### Determine distribution needed:
* The **random variable** is $P^\prime = $ proportion of households that have three cell phones.
* the **distribution** for the hypothesis test is $P^\prime \sim N \bigl ( 0.30, \sqrt{\frac{(0.30) \cdot (0.70)}{150}} \bigr)$

a. The value that helps determine the p-value is p′. Calculate p′.

* $p^\prime = \frac{x}{n}$ where x is the number of successes and n is the total number in the sample.
* $x=43$
* $n=150$
* $p^\prime = \frac{43}{150} = 0.287$

b.  What is a **success** for this problem?

A success is having three cell phones in a household.

c. What is the level of significance?

The level of significance is the preset $\alpha$.  Since $\alpha$ is not given, assume that $\alpha = 0.05$.

d. Draw the graph for this problem. Draw the horizontal axis. Label and shade appropriately.
Calculate the $p$-value.

$p\text{-value } = 0.7216$

e. Make a decision. (Reject/Do not reject) $H_0$ because.

Assuming that $\alpha = 0.05$, $\alpha \lt p\text{-value}$. The decision is do not reject $H_0$ because there is not sufficient evidence to conclude that the proportion of households that have three cell phones is not 30%.

In [14]:
(.30, math.sqrt((0.30*0.70)/150))

(0.3, 0.03741657386773942)

In [15]:
43/150

0.2866666666666667

In [16]:
1 - st.norm.ppf(0.287, loc=.30, scale=0.03741657386773942)

0.7210344862665174

<span style="color:red">Example 9.19</span>

My dog has so many fleas,
They do not come off with ease.
As for shampoo, I have tried many types
Even one called Bubble Hype,
Which only killed 25% of the fleas,
Unfortunately I was not pleased.

I've used all kinds of soap,
Until I had given up hope
Until one day I saw
An ad that put me in awe.

A shampoo used for dogs
Called GOOD ENOUGH to Clean a Hog
Guaranteed to kill more fleas.

I gave Fido a bath
And after doing the math
His number of fleas
Started dropping by 3's!

Before his shampoo
I counted 42.
At the end of his bath,
I redid the math
And the new shampoo had killed 17 fleas.
So now I was pleased.

Now it is time for you to have some fun
With the level of significance being .01,
You must help me figure out
Use the new shampoo or go without?

#### Setup the Hypothesis test:
* $H_0: p \leq 0.25$
* $H_a: p \gt 0.25$

#### Determine the distribution needed:
* $P^\prime$ = The proportion of fleas that are killed by the shampoo
* **Normal**: $N \bigl ( 0.25, \sqrt{\frac{(0.25)(1-0.25)}{42}} \bigr)$
* **Test Statistic**: $z=2.3163$
* $p$-value using normal distribution for proportions:
    * $p\text{-value } = 0.0103$

In [17]:
17/42

0.40476190476190477

In [18]:
(0.25, math.sqrt((0.25*(1-0.25))/42))

(0.25, 0.0668153104781061)

In [19]:
st.norm.ppf(1 - 0.40476, loc=0.25, scale=0.06681531)

0.2661055170552045

In [20]:
math.sqrt((0.25*(1-0.25))/42)

0.0668153104781061

<span style="color:orange">Example 9.20</span>

The National Institute of Standards and Technology provides exact data on conductivity properties of materials. Following are conductivity measurements for 11 randomly selected pieces of a particular type of glass.
1.11; 1.07; 1.11; 1.07; 1.12; 1.08; .98; .98 1.02; .95; .95
Is there convincing evidence that the average conductivity of this type of glass is greater than one? Use a significance level of 0.05. Assume the population is normal.

In [21]:
sample_data = [1.11, 1.07, 1.11, 1.07, 1.12, 1.08, 0.98, 0.98, 1.02, 0.95, 0.95]

In [22]:
len(sample_data)

11

In [23]:
sum(sample_data)/len(sample_data)

1.04

In [24]:
st.tstd(sample_data)

0.06587867636800247

#### Setup the Hypothesis test:
* $H_0: p \gt 1$
* $H_a: p \leq 1$

* $\alpha = 0.05$
* $n=11$
* $\bar{x} = 1.04$
* $s_x = 0.06587867636800247$
* Testing mean without a known population standard deviation, thefore use a <span style="color:pink">**t-test**</span>
* $t_{df} = t_{(11 - 1)} = t_{10}$
* $\frac{\alpha}{2}=\frac{0.05}{2}=0.025$, then $t_{\frac{\alpha}{2}}=t_{0.025}$
* p-value = $P(\bar{x} \gt 1)$

In [25]:
# calcluate the t-value, from the problem the assumption is mu = 1
(1.04-1)/(0.065878676368/math.sqrt(11))

2.0137774303956264

$\therefore \text{t-value}=2.013777$

In [26]:
1 - st.t.cdf(2.013, df=10)

0.035907194095100015

$\therefore p\text{-value}=0.0359$

**State the Conclusions**: Since the p-value (p = 0.036) is less than our alpha value, we will reject the null hypothesis. It is reasonable to state that the data supports the claim that the average conductivity level is greater than one.

<span style="color:orange">Example 9.21</span>

In a study of 420,019 cell phone users, 172 of the subjects developed brain cancer. Test the claim that cell phone users developed brain cancer at a greater rate than that for non-cell phone users (the rate of brain cancer for non-cell phone users is 0.0340%). Since this is a critical issue, use a 0.005 significance level. Explain why the significance level should be so low in terms of a Type I error.

**Hypothesis**:
* $H_0: p \leq 0.00034$
* $H_a: p \gt 0.00034$
> If we commit a Type I error, we are essentially accepting a false claim. Since the claim describes cancer-causing environments, we want to minimize the chances of incorrectly identifying causes of cancer.

**Distribution Type**:
* Proportions
* $x = 172$
* $n = 410,019$
* $p = 0.000340$%
* $np = 410,019(0.0034) = 142.8$
* $nq = 410,019(0.99966) = 419,876.2$
* Two independent outcomes and a fixed probability of success $p=0.00034$

In [27]:
172/420019

0.00040950528428475855

In [36]:
1 - st.norm.cdf(0.0004095, loc=0.00034, scale=math.sqrt((0.00034*(1-0.00034)/420019)))

0.007279442156423843

Since the $p\text{-value} = 0.0073$ is greater than our alpha value = 0.005, we cannot reject the null. Therefore, we conclude that there is not enough evidence to support the claim of higher brain cancer rates for the cell phone users.

<span style="color:red">Example 9.22</span>

According to the US Census there are approximately 268,608,618 residents aged 12 and older. Statistics from the Rape, Abuse, and Incest National Network indicate that, on average, 207,754 rapes occur each year (male and female) for persons aged 12 and older. This translates into a percentage of sexual assaults of 0.078%. In Daviess County, KY, there were reported 11 rapes for a population of 37,937. Conduct an appropriate hypothesis test to determine if there is a statistically significant difference between the local sexual assault percentage and the national sexual assault percentage. Use a significance level of 0.01.

**Hypothesis**:
* $H_0: p = 0.078$%
* $H_0: p \ne 0.078$%
> We need to test whether the proportion of sexual assaults in Daviess County, KY is significantly different from the national average.

**Distribution**:
* Two-tailed Proportion

* 

In [37]:
11/37937

0.00028995439808102906

In [45]:
1 - st.norm.cdf(0.000289954, loc=0.00078, scale=math.sqrt((0.00078*(1-0.00078)/268608618)))

1.0

## Chapter Summary

### 9.3 Distribution Needed for Hypothesis Testing
If there is no given preconceived $\alpha$, then use $\alpha = 0.05$.

**Types of Hypothesis Tests**
* Single population mean, **known** population variance (or standard deviation): **Normal test**.
* Single population mean, **unknown** population variance (or standard deviation): **Student's t-test**.
* Single population proportion: Normal test.
* For a **single population mean**, we may use a normal distribution with the following mean and standard deviation. Means:
* $\mu=\mu_\bar{x}$  and  $\sigma_\bar{x} = \frac{\sigma_x}{\sqrt{n}}$ 
* A **single population proportion**, we may use a normal distribution with the following mean and standard deviation. * Proportions: $\mu=p$ and $\sigma = \sqrt{\frac{pq}{n}}$.