# Content

### The Four-Step Process for a Significance Test

Carrying out a significance test is a structured process. We'll use the acronym **STATE, PLAN, DO, CONCLUDE**.

#### **Big Picture: What is a Significance Test?**
A confidence interval's goal is to *estimate* a population parameter. A significance test's goal is to *assess the evidence provided by data against a claim* about that parameter. We are asking: "Is the observed result statistically significant (unlikely to have happened by random chance alone), or could it have plausibly occurred by chance?"

---

### Step 1: STATE

This step involves defining the hypotheses and the significance level.

**Hypotheses:**
Every test has two hypotheses that compete against each other.
*   **The Null Hypothesis (H₀):** This is the "status quo" or "no effect" hypothesis. It's a statement of no difference or no change. It always includes a statement of equality. For a proportion, it takes the form:
    *   **H₀: p = p₀** (where `p₀` is the hypothesized, or claimed, value)
*   **The Alternative Hypothesis (Hₐ):** This is the claim we are trying to find evidence *for*. It can take one of three forms:
    *   **Hₐ: p > p₀** (a one-sided test, looking for an increase)
    *   **Hₐ: p < p₀** (a one-sided test, looking for a decrease)
    *   **Hₐ: p ≠ p₀** (a two-sided test, looking for any difference)

**Significance Level (α):**
This is the "threshold of surprise." It's the probability of rejecting the null hypothesis when it's actually true (a Type I error). We choose `α` *before* collecting data. Common levels are `α = 0.05` or `α = 0.01`. If our result is less likely than `α` to occur by chance, we'll consider it statistically significant.

---

### Step 2: PLAN

This step involves naming the test and checking the conditions for its validity.

1.  **Name the test:** "We will perform a **one-sample z-test for a population proportion `p`**."
2.  **Check Conditions:** These are the same three conditions as for confidence intervals, with one crucial difference in the "Large Counts" check.
    *   **Random:** The data must come from a random sample or randomized experiment.
    *   **Independent (10% Rule):** Sample size `n` must be no more than 10% of the population `N`.
    *   **Large Counts (Normality):** Since our entire test is based on the premise "assuming the null hypothesis is true," we use the hypothesized proportion `p₀` from the null hypothesis to check this condition.
        *   **n * p₀ ≥ 10** and **n * (1 - p₀) ≥ 10**

---

### Step 3: DO

This is the calculation step where we find our test statistic and the P-value.

#### **1. Calculating a z statistic in a test about a proportion**

**Theory:** The z-statistic measures how many standard deviations our sample statistic (`p̂`) is from the hypothesized parameter (`p₀`). It tells us how "surprising" our result is.

The formula is:
**z = (p̂ - p₀) / √[p₀(1-p₀)/n]**
*   `p̂` = Your sample proportion (`x/n`).
*   `p₀` = The hypothesized proportion from H₀.
*   `√[p₀(1-p₀)/n]` = The standard deviation of the sampling distribution, assuming the null hypothesis is true.

#### **2. Calculating a P-value given a z statistic**

**Theory:** The **P-value** is the heart of the test. It's the answer to the question: **"Assuming the null hypothesis is true, what is the probability of getting a sample result as extreme or more extreme than the one we actually observed?"**

The calculation depends on the alternative hypothesis:
*   If **Hₐ: p > p₀**, the P-value is the area to the **right** of your z-statistic. `P-value = P(Z ≥ z)`.
*   If **Hₐ: p < p₀**, the P-value is the area to the **left** of your z-statistic. `P-value = P(Z ≤ z)`.
*   If **Hₐ: p ≠ p₀**, the P-value is the area in **both tails**. You find the area in one tail and **double it**. `P-value = 2 * P(Z ≥ |z|)`.

---

### Step 4: CONCLUDE

This final step involves making a decision and interpreting it in the context of the problem.

#### **3. Making conclusions in a test about a proportion**

There are two steps to a good conclusion:
1.  **Make a formal decision:** Compare your P-value to your significance level `α`.
    *   If **P-value < α**, your result is statistically significant. You **reject the null hypothesis (H₀)**. There is convincing evidence for the alternative hypothesis (Hₐ).
    *   If **P-value ≥ α**, your result is not statistically significant. You **fail to reject the null hypothesis (H₀)**. There is *not* convincing evidence for the alternative hypothesis (Hₐ).

2.  **State your conclusion in context:** Write a sentence or two explaining what your decision means in terms of the original problem.

**Crucial Nuance:** "Failing to reject H₀" does not mean you have proven H₀ is true. It simply means you lack sufficient evidence to say it's false. Think of a court case: a "not guilty" verdict doesn't mean the person is innocent, just that the prosecution failed to prove guilt beyond a reasonable doubt.

---

### 4. Significance test for a proportion: Free Response Example

**Scenario:** A nationwide study showed that 70% of college students have a streaming subscription. A professor at a large state university (over 20,000 students) believes the proportion at her university is *lower*. She takes a random sample of 200 students and finds that 128 of them have a subscription. Does this provide convincing evidence for her belief at the `α = 0.05` significance level?

#### **STATE**
*   **Parameter:** `p` = the true proportion of all students at this university who have a streaming subscription.
*   **Null Hypothesis (H₀):** `p = 0.70` (The proportion at this university is the same as the national average).
*   **Alternative Hypothesis (Hₐ):** `p < 0.70` (The professor's belief that the proportion is lower).
*   **Significance Level (α):** `α = 0.05`.

#### **PLAN**
*   **Test:** We will perform a one-sample z-test for a population proportion `p`.
*   **Check Conditions:**
    *   **Random:** The problem states a "random sample" was taken.
    *   **Independent (10% Rule):** The sample size `n=200` is less than 10% of the 20,000+ students at the university.
    *   **Large Counts (Normality):** We use `p₀ = 0.70`.
        *   `n * p₀ = 200 * 0.70 = 140` (which is ≥ 10).
        *   `n * (1 - p₀) = 200 * 0.30 = 60` (which is ≥ 10).
    *   Conditions are met.

#### **DO**
1.  **Calculate the sample proportion (`p̂`):**
    *   `p̂ = 128 / 200 = 0.64`
2.  **Calculate the z-statistic:**
    *   `z = (0.64 - 0.70) / √[0.70(1-0.70)/200]`
    *   `z = -0.06 / √[0.21/200] = -0.06 / √0.00105 ≈ -0.06 / 0.0324`
    *   **z ≈ -1.85**
3.  **Calculate the P-value:**
    *   Since Hₐ is `p < 0.70`, we need the area to the left of our z-statistic.
    *   `P-value = P(Z ≤ -1.85)`
    *   Using a Z-table or calculator, this probability is **≈ 0.0322**.

#### **CONCLUDE**
1.  **Formal Decision:** Our P-value (0.0322) is less than our significance level `α` (0.05). Therefore, we **reject the null hypothesis (H₀)**.
2.  **Conclusion in Context:** We have convincing statistical evidence to support the professor's belief that the proportion of students at her university with a streaming subscription is lower than the national average of 70%.

***

### Python Code Illustration



In [None]:
import numpy as np
from scipy.stats import norm

# --- Setup from the Free Response Example ---
p0 = 0.70      # Hypothesized proportion from H₀
n = 200        # Sample size
x = 128        # Number of successes in sample
alpha = 0.05   # Significance level

# --- DO Step ---
# 1. Calculate sample proportion (p-hat)
p_hat = x / n

# 2. Calculate the standard deviation of the null distribution (standard error)
standard_error = np.sqrt(p0 * (1 - p0) / n)

# 3. Calculate the z-statistic
z_statistic = (p_hat - p0) / standard_error

# 4. Calculate the P-value
# Since H_a is p < p0 (a "less than" test), we want the area to the left of z
# The norm.cdf() function gives the cumulative area to the left, which is exactly what we need.
p_value = norm.cdf(z_statistic)

print("--- Calculation (DO) Step ---")
print(f"Sample Proportion (p̂): {p_hat:.4f}")
print(f"Z-statistic: {z_statistic:.4f}")
print(f"P-value: {p_value:.4f}")
print("-" * 30)

# --- CONCLUDE Step ---
print("--- Conclusion Step ---")
print(f"Significance Level (α): {alpha}")
print(f"Is P-value ({p_value:.4f}) < alpha ({alpha})? {p_value < alpha}")

if p_value < alpha:
    print("\nDecision: Reject the null hypothesis (H₀).")
    print("Conclusion: There is convincing evidence that the true proportion of students with a subscription at this university is less than 70%.")
else:
    print("\nDecision: Fail to reject the null hypothesis (H₀).")
    print("Conclusion: There is not convincing evidence that the true proportion of students with a subscription at this university is less than 70%.")
