# Content

[The Four-Step Process for a Significance Test]()

[Step 1: STATE]()

[Step 2: PLAN]()

[Step 3: DO]()

[Step 4: CONCLUDE]()

[Confidence Interval for the Difference Between Two Proportions]()

[Hypothesis Test for a Difference in Proportions]()

[Using a Confidence Interval to Make a Conclusion for a Test]()


### The Four-Step Process for a Significance Test

Carrying out a significance test is a structured process. We'll use the acronym **STATE, PLAN, DO, CONCLUDE**.

#### **Big Picture: What is a Significance Test?**
A confidence interval's goal is to *estimate* a population parameter. A significance test's goal is to *assess the evidence provided by data against a claim* about that parameter. We are asking: "Is the observed result statistically significant (unlikely to have happened by random chance alone), or could it have plausibly occurred by chance?"

---

### Step 1: STATE

This step involves defining the hypotheses and the significance level.

**Hypotheses:**
Every test has two hypotheses that compete against each other.
*   **The Null Hypothesis (H₀):** This is the "status quo" or "no effect" hypothesis. It's a statement of no difference or no change. It always includes a statement of equality. For a proportion, it takes the form:
    *   **H₀: p = p₀** (where `p₀` is the hypothesized, or claimed, value)
*   **The Alternative Hypothesis (Hₐ):** This is the claim we are trying to find evidence *for*. It can take one of three forms:
    *   **Hₐ: p > p₀** (a one-sided test, looking for an increase)
    *   **Hₐ: p < p₀** (a one-sided test, looking for a decrease)
    *   **Hₐ: p ≠ p₀** (a two-sided test, looking for any difference)

**Significance Level (α):**
This is the "threshold of surprise." It's the probability of rejecting the null hypothesis when it's actually true (a Type I error). We choose `α` *before* collecting data. Common levels are `α = 0.05` or `α = 0.01`. If our result is less likely than `α` to occur by chance, we'll consider it statistically significant.

---

### Step 2: PLAN

This step involves naming the test and checking the conditions for its validity.

1.  **Name the test:** "We will perform a **one-sample z-test for a population proportion `p`**."
2.  **Check Conditions:** These are the same three conditions as for confidence intervals, with one crucial difference in the "Large Counts" check.
    *   **Random:** The data must come from a random sample or randomized experiment.
    *   **Independent (10% Rule):** Sample size `n` must be no more than 10% of the population `N`.
    *   **Large Counts (Normality):** Since our entire test is based on the premise "assuming the null hypothesis is true," we use the hypothesized proportion `p₀` from the null hypothesis to check this condition.
        *   **n * p₀ ≥ 10** and **n * (1 - p₀) ≥ 10**

---

### Step 3: DO

This is the calculation step where we find our test statistic and the P-value.

#### **1. Calculating a z statistic in a test about a proportion**

**Theory:** The z-statistic measures how many standard deviations our sample statistic (`p̂`) is from the hypothesized parameter (`p₀`). It tells us how "surprising" our result is.

The formula is:
**z = (p̂ - p₀) / √[p₀(1-p₀)/n]**
*   `p̂` = Your sample proportion (`x/n`).
*   `p₀` = The hypothesized proportion from H₀.
*   `√[p₀(1-p₀)/n]` = The standard deviation of the sampling distribution, assuming the null hypothesis is true.

#### **2. Calculating a P-value given a z statistic**

**Theory:** The **P-value** is the heart of the test. It's the answer to the question: **"Assuming the null hypothesis is true, what is the probability of getting a sample result as extreme or more extreme than the one we actually observed?"**

The calculation depends on the alternative hypothesis:
*   If **Hₐ: p > p₀**, the P-value is the area to the **right** of your z-statistic. `P-value = P(Z ≥ z)`.
*   If **Hₐ: p < p₀**, the P-value is the area to the **left** of your z-statistic. `P-value = P(Z ≤ z)`.
*   If **Hₐ: p ≠ p₀**, the P-value is the area in **both tails**. You find the area in one tail and **double it**. `P-value = 2 * P(Z ≥ |z|)`.

---

### Step 4: CONCLUDE

This final step involves making a decision and interpreting it in the context of the problem.

#### **3. Making conclusions in a test about a proportion**

There are two steps to a good conclusion:
1.  **Make a formal decision:** Compare your P-value to your significance level `α`.
    *   If **P-value < α**, your result is statistically significant. You **reject the null hypothesis (H₀)**. There is convincing evidence for the alternative hypothesis (Hₐ).
    *   If **P-value ≥ α**, your result is not statistically significant. You **fail to reject the null hypothesis (H₀)**. There is *not* convincing evidence for the alternative hypothesis (Hₐ).

2.  **State your conclusion in context:** Write a sentence or two explaining what your decision means in terms of the original problem.

**Crucial Nuance:** "Failing to reject H₀" does not mean you have proven H₀ is true. It simply means you lack sufficient evidence to say it's false. Think of a court case: a "not guilty" verdict doesn't mean the person is innocent, just that the prosecution failed to prove guilt beyond a reasonable doubt.

---

### 4. Significance test for a proportion: Free Response Example

**Scenario:** A nationwide study showed that 70% of college students have a streaming subscription. A professor at a large state university (over 20,000 students) believes the proportion at her university is *lower*. She takes a random sample of 200 students and finds that 128 of them have a subscription. Does this provide convincing evidence for her belief at the `α = 0.05` significance level?

#### **STATE**
*   **Parameter:** `p` = the true proportion of all students at this university who have a streaming subscription.
*   **Null Hypothesis (H₀):** `p = 0.70` (The proportion at this university is the same as the national average).
*   **Alternative Hypothesis (Hₐ):** `p < 0.70` (The professor's belief that the proportion is lower).
*   **Significance Level (α):** `α = 0.05`.

#### **PLAN**
*   **Test:** We will perform a one-sample z-test for a population proportion `p`.
*   **Check Conditions:**
    *   **Random:** The problem states a "random sample" was taken.
    *   **Independent (10% Rule):** The sample size `n=200` is less than 10% of the 20,000+ students at the university.
    *   **Large Counts (Normality):** We use `p₀ = 0.70`.
        *   `n * p₀ = 200 * 0.70 = 140` (which is ≥ 10).
        *   `n * (1 - p₀) = 200 * 0.30 = 60` (which is ≥ 10).
    *   Conditions are met.

#### **DO**
1.  **Calculate the sample proportion (`p̂`):**
    *   `p̂ = 128 / 200 = 0.64`
2.  **Calculate the z-statistic:**
    *   `z = (0.64 - 0.70) / √[0.70(1-0.70)/200]`
    *   `z = -0.06 / √[0.21/200] = -0.06 / √0.00105 ≈ -0.06 / 0.0324`
    *   **z ≈ -1.85**
3.  **Calculate the P-value:**
    *   Since Hₐ is `p < 0.70`, we need the area to the left of our z-statistic.
    *   `P-value = P(Z ≤ -1.85)`
    *   Using a Z-table or calculator, this probability is **≈ 0.0322**.

#### **CONCLUDE**
1.  **Formal Decision:** Our P-value (0.0322) is less than our significance level `α` (0.05). Therefore, we **reject the null hypothesis (H₀)**.
2.  **Conclusion in Context:** We have convincing statistical evidence to support the professor's belief that the proportion of students at her university with a streaming subscription is lower than the national average of 70%.

***

### Python Code Illustration



In [1]:
import numpy as np
from scipy.stats import norm

# --- Setup from the Free Response Example ---
p0 = 0.70      # Hypothesized proportion from H₀
n = 200        # Sample size
x = 128        # Number of successes in sample
alpha = 0.05   # Significance level

# --- DO Step ---
# 1. Calculate sample proportion (p-hat)
p_hat = x / n

# 2. Calculate the standard deviation of the null distribution (standard error)
standard_error = np.sqrt(p0 * (1 - p0) / n)

# 3. Calculate the z-statistic
z_statistic = (p_hat - p0) / standard_error

# 4. Calculate the P-value
# Since H_a is p < p0 (a "less than" test), we want the area to the left of z
# The norm.cdf() function gives the cumulative area to the left, which is exactly what we need.
p_value = norm.cdf(z_statistic)

print("--- Calculation (DO) Step ---")
print(f"Sample Proportion (p̂): {p_hat:.4f}")
print(f"Z-statistic: {z_statistic:.4f}")
print(f"P-value: {p_value:.4f}")
print("-" * 30)

# --- CONCLUDE Step ---
print("--- Conclusion Step ---")
print(f"Significance Level (α): {alpha}")
print(f"Is P-value ({p_value:.4f}) < alpha ({alpha})? {p_value < alpha}")

if p_value < alpha:
    print("\nDecision: Reject the null hypothesis (H₀).")
    print("Conclusion: There is convincing evidence that the true proportion of students with a subscription at this university is less than 70%.")
else:
    print("\nDecision: Fail to reject the null hypothesis (H₀).")
    print("Conclusion: There is not convincing evidence that the true proportion of students with a subscription at this university is less than 70%.")


--- Calculation (DO) Step ---
Sample Proportion (p̂): 0.6400
Z-statistic: -1.8516
P-value: 0.0320
------------------------------
--- Conclusion Step ---
Significance Level (α): 0.05
Is P-value (0.0320) < alpha (0.05)? True

Decision: Reject the null hypothesis (H₀).
Conclusion: There is convincing evidence that the true proportion of students with a subscription at this university is less than 70%.


### The Context of Errors in Hypothesis Testing

When we make a decision in a significance test (either "reject H₀" or "fail to reject H₀"), we are doing so with incomplete information (a sample, not the whole population). Our decision might be correct, or it might be wrong. There are four possible outcomes, which can be summarized in a table:

| | **Reality: H₀ is True** | **Reality: H₀ is False** |
| :--- | :--- | :--- |
| **Our Decision: Reject H₀** | **Type I Error** (False Alarm) | **Correct Decision (Power)** |
| **Our Decision: Fail to Reject H₀** | **Correct Decision** | **Type II Error** (Missed Effect) |

---

### 1. Introduction to Type I and Type II Errors

#### Theory
*   A **Type I Error** occurs when we **reject a null hypothesis that is actually true**.
    *   **In short:** A "false positive" or "false alarm." We conclude there is an effect when, in reality, there isn't one.
    *   **Probability:** The probability of making a Type I error is denoted by the Greek letter **α (alpha)**. This is our significance level. When we choose `α = 0.05`, we are explicitly accepting a 5% risk of making this type of error.

*   A **Type II Error** occurs when we **fail to reject a null hypothesis that is actually false**.
    *   **In short:** A "false negative" or "missed opportunity." We fail to detect an effect that actually exists.
    *   **Probability:** The probability of making a Type II error is denoted by the Greek letter **β (beta)**. The value of `β` is not chosen directly; it depends on several factors, including sample size and the true size of the effect.

#### Identifying Type I and Type II Errors in Context
The key is to write out the consequence of each error in the specific context of the problem.

Let's use our previous example: A professor tests if the proportion of students with a streaming subscription (`p`) at her university is less than the national average of 0.70.
*   **H₀: p = 0.70**
*   **Hₐ: p < 0.70**

*   **How to describe a Type I Error:**
    1.  Start with the conclusion for rejecting H₀: "The professor concludes that the proportion of students with a subscription is less than 70%..."
    2.  Add the reality for H₀ being true: "...when in reality, the proportion is equal to 70%."
    *   **Full Description:** A Type I error would be concluding that the proportion of students with a subscription at this university is less than 70%, when in fact it is 70%. (She finds a "significant" result that is just due to random chance).

*   **How to describe a Type II Error:**
    1.  Start with the conclusion for failing to reject H₀: "The professor concludes that there is not enough evidence to say the proportion of students with a subscription is less than 70%..."
    2.  Add the reality for H₀ being false: "...when in reality, the proportion is actually less than 70%."
    *   **Full Description:** A Type II error would be failing to find convincing evidence that the proportion of students with a subscription is less than 70%, when in truth it really is lower. (Her study failed to detect a real difference).

---

### 2. Power in Significance Tests

#### Theory
**Power** is the probability that a test will **correctly reject a false null hypothesis**. It is the probability of avoiding a Type II error.

*   **Formula:** **Power = 1 - β**
*   **In short:** It's the probability of finding an effect if there really is an effect to be found. A powerful test is a good test.

We want tests with high power. The power of a significance test depends on three main factors:

1.  **Sample Size (`n`):** The single best way to increase power. More data provides more information and makes it easier to distinguish a real effect from random noise. **Larger `n` = Higher Power**.
2.  **Effect Size:** This is the magnitude of the difference between the hypothesized value (`p₀`) and the true value (`p`). It's much easier to detect a large difference than a small one. **Larger Effect Size = Higher Power**.
3.  **Significance Level (`α`):** Increasing `α` (e.g., from 0.01 to 0.05) lowers the standard for rejecting H₀. This makes you more likely to reject H₀, which in turn means you are more likely to *correctly* reject H₀ if it's false. This creates a direct trade-off: **Higher `α` = Higher Power = Higher risk of Type I Error**.

---

### 3. Consequences of Errors and Significance

#### Theory
The choice of a significance level `α` should be based on the real-world consequences of making each type of error. You must ask: **"Which error is worse?"**

#### Real-Life Example: Medical Drug Trial
A company is testing a new drug to treat a disease. The current treatment has a 30% success rate. The company hopes the new drug is better.
*   **H₀: p = 0.30** (The new drug is no better than the old one).
*   **Hₐ: p > 0.30** (The new drug is better).

*   **Consequence of a Type I Error (False Positive):**
    *   **What happens:** The company concludes the new drug is better when it's not.
    *   **The cost:** Patients are prescribed an expensive new drug that is no more effective than the old one. They might suffer new side effects for no benefit. The company invests millions in producing a useless drug. This is a very bad outcome.

*   **Consequence of a Type II Error (Missed Opportunity):**
    *   **What happens:** The study fails to detect that the new drug is, in fact, better.
    *   **The cost:** A genuinely superior treatment is abandoned. Patients continue to receive the less effective old treatment. The company misses out on a successful product. This is also a very bad outcome.

*   **Choosing `α` and Significance:**
    *   Because the consequence of a Type I error (releasing a useless drug) is potentially very harmful to the public, medical trials often use a very small significance level, like **`α = 0.01`**. They are willing to decrease the power of the test (making a Type II error more likely) to minimize the chance of a false alarm.
    *   In other fields, where the consequences are less dire (e.g., testing a new website layout), a researcher might be comfortable with `α = 0.05` or even `α = 0.10` to increase the power to detect a small but potentially profitable improvement.

***

### Python Code Illustration

This code manually calculates the power of a test to demonstrate the concepts. It shows how power is affected by sample size, effect size, and the significance level.





In [1]:
import numpy as np
from scipy.stats import norm

def calculate_power_one_sample_proportion(n, p0, p_true, alpha, alternative='smaller'):
    """
    Calculates the power of a one-sample z-test for a proportion manually.
    
    Args:
        n: Sample size
        p0: Proportion under the null hypothesis
        p_true: A specific true proportion under the alternative hypothesis
        alpha: Significance level
        alternative: 'smaller', 'larger', or 'two-sided'
    
    Returns:
        The power of the test (float).
    """
    # 1. Find the critical value (z-score) in the null distribution
    if alternative == 'two-sided':
        critical_z = norm.ppf(1 - alpha / 2)
    else:
        critical_z = norm.ppf(1 - alpha)

    # 2. Find the critical sample proportion (p-hat_crit) that marks the rejection region
    se_null = np.sqrt(p0 * (1 - p0) / n)
    if alternative == 'smaller':
        p_hat_crit = p0 - critical_z * se_null
    elif alternative == 'larger':
        p_hat_crit = p0 + critical_z * se_null
    else: # two-sided needs two critical values
        p_hat_crit_lower = p0 - critical_z * se_null
        p_hat_crit_upper = p0 + critical_z * se_null

    # 3. Calculate the standard error of the *alternative* distribution
    se_alternative = np.sqrt(p_true * (1 - p_true) / n)

    # 4. Find the probability of being in the rejection region, from the perspective
    #    of the alternative distribution. This is the power.
    if alternative == 'smaller':
        z_for_power = (p_hat_crit - p_true) / se_alternative
        power = norm.cdf(z_for_power)
    elif alternative == 'larger':
        z_for_power = (p_hat_crit - p_true) / se_alternative
        power = 1 - norm.cdf(z_for_power)
    else: # two-sided
        z_lower = (p_hat_crit_lower - p_true) / se_alternative
        z_upper = (p_hat_crit_upper - p_true) / se_alternative
        power = norm.cdf(z_lower) + (1 - norm.cdf(z_upper))
        
    return power


# --- Setup from the professor example ---
p0 = 0.70
n_original = 200
alpha_original = 0.05
p_true_original = 0.62  # A specific, true alternative we want to detect

# --- 1. Calculate the power for the original test ---
power_original = calculate_power_one_sample_proportion(n_original, p0, p_true_original, alpha_original, alternative='smaller')

print("--- Power Calculation ---")
print(f"The power of the test to detect a true p of {p_true_original} is: {power_original:.4f}")
print("This means there's a ~78% chance the professor's test will correctly reject H₀ if the true proportion is 62%.")
print(f"The probability of a Type II Error (β) would be 1 - {power_original:.2f} = {1-power_original:.2f}\n")
print("-" * 60)


# --- 2. How does sample size affect power? ---
power_n400 = calculate_power_one_sample_proportion(400, p0, p_true_original, alpha_original, alternative='smaller')
print("--- The Effect of Sample Size ---")
print(f"Power with n=200: {power_original:.4f}")
print(f"Power with n=400: {power_n400:.4f}")
print("Conclusion: Doubling the sample size significantly increased the power of the test.\n")
print("-" * 60)

# --- 3. How does effect size affect power? ---
# What if the true proportion was even lower, say p_true = 0.60? (A larger effect)
power_large_effect = calculate_power_one_sample_proportion(n_original, p0, 0.60, alpha_original, alternative='smaller')
print("--- The Effect of Effect Size ---")
print(f"Original power (for p_true=0.62): {power_original:.4f}")
print(f"Power for larger effect (p_true=0.60): {power_large_effect:.4f}")
print("Conclusion: It's much easier to detect a larger difference, so the power is higher.\n")
print("-" * 60)

# --- 4. How does significance level affect power? ---
power_alpha_01 = calculate_power_one_sample_proportion(n_original, p0, p_true_original, 0.01, alternative='smaller')
print("--- The Effect of Alpha ---")
print(f"Power with α=0.05: {power_original:.4f}")
print(f"Power with α=0.01: {power_alpha_01:.4f}")
print("Conclusion: Using a stricter alpha (making it harder to reject H₀) reduces the power of the test.")


--- Power Calculation ---
The power of the test to detect a true p of 0.62 is: 0.7817
This means there's a ~78% chance the professor's test will correctly reject H₀ if the true proportion is 62%.
The probability of a Type II Error (β) would be 1 - 0.78 = 0.22

------------------------------------------------------------
--- The Effect of Sample Size ---
Power with n=200: 0.7817
Power with n=400: 0.9594
Conclusion: Doubling the sample size significantly increased the power of the test.

------------------------------------------------------------
--- The Effect of Effect Size ---
Original power (for p_true=0.62): 0.7817
Power for larger effect (p_true=0.60): 0.9112
Conclusion: It's much easier to detect a larger difference, so the power is higher.

------------------------------------------------------------
--- The Effect of Alpha ---
Power with α=0.05: 0.7817
Power with α=0.01: 0.5535
Conclusion: Using a stricter alpha (making it harder to reject H₀) reduces the power of the test.



### Confidence Interval for the Difference Between Two Proportions

The goal here is to **estimate** the true difference (`p₁ - p₂`) between two population proportions.

#### Theory
Just like with a single proportion, our estimate will be built around our sample data.
*   **Point Estimate:** Our best guess for the true difference `p₁ - p₂` is the difference we observed in our samples, `p̂₁ - p̂₂`.
*   **The Formula:** We use the same standard structure:
    **Point Estimate ± (Critical Value * Standard Error)**
    **`(p̂₁ - p̂₂) ± z* * √[ (p̂₁(1-p̂₁)/n₁) + (p̂₂(1-p̂₂)/n₂) ]`**

Let's break down the **Standard Error (SE)** part, as it's the most complex piece:
*   We learned that to find the variance of a difference, we **add the individual variances**.
*   `Var(p̂₁ - p̂₂) = Var(p̂₁) + Var(p̂₂)`
*   `Var(p̂₁ - p̂₂) = [p₁(1-p₁)/n₁] + [p₂(1-p₂)/n₂]`
*   The standard error is the square root of the estimated variance, so we use our sample proportions (`p̂`) as the best available substitute for the true proportions (`p`).

#### Conditions for a Valid Confidence Interval
The conditions must hold for **both** samples independently.
1.  **Random:** Both samples must be selected randomly or come from a randomized experiment.
2.  **Independent (10% Rule):** Both sample sizes (`n₁` and `n₂`) must be no more than 10% of their respective population sizes (`N₁` and `N₂`).
3.  **Large Counts (Normality):** The number of successes and failures must be at least 10 in **both** samples.
    *   `n₁p̂₁ ≥ 10`, `n₁(1-p̂₁) ≥ 10`,  **and**
    *   `n₂p̂₂ ≥ 10`, `n₂(1-p̂₂) ≥ 10`

#### Example: Constructing and Interpreting a Confidence Interval
**Scenario:** A tech company wants to know if there's a difference in smartphone preference between teenagers and adults. They survey a random sample of 80 teenagers and find that 60 prefer Brand A. They survey a random sample of 100 adults and find that 55 prefer Brand A. Construct a 95% confidence interval for the difference.

1.  **STATE:** We want to estimate `p₁ - p₂` at a 95% confidence level, where `p₁` is the true proportion of all teenagers who prefer Brand A and `p₂` is the true proportion of all adults who prefer Brand A.

2.  **PLAN:** We will construct a two-sample z-interval for a difference in proportions.
    *   **Check Conditions:**
        *   `p̂₁` (teens) = 60/80 = 0.75
        *   `p̂₂` (adults) = 55/100 = 0.55
        *   **Random:** Stated in the problem.
        *   **Independent:** 80 teenagers and 100 adults are less than 10% of all teenagers and adults.
        *   **Large Counts:**
            *   Teens: `80*0.75=60`, `80*0.25=20`. Both ≥ 10.
            *   Adults: `100*0.55=55`, `100*0.45=45`. Both ≥ 10.
        *   Conditions are met.

3.  **DO:**
    *   `p̂₁ - p̂₂ = 0.75 - 0.55 = 0.20`
    *   `z*` for 95% confidence is `1.96`.
    *   `SE = √[ (0.75*0.25/80) + (0.55*0.45/100) ] = √[ 0.00234 + 0.002475 ] = √0.004815 ≈ 0.0694`
    *   **Interval = 0.20 ± 1.96 * 0.0694 = 0.20 ± 0.136`**
    *   Interval = (0.064, 0.336)

4.  **CONCLUDE:** We are 95% confident that the interval from 0.064 to 0.336 captures the true difference between the proportion of teenagers and adults who prefer Brand A. Since the entire interval is positive, this suggests that the proportion is higher for teenagers than for adults.

***

### Hypothesis Test for a Difference in Proportions

The goal here is to **assess evidence for a claim** about the difference `p₁ - p₂`.

#### Constructing Hypotheses
*   **Null Hypothesis (H₀):** The null hypothesis always states that there is **no difference** between the two groups.
    *   **H₀: p₁ - p₂ = 0**  or simply  **H₀: p₁ = p₂**
*   **Alternative Hypothesis (Hₐ):** This is the claim we want to find evidence for.
    *   **Hₐ: p₁ - p₂ > 0** (Proportion for group 1 is larger)
    *   **Hₐ: p₁ - p₂ < 0** (Proportion for group 1 is smaller)
    *   **Hₐ: p₁ - p₂ ≠ 0** (The proportions are simply different)

#### The Critical Difference: Pooled (Combined) Proportions
When we perform a significance test, our foundational assumption is that **the null hypothesis is true**. If `p₁ = p₂`, then we shouldn't use two separate estimates (`p̂₁` and `p̂₂`) for what we assume is the *same* underlying proportion.

Instead, we **pool** the data to get one, better estimate called the **combined sample proportion (`p̂_c`)**.
**`p̂_c = (Total successes in both samples) / (Total individuals in both samples) = (x₁ + x₂) / (n₁ + n₂)`**

We use this `p̂_c` for two things:
1.  **Checking the Large Counts condition for tests:** `n₁p̂_c ≥ 10`, `n₁(1-p̂_c) ≥ 10`, etc.
2.  **Calculating a pooled Standard Error for the test statistic.**

#### The Full Test Procedure
Let's use the same smartphone data to test if the proportions are different.

1.  **STATE:**
    *   **H₀: p₁ - p₂ = 0**
    *   **Hₐ: p₁ - p₂ ≠ 0** (We want to know if there's *any* difference).
    *   `α = 0.05`.

2.  **PLAN:**
    *   **Test:** Two-sample z-test for a difference in proportions.
    *   **Conditions:** Random and Independent are the same. For Large Counts, we use `p̂_c`.
        *   `p̂_c = (60 + 55) / (80 + 100) = 115 / 180 ≈ 0.639`
        *   `n₁*p̂_c = 80*0.639=51.1`, `n₁*(1-p̂_c) = 80*0.361=28.9`. Both ≥ 10.
        *   `n₂*p̂_c = 100*0.639=63.9`, `n₂*(1-p̂_c) = 100*0.361=36.1`. Both ≥ 10.
        *   Conditions are met.

3.  **DO:**
    *   **Standard Error (Pooled):** `SE_pooled = √[ (0.639*0.361/80) + (0.639*0.361/100) ] = √[ 0.00288 + 0.00231 ] ≈ 0.072`
    *   **Test Statistic (z):** `z = (Point Estimate - Null Value) / SE = ( (p̂₁ - p̂₂) - 0 ) / SE_pooled`
        *   `z = (0.75 - 0.55) / 0.072 = 0.20 / 0.072 ≈ 2.78`
    *   **P-value:** Since Hₐ is `≠`, this is a two-sided test. We find the area in the tail and double it.
        *   `P(Z > 2.78) ≈ 0.0027`.
        *   `P-value = 2 * 0.0027 = 0.0054`.

4.  **CONCLUDE:**
    *   **Comparing P-value to `α`:** Our P-value (0.0054) is less than our significance level `α` (0.05).
    *   **Decision:** We **reject the null hypothesis (H₀)**.
    *   **Interpretation:** There is convincing statistical evidence to conclude that a difference exists between the true proportion of teenagers and adults who prefer Brand A smartphones.

***

### Using a Confidence Interval to Make a Conclusion for a Test

This connects the two procedures. A confidence interval provides a range of plausible values for the true difference.

*   **The Duality Principle:** A two-sided hypothesis test with a significance level `α` will give the same conclusion as a confidence interval with a confidence level of `C = 1 - α`.
*   **The Rule:**
    *   If the null value (in this case, 0) is **INSIDE** your confidence interval, it is a plausible value. Therefore, you **fail to reject H₀**.
    *   If the null value (0) is **OUTSIDE** your confidence interval, it is not a plausible value. Therefore, you **reject H₀**.

**Applying to our example:**
*   Our 95% confidence interval was **(0.064, 0.336)**.
*   Does this interval contain 0? **No.**
*   Therefore, based on the confidence interval, we would **reject H₀**. This matches the conclusion from our significance test.

**Note:** A confidence interval is always two-sided. It is most useful for making a conclusion in a two-sided test. It gives you the bonus of showing a range of plausible effect sizes, which a P-value alone does not.

***

### Python Code Illustration
This code will perform both the confidence interval calculation and the significance test on our smartphone data.


In [1]:
import numpy as np
from scipy.stats import norm

# --- Setup from the Smartphone Example ---
# Group 1: Teenagers
n1 = 80
x1 = 60
p_hat1 = x1 / n1

# Group 2: Adults
n2 = 100
x2 = 55
p_hat2 = x2 / n2

# --- Part 1: 95% Confidence Interval Calculation ---
print("--- 1. Confidence Interval for p1 - p2 ---")
# Point estimate
diff_p_hat = p_hat1 - p_hat2

# Standard Error for a CI (un-pooled)
se_ci = np.sqrt( (p_hat1 * (1 - p_hat1) / n1) + (p_hat2 * (1 - p_hat2) / n2) )

# Critical value z* for 95% confidence
z_star = norm.ppf(0.975) # 0.975 because 2.5% is in each tail

# Margin of Error and Interval
margin_of_error = z_star * se_ci
lower_bound = diff_p_hat - margin_of_error
upper_bound = diff_p_hat + margin_of_error

print(f"Difference in sample proportions (p̂1 - p̂2): {diff_p_hat:.4f}")
print(f"Standard Error for CI: {se_ci:.4f}")
print(f"95% Confidence Interval: ({lower_bound:.4f}, {upper_bound:.4f})\n")


# --- Part 2: Hypothesis Test ---
print("--- 2. Two-Sample z-test for p1 - p2 ---")
# H₀: p1 - p2 = 0
# Hₐ: p1 - p2 ≠ 0

# Calculate the combined (pooled) proportion
p_hat_combined = (x1 + x2) / (n1 + n2)

# Calculate the Standard Error for the test (pooled)
se_test = np.sqrt( (p_hat_combined * (1 - p_hat_combined) / n1) + (p_hat_combined * (1 - p_hat_combined) / n2) )

# Calculate the z-statistic
z_stat = (diff_p_hat - 0) / se_test

# Calculate the P-value for a two-sided test
p_value = 2 * (1 - norm.cdf(abs(z_stat)))

print(f"Combined Proportion (p̂_c): {p_hat_combined:.4f}")
print(f"Standard Error for Test: {se_test:.4f}")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}\n")

# --- Part 3: Drawing Conclusions ---
print("--- 3. Drawing Conclusions ---")
alpha = 0.05
# Conclusion from the P-value
print("Conclusion from Significance Test:")
if p_value < alpha:
    print(f"P-value ({p_value:.4f}) < α ({alpha}). We reject H₀.")
else:
    print(f"P-value ({p_value:.4f}) ≥ α ({alpha}). We fail to reject H₀.")

# Conclusion from the CI
print("\nConclusion from Confidence Interval:")
if lower_bound > 0 or upper_bound < 0:
    print(f"The null value (0) is NOT in the interval ({lower_bound:.4f}, {upper_bound:.4f}). We reject H₀.")
else:
    print(f"The null value (0) IS in the interval ({lower_bound:.4f}, {upper_bound:.4f}). We fail to reject H₀.")


--- 1. Confidence Interval for p1 - p2 ---
Difference in sample proportions (p̂1 - p̂2): 0.2000
Standard Error for CI: 0.0694
95% Confidence Interval: (0.0639, 0.3361)

--- 2. Two-Sample z-test for p1 - p2 ---
Combined Proportion (p̂_c): 0.6389
Standard Error for Test: 0.0720
Z-statistic: 2.7759
P-value: 0.0055

--- 3. Drawing Conclusions ---
Conclusion from Significance Test:
P-value (0.0055) < α (0.05). We reject H₀.

Conclusion from Confidence Interval:
The null value (0) is NOT in the interval (0.0639, 0.3361). We reject H₀.
