#### Type of Hypothesis Testing

**Given the following Concept**
- Chi-square test (for categorical data)
- ANOVA (for comparing more than two means)
- Two Tailed Tests
- Proportion Tests

**Chi-square test (for categorical data)**

The **Chi-square test** is a statistical method used to determine whether there's a significant association between two categorical variables. It compares the observed frequencies in each category to the expected frequencies if there were no relationship.

- **Types of Chi-Square Tests**
    - **Chi-square Test of Independence**
        - **Purpose:** Tests whether two categorical variables are independent.
        - **Example:** is gender related to voting preference?
    - **Chi-square Goodness-of-Fit Test**
        - **Purpose:** Tests whether a single categorical variable fits a specific distribution.
        - **Example:** Do dice rolls follow a uniform distribution?

**How it works**
1. **Setting up hypotheses:**
- **Null hypothesis ($H_0$):** No association between variables (or data fits the expected distribution)
- **Alternative hypothesis ($H_0$):** There is an association (or data does not fit the expected distribution).
2. **Create a contingency table (for independence test):**
- Rows = categories of one variable
- Columns = categories of the other
3. Calculate expected frequencies:
- for each cell:

4. **Compute the Chi-square statistic:**
- Where:
- \(O\) = observed frequency
- \(E\) = expected frequency

5. Compare with critical value:
- Use degrees of freedom and significance level (usually 0.05) to determine if the result is statistically significant.
**note:** Once the **Chi-square statistic** is available and the **degrees of freedom**, you can:
- Look up the **critical value** in a chi-square table.
- Then decide whether to **reject the null hypothesis** 

**In summary:** The Chi-square test helps answer questions like:
- Is there a relationship between two categorical variables?
- Are observed outcomes significantly different from what we'd expected by chance

It's based on the idea that if two variable are independent, the distribution of one should't affect the other. The test checks whether the difference between observedd and expected frequencies is large enough to reject that assumption.

**Example:** Voting Preference by Age Group

Wants to know if **voting preference** is related to **age group**. Using the survey of 300 people and record the following:

| **Age Group** | **Voted Party A** | **Voted Party B** | **Total** |
|---------------|-------------------|-------------------|-----------|
| 18 - 35       | 90                | 60                | 150       |
| 36 - 60       | 30                | 60                | 90        |
| 60+       | 10                | 50                | 60        |
| **Total**       | **130**               | **170**               | **300**       |

**Step 1: Hypotheses**
- $H_0$ (null): Volting preference is independent of age group.
- $H_1$ (alternative): Volting preference depends on age group.

**Step 2: Calculate Expected Frequencies**
$$E = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}}$$

calculating expected frequency for the 18 - 35 group voting Party A:
$$ E = \frac{150\times 130}{300} = 65 $$

calculating expected frequency for the 18 - 35 group voting Party B:
$$ E = \frac{150\times 170}{300} = 85 $$

calculating expected frequency for the 36 - 60 group voting Party A:
$$ E = \frac{90\times 170}{300} = \frac{11700}{300} = 39 $$

calculating expected frequency for the 36 - 60 group voting Party B:
$$ E = \frac{90\times 130}{300} = \frac{15300}{300} = 51 $$

calculating expected frequency for the 36 - 60 group voting Party A:
$$ E = \frac{60\times 130}{300} = \frac{7800}{300} = 26 $$

calculating expected frequency for the 36 - 60 group voting Party B:
$$ E = \frac{60\times 170}{300} = \frac{10200}{300} = 34 $$

| **Age Group** | **Voted Party A (Expected)** | **Voted Party B (Expected)**|
|---------------|------------------------------|-----------------------------|
| $18 - 35$     | $65$                         | $85$                        |
| $36 - 60$     | $39$                         | $51$                        |
| $18 - 35$     | $26$                         | $34$                        |

**Step 3: Compute Chi-Square Statistic**
Using:
$$ \chi^2 = \sum\frac{(O - E)^2}{E} $$

Computing for each cell:
- 18–35, Party A: $$\frac{(90 - 65)^2}{65} = \frac{625}{65} ≈ 9.62 $$
- 18–35, Party B: $$\frac{(60 - 85)^2}{85} = \frac{625}{85} ≈ 7.35$$
- 36–60, Party A: $$\frac{(30 - 39)^2}{39} = \frac{81}{39} ≈ 2.08$$
- 36–60, Party B: $$\frac{(60 - 51)^2}{51} = \frac{81}{51} ≈ 1.59$$
- 60+, Party A: $$\frac{(10 - 26)^2}{26} = \frac{256}{26} ≈ 9.85$$
- 60+, Party B: $$\frac{(50 - 34)^2}{34} = \frac{256}{34} ≈ 7.53$$

**Total Chi-square statistic:**
$$ \chi^2 \approx 9.62 + 7.35 + 2.08 + 1.59 + 9.85 + 7.53 = 38.02 $$

**Step 4: Degrees of Freedom**
- Number of rows = 3 (age groups)
- Number of columns = 2 (party choices)
$$ df = (\text{row} - 1) \times (\text{colums} - 1) = (3 -1)(2 - 1) = 2$$

**Note:** Why 1 is been subtracted
- When you fix the **row totals**, only $$\text{row} - 1 can vary independently - the last one is determine auomatically.
- Same goes for columns.

This ensures that the test accounts for the structure of the data without overestimating variability.

**Step 5: Comparing with Critical Value**
at df = and significance level $\alpha = 0.05$, the critical value is **5.99.**

Since $\chi^2 = 38.02 > 5.99$, we **reject the null hypothesis.**

**Conclusion**
There is a  statiscally significant relaationship between age group and voting preference. Age appears to influence how people vote.





<!-- $$ \begin{align} E &= \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \\ \chi^2 &= \sum \frac{(O - E)^2}{E} \end{align} $$ -->


**ANOVA (for comparing more than two means)**

**ANOVA (Analysis of Variance)** is powerful statistical method used when you're comparing **three or more group means** to see if at least one is significantly differenct from the others.

**ANOVA** test the hypothesis:
- **Null Hypothesis ($H_O$):** All group means are equal
- **Alternative Hypothesis ($H_1$):** At least one group mean is different

It helps answer: "Are the differences between group averages due to real effects or just random chance?

**When to Use ANOVA**
ANOVA can be use when:
- You have **one independent variable** with **three or more levels/groups**
- Your dependent variable is **continouns**
- Data is approximately **normally distributed**
- Groups have **similar variances** (homogeneity of variance)

**How ANOVA Works**
ANOVA compares **two types of variability:**
| **Type of Variability** | **Description** |
|-------------------------|-----------------|
| **Between-group**       | Differences among group means |
| **Within-group**        | Variability within each group |

It calculate an **F-statistic:**

$ F = \frac{\text{Between-group variance}}{\text{within-group variance}}$
- A **large F-value** suggests group means differ more than you'd expect by chance.
- A **small F-value** suggests differences are likely due to random variation.

**Example:** Testing fertilizers (A, B, C) on plant growth:
- Group A: Fertilizer A
- Group B: Fertilizer B
- Group C: Fertilizer C

After 30 days of measuring plant height. ANOVA tells whether the average height differs significantly across the three fertilizers.

**What Happens After ANOVA?**

If ANOVA shows a signifant result (p-value < 0.05), you **reject the null hypothesis**. But it doesn't tell us which groups differ. For that, we use **post hoc tests** like:
- **Tukey's HSD**
- **Bonferroni correction**
- **Scheffe test**

**Variants of ANOVA**
| **Type** | **Use Case** |
|----------|--------------|
| **One-way ANOVA** | One independent variable (e.g, fertilizer type) |
| **Two-way ANOVA** | Two independent variables (e.g., fertilizer and sunlight) |
| **Repeated Measures ANOVA** | Same subjects tested under different conditions |

**Two Tailed Tests**

A **two-tailed test** is a type of hypothesis used in statistics when you're checking for the possibility of an effect in **either direction**, not just higher or lower, but **different**.

**What it Tests**

It asks if the sample mean significantly different from the population mean, either higher or lower?

**Example**

Suppose a company claims their battery lasts 10 hours. The sample is tested and check if the actual average is **different**, not just longer or shorter, but **any difference**.
- **Null Hypothesis ($H_O$):** $\mu$ = 10 (no difference)
- **Alternative Hypothesis ($H_1$):** $\mu \ne 10$ (there is a difference)

**How it works**
- Calculating a test statistic (like a z-score or t-score)
- Comparing it to **critical valuesa on both ends of the distribution.
- If the test statistic falls in **either tail** (extreme low or high), you reject the null hypothesis.

**Significance Level**

if significance level is $\alpha = 0.05$, then:
- Each tail gets 0.025.
- reject $H_O$ if p-value is **less than 0.025** or **greater than 0.975.**

**When to Use it**
- When testing for **any change**, not just increase or decrease.
- You **don't have a directional prediction.**

**Proportion Tests**

A **Proportion Tests** is a type of hypothesis test used to determine whether the proportion of successes in a sample differs significantly from a known or hypothesized population proportion.

It's commonly used when dealing with **categorical data** like yes/no, success/failure, male/female, where you're intrested in the **percentage** or **rate** of a particular outcome.

**When to use it**
- Having a binary outcome (e.g., pass/fail).
- Comparing a sample proportion to a know value (one-sample test).
- Comparing proportions between two groups (two-sample test).

**Type of Proportion Tests**
1. **One-proportion Z-Test:** Tests if sample proportion is significantly different from a known population proportion.

**Example**: You believe 60% of Lagos residents support a new policy. You survey 100 people and find 70 support it. Is this significanly different?

2. **Two-Proportion Z-Test:** Compares proportions between two independent groups.

**Example**: You want to compare the proportion of voters supporting a candidate in Lagos vs Abuja.

**Hypotheses Setup**

For a one-proportion test:
- **Null Hypothesis ($H_0$):** p = p_0 (no difference)
- **Alternative Hypothesis ($H_1$):** p $\neq$ p_0, p > p_0, or p < p_0 depending on the test direction

for a two-proportion test:
- $H_O$: p_1 = p_2
- $H_1$: p_1 $\neq$ p_2, P_1 > P_2, or p_1 < p_2

**Test Statistic Formula**

for a one-proportion Z-test:

$Z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}}$

Where:
- $\hat{p}$ = sample proportion
- p_0 = hypothesized proportion
- n = sample size

**Assumptions**
- Random sampling
- Binary outcomes
- Sample size is large enough for normal approximation (typically $np \geq 5$ and $n(1-q) \geq 5$)

<!-- **Example**

Let's say you want to test whether more than 50% of students at a university prefer online classes. You survey 200 students, and 120 say yes.
- $\hat{p} = 120/200 = 0.6$
- p_0 = 0.5
- Use the formula to calculate Z
- Compare Z to critical or use p-value to decide -->