## CLASS TASK 8

**Types of Hypothesis Testing**

- Chi-square test (for categorical data),
- ANOVA (for comparing more than two means),
- Two Tailed Tests
- Proportion Tests



# Chi-Square Test

## What is the Chi-Square Test?

The **Chi-Square test** is a statistical method used to determine whether there is a **significant difference** between the **observed frequencies** and the **expected frequencies** in categorical data.
It helps to check whether differences are due to **chance** or a **relationship** between variables.


## Formula

The Chi-Square statistic is calculated as:

$$
\chi^2 = \sum \frac{(O - E)^2}{E}
$$

Where:

* ( O ) = Observed frequency
* ( E ) = Expected frequency

### Degrees of Freedom:

$$
df = (r - 1) \times (c - 1)
$$

Where:

* ( r ) = number of rows
* ( c ) = number of columns



## Types of Chi-Square Tests

1. **Goodness-of-Fit Test**

   * Checks if the sample data fits a theoretical distribution.
   * Example: Testing if a die is fair.

2. **Test of Independence**

   * Tests whether two categorical variables are related.
   * Example: Checking if movie genre preference is related to snack purchase at a cinema.

3. **Test of Homogeneity**

   * Compares distributions of a categorical variable across different populations.
   * Example: Comparing food preferences across three cities.



##  Steps to Perform a Chi-Square Test

1. **State the hypotheses**

   * ( H_0 ): No association between variables.
   * ( H_1 ): There is an association.

2. **Calculate the Expected Frequencies**

$$
E = \frac{(\text{Row Total} \times \text{Column Total})}{\text{Grand Total}}
$$

3. **Compute Chi-Square Statistic**

$$
\chi^2 = \sum \frac{(O - E)^2}{E}
$$

4. **Find Degrees of Freedom**

$$
df = (r - 1)(c - 1)
$$

5. **Decision Rule**

 * Compare the test statistic with the critical value:

$$
\chi^2
$$

* Decision rules:  

$$
p \leq 0.05 \;\;\Rightarrow\;\; \text{Reject } H_0
$$  

$$
p > 0.05 \;\;\Rightarrow\;\; \text{Fail to reject } H_0
$$

## Examples

* **Goodness-of-Fit**: Testing if dice rolls are equally likely.
* **Independence**: Checking if gender is related to political party preference.
* **Homogeneity**: Testing if menu preferences are the same across multiple cities.




# Analysis of Variance (ANOVA)

## What is ANOVA?

**Analysis of Variance (ANOVA)** is a statistical test used to compare the **means of three or more groups** to determine if observed differences are statistically significant or simply due to random variation.

It extends the **t-test**, which only compares two groups, by analyzing variance between and within groups simultaneously.


## Formula

The ANOVA statistic is based on the **F-ratio**:

$$
F = \frac{MS_T}{MS_E}
$$

Where:

* ( F ) = ANOVA test statistic
* ( MS_T ) = Mean Square for Treatments (between groups)
* ( MS_E ) = Mean Square for Error (within groups)


## Types of ANOVA

1. **One-Way ANOVA**

   * Uses **one independent variable (factor)**.
   * Example: Comparing exam scores across three different schools.

2. **Two-Way ANOVA**

   * Uses **two independent variables (factors)** and tests for **interaction effects**.
   * Example: Comparing exam scores across schools **and** gender.

3. **MANOVA** (Multivariate ANOVA)

   * Extension of ANOVA for **multiple dependent variables**.



## Steps to Perform ANOVA

1. **State the hypotheses**

   * ( H_0 ): All group means are equal.
   * ( H_1 ): At least one group mean differs.

2. **Partition Variance**

   * Between-groups variance (( MS_T ))
   * Within-groups variance (( MS_E ))

3. **Compute the F-statistic**

$$
F = \frac{MS_T}{MS_E}
$$

4. **Find Degrees of Freedom**

   * Numerator:  
     $$
     df_1 = k - 1
     $$  

   * Denominator:  
     $$
     df_2 = N - k
     $$  

   Where:  
   $$
   k = \text{number of groups}, \quad N = \text{total sample size}
   $$  


5. **Decision Rule**

   * Compare  
     $$
     F
     $$  
     with the critical value from the F-table, or use the p-value.  

   * If  
     $$
     p \leq 0.05 \;\; \Rightarrow \;\; \text{Reject } H_0
     $$  

   * If  
     $$
     p > 0.05 \;\; \Rightarrow \;\; \text{Fail to reject } H_0
     $$  



##  Example

Suppose a researcher wants to compare the average performance of **three teaching methods** on students’ test scores.

* **One-way ANOVA** would test if the mean scores differ across methods.
* **Two-way ANOVA** could include another factor, e.g., gender, to see if there’s an **interaction** between teaching method and gender.



# Two-Tailed Test

## What is a Two-Tailed Test?

A **two-tailed test** in statistics is used to check whether a sample mean is **significantly different** from a population mean, either **greater** or **less**.  
It evaluates both sides (tails) of the probability distribution.  

This makes it especially useful when we want to test **any deviation** (not just higher or lower).  

### Explanation  

- If we want to check if a machine produces **exactly 50g of candy** in each bag,  
- We’re not only worried if it produces **less than 50g**,  
- We also care if it produces **more than 50g**.  

So instead of checking **one side** of the distribution (just “less” or just “greater”), we check **both sides** (both tails).  

That’s why it’s called a **two-tailed test** , because we are looking at the two extreme ends of the probability curve.  

## Hypotheses

For a population mean $ \mu $:

* Null hypothesis:  
  $$
  H_0 : \mu = \mu_0
  $$

* Alternative hypothesis:  
  $$
  H_1 : \mu \neq \mu_0
  $$


## Test Statistic (Z-test or T-test)

For a sample mean, population mean, standard deviation, and sample size:

* Sample mean: $ \bar{x} $  
* Population mean: $ \mu_0 $  
* Standard deviation: $ \sigma $  
* Sample size: $ n $


$$
Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}
$$

If the population standard deviation is unknown, we use the **t-test** formula with the sample standard deviation \( s \).


## Decision Rule

At significance level $ \alpha $ (commonly 0.05):

* Reject $ H_0 $ if:  
  $$
  Z \leq -Z_{\alpha/2} \quad \text{or} \quad Z \geq Z_{\alpha/2}
  $$  

* Otherwise, fail to reject $ H_0 $.



## Example

Suppose a factory claims that the mean weight of candy bags is 50g.  
A sample of 100 bags gives:

* $ \bar{x} = 50.8 $  
* $ \sigma = 2 $  
* $ n = 100 $  

Test at $ \alpha = 0.05 $:

$$
Z = \frac{50.8 - 50}{2 / \sqrt{100}} = \frac{0.8}{0.2} = 4
$$

Since $ Z = 4 $ is greater than the critical value $ Z_{0.025} = 1.96 $,  
we **reject $ H_0 $** and conclude the mean weight is significantly different from 50g.


# Proportion Tests

## What is a Proportion Test?

A **proportion test** is a statistical method used to check if the percentage (proportion) of a certain outcome in a sample is **different from a claimed or expected percentage** in the population.  

It can also compare **two groups** to see if their proportions are different.  

Example:  
- A company claims **60% of customers** are satisfied.  
- We test if the actual satisfaction proportion in a survey is **really 60%** or not.  

## Types of Proportion Tests

1. **One-Sample Proportion Test**  
   Checks if a sample proportion ($ \hat{p} $) is different from a hypothesized population proportion ($ p_0 $).  

2. **Two-Sample Proportion Test**  
   Compares the proportions of **two independent samples** (e.g., male vs. female voters).  

## Hypotheses (One-Sample)

For a population proportion $ p $:

* Null hypothesis (no difference):  
  $$
  H_0 : p = p_0
  $$

* Alternative hypothesis (difference exists):  
  $$
  H_1 : p \neq p_0
  $$

## Test Statistic

We use the **Z-test for proportions**.  

For a one-sample test:  

$$
Z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0 (1 - p_0)}{n}}}
$$

Where:  
- $ \hat{p} $ = sample proportion  
- $ p_0 $ = claimed (hypothesized) population proportion  
- $ n $ = sample size  

## Decision Rule

At significance level $ \alpha $ (commonly 0.05):  

* Reject $ H_0 $ if:
  $$
  Z \leq -Z_{\alpha/2} \quad \text{or} \quad Z \geq Z_{\alpha/2}
  $$
* Otherwise, **fail to reject $ H_0 $** (no evidence of a difference).  

## Example

A company claims that **60% of customers** are satisfied.  
A survey of **100 customers** finds that **54 are satisfied**.  
We want to test if the true proportion is **really 60%** at $ \alpha = 0.05 $.  


### Step 1: Sample Proportion  
$$
\hat{p} = \frac{54}{100} = 0.54
$$


### Step 2: Test Statistic  
$$
Z = \frac{0.54 - 0.60}{\sqrt{\frac{0.60 (1 - 0.60)}{100}}}
$$

$$
Z = \frac{-0.06}{\sqrt{0.0024}}
= \frac{-0.06}{0.049} \approx -1.22
$$

### Step 3: Decision  
- Critical values at $ \alpha = 0.05 $ (two-tailed) are $ \pm 1.96 $.  
- Our $ Z = -1.22 $ is **between -1.96 and 1.96**.  

Therefore, we **fail to reject $ H_0 $**.  

### Conclusion
The data does **not provide enough evidence** to say the proportion of satisfied customers is different from **60%**.  
