<h1 style="font-size: 1.6rem; font-weight: bold">Module 6 - Topic 1: Statistics</h1>
<p style="margin-top: 5px; margin-bottom: 5px;">Monash University Australia</p>
<p style="margin-top: 5px; margin-bottom: 5px;">ITO 4001: Foundations of Computing</p>
<p style="margin-top: 5px; margin-bottom: 5px;">Jupyter Notebook by: Tristan Sim Yook Min</p>
References: Images and Diagrams from Monash Faculty of Information Technology

---

### **Z-Tests: Statistical Inference for Normal Populations with Known Variance**

A statistical hypothesis represents a claim or assumption about the parameters of a population distribution. We call it a hypothesis because its truth remains uncertain until tested. The fundamental challenge in hypothesis testing is creating a systematic method to evaluate whether observed sample data supports or contradicts our initial assumption about the population.

### **Example: Testing Population Means When Variance is Known**

Consider a random sample $X_1, X_2, \ldots, X_n$ drawn from a normal distribution with unknown mean $\mu$ but known variance $\sigma^2$. Our goal is to evaluate the null hypothesis:

$H_0: \mu = \mu_0$

against the competing alternative hypothesis:

$H_1: \mu \neq \mu_0$

where $\mu_0$ represents a specific value we want to test against.

#### Building the Test Statistic

The sample mean $\bar{X} = \frac{\sum_{i=1}^n X_i}{n}$ serves as our natural estimator for the population mean $\mu$. Intuitively, we should accept the null hypothesis $H_0$ when $\bar{X}$ falls reasonably close to $\mu_0$. This logic leads us to define a rejection region:

$C = \{X_1, \ldots, X_n : |\bar{X} - \mu_0| > c\}$

where $c$ represents a threshold value we need to determine.

#### Finding the Critical Threshold

To construct a test with significance level $\alpha$, we must choose $c$ such that the probability of Type I error equals $\alpha$. This means finding $c$ where:

$P_{\mu_0}\{|\bar{X} - \mu_0| > c\} = \alpha$

The notation $P_{\mu_0}$ indicates we calculate this probability assuming the null hypothesis is true (i.e., $\mu = \mu_0$).

Under the null hypothesis, $\bar{X}$ follows a normal distribution with mean $\mu_0$ and variance $\sigma^2/n$. Therefore, we can standardize using:

$Z \equiv \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}}$

This standardized variable $Z$ follows a standard normal distribution.

#### Deriving the Decision Rule

The condition $P_{\mu_0}\{|\bar{X} - \mu_0| > c\} = \alpha$ can be rewritten as:

$P\left\{|Z| > \frac{c\sqrt{n}}{\sigma}\right\} = \alpha$

Due to the symmetry of the standard normal distribution:

$2P\left\{Z > \frac{c\sqrt{n}}{\sigma}\right\} = \alpha$

Since we know that $P\{Z > z_{\alpha/2}\} = \alpha/2$ for a standard normal variable, we can set:

$\frac{c\sqrt{n}}{\sigma} = z_{\alpha/2}$

Solving for $c$ gives us:
$c = \frac{z_{\alpha/2}\sigma}{\sqrt{n}}$

### **Two-Tailed Test Decision Framework**

With our critical value established, the significance level $\alpha$ test follows this decision rule:

- **Reject** $H_0$ when $\frac{\sqrt{n}}{\sigma}|\bar{X} - \mu_0| > z_{\alpha/2}$
- **Fail to reject** $H_0$ when $\frac{\sqrt{n}}{\sigma}|\bar{X} - \mu_0| \leq z_{\alpha/2}$

This approach, testing $\mu = \mu_0$ against $\mu \neq \mu_0$, is termed a **two-tailed test**. We consider both extremely large positive and negative deviations of the sample mean from $\mu_0$ as evidence against our null hypothesis.

### **One-Tailed Tests**

When we specifically want to determine if the population mean is greater than or less than $\mu_0$ (rather than simply different from it), we employ **one-tailed tests**.

### **Upper-Tail Testing**

Consider testing the directional hypothesis:

$H_0: \mu \leq \mu_0 \text{ versus } H_1: \mu > \mu_0$

Logic dictates that we should reject $H_0$ when our sample mean $\bar{X}$ substantially exceeds $\mu_0$. This leads to a rejection region:

$C = \{(X_1, \ldots, X_n): \bar{X} - \mu_0 > c\}$

To maintain a Type I error rate of $\alpha$, we need the critical value $c$ to satisfy:

$P_{\mu_0}\{\bar{X} - \mu_0 > c\} = \alpha$

Using our standardization $Z = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}}$, which follows a standard normal distribution under $H_0$:

$P\left\{Z > \frac{c\sqrt{n}}{\sigma}\right\} = \alpha$

Since $P\{Z > z_\alpha\} = \alpha$ for a standard normal variable, we obtain:

$c = \frac{z_\alpha \sigma}{\sqrt{n}}$

### **One-Tailed Test Decision Framework**

The upper-tail hypothesis test follows this decision rule:

- **Fail to reject** $H_0$ when $\frac{\sqrt{n}}{\sigma}(\bar{X} - \mu_0) \leq z_\alpha$
- **Reject** $H_0$ when $\frac{\sqrt{n}}{\sigma}(\bar{X} - \mu_0) > z_\alpha$

---

### **T-Tests: Statistical Inference When Population Variance is Unknown**

While z-tests are powerful when population variance is known, real-world scenarios often involve unknown variances. The t-test addresses this limitation by using sample variance to estimate the unknown population variance, making it one of the most practical tools in statistical inference.

### **Single Sample T-Test: Testing Population Mean with Unknown Variance**

#### **The Problem Setup**

When both the population mean and variance are unknown, we cannot use the standard normal distribution. Consider testing:

$$H_0: \mu = \mu_0$$

against the alternative:

$$H_1: \mu \neq \mu_0$$

Note that this null hypothesis is **composite** rather than simple, since it doesn't specify the variance value.

#### **Estimating the Unknown Variance**

Since the population variance $\sigma^2$ is unknown, we estimate it using the sample variance:

$$S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}$$

Our intuition suggests rejecting $H_0$ when the standardized difference is large:

$$\left|\frac{\bar{X} - \mu_0}{S/\sqrt{n}}\right|$$

#### **The T-Distribution**

To establish the critical values, we need the distribution of our test statistic. When $H_0$ is true, the statistic:

$$T = \frac{\sqrt{n}(\bar{X} - \mu_0)}{S}$$

follows a **t-distribution with $(n-1)$ degrees of freedom**.

#### **Probability Statement**

Under the null hypothesis:

$$P_{\mu_0}\left\{-t_{\alpha/2, n-1} \leq \frac{\sqrt{n}(\bar{X} - \mu_0)}{S} \leq t_{\alpha/2, n-1}\right\} = 1 - \alpha$$

where $t_{\alpha/2, n-1}$ represents the $100(\alpha/2)$ upper percentile of the t-distribution with $(n-1)$ degrees of freedom.

By definition: $P\{T_{n-1} \geq t_{\alpha/2, n-1}\} = P\{T_{n-1} \leq -t_{\alpha/2, n-1}\} = \alpha/2$

### **Decision Framework for Single Sample T-Test**

The significance level $\alpha$ test follows this rule:

- **Fail to reject** $H_0$ if: $\left|\frac{\sqrt{n}(\bar{X} - \mu_0)}{S}\right| \leq t_{\alpha/2, n-1}$

- **Reject** $H_0$ if: $\left|\frac{\sqrt{n}(\bar{X} - \mu_0)}{S}\right| > t_{\alpha/2, n-1}$

### **P-Value Calculation**

If $t$ represents the observed value of our test statistic $T = \sqrt{n}(\bar{X} - \mu_0)/S$, then:

**p-value** = Probability that $|T|$ would exceed $|t|$ when $H_0$ is true

This equals the probability that the absolute value of a t-random variable with $(n-1)$ degrees of freedom exceeds $|t|$.

### **Two-Sample Tests: Comparing Means of Two Populations**

#### **Two-Sample Z-Test (Known Variances)**

When comparing two populations with **known variances**, suppose we have independent samples $X_1, \ldots, X_n$ and $Y_1, \ldots, Y_m$ from normal populations with unknown means $\mu_x, \mu_y$ but known variances $\sigma_x^2, \sigma_y^2$.

**Hypotheses:**
$$H_0: \mu_x = \mu_y \text{ versus } H_1: \mu_x \neq \mu_y$$

**Distribution of Difference:**
Under $H_0$ (when $\mu_x = \mu_y$):

$$\bar{X} - \bar{Y} \sim N\left(\mu_x - \mu_y, \frac{\sigma_x^2}{n} + \frac{\sigma_y^2}{m}\right)$$

**Standardized Test Statistic:**
$$\frac{\bar{X} - \bar{Y} - (\mu_x - \mu_y)}{\sqrt{\frac{\sigma_x^2}{n} + \frac{\sigma_y^2}{m}}} \sim N(0,1)$$

**Decision Rule:**
- **Fail to reject** $H_0$ if: $\frac{|\bar{X} - \bar{Y}|}{\sqrt{\frac{\sigma_x^2}{n} + \frac{\sigma_y^2}{m}}} \leq z_{\alpha/2}$

- **Reject** $H_0$ if: $\frac{|\bar{X} - \bar{Y}|}{\sqrt{\frac{\sigma_x^2}{n} + \frac{\sigma_y^2}{m}}} > z_{\alpha/2}$

### **Two-Sample T-Test (Unknown Equal Variances)**

More realistically, when all parameters are unknown, we test:

$$H_0: \mu_x = \mu_y \text{ versus } H_1: \mu_x \neq \mu_y$$

**Key Assumption:** The unknown variances are equal: $\sigma^2 = \sigma_x^2 = \sigma_y^2$

### **Sample Variance Calculations**

Define the individual sample variances:

$$S_x^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}$$

$$S_y^2 = \frac{\sum_{i=1}^m (Y_i - \bar{Y})^2}{m-1}$$

### **Pooled Variance Estimator**

The **pooled estimator** of the common variance $\sigma^2$ is:

$$S_p^2 = \frac{(n-1)S_x^2 + (m-1)S_y^2}{n + m - 2}$$

This combines information from both samples to estimate the shared variance.

### **Test Statistic Distribution**

Under $H_0$ (when $\mu_x - \mu_y = 0$):

$$T \equiv \frac{\bar{X} - \bar{Y}}{\sqrt{S_p^2\left(\frac{1}{n} + \frac{1}{m}\right)}} \sim t_{n+m-2}$$

This follows a t-distribution with $(n + m - 2)$ degrees of freedom.

### **Decision Framework for Two-Sample T-Test**

- **Fail to reject** $H_0$ if: $|T| \leq t_{\alpha/2, n+m-2}$

- **Reject** $H_0$ if: $|T| > t_{\alpha/2, n+m-2}$

where $t_{\alpha/2, n+m-2}$ is the $100(\alpha/2)$ percentile point of a t-distribution with $(n+m-2)$ degrees of freedom.

### **Critical Values Reference Table**

Here's a reference table for common t-distribution critical values:

| df | $t_{0.10}$ | $t_{0.05}$ | $t_{0.025}$ | $t_{0.01}$ | $t_{0.005}$ |
|---|---|---|---|---|---|
| 1 | 3.078 | 6.314 | 12.706 | 31.821 | 63.657 |
| 2 | 1.886 | 2.920 | 4.303 | 6.965 | 9.925 |
| 3 | 1.638 | 2.353 | 3.182 | 4.541 | 5.841 |
| 4 | 1.533 | 2.132 | 2.776 | 3.747 | 4.604 |
| 5 | 1.476 | 2.015 | 2.571 | 3.365 | 4.032 |
| 10 | 1.372 | 1.812 | 2.228 | 2.764 | 3.169 |
| 15 | 1.341 | 1.753 | 2.131 | 2.602 | 2.947 |
| 20 | 1.325 | 1.725 | 2.086 | 2.528 | 2.845 |
| 25 | 1.316 | 1.708 | 2.060 | 2.485 | 2.787 |
| 30 | 1.310 | 1.697 | 2.042 | 2.457 | 2.750 |
| $\infty$ | 1.282 | 1.645 | 1.960 | 2.326 | 2.576 |

**Note:** As degrees of freedom approach infinity, t-values converge to z-values (standard normal).

### **Comprehensive Test Summary**

| Test Type | Conditions | Test Statistic | Degrees of Freedom | Decision Rule |
|-----------|------------|----------------|-------------------|---------------|
| One-sample t-test | $\sigma^2$ unknown | $\frac{\sqrt{n}(\bar{X} - \mu_0)}{S}$ | $n-1$ | Reject if $\|T\| > t_{\alpha/2, n-1}$ |
| Two-sample z-test | $\sigma_x^2, \sigma_y^2$ known | $\frac{\bar{X} - \bar{Y}}{\sqrt{\frac{\sigma_x^2}{n} + \frac{\sigma_y^2}{m}}}$ | N/A (use $z_{\alpha/2}$) | Reject if $\|Z\| > z_{\alpha/2}$ |
| Two-sample t-test | $\sigma_x^2 = \sigma_y^2$ unknown | $\frac{\bar{X} - \bar{Y}}{\sqrt{S_p^2(\frac{1}{n} + \frac{1}{m})}}$ | $n+m-2$ | Reject if $\|T\| > t_{\alpha/2, n+m-2}$ |


---

### **Chi-Square Tests: Distribution Theory and Applications**

### **Definition and Construction**

The chi-square distribution emerges naturally from the sum of squared standard normal variables. If $Z_1, Z_2, \ldots, Z_n$ are independent standard normal random variables, then:

$$X = Z_1^2 + Z_2^2 + \cdots + Z_n^2$$

follows a **chi-square distribution with $n$ degrees of freedom**. We denote this as:

$$X \sim \chi_n^2$$

### **Key Properties**

**Additive Property:** The chi-square distribution has a useful additive property. If $X_1$ and $X_2$ are independent chi-square random variables with $n_1$ and $n_2$ degrees of freedom respectively, then:

$$X_1 + X_2 \sim \chi_{n_1 + n_2}^2$$

This property follows directly from the definition, since $X_1 + X_2$ represents the sum of squares of $(n_1 + n_2)$ independent standard normal variables.

### **Critical Values**

For any chi-square random variable $X$ with $n$ degrees of freedom and significance level $\alpha \in (0,1)$, the critical value $\chi_{\alpha,n}^2$ is defined such that:

$$P\{X \geq \chi_{\alpha,n}^2\} = \alpha$$

This critical value is essential for hypothesis testing applications.

### **Types of Chi-Square Tests**

Chi-square tests are versatile tools for analyzing categorical data and testing various hypotheses:

### 1. Chi-Square Goodness of Fit Test
- **Purpose:** Determines whether a single categorical variable follows a specified distribution
- **Use case:** Testing if observed frequencies match expected theoretical frequencies
- **Data requirement:** One categorical variable

### 2. Chi-Square Test of Independence  
- **Purpose:** Determines whether two categorical variables are independent
- **Use case:** Testing if there's an association between two categorical variables
- **Data requirement:** Two categorical variables

**Key Assumptions for Chi-Square Tests:**
- Observations must be random and independent
- Categories must be mutually exclusive
- Expected frequencies should be sufficiently large (typically ≥ 5)

### **Testing Equality of Variances: The F-Test**

#### **Problem Statement**

Consider independent samples from two normal populations:
- Sample 1: $X_1, \ldots, X_n$ from $N(\mu_x, \sigma_x^2)$
- Sample 2: $Y_1, \ldots, Y_m$ from $N(\mu_y, \sigma_y^2)$

We want to test whether the population variances are equal:

$$H_0: \sigma_x^2 = \sigma_y^2 \text{ versus } H_1: \sigma_x^2 \neq \sigma_y^2$$

### **Sample Variance Calculations**

Define the sample variances:

$$S_x^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}$$

$$S_y^2 = \frac{\sum_{i=1}^m (Y_i - \bar{Y})^2}{m-1}$$

### **Theoretical Foundation**

From sampling theory, we know that:
- $\frac{(n-1)S_x^2}{\sigma_x^2} \sim \chi_{n-1}^2$
- $\frac{(m-1)S_y^2}{\sigma_y^2} \sim \chi_{m-1}^2$

These chi-square statistics are independent, leading us to the F-distribution.

### **The F-Distribution Connection**

The ratio of scaled chi-square variables follows an F-distribution:

$$\frac{(S_x^2/\sigma_x^2)}{(S_y^2/\sigma_y^2)} \sim F_{n-1, m-1}$$

Under the null hypothesis $H_0: \sigma_x^2 = \sigma_y^2$, this simplifies to:

$$\frac{S_x^2}{S_y^2} \sim F_{n-1, m-1}$$

### **Probability Statement**

Under $H_0$, the test statistic falls within the acceptance region with probability $(1-\alpha)$:

$$P_{H_0}\left\{F_{1-\alpha/2, n-1, m-1} \leq \frac{S_x^2}{S_y^2} \leq F_{\alpha/2, n-1, m-1}\right\} = 1 - \alpha$$

#### **Decision Framework for F-Test**

The significance level $\alpha$ test follows this rule:

- **Fail to reject** $H_0$ if: $F_{1-\alpha/2, n-1, m-1} < \frac{S_x^2}{S_y^2} < F_{\alpha/2, n-1, m-1}$

- **Reject** $H_0$ otherwise

#### **P-Value Calculation**

For an observed test statistic value $v = \frac{S_x^2}{S_y^2}$:

**p-value** = $2 \min\left(P\{F_{n-1,m-1} < v\}, 1 - P\{F_{n-1,m-1} < v\}\right)$

This two-tailed p-value accounts for extreme values in either direction.

#### **Chi-Square Distribution Examples**

### **Sample Calculations**

Here are some illustrative probability calculations for chi-square distributions:

**Critical Values:**
- $\chi_{0.99}^2 = 4.2$ (for some degrees of freedom)

**Cumulative Probabilities:**
- $P\{\chi_{16}^2 < 14.3\} = 0.425$
- $P\{\chi_{11}^2 < 17.1875\} = 0.8976$

These examples demonstrate how to work with chi-square tables and probability calculations.

---

### **F-Tests: Distribution Theory and Variance Comparison**

### **Definition and Construction**

The F-distribution arises naturally from the ratio of two independent chi-square variables, each divided by their respective degrees of freedom. If $\chi_n^2$ and $\chi_m^2$ are independent chi-square random variables with $n$ and $m$ degrees of freedom respectively, then:

$$F_{n,m} = \frac{\chi_n^2/n}{\chi_m^2/m}$$

is said to have an **F-distribution with $n$ and $m$ degrees of freedom**.

#### **Critical Values and Notation**

For any significance level $\alpha \in (0,1)$, the critical value $F_{\alpha,n,m}$ is defined such that:

$$P\{F_{n,m} > F_{\alpha,n,m}\} = \alpha$$

This notation is fundamental for hypothesis testing with the F-distribution.

#### **Key Properties of F-Distribution**

**Shape Characteristics:**
- Always non-negative (since it's a ratio of positive quantities)
- Right-skewed distribution
- Approaches normal distribution as degrees of freedom increase
- Has two parameters: numerator df ($n$) and denominator df ($m$)

**Relationship to Other Distributions:**
- Connected to chi-square through its definition
- Related to t-distribution: $t_n^2 = F_{1,n}$
- Converges to chi-square distribution under certain conditions

### **F-Test for Comparing Two Population Variances**

#### **Purpose and Application**

The F-test is specifically designed to determine whether two population variances are equal. This test is fundamental because:

- It's a prerequisite for many other statistical tests
- Used extensively in ANOVA (Analysis of Variance)
- Helps validate assumptions in regression analysis
- Essential for comparing variability between groups

#### **Theoretical Foundation**

**Basic Principle:** If two populations have equal variances, the ratio of their sample variances should be close to 1. The F-test evaluates how far this ratio deviates from 1.

**Mathematical Setup:**
When testing $H_0: \sigma_1^2 = \sigma_2^2$, the test statistic:

$$F = \frac{S_1^2}{S_2^2}$$

follows an F-distribution with $(n_1-1, n_2-1)$ degrees of freedom under the null hypothesis.

### **Essential Assumptions for F-Tests**

#### **Required Conditions**

1. **Normal Populations:** Both populations must be normally distributed
   - F-test is sensitive to departures from normality
   - Consider alternative tests (Levene's test) for non-normal data

2. **Independence:** 
   - Samples must be independent of each other
   - Observations within each sample must be independent
   - No pairing or matching between samples

3. **Random Sampling:** Both samples should be randomly selected from their respective populations

### **Practical Considerations**

**Sample Size Effects:**
- Larger samples provide more reliable results
- F-test becomes more robust with increased sample sizes
- Unequal sample sizes are acceptable but affect power

**Variance Arrangement:**
- Conventionally, place larger sample variance in numerator
- This creates a right-tailed test, simplifying calculations
- Always results in $F \geq 1$ when following this convention

## Step-by-Step F-Test Procedure

#### Step 1: Formulate Hypotheses

**Two-tailed test (most common):**
- $H_0: \sigma_1^2 = \sigma_2^2$ (variances are equal)
- $H_1: \sigma_1^2 \neq \sigma_2^2$ (variances are different)

**One-tailed test:**
- $H_0: \sigma_1^2 \leq \sigma_2^2$ vs $H_1: \sigma_1^2 > \sigma_2^2$

#### Step 2: Calculate the F-Statistic

$$F = \frac{S_1^2}{S_2^2}$$

where:
- $S_1^2$ = larger sample variance (numerator)
- $S_2^2$ = smaller sample variance (denominator)
- This ensures $F \geq 1$

**Sample Variance Formula:**
$$S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}$$

###W# Step 3: Determine Degrees of Freedom

- **Numerator degrees of freedom:** $df_1 = n_1 - 1$
- **Denominator degrees of freedom:** $df_2 = n_2 - 1$

where $n_1$ and $n_2$ are the respective sample sizes.

#### Step 4: Find Critical Value

**For two-tailed test:**
- Use $\alpha/2$ to find $F_{\alpha/2, df_1, df_2}$
- Critical region: $F > F_{\alpha/2, df_1, df_2}$

**For one-tailed test:**
- Use $\alpha$ to find $F_{\alpha, df_1, df_2}$
- Critical region: $F > F_{\alpha, df_1, df_2}$

#### Step 5: Make Decision

**Decision Rule:**
- **Reject** $H_0$ if $F_{calculated} > F_{critical}$
- **Fail to reject** $H_0$ if $F_{calculated} \leq F_{critical}$

### **Using F-Distribution Tables**

### **Required Information**

To use F-distribution tables effectively, you need:

1. **Numerator degrees of freedom** ($df_1$)
2. **Denominator degrees of freedom** ($df_2$)  
3. **Significance level** ($\alpha$)

### **Table Structure**

F-tables are typically organized as:
- Columns represent numerator degrees of freedom
- Rows represent denominator degrees of freedom
- Multiple tables for different $\alpha$ levels (0.05, 0.01, etc.)

### **Critical Value Examples**

| $df_1$ | $df_2$ | $F_{0.05}$ | $F_{0.01}$ |
|--------|--------|------------|------------|
| 1 | 10 | 4.96 | 10.04 |
| 5 | 10 | 3.33 | 5.64 |
| 10 | 10 | 2.98 | 4.85 |
| 20 | 20 | 2.12 | 2.94 |

### **P-Value Approach**

### **Calculating P-Values**

For an observed F-statistic value $f$:

**Two-tailed test:**
$$p\text{-value} = 2 \times P(F_{df_1,df_2} > f)$$

**One-tailed test:**
$$p\text{-value} = P(F_{df_1,df_2} > f)$$

### Interpretation

- **Small p-value** (< $\alpha$): Strong evidence against $H_0$
- **Large p-value** (≥ $\alpha$): Insufficient evidence to reject $H_0$

### **Applications Beyond Variance Testing**

### **Analysis of Variance (ANOVA)**

The F-test is central to ANOVA, where it tests:
$$H_0: \mu_1 = \mu_2 = \cdots = \mu_k$$

**F-statistic in ANOVA:**
$$F = \frac{\text{Mean Square Between}}{\text{Mean Square Within}}$$

### **Regression Analysis**

In regression, F-tests evaluate:
- **Overall model significance:** Tests if any predictors are significant
- **Nested model comparison:** Compares models with different numbers of parameters

### **Quality Control**

F-tests help monitor:
- Process variability changes
- Consistency between production batches
- Equipment calibration verification

### **Practical Example Framework**

### **Sample Calculation Setup**

Consider testing if two machines have equal variance in production:

**Sample Data:**
- Machine 1: $n_1 = 16$, $S_1^2 = 25.6$
- Machine 2: $n_2 = 21$, $S_2^2 = 16.8$

**Test Statistic:**
$$F = \frac{25.6}{16.8} = 1.524$$

**Degrees of Freedom:**
- $df_1 = 16 - 1 = 15$
- $df_2 = 21 - 1 = 20$

**Decision Process:**
Compare $F = 1.524$ with $F_{0.025,15,20} = 2.57$ for $\alpha = 0.05$ (two-tailed)

### **Alternative Tests for Variance Comparison**

### **When F-Test Assumptions Are Violated**

**Levene's Test:**
- More robust to non-normality
- Uses absolute deviations from median/mean
- Better for skewed distributions

**Brown-Forsythe Test:**
- Modification of Levene's test
- Uses median instead of mean
- More robust to heavy-tailed distributions

**Bartlett's Test:**
- Extension for multiple groups
- Very sensitive to normality assumption
- More powerful when normality holds