<font color = "red" size =12>**ANOVA (Analysis of Varience)**</font>

<font color = "red">**F-Distribution: Key Concepts & Applications**</font>

The **F-distribution** is a **continuous probability distribution** commonly used in **statistical hypothesis testing**, especially in **ANOVA (Analysis of Variance)** and comparing **variances between samples**. Below is a detailed breakdown of its key properties and applications.  



**1. Continuous Probability Distribution**  
- The **F-distribution** is used in hypothesis testing to analyze **variance ratios** from sample data.
- Most commonly applied in **ANOVA** and regression analysis.



**2. Fisher-Snedecor Distribution**  
- Named after **Ronald Fisher** and **George Snedecor**, two influential statisticians.
- Often referred to as the **Fisher-Snedecor distribution** in statistical literature.



**3. Degrees of Freedom (df)**  
- Defined by **two parameters**:  
  - \( df_1 \) ‚Üí Degrees of freedom for the **numerator** (between-group variance).  
  - \( df_2 \) ‚Üí Degrees of freedom for the **denominator** (within-group variance).  
- These **df values** determine the **shape** of the F-distribution.



**4. Shape: Positively Skewed & Bounded**  
- The F-distribution is **positively skewed**, meaning it has a long tail on the right.
- **Lower bound = 0**, since variance **cannot** be negative.
- The shape depends on the **degrees of freedom** (higher df leads to a more symmetrical curve).



**5. Testing Equality of Variances**  
- Used for testing if **two population variances are equal**.
- Example: **Levene‚Äôs Test** and **Bartlett‚Äôs Test** use the F-distribution for variance equality.



### **6. Comparing Statistical Models**  
- Helps **compare the fit of statistical models** by checking whether including additional variables **significantly improves a model**.
- Used in **ANOVA** to test if group means differ significantly.



**7. F-Statistic Calculation**  
- The **F-statistic** is computed as:

  $$
  F = \frac{\text{Variance Between Groups}}{\text{Variance Within Groups}}
  $$

- A **large F-statistic** suggests a **significant difference between groups**.
- Compared against **critical values** from the F-distribution table to determine significance.



**8. Applications of the F-Distribution**  
- Used in **many fields**, including:
  - **Psychology & Education** ‚Üí Evaluating experimental designs.
  - **Economics** ‚Üí Comparing different financial models.
  - **Social & Natural Sciences** ‚Üí Analyzing variance in real-world data.
- Essential for **hypothesis testing, regression analysis, and variance analysis**.



**Key Takeaway**  
The **F-distribution** is **essential** in **ANOVA**, hypothesis testing, and model comparisons. By evaluating **variance ratios**, it helps **determine statistical significance**, ensuring data-driven decision-making in various research and business applications.

The **F-distribution** is created using the ratio of two **independent Chi-Square distributions**, scaled by their degrees of freedom. The shape of the F-distribution depends on **two degrees of freedom (df‚ÇÅ, df‚ÇÇ)**‚Äîone for the numerator and one for the denominator.

**How to Create the F-Distribution**
1. **Generate Two Independent Chi-Square Distributed Variables**  
   - Let $ X_1 $ and $ X_2 $ be **Chi-Square distributed** random variables with degrees of freedom **df‚ÇÅ** and **df‚ÇÇ**, respectively.

2. **Divide Each Chi-Square Variable by Its Degrees of Freedom**  
   - Compute:  
     $$
     \frac{X_1}{df_1} \quad \text{and} \quad \frac{X_2}{df_2}
     $$
   - This scales each value relative to its degrees of freedom.

3. **Compute the Ratio to Obtain the F-Statistic**  
   - The F-distribution follows:  
     $$
     F = \frac{\left( \frac{X_1}{df_1} \right)}{\left( \frac{X_2}{df_2} \right)}
     $$
   - This gives the **F-statistic**, which follows an F-distribution.

**Shape of the F-Distribution**
- **Positively Skewed (Right-Skewed):**  
  - The distribution has a long tail to the **right**, meaning most values are near zero, but extreme values can be large.
  
- **Lower Bound at 0:**  
  - Since variance cannot be negative, F-values **cannot be less than 0**.

- **Shape Varies by Degrees of Freedom:**  
  - **Higher df‚ÇÅ & df‚ÇÇ ‚Üí more symmetric, approaching normal distribution**.
  - **Lower df‚ÇÅ & df‚ÇÇ ‚Üí heavily skewed right**.

![alt text](images\F-distribution_pdf.svg.png)


<font color = "red">**One-Way ANOVA**</font>

One-Way ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more independent groups to determine if there are any significant differences between them. It extends the t-test, which is used for comparing two groups, and helps analyze the effect of one independent variable with multiple levels.

**Key Steps in One-Way ANOVA**
1. **Define Hypotheses:**
   - **Null Hypothesis (H‚ÇÄ):** All group means are equal.
   - **Alternative Hypothesis (H‚ÇÅ):** At least one group mean is significantly different.

2. **Calculate Means:**
   - Compute the overall mean (**grand mean**) of all groups combined.
   - Compute the individual mean for each group.

3. **Calculate Sum of Squares (SS):**
   - **Between-Group Sum of Squares (SSB):** Measures variability between groups.
   - **Within-Group Sum of Squares (SSW):** Measures variability within each group.

4. **Find Degrees of Freedom (df):**
   - **Between-group df:** $ df_B = k - 1 $ (where $ k $ is the number of groups).
   - **Within-group df:** $ df_W = N - k $ (where $ N $ is the total number of observations).

5. **Calculate Mean Squares (MS):**
   - **Between-group mean square:** $ MSB = \frac{SSB}{df_B} $.
   - **Within-group mean square:** $ MSW = \frac{SSW}{df_W} $.

6. **Compute the F-statistic:**
   - $ F = \frac{MSB}{MSW} $.

7. **Compare F-statistic with Critical Value:**
   - Look up the F-distribution table.
   - If $ p $-value < significance level ($ \alpha $), reject $ H‚ÇÄ $, meaning at least one group mean differs.

Certainly! Below is the One-Way ANOVA table in a similar format to the one in your image:

**One-Way ANOVA Summary Table**
| **Source of Variation**   | **Sum of Squares (SS)**                  | **Degrees of Freedom (d.f.)** | **Mean Square (MS)** (SS √∑ d.f.) | **F-ratio** (MS Between √∑ MS Within) |
|--------------------------|-----------------------------------------|------------------------------|----------------------------------|--------------------------------------|
| **Between Groups**       | $ SS_B = \sum n_j (XÃÑ_j - XÃÑÃÑ)^2 $     | $ df_B = k - 1 $           | $ MS_B = SS_B / df_B $        | $ F = MS_B / MS_W $              |
| **Within Groups**        | $ SS_W = \sum (X_{ij} - XÃÑ_j)^2 $     | $ df_W = N - k $           | $ MS_W = SS_W / df_W $        |                                      |
| **Total**               | $ SS_T = \sum (X_{ij} - XÃÑÃÑ)^2 $       | $ df_T = N - 1 $           |                                  |                                      |

### **Notes:**
- $ SS_B $: Variability due to differences between group means.
- $ SS_W $: Variability within each group.
- $ SS_T $: Total variability in the dataset.
- $ df_B $: Degrees of freedom for between-group variation.
- $ df_W $: Degrees of freedom for within-group variation.
- $ df_T $: Total degrees of freedom.
- $ MS_B $ & $ MS_W $: Mean squares calculated by dividing SS by respective degrees of freedom.
- $ F $: The test statistic used to compare group means.


#### **Additional Notes**
- **Assumptions of ANOVA:**
  - The samples are **independent**.
  - The data follows a **normal distribution** within each group.
  - The **variance** across groups should be approximately equal (**homogeneity of variances**).

Certainly! Here are some additional notes that might help you better understand One-Way ANOVA:

### **Additional Notes on One-Way ANOVA**
- **Assumptions of One-Way ANOVA**:
  1. **Independence** ‚Äì The observations within each group should be independent.
  2. **Normality** ‚Äì The data should be approximately normally distributed within each group.
  3. **Homogeneity of Variances** ‚Äì The variance across groups should be roughly equal (checked using Levene's test).
  
- **Interpretation of Results**:
  - If the F-ratio is large and the corresponding **p-value** is less than the chosen significance level (typically 0.05), we reject the null hypothesis, meaning at least one group mean is significantly different.
  - If the **p-value** is greater than 0.05, we fail to reject the null hypothesis, indicating no significant difference between group means.



It's important to note that one-way ANOVA only determines if there is a significant difference
between the group means; it does not identify which specific groups have significant
differences. To determine which pairs of groups are significantly different, post-hoc tests, such
as Tukey's HSD or Bonferroni, are conducted after a significant ANOVA result.


<Font color = "Red" size = 8 >Example - 1</font>


**Problem Statement**

A researcher wants to study the effect of different diets on weight loss. Three groups of participants follow different diet plans for 6 weeks:
- **Group A:** Low-carb diet
- **Group B:** Mediterranean diet
- **Group C:** Vegan diet

At the end of the study, their weight losses (in kg) are recorded as follows:

| **Group A (Low-carb)** | **Group B (Mediterranean)** | **Group C (Vegan)** |
|------------------------|----------------------------|----------------------|
| 4.2 | 3.8 | 2.5 |
| 3.9 | 4.1 | 3.2 |
| 4.4 | 3.5 | 3.1 |
| 4.1 | 4.0 | 2.8 |
| 3.7 | 3.6 | 2.9 |

**Objective:** Determine if there is a significant difference in the average weight loss among the three diet groups using **One-Way ANOVA**.



**Step 1: Define Hypotheses**
- **Null Hypothesis ($H_0$)**: There is no significant difference between the mean weight losses of the three diet plans. $ \mu_A = \mu_B = \mu_C $.
- **Alternative Hypothesis ($H_1$)**: At least one group has a significantly different mean weight loss.



**Step 2: Compute Group Means and Grand Mean**
Calculate the mean weight loss for each group:

- **Mean of Group A** ($\bar{X}_A$): $ \frac{4.2 + 3.9 + 4.4 + 4.1 + 3.7}{5} = 4.06 $
- **Mean of Group B** ($\bar{X}_B$): $ \frac{3.8 + 4.1 + 3.5 + 4.0 + 3.6}{5} = 3.8 $
- **Mean of Group C** ($\bar{X}_C$): $ \frac{2.5 + 3.2 + 3.1 + 2.8 + 2.9}{5} = 2.9 $
- **Grand Mean** ($\bar{X}$): $ \frac{(4.06 + 3.8 + 2.9) \times 5}{15} = 3.59 $



**Step 3: Compute Sum of Squares (SS)**

**Between-Group Sum of Squares ($SS_B$)**:
$$
SS_B = \sum n_j (\bar{X}_j - \bar{X})^2
$$
$$
= 5(4.06 - 3.59)^2 + 5(3.8 - 3.59)^2 + 5(2.9 - 3.59)^2
$$
$$
= 5(0.47)^2 + 5(0.21)^2 + 5(-0.69)^2
$$
$$
= 5(0.2209) + 5(0.0441) + 5(0.4761)
$$
$$
= 1.1045 + 0.2205 + 2.3805 = 3.7055
$$

**Within-Group Sum of Squares ($SS_W$)**:
$$
SS_W = \sum (X_{ij} - \bar{X}_j)^2
$$

Using each observation:

$$
SS_W = \sum_{A} (X_i - \bar{X}_A)^2 + \sum_{B} (X_i - \bar{X}_B)^2 + \sum_{C} (X_i - \bar{X}_C)^2
$$

$$
= (4.2 - 4.06)^2 + (3.9 - 4.06)^2 + (4.4 - 4.06)^2 + (4.1 - 4.06)^2 + (3.7 - 4.06)^2
$$
$$
+ (3.8 - 3.8)^2 + (4.1 - 3.8)^2 + (3.5 - 3.8)^2 + (4.0 - 3.8)^2 + (3.6 - 3.8)^2
$$
$$
+ (2.5 - 2.9)^2 + (3.2 - 2.9)^2 + (3.1 - 2.9)^2 + (2.8 - 2.9)^2 + (2.9 - 2.9)^2
$$

$$
= 0.023 + 0.0256 + 0.116 + 0.0016 + 0.1296 + 0 + 0.09 + 0.09 + 0.04 + 0.04 + 0.16 + 0.09 + 0.04 + 0.01 + 0
$$

$$
= 0.2964 + 0.26 + 0.5 = 1.0564
$$



**Step 4: Compute Degrees of Freedom (df)**
- **Between-group $ df_B = k - 1 = 3 - 1 = 2 $**
- **Within-group $ df_W = N - k = 15 - 3 = 12 $**



**Step 5: Compute Mean Squares (MS)**
$$
MS_B = \frac{SS_B}{df_B} = \frac{3.7055}{2} = 1.85275
$$
$$
MS_W = \frac{SS_W}{df_W} = \frac{1.0564}{12} = 0.088033
$$



**Step 6: Compute F-Ratio**
$$
F = \frac{MS_B}{MS_W} = \frac{1.85275}{0.088033} = 21.04
$$



**Step 7: Compare F-Statistic with Critical Value**
Using an F-table with **df (2, 12)** and **Œ± = 0.05**, the critical value is approximately **3.89**.

Since **21.04 > 3.89**, we **reject the null hypothesis** and conclude that there is a **significant difference in weight loss** among the three diets.



**Step 8: Post-Hoc Analysis (If Required)**
Since we rejected $H_0$, we may use **Tukey‚Äôs HSD** or **Bonferroni correction** to determine which groups significantly differ.



### **Final Conclusion**
Based on One-Way ANOVA, there is enough evidence to conclude that different diet plans lead to different weight loss results. A deeper analysis (post-hoc tests) can determine which diets significantly differ.




<font color = "red"># üìò Post Hoc Tests (After ANOVA) </font>

**üîç What Are Post Hoc Tests?**

* Post hoc = *"after this"*. These tests are used **after running ANOVA** (Analysis of Variance).
* If ANOVA shows a **significant difference** between group means, it tells us **that at least two groups differ**‚Äîbut **not which ones**.
* That‚Äôs where **post hoc tests** come in. They help us compare **all group pairs** to see **which specific pairs** differ significantly.



**üß† Why Can‚Äôt We Just Run Multiple t-tests?**

Imagine you have 4 groups: A, B, C, and D.

You want to compare:

* A vs B  
* A vs C  
* A vs D  
* B vs C  
* B vs D  
* C vs D  

That‚Äôs **6 comparisons**!

If you use the typical significance level of Œ± = 0.05 for **each**, there's a high chance of **making a Type I error** (false positive) just by luck. This is called the **family-wise error rate (FWER)** problem.



**üéØ Purpose of Post Hoc Tests**

* **Control the Family-Wise Error Rate (FWER)**: Make sure we don‚Äôt accidentally claim there's a difference when there isn‚Äôt.
* **Adjust significance levels** for multiple comparisons.
* **Tell you exactly which group means are different** after an ANOVA tells you that *some difference* exists.


**üîß Common Post Hoc Tests**

**1. üìè Bonferroni Correction**

* It‚Äôs simple and conservative.
* Adjust the significance level by dividing it by the number of comparisons.

üîç Formula:

$$
\alpha_{adjusted} = \frac{\alpha}{k}
$$

Where:

* $\alpha$ = original significance level (e.g., 0.05)  
* $k$ = number of pairwise comparisons

**‚úÖ Use When:**

* You want a **safe**, straightforward method.  
* You're okay with being conservative (i.e., might miss some real differences).

**‚ö†Ô∏è Downside:**

* It becomes **too strict** if you have many comparisons ‚Üí **lower power** to detect true differences.

üß™ Example:

You have 4 groups (A, B, C, D) ‚Üí 6 comparisons.

If $\alpha = 0.05$, then:

$$
\alpha_{adjusted} = \frac{0.05}{6} \approx 0.0083
$$

Now, **only p-values less than 0.0083** are considered significant.



2. üçó Tukey‚Äôs HSD (Honestly Significant Difference)

* It compares all possible group pairs, controlling the **FWER** just like Bonferroni.
* It‚Äôs **less conservative** than Bonferroni, so **more power** to detect real differences.
* Requires:
  * **Equal sample sizes** in each group  
  * **Equal variances** (assumption of homogeneity of variance)

**üîç What Do We Mean by ‚ÄúLess Conservative‚Äù? (Bonferroni vs. Tukey‚Äôs HSD)**

When we say **Tukey‚Äôs HSD is less conservative than Bonferroni**, we mean it is **less strict** in determining statistical significance. It balances between detecting **real differences** and avoiding **false positives**.



**üß™ Example to Understand This:**

Let‚Äôs say you're comparing **6 group pairs**:  
(A vs B, A vs C, A vs D, B vs C, B vs D, C vs D)



**üìè Bonferroni Correction:**

* Adjusts significance level like this:
  
  $$
  \alpha_{adjusted} = \frac{0.05}{6} \approx 0.0083
  $$

* You only accept **p-values < 0.0083** as significant.
* ‚úÖ Very cautious ‚Üí Protects against false positives (Type I error).
* ‚ùå But it may **miss real differences** ‚Üí Lower power.



**üçó Tukey‚Äôs HSD:**

* Also adjusts for multiple comparisons.
* Uses a **studentized range distribution** to control FWER.
* Allows significance at **p-values closer to 0.05** (depending on sample size and variability).
* ‚úÖ **More power** to detect real differences.
* ‚ùå Slightly higher chance of false positives than Bonferroni, but still controlled.

**‚úÖ In Summary:**

| Method           | Adjusted Œ± | Strictness | Power to Detect Real Difference |
|------------------|------------|------------|----------------------------------|
| Bonferroni       | 0.0083     | Very High  | Lower                            |
| Tukey‚Äôs HSD      | ~0.03‚Äì0.05 | Moderate   | Higher                           |

**Tukey‚Äôs HSD is less conservative**, meaning it is **more flexible** than Bonferroni and better at **finding true effects**, while still keeping error rates under control.



**üîç Formula (FYI):**

$$
\text{HSD} = q \cdot \sqrt{\frac{MS_{within}}{n}}
$$

Where:

* $q$ = studentized range statistic (depends on number of groups and degrees of freedom)  
* $MS_{within}$ = mean square within groups (from ANOVA)  
* $n$ = sample size per group

**‚úÖ Use When:**

* You meet assumptions (equal sample size, equal variance)
* You want to know **which group means differ**

**üß™ Example:**

Let‚Äôs say you ran a one-way ANOVA with 3 groups:

* Group A (mean = 50)  
* Group B (mean = 60)  
* Group C (mean = 55)

ANOVA says: ‚úÖ ‚ÄúThere‚Äôs a significant difference‚Äù

Now you run Tukey's HSD:

* A vs B ‚Üí significant (if difference > HSD threshold)  
* A vs C ‚Üí maybe not significant  
* B vs C ‚Üí maybe not significant  

It tells you **exactly which groups are different** in a way that controls for errors.


**‚úÖ Summary Table for Revision**

| Test        | Adjusts for Multiple Comparisons | Assumptions                         | Conservative? | Power                        | Use When...                              |
| ----------- | -------------------------------- | ----------------------------------- | ------------- | ---------------------------- | ---------------------------------------- |
| Bonferroni  | Yes                              | No strong assumptions               | Yes           | Low (many groups = stricter) | You want simplicity and safety           |
| Tukey's HSD | Yes                              | Equal sample sizes, equal variances | Medium        | Higher                       | You meet assumptions and want more power |



**üîÅ Final Recap**

1. **Run ANOVA** ‚Üí Checks if *any* groups are different.  
2. If significant ‚Üí Use **Post Hoc Tests** to find **which** groups differ.  
3. **Bonferroni**: Divide Œ± by number of tests, very strict.  
4. **Tukey‚Äôs HSD**: Good balance between error control and power, but needs equal sample sizes and variance.


<font color = "Red" size = 27>**Two - Way ANOVA**</font>

Coming Soon