### 🎯 **What is Hypothesis Testing?**

Hypothesis testing is like being a detective for data! 🕵️‍♂️ It’s a statistical method to decide whether a claim (hypothesis) about a population is supported by sample data. Think of it as answering: **“Is there enough evidence to believe this claim?”**


### 🌟 **Key Components of Hypothesis Testing:**

1. **🎯 Null Hypothesis ($H_0$)**:  
   - The starting assumption or default claim.  
   - Typically states **"no effect"** or **"no difference."**  
   - Example: **"The average test score is 75."**

2. **🔍 Alternative Hypothesis ($H_a$)**:  
   - The statement we aim to find evidence for.  
   - Opposite of the null hypothesis.  
   - Example: **"The average test score is NOT 75."**

3. **📏 Significance Level ($\alpha$)**:  
   - The threshold to decide if the results are statistically significant.  
   - Common values: **5% (0.05)** or **1% (0.01).**  
   - It’s the risk of rejecting $H_0$ when it’s actually true (Type I error).

4. **📊 Test Statistic**:  
   - A number calculated from the data.  
   - Shows how far the sample data is from $H_0$.  
   - Examples: **$z$-score, $t$-score.**

5. **🎲 P-Value**:  
   - The probability of observing the test results (or more extreme ones) assuming $H_0$ is true.  
   - **Low P-value (< $\alpha$)**: Reject $H_0$.  
   - **High P-value (> $\alpha$)**: Fail to reject $H_0$.

6. **✅ Conclusion**:  
   - Decide whether to reject or fail to reject $H_0$ based on the P-value.



### 🧪 **Steps in Hypothesis Testing**:

1. **Define Hypotheses**:  
   Example:  
   - **$H_0$**: The average cookie weight is 50 grams. 🍪  
   - **$H_a$**: The average cookie weight is NOT 50 grams.

2. **Set Significance Level ($\alpha$)**:  
   - Example: Choose **$\alpha = 0.05$ (5%)**.

3. **Collect Data**:  
   - Example: Weigh a sample of cookies (say, 30 cookies). 🍪

4. **Compute Test Statistic**:  
   - Calculate how different the sample data is from $H_0$.

5. **Find the P-Value**:  
   - Use statistical software or tables to calculate the probability of the observed data.

6. **Make a Decision**:  
   - **If P-value ≤ $\alpha$:** Reject $H_0$ 🎉  
   - **If P-value > $\alpha$:** Fail to reject $H_0$ 🤔  

7. **Draw a Conclusion**:  
   - Example: “The cookies likely don’t weigh 50 grams on average.”



### 🔁 **Types of Hypothesis Tests:**

1. **👉 One-Tailed Test**:  
   Tests if a parameter is **greater than** or **less than** a certain value.  
   Example:  
   - $H_0$: "Mean ≤ 75."  
   - $H_a$: "Mean > 75."

2. **🔄 Two-Tailed Test**:  
   Tests if a parameter is **different (either higher or lower)** from a certain value.  
   Example:  
   - $H_0$: "Mean = 75."  
   - $H_a$: "Mean ≠ 75."



### 🍪 **Example in Layman’s Terms: The Cookie Detective!**

You claim: **"My cookies weigh 50 grams on average."** A skeptical customer decides to test your claim.

1. **Null Hypothesis ($H_0$)**: Your claim is true (cookies weigh 50g).  
2. **Alternative Hypothesis ($H_a$)**: Your claim is false (cookies don’t weigh 50g).  
3. **Significance Level ($\alpha$)**: The customer sets $\alpha = 0.05$.  
4. **Data Collection**: The customer weighs 30 cookies.  
5. **Test Statistic**: A calculation based on the weights.  
6. **P-Value**: Probability of getting these weights if $H_0$ is true.  
7. **Conclusion**:  
   - **If P ≤ 0.05**: The customer rejects $H_0$, saying, “Your cookies don’t weigh 50g!”  
   - **If P > 0.05**: The customer fails to reject $H_0$, accepting the claim as plausible.

### 🎨 **Visual Summary:**

| **Component**          | **Explanation**                           | **Example**                   |
|-------------------------|-------------------------------------------|-------------------------------|
| **Null Hypothesis ($H_0$)** | The default claim                     | "Cookies weigh 50g." 🍪        |
| **Alternative Hypothesis ($H_a$)** | What we want to prove             | "Cookies don’t weigh 50g."    |
| **Significance Level ($\alpha$)** | Risk of false rejection            | 5% (0.05)                     |
| **P-Value**             | Evidence against $H_0$                 | P = 0.036 (Reject $H_0$)    |
| **Decision**            | Reject or Fail to Reject $H_0$         | "Cookies likely ≠ 50g." 🎉    |




### 💡 **Common Mistakes to Avoid:**

1. **Confusing "Failing to Reject $H_0$" with "Accepting $H_0$"**:  
   - Failing to reject $H_0$ doesn’t mean $H_0$ is true! It just means the evidence isn’t strong enough.

2. **Ignoring Sample Size**:  
   - Small sample sizes can lead to unreliable results. 📉

3. **Misinterpreting P-Values**:  
   - A low P-value doesn’t mean $H_a$ is 100% true, just that $H_0$ is unlikely.

---

### 🚦 **Rejection Region Approach in Hypothesis Testing**

The **Rejection Region Approach** is a classic method used in hypothesis testing to decide whether to reject the null hypothesis ($H_0$). It’s like drawing a boundary on a graph 📉—if the test statistic falls in the **"Rejection Region"**, we reject $H_0$.  

Let’s break it down step by step with a splash of color 🌈!



### 🌟 **Key Concepts in Rejection Region Approach**

1. **Rejection Region** 🎯:  
   - This is the area in the tails of the distribution where the null hypothesis ($H_0$) is unlikely to be true.  
   - If your test statistic lands here, it’s a signal to reject $H_0$. 🚫  

2. **Critical Value** 🚩:  
   - The cutoff point(s) that define the boundary of the rejection region.  
   - Determined based on the **significance level ($\alpha$)**.  

3. **Test Statistic** 📊:  
   - A value calculated from the sample data.  
   - If this value is more extreme than the critical value, it’s in the rejection region.

4. **Decision Rule** ✅:  
   - **If Test Statistic is in the Rejection Region:** Reject $H_0$.  
   - **If Test Statistic is outside the Rejection Region:** Fail to reject $H_0$.  



### 🧪 **Steps in Rejection Region Approach**

1. **State the Hypotheses** 🎯:  
   - Define $H_0$ and $H_a$ based on the problem.  

2. **Choose the Significance Level ($\alpha$)** 📏:  
   - Example: $\alpha = 0.05$.  

3. **Determine the Critical Value(s)** 🚩:  
   - Use statistical tables (e.g., $z$-table, $t$-table) or software to find the cutoff.  

4. **Calculate the Test Statistic** 📊:  
   - Compute the test statistic based on the sample data.  

5. **Compare Test Statistic to Critical Value(s)** 🆚:  
   - Check if the test statistic falls within the rejection region.  

6. **Make a Decision** ✅:  
   - **In Rejection Region:** Reject $H_0$.  
   - **Outside Rejection Region:** Fail to reject $H_0$.  



### 🔄 **Types of Rejection Regions**

1. **One-Tailed Test (Left-Tail or Right-Tail)** 👉:  
   - **Left-Tailed Test:** Reject $H_0$ if the test statistic is less than the critical value.  
   - **Right-Tailed Test:** Reject $H_0$ if the test statistic is greater than the critical value.  

   **Example:** Testing if the mean is less than a specific value.

   **Graph**:  
   ```
   |Rejection Region          | (One tail)
           ^
       Critical Value
   ```

2. **Two-Tailed Test** 🔄:  
   - Reject $H_0$ if the test statistic is **too low** or **too high** compared to the critical values.  

   **Example:** Testing if the mean is **not equal** to a specific value.

   **Graph**:  
   ```
   Rejection Region |-|-Rejection Region (Two tails)
                            ^
                      Critical Values
   ```



### 🍪 **Example: Cookie Weights**

Imagine you claim your cookies weigh **50g on average**. Someone doubts it and conducts a test:  

1. **Null Hypothesis ($H_0$)**: Cookies weigh 50g ($\mu = 50$).  
2. **Alternative Hypothesis ($H_a$)**: Cookies don’t weigh 50g ($\mu \neq 50$).  

#### **Step-by-Step Using Rejection Region:**

1. **Choose $\alpha = 0.05$** (5% significance level).  
2. **Two-Tailed Test** → Divide $\alpha$ into two tails: $0.025$ in each.  
3. **Find Critical Values**:  
   - Using a $z$-table for $\alpha = 0.025$ (two-tailed), the critical values are $-1.96$ and $+1.96$.  
4. **Calculate Test Statistic**:  
   - Suppose the test statistic is $z = -2.5$.  
5. **Compare Test Statistic to Critical Values**:  
   - $z = -2.5$ falls in the **left rejection region** ($z < -1.96$).  
6. **Conclusion**:  
   - Reject $H_0$! The cookies don’t weigh 50g on average.  



### 🎨 **Visual Representation**  

#### Two-Tailed Test  
```
Rejection Region ||| Rejection Region
                 -1.96    0     +1.96
                   ^
         Test Statistic (-2.5)
```

#### One-Tailed Test (Right-Tail)  
```
                       |Rejection Region
                                    +1.645
                                      ^
                            Test Statistic (+2.0)
```



### 🔔 **Key Points to Remember**

1. **Rejection Region Depends on $\alpha$:**  
   - Smaller $\alpha$ → Narrower rejection region.  
   - Larger $\alpha$ → Wider rejection region.

2. **Critical Values Depend on the Test Type:**  
   - **$z$-test:** Standard normal distribution.  
   - **$t$-test:** Student’s $t$-distribution (for small samples).

3. **Rejection ≠ Proof of $H_a$:**  
   - Rejecting $H_0$ means the data provides strong evidence against it, not absolute proof of $H_a$.

---

![](rejection_region.png)

---

### 🎯 **One-Sided vs. Two-Sided Tests in Hypothesis Testing**

When performing hypothesis testing, you must decide whether you’re looking for a specific direction in your results (**one-sided test**) or any significant difference, regardless of direction (**two-sided test**). Let’s explore the differences between the two types of tests with a colorful, interactive explanation! 🌈



### 🌟 **One-Sided Test**

A **one-sided test** is used when you want to check for a difference in a specific direction—**either greater than** or **less than** a given value.  

#### 💡 Key Features:
- Focuses on **one tail** of the distribution.
- The entire significance level ($\alpha$) is concentrated in that one tail.
- Typically used when there’s a strong reason to expect a change in one direction.

#### 🧮 **Example Scenarios:**
- Testing if a new drug **increases** recovery rates compared to the standard treatment.
- Determining if a machine produces **less than** 500 defective items per batch.

#### 🔢 **Hypotheses for a One-Sided Test**:
1. **Right-tailed Test** (Checking if the value is **greater**):  
   $$
   H_0: \mu \leq \mu_0 \quad \text{vs.} \quad H_a: \mu > \mu_0
   $$
2. **Left-tailed Test** (Checking if the value is **less**):  
   $$
   H_0: \mu \geq \mu_0 \quad \text{vs.} \quad H_a: \mu < \mu_0
   $$

#### 🎨 **Graph for One-Sided Test**  
For a **right-tailed test** ($H_a: \mu > \mu_0$):  
```
        |---------------------------|--- Rejection Region
                                    ^
                                Critical Value
```

For a **left-tailed test** ($H_a: \mu < \mu_0$):  
```
Rejection Region ---|---------------------------|
                    ^
                Critical Value
```



### 🌟 **Two-Sided Test**

A **two-sided test** is used when you want to check for any **difference** (either higher or lower) from a given value.  

#### 💡 Key Features:
- Focuses on **both tails** of the distribution.
- The significance level ($\alpha$) is split equally between the two tails.
- Typically used when there’s no specific direction expected in the change.

#### 🧮 **Example Scenarios:**
- Testing if a new teaching method changes the average test scores (could be higher or lower).
- Determining if a batch of products has a mean weight **different** from 1 kg.

#### 🔢 **Hypotheses for a Two-Sided Test**:
$$
H_0: \mu = \mu_0 \quad \text{vs.} \quad H_a: \mu \neq \mu_0
$$

#### 🎨 **Graph for Two-Sided Test**  
```
Rejection Region ---|---|----------------|---|--- Rejection Region
                  Lower    Center         Upper
                  Critical Critical       Critical
                   Value    Value          Value
```



### 🔑 **Key Differences Between One-Sided and Two-Sided Tests**

| Feature             | One-Sided Test                           | Two-Sided Test                        |
|---------------------|------------------------------------------|---------------------------------------|
| **Direction**        | Tests in **one specific direction**.     | Tests for **any difference** (higher or lower). |
| **Significance Level** | Entire $\alpha$ is in **one tail**.    | $\alpha$ is **split between two tails**. |
| **Critical Region**  | Located in **one tail** of the distribution. | Located in **both tails**.            |
| **Hypotheses**       | Focus on $>$ or $<$.                 | Focus on $\neq$.                   |
| **Sensitivity**      | More **sensitive** to detecting differences in the specified direction. | Less sensitive but captures differences in both directions. |



### 🧪 **Example: Cookie Weights**

Let’s say you’re testing if your cookies weigh **exactly 50g** or not.

#### **Scenario 1: One-Sided Test (Right-Tailed)**  
You believe your cookies weigh **more than** 50g.

- $H_0: \mu \leq 50$ (Cookies weigh 50g or less).  
- $H_a: \mu > 50$ (Cookies weigh more than 50g).  

If the test statistic is **greater than the critical value**, you reject $H_0$.

#### **Scenario 2: Two-Sided Test**  
You suspect your cookies weigh **different** from 50g, but you’re unsure if it’s higher or lower.

- $H_0: \mu = 50$ (Cookies weigh exactly 50g).  
- $H_a: \mu \neq 50$ (Cookies weigh something other than 50g).  

If the test statistic falls in **either tail** (too high or too low), you reject $H_0$.



### 🎨 **Visual Representation**

#### **One-Sided Test (Right-Tailed)**
```
                          |------------ Rejection Region
Critical Value --->     ^
```

#### **Two-Sided Test**
```
Rejection Region ---|---|-----------------|---|--- Rejection Region
                   Lower                  Upper
                   Critical               Critical
                   Value                  Value
```



### 🛠️ **When to Use Which Test?**

1. **Use One-Sided Test When**:  
   - You have a specific direction in mind.
   - Example: Checking if a new drug increases recovery time.

2. **Use Two-Sided Test When**:  
   - You’re open to differences in **any direction**.  
   - Example: Checking if a new drug has a different effect, regardless of whether it’s better or worse.

---

![](one_sided.png)

---

### 🌟 **What is a Z-Test? (Simplified!)**

Think of a z-test as a way to **check if a claim about a group of things makes sense**. Imagine you’re testing if cookies from a bakery are really 50g each, as the bakery claims. A z-test helps you figure out if your measurements (sample data) agree with the bakery's claim (population data).



### 🔑 **Key Ideas Behind a Z-Test**

1. **Population**: The whole group you’re studying (e.g., all cookies in the bakery).
2. **Sample**: A smaller group you test (e.g., 30 cookies you randomly pick).
3. **Null Hypothesis ($H_0$)**: The bakery’s claim is correct (cookies = 50g).
4. **Alternative Hypothesis ($H_a$)**: The bakery’s claim is wrong (cookies ≠ 50g).



### 🛠️ **How Does a Z-Test Work?**

It compares:
1. **The claim (population mean)** against
2. **What you actually measured (sample mean)**.



### 🚀 **When Do You Use a Z-Test?**

Use it when:
- 🧮 The data is **normally distributed** (or your sample size is large, $n \geq 30$).
- 📊 You **know the population standard deviation** ($\sigma$).



### 🧪 **How to Conduct a Z-Test**

1. **State Your Hypotheses**:
   - Null Hypothesis ($H_0$): “The claim is true.”
   - Alternative Hypothesis ($H_a$): “The claim is false.”

2. **Decide the Significance Level ($\alpha$)**:
   - Example: $\alpha = 0.05$ (You allow a 5% chance of being wrong).

3. **Calculate the Z-Statistic**:
   Use the formula:
   $$
   Z = \frac{\text{Sample Mean} - \text{Population Mean}}{\text{Standard Error}}
   $$
   **Standard Error**:
   $$
   \text{Standard Error} = \frac{\sigma}{\sqrt{n}}
   $$

4. **Find the Critical Value**:
   - Use a z-table or standard chart.
   - For $\alpha = 0.05$ (two-tailed test): Critical values are $-1.96$ and $1.96$.

5. **Make a Decision**:
   - If the z-statistic lies in the rejection region (beyond critical values), reject $H_0$.



### 🍪 **Let’s Simplify with a Cookie Example**

#### Problem:
You think the bakery’s cookies don’t weigh 50g on average. You measure 30 cookies and find:
- Sample Mean ($\bar{X}$) = 48g
- Population Mean ($\mu$) = 50g
- Standard Deviation ($\sigma$) = 4g

Test this at a 5% significance level ($\alpha = 0.05$).



#### Solution:

1. **State Hypotheses**:
   - $H_0$: Cookies weigh 50g ($\mu = 50$).
   - $H_a$: Cookies don’t weigh 50g ($\mu \neq 50$).

2. **Significance Level**: $\alpha = 0.05$.

3. **Calculate Z-Statistic**:
   $$
   Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
   $$
   Substituting values:
   $$
   Z = \frac{48 - 50}{\frac{4}{\sqrt{30}}} = \frac{-2}{0.73} \approx -2.74
   $$

4. **Find Critical Values**:
   - For $\alpha = 0.05$ (two-tailed): Critical values are $-1.96$ and $1.96$.

5. **Decision**:
   - $Z = -2.74$, which is less than $-1.96$.
   - **Reject $H_0$**. The cookies don’t weigh 50g on average.

---

![](z-test.png)

---

### 🌟 **What is a P-Value?** 🌟

A **p-value** is a probability that helps us decide whether the evidence we have from a sample is strong enough to reject the null hypothesis in statistical hypothesis testing. 

To put it simply, the p-value is **how likely** you would see your results (or something more extreme) **if the null hypothesis were true**.



### 🔑 **Key Concept: Null Hypothesis and P-Value**

1. **Null Hypothesis ($H_0$)**: This is a statement we want to test. It often represents the status quo or a claim we want to challenge. For example, "The average weight of cookies is 50g."

2. **P-Value**: The probability of obtaining results at least as extreme as the ones you actually observed, **given that the null hypothesis is true**.



### 📊 **What Does the P-Value Tell Us?**

- **Small p-value (< 0.05)**: If the p-value is smaller than the significance level $ \alpha $ (typically 0.05), it suggests that the observed data is **unlikely** under the null hypothesis, **so we reject $H_0$**.
  
- **Large p-value (≥ 0.05)**: If the p-value is larger than the significance level $ \alpha $, it suggests that the observed data is **consistent** with the null hypothesis, **so we fail to reject $H_0$**.



### 🚀 **Interpreting P-Value in Action**

Let's use a **cookie example** to visualize it:

#### Problem:
You are testing the claim that cookies weigh 50g on average, and you have a sample of cookies with a sample mean of 48g. You want to check if this evidence is strong enough to reject the bakery’s claim at a 5% significance level ($ \alpha = 0.05 $).

- Null Hypothesis ($H_0$): Cookies weigh 50g on average ($ \mu = 50 $).
- Alternative Hypothesis ($H_a$): Cookies do not weigh 50g on average ($ \mu \neq 50 $).

#### Solution:

1. **Calculate the Z-Statistic**: 
   $$
   Z = \frac{\bar{X} - \mu}{\text{SE}} = \frac{48 - 50}{0.73} = -2.74
   $$

2. **Look up the p-value**: The p-value corresponds to the area to the left of **z = -2.74** (or the area to the right if z is positive). 

   - From the z-table or using a calculator, you’ll find the **p-value ≈ 0.006**.



### 🔍 **What Does the P-Value of 0.006 Mean?**

Since **0.006** is **less than 0.05**, it means the observed sample mean of 48g is **highly unlikely** if the true average weight of the cookies were 50g.

So, **we reject the null hypothesis ($H_0$)** and conclude that the cookies **do not** weigh 50g on average.



### 🌈 **Visualizing the P-Value**

Let’s visualize what happens with a z-test and the p-value:

1. **Z-Distribution Curve**: This shows the distribution of the test statistic (Z-value).
2. **Rejection Region**: If the p-value is small, the test statistic falls in the tail (rejection region).
3. **P-Value Area**: The smaller the p-value, the larger the tail area, indicating stronger evidence against $H_0$.

---

Great question! Let's walk through the process of **calculating the p-value** step by step using the z-statistic we found earlier.

### Step-by-Step Guide to Calculate P-Value

1. **Z-Statistic Calculation** (we already calculated this):
   $$
   Z = \frac{\bar{X} - \mu}{\text{Standard Error}} = \frac{48 - 50}{0.73} = -2.74
   $$
   So, the z-statistic is **-2.74**.

2. **Understanding the P-Value**:
   - The p-value corresponds to the **area under the standard normal curve** that is **more extreme** than the z-statistic.
   - Since we're doing a **two-tailed test**, we want to find the probability (p-value) that the test statistic is **either less than -2.74** or **greater than +2.74**.



### How to Find the P-Value:

- **Step 1**: Look up the z-statistic **-2.74** in a **z-table** or use a statistical function to get the cumulative probability for this value.
  
  In the **z-table**:
  - Look up **2.74** (ignoring the negative sign) in the table.
  - The table will give you the **area to the left** of this z-value.
  - For **z = 2.74**, the area to the left is approximately **0.0031**.

- **Step 2**: Since the z-statistic is negative (-2.74), the area to the left of this z-value is **0.0031**, which represents the probability of getting a value less than **-2.74**.

- **Step 3**: Multiply that probability by **2** (because it's a two-tailed test) to account for both sides of the distribution:
  $$
  \text{p-value} = 2 \times 0.0031 = 0.0062
  $$



### Why is the P-Value ~0.006?

- The p-value of **0.0062** means there's a **0.62% chance** of observing a sample mean as extreme as **48g** (or more extreme) if the true population mean was 50g.
- Since **0.0062** is **smaller than 0.05** (our significance level), it tells us the evidence **strongly suggests** that the true mean is **not** 50g, so we **reject the null hypothesis**.



### TL;DR
- The **z-statistic** of **-2.74** corresponds to a **cumulative probability of 0.0031** (for the left tail).
- Multiply by 2 for the two-tailed test: **p-value ≈ 0.006**.
- A **small p-value (< 0.05)** means we reject the null hypothesis.

---

### 🌟 **What is a T-Test?** 🌟

A **t-test** is a statistical test used to compare the means of two groups or a sample mean with a population mean. It's particularly useful when the **sample size is small** and the **population standard deviation is unknown** (which is often the case in real-life situations). The t-test helps us determine if there is a significant difference between the groups being compared.



### 🔑 **Key Concepts in T-Test** 🔑

1. **Null Hypothesis ($H_0$)**: The idea that there is **no significant difference** between the groups or the sample mean and the population mean. For example, "The average height of students in class A is the same as class B."
2. **Alternative Hypothesis ($H_a$)**: The idea that there **is a significant difference** between the groups or the sample mean and the population mean. For example, "The average height of students in class A is different from class B."



### 🧪 **Types of T-Tests** 🧪

1. **One-Sample T-Test**: 
   - Compares the sample mean to a known population mean.
   - Example: You want to check if the average weight of cookies in your sample is different from the claimed 50g.

2. **Independent Two-Sample T-Test**: 
   - Compares the means of two independent groups.
   - Example: You compare the average height of male and female students in a class.

3. **Paired Sample T-Test**: 
   - Compares the means from the **same group** at two different times or conditions.
   - Example: You test if students’ test scores improve after attending a study session.



### 🛠️ **How Does a T-Test Work?** 🛠️

The **t-test** involves calculating the **t-statistic**, which is a measure of the difference between the sample mean and the population mean (or the means of two groups), **normalized by the variation in the data**.

The formula for the **t-statistic** is:

$$
t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}
$$

Where:
- **$\bar{X}$** = sample mean
- **$\mu$** = population mean (for one-sample t-test) or mean of the other group (for two-sample t-test)
- **$s$** = sample standard deviation
- **$n$** = sample size



### 🚀 **When Do You Use a T-Test?** 🚀

Use a **t-test** when:
- The **sample size** is small (typically $n < 30$).
- The **population standard deviation** is **unknown**.
- You want to compare means between **one or two groups**.



### 📊 **Steps to Perform a T-Test:**

1. **State Your Hypotheses**:
   - **Null Hypothesis ($H_0$)**: There is no difference (e.g., the means are equal).
   - **Alternative Hypothesis ($H_a$)**: There is a difference (e.g., the means are not equal).

2. **Choose the Significance Level ($\alpha$)**: 
   - Common choices are **0.05** (5% risk of Type I error) or **0.01** (1% risk).

3. **Calculate the T-Statistic**: 
   - Using the formula above, calculate the t-statistic for your data.

4. **Find the Critical Value**:
   - Use a **t-distribution table** or a calculator to find the critical t-value for your chosen significance level ($\alpha$) and degrees of freedom ($df = n - 1$).

5. **Make a Decision**:
   - Compare your **calculated t-statistic** with the **critical t-value**:
     - If **t-statistic** is **greater than the critical value** (in absolute terms), reject the **null hypothesis**.
     - If **t-statistic** is **less than the critical value**, fail to reject the null hypothesis.



### 🍪 **T-Test with a Cookie Example**

#### Problem:
You have a batch of **30 cookies** and you want to check if the **average weight** is different from the bakery’s claim of **50g**. After measuring the cookies, you find:
- **Sample Mean ($\bar{X}$) = 48g**
- **Sample Standard Deviation ($s$) = 4g**
- **Sample Size ($n$) = 30**

You’ll test the null hypothesis $H_0$: "The cookies weigh 50g on average."

#### Solution:

1. **State Hypotheses**:
   - $H_0$: Cookies weigh 50g ($\mu = 50$).
   - $H_a$: Cookies do not weigh 50g ($\mu \neq 50$).

2. **Significance Level**: $\alpha = 0.05$.

3. **Calculate T-Statistic**:
   Using the formula for the t-statistic:
   $$
   t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}} = \frac{48 - 50}{\frac{4}{\sqrt{30}}} = \frac{-2}{0.73} \approx -2.74
   $$

4. **Find Critical Value**:
   - For a **two-tailed test** with **30 - 1 = 29** degrees of freedom and a **5% significance level**, you can use a t-table or statistical software to find the **critical t-value ≈ ±2.045**.

5. **Decision**:
   - Since **t = -2.74** and the critical value is **±2.045**, the **t-statistic falls in the rejection region** (since $-2.74 < -2.045$), so we **reject the null hypothesis**.

**Conclusion**: The cookies do not weigh 50g on average.

---

![](t-test.png)

---

Yes, it's absolutely possible to help you understand how to use a **t-table**! Below is an explanation of how to read and use the **t-table** for different scenarios:

### Key Points:
- **Degrees of Freedom (df)**: For a **t-test**, degrees of freedom is calculated as **n - 1**, where **n** is the sample size.
- **One-Tailed and Two-Tailed**: You need to choose between these two based on your hypothesis. In a **two-tailed test**, we check both ends of the distribution.

Here is a **t-table** with highlights on what to check, and I’ll explain where to look depending on your scenario:

### Table: How to Check Values for t-test

| **df** | **0.50 (One-Tail)** | **0.75 (One-Tail)** | **0.80 (One-Tail)** | **0.85 (One-Tail)** | **0.90 (One-Tail)** | **0.95 (One-Tail)** | **0.975 (One-Tail)** | **0.99 (One-Tail)** | **0.995 (One-Tail)** | **0.999 (One-Tail)** | **0.9995 (One-Tail)** |
|--------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|---------------------|--------------------|--------------------|--------------------|----------------------|
| **1**  | 0.000              | 1.000              | 1.376              | 1.963              | 3.078              | 6.314              | 12.71               | 31.82              | 63.66              | 318.31             | 636.62               |
| **2**  | 0.000              | 0.816              | 1.061              | 1.386              | 1.886              | 2.920              | 4.303               | 6.965              | 9.925              | 22.327             | 31.599              |
| **3**  | 0.000              | 0.765              | 0.978              | 1.250              | 1.638              | 2.353              | 3.182               | 4.541              | 5.841              | 10.215             | 12.924              |
| **4**  | 0.000              | 0.741              | 0.941              | 1.190              | 1.533              | 2.132              | 2.776               | 3.747              | 4.604              | 7.173              | 8.610               |
| **5**  | 0.000              | 0.727              | 0.920              | 1.156              | 1.476              | 2.015              | 2.571               | 3.365              | 4.032              | 5.893              | 6.869               |
| **6**  | 0.000              | 0.718              | 0.906              | 1.134              | 1.440              | 1.943              | 2.447               | 3.143              | 3.707              | 5.208              | 5.959               |

### **How to Read the t-Table:**

#### **Steps:**
1. **Identify Degrees of Freedom (df):**  
   This depends on your **sample size (n)**. The formula for **df** is **n - 1**.
   
   For example, if you have **30** data points, then **df = 30 - 1 = 29**.
   
2. **Select One-Tailed or Two-Tailed Test:**
   - For **one-tailed tests**, check the left part of the table.
   - For **two-tailed tests**, check the right part of the table.
   
3. **Find the Significance Level (α)**:  
   The significance level (e.g., **0.05**, **0.01**) will help you identify the critical values. For example, in a **95% confidence level**, we use **α = 0.05** (split into two tails for a two-tailed test).

4. **Find the Corresponding t-Value:**
   - If you have **α = 0.05** for a **two-tailed test**, you'll look for the critical t-value that corresponds to **α = 0.025** on each side of the distribution.
   
5. **Compare the Calculated t-Statistic with Critical t-Value:**
   If your **calculated t-statistic** is greater than the critical value, you reject the null hypothesis.



### **Highlighting Key Values:**
If you're working with a **two-tailed test** and **α = 0.05**:
- You would look for the column corresponding to **0.025** for the **two-tailed** test (as the tail is split).
- **Critical t-value** for **df = 29** and **α = 0.05** (two-tailed test) would be **±2.045** (from the table).

For **one-tailed tests**, you check the values in the one-tail column. For example, for **α = 0.05**, the **critical t-value** would be **1.699** for **df = 29**.



### **Visual Representation for Two-Tailed Test:**

| **df** | **Critical t-Value (Two-Tails, α=0.05)** | **t-statistic Range for Accepting H₀** | **t-statistic Range for Rejecting H₀** |
|--------|----------------------------------------|--------------------------------------|--------------------------------------|
| **1**  | ±12.71                                  | | |
| **2**  | ±4.303                                  | | |
| **3**  | ±3.182                                  | | |
| **4**  | ±2.776                                  | | |
| **5**  | ±2.571                                  | | |
| **29** | ±2.045                                  | | |

For **df = 29**, you can look at the value **±2.045** for **α = 0.05 (two-tailed)**.

---

## **Chi-Square test**:

The **Chi-Square test** is a statistical test used to determine if there is a significant association between categorical variables or if a dataset matches an expected distribution. It is commonly used in hypothesis testing to compare observed and expected frequencies.



### Types of Chi-Square Tests
1. **Chi-Square Test of Independence**: 
   - Used to test if two categorical variables are independent of each other.
   - Example: Is there a relationship between gender and voting preference?

2. **Chi-Square Goodness-of-Fit Test**:
   - Used to determine if a sample data matches an expected distribution.
   - Example: Do observed dice rolls match the expected probabilities of a fair die?



### Formula for Chi-Square Test
$$
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
$$
Where:
- $ O_i $: Observed frequency for category $ i $.
- $ E_i $: Expected frequency for category $ i $.



### Steps to Perform a Chi-Square Test
#### 1. **Set Up Hypotheses**
- **Null Hypothesis ($ H_0 $)**:
  - For independence: The variables are independent.
  - For goodness-of-fit: The observed data matches the expected distribution.
- **Alternative Hypothesis ($ H_a $)**:
  - For independence: The variables are not independent.
  - For goodness-of-fit: The observed data does not match the expected distribution.

#### 2. **Calculate Expected Frequencies**
- **For Test of Independence**:
  $$
  E_{ij} = \frac{(Row \ Total \times Column \ Total)}{Grand \ Total}
  $$
  Where $ E_{ij} $ is the expected frequency for cell $ (i, j) $.
- **For Goodness-of-Fit**:
  $$
  E_i = N \times P_i
  $$
  Where $ P_i $ is the expected proportion, and $ N $ is the total sample size.

#### 3. **Compute the Chi-Square Statistic**
Use the formula:
$$
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
$$
Sum across all categories.

#### 4. **Determine Degrees of Freedom**
- **For Test of Independence**:
  $$
  df = (R - 1)(C - 1)
  $$
  Where $ R $ is the number of rows, and $ C $ is the number of columns in the contingency table.
- **For Goodness-of-Fit**:
  $$
  df = k - 1
  $$
  Where $ k $ is the number of categories.

#### 5. **Find the Critical Value**
- Use a Chi-Square distribution table with the degrees of freedom and significance level ($ \alpha $, usually 0.05).

#### 6. **Make a Decision**
- If $ \chi^2 $ (calculated) > $ \chi^2 $ (critical value): Reject $ H_0 $.
- Otherwise: Fail to reject $ H_0 $.

### Example: Chi-Square Test of Independence
**Question**: Is there a relationship between gender and preference for a product?

|              | Product A | Product B | Product C | Total |
|--------------|-----------|-----------|-----------|-------|
| **Male**     | 20        | 15        | 25        | 60    |
| **Female**   | 30        | 25        | 35        | 90    |
| **Total**    | 50        | 40        | 60        | 150   |



1. **Calculate Expected Frequencies**:
   For **Male, Product A**:
   $$
   E_{11} = \frac{(Row \ Total \times Column \ Total)}{Grand \ Total} = \frac{(60 \times 50)}{150} = 20
   $$
   Repeat for all cells.

2. **Calculate $ \chi^2 $**:
   Use the formula:
   $$
   \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
   $$

3. **Degrees of Freedom**:
   $$
   df = (R - 1)(C - 1) = (2 - 1)(3 - 1) = 2
   $$

4. **Compare with Critical Value**:
   At $ \alpha = 0.05 $, critical value for $ df = 2 $ is 5.991.

5. **Decision**:
   - If $ \chi^2 $ > 5.991, reject $ H_0 $: Gender and product preference are not independent.
   - Otherwise: Fail to reject $ H_0 $.



### Example: Chi-Square Goodness-of-Fit Test
**Question**: Is a die fair?  
Observed frequencies: [16, 14, 18, 12, 20, 10]  
Expected frequencies (for a fair die): [15, 15, 15, 15, 15, 15]

1. **Calculate $ \chi^2 $**:
   $$
   \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} = \frac{(16-15)^2}{15} + \frac{(14-15)^2}{15} + \dots
   $$

2. **Degrees of Freedom**:
   $$
   df = k - 1 = 6 - 1 = 5
   $$

3. **Critical Value**:
   At $ \alpha = 0.05 $, critical value for $ df = 5 $ is 11.07.

4. **Decision**:
   - If $ \chi^2 $ > 11.07, reject $ H_0 $: The die is not fair.
   - Otherwise: Fail to reject $ H_0 $.



### Key Assumptions
1. Observations are independent.
2. Expected frequencies in each category should be at least 5.

---



### 🔎 **What is the Chi-Square Test?**
It’s a test to answer questions like:
1. Are two things related? (e.g., Does gender affect voting preference? 🤔)
2. Does my data look like what I expected? (e.g., Are the colors in a bag of M&Ms evenly distributed? 🍫)



### 🎲 **How Does It Work?**
1. **Collect Your Observed Data (O)**:
   These are the actual numbers you see in your data.

2. **Calculate the Expected Data (E)**:
   This is what you think the data should look like if your guess (hypothesis) is true.

3. **Compare Observed vs. Expected**:
   If they’re very different, something might be going on!



### 🧮 **The Formula**:
$$
\chi^2 = \sum \frac{(O - E)^2}{E}
$$

Think of it as:
1. Take the difference between observed and expected: $ O - E $.
2. Square it to make everything positive: $ (O - E)^2 $.
3. Divide by the expected value to account for size: $ \frac{(O - E)^2}{E} $.
4. Add it all up for every category. 🎉



### **Types of Chi-Square Tests**
#### 1. **Test of Independence** (Are two things related? 🤝)
   - Example: Does **gender** affect whether someone prefers Product A or Product B? 🛍️
   - You use a table of counts (contingency table) to check for relationships.

#### 2. **Goodness-of-Fit Test** (Does my data fit the expected distribution? 🎨)
   - Example: Does a die roll give equal numbers for 1, 2, 3, 4, 5, 6? 🎲
   - You compare your observed data to an expected distribution.

### 🌈 **Let’s Make It Fun With an Example!**
Imagine you have a bag of Skittles 🍬, and you want to see if the colors are evenly distributed.

#### 🛠️ Your Data:
| Color     | Observed (O) | Expected (E) |
|-----------|--------------|--------------|
| Red       | 12           | 10           |
| Green     | 15           | 10           |
| Blue      | 8            | 10           |
| Yellow    | 5            | 10           |
| Purple    | 10           | 10           |



#### 🎯 Step 1: Compute $ \chi^2 $:
For each color:
$$
\text{Chi-Square for each color} = \frac{(O - E)^2}{E}
$$
- Red: $ \frac{(12 - 10)^2}{10} = 0.4 $
- Green: $ \frac{(15 - 10)^2}{10} = 2.5 $
- Blue: $ \frac{(8 - 10)^2}{10} = 0.4 $
- Yellow: $ \frac{(5 - 10)^2}{10} = 2.5 $
- Purple: $ \frac{(10 - 10)^2}{10} = 0.0 $

Total $ \chi^2 $:
$$
\chi^2 = 0.4 + 2.5 + 0.4 + 2.5 + 0.0 = 5.8
$$



#### 🎯 Step 2: Find Degrees of Freedom ($ df $):
$$
df = k - 1
$$
Where $ k $ is the number of categories. Here:
$$
df = 5 - 1 = 4
$$



#### 🎯 Step 3: Compare With Critical Value:
- Look up a **Chi-Square Table** for $ df = 4 $ and $ \alpha = 0.05 $ (5% significance level). Critical value = **9.488**.
- If $ \chi^2 $ < 9.488, your data is fine (no evidence of uneven distribution).
- If $ \chi^2 $ > 9.488, something unusual is happening.

Here:
$$
\chi^2 = 5.8 < 9.488
$$
So, we **don’t reject the null hypothesis**. 🎉 The Skittles are evenly distributed.



### 🖍️ **Key Points to Remember**
1. A big $ \chi^2 $ value means your data and expectations are quite different. 🚨
2. Small $ \chi^2 $ means your data matches your expectations well. ✅
3. Always check degrees of freedom (df) and the Chi-Square table.

``` python

import scipy.stats as stats

# Observed data (O) - number of Skittles of each color
observed = [12, 15, 8, 5, 10]

# Expected data (E) - assuming equal distribution
expected = [10, 10, 10, 10, 10]

# Perform Chi-Square Test
chi2, p_value = stats.chisquare(f_obs=observed, f_exp=expected)

# Results
print(f"Chi-Square Statistic: {chi2}")
print(f"P-Value: {p_value}")

# Decision based on significance level (alpha = 0.05)
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The distribution is not as expected.")
else:
    print("Fail to reject the null hypothesis: The distribution matches expectations.")

```

---

## 🎯 **What is ANOVA?**(**Analysis of Variance**)

- **ANOVA** is a statistical test that compares the means of **two or more groups** to see if they are **significantly different** from each other.  
- It answers: **"Do these groups have the same average?"**  
  Think of comparing the exam scores of students in three different classes. 🏫



## **Key Concepts**

1. **Null Hypothesis ($ H_0 $)**:  
   All group means are equal.  
   Example: The average exam scores of the three classes are the same.

2. **Alternative Hypothesis ($ H_1 $)**:  
   At least one group mean is different.  
   Example: At least one class has a different average exam score.

3. **F-Statistic**:  
   Measures how much the group means differ relative to the variability within the groups.  
   Larger $ F $-values indicate more difference between group means.

4. **P-Value**:  
   If $ p $-value < 0.05, reject $ H_0 $: There’s a significant difference.



## **How ANOVA Works (Simple Steps)**

1. **Calculate Group Means**: Find the average of each group.
2. **Measure Variability**:  
   - **Between Groups**: How much group means differ from the overall mean.  
   - **Within Groups**: How much individual data points differ within each group.
3. **Calculate F-Statistic**:  
   $ F = \frac{\text{Variance Between Groups}}{\text{Variance Within Groups}} $
4. **Compare P-Value to 0.05**: Decide if differences are significant.



## 🔢 **Types of ANOVA**

1. **One-Way ANOVA**:  
   Compares means of **one factor** (e.g., scores across 3 classes).  
2. **Two-Way ANOVA**:  
   Compares means of **two factors** (e.g., scores by class and gender).  
3. **Repeated Measures ANOVA**:  
   Measures the same group multiple times (e.g., performance before and after training).



## 📊 **Python Code for One-Way ANOVA**

Here’s a simple example with students’ scores from three classes.

```python
import scipy.stats as stats

# Scores from three classes
class_a = [85, 88, 92, 95, 90]
class_b = [78, 82, 80, 84, 79]
class_c = [92, 94, 89, 96, 91]

# Perform One-Way ANOVA
f_stat, p_value = stats.f_oneway(class_a, class_b, class_c)

# Results
print(f"F-Statistic: {f_stat}")
print(f"P-Value: {p_value}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: At least one group mean is different.")
else:
    print("Fail to reject the null hypothesis: The group means are similar.")
```



### 🧪 **Example Output**
```
F-Statistic: 24.87
P-Value: 0.0001
Reject the null hypothesis: At least one group mean is different.
```



## 🌟 **Key Takeaways**

- ANOVA helps you determine if group means differ, but it doesn’t tell **which groups** are different. For that, we use **post-hoc tests** like Tukey’s test.
- ANOVA assumes:
  - Data is normally distributed.
  - Variances are similar across groups.
  - Observations are independent.

---

## 🎯 **What is ANOVA (Super Simple Explanation)?**

Imagine 🍕 **you have 3 pizza brands**—A, B, and C—and you want to know:  
**"Do people like one brand more than the others?"**

1. 🍕 People taste pizza from **Brand A**, **Brand B**, and **Brand C**.
2. They give scores (ratings) for each brand.
3. You collect the scores and now you ask:  
   **"Are these scores different enough to say people prefer one brand?"**



## 🚦 **How ANOVA Works (The Pizza Story)**

To answer this question, ANOVA checks **two things**:

1. **How far apart are the average scores for the brands?**  
   (This is **Between-Group Variance**—how the brands differ.)

2. **How much variation is there within each brand’s scores?**  
   (This is **Within-Group Variance**—how people differ in scoring the same brand.)



### 🍕 **Key Idea**: Compare **Between-Group Variance** with **Within-Group Variance**  
- If the **differences between groups** are much bigger than the **variations within groups**, then the groups are **different**.
- If not, the groups are probably **similar**.



## 📊 **Simple Steps for ANOVA**

1. 🧮 **Calculate the group averages (means):**  
   For each pizza brand, find the average score.

2. 📈 **Measure Variations:**
   - **Between Groups:** How different the group averages are.
   - **Within Groups:** How much people’s scores differ **within the same group**.

3. 🎯 **Calculate the F-Statistic:**  
   This tells us if the differences between groups are **significant**.

4. 🤔 **Check P-Value:**  
   - If $ p $-value < 0.05, it means **YES, the groups are different**.
   - If $ p $-value >= 0.05, it means **NO, the groups are not different**.



## 🐍 **Python Example (Pizza Ratings)**

Let’s say these are the scores for each brand:

- Brand A: [4, 5, 3, 4, 5]  
- Brand B: [2, 3, 2, 3, 2]  
- Brand C: [5, 5, 4, 5, 4]

### Code:
```python
import scipy.stats as stats

# Pizza scores
brand_a = [4, 5, 3, 4, 5]
brand_b = [2, 3, 2, 3, 2]
brand_c = [5, 5, 4, 5, 4]

# Perform One-Way ANOVA
f_stat, p_value = stats.f_oneway(brand_a, brand_b, brand_c)

# Results
print(f"F-Statistic: {f_stat:.2f}")
print(f"P-Value: {p_value:.4f}")

# Interpretation
if p_value < 0.05:
    print("The groups are significantly different. 🍕 Someone has a favorite!")
else:
    print("The groups are not significantly different. 🍕 All brands are equally liked!")
```



### 🧪 **Output**  
```
F-Statistic: 25.71  
P-Value: 0.0002  
The groups are significantly different. 🍕 Someone has a favorite!
```



## 🌟 **What Happens Behind the Scenes?**

- **Group Averages**: Calculate the mean for each group (e.g., Brand A, B, C).  
- **Variability**: Check how scores differ within and between groups.  
- **F-Test**: Combine these numbers to see if group differences are real or just random.



## 🥳 **Key Takeaways (Pizza Style):**

- ANOVA checks if **group averages** (means) are significantly different.  
- If $ p $-value < 0.05: The differences are **real**!  
- If $ p $-value ≥ 0.05: The differences are **not real** (just random).  

---

## 🎯 **Example: Testing Donut Shop Sales**

Three donut shops—**Shop A**, **Shop B**, and **Shop C**—track their daily sales (in dollars) for a week.  
You want to know:  
**"Do the shops have significantly different average sales?"**



### 🛍️ **Data Collection**  
Here’s the sales data:  

- **Shop A**: \$100, \$120, \$110, \$115, \$105  
- **Shop B**: \$80, \$85, \$90, \$95, \$85  
- **Shop C**: \$130, \$140, \$125, \$135, \$145  



### 🔢 **Step 1: Calculate Group Averages**

1. Calculate the average sales for each shop:
   - Shop A: $ (100 + 120 + 110 + 115 + 105) / 5 = 110 $
   - Shop B: $ (80 + 85 + 90 + 95 + 85) / 5 = 87 $
   - Shop C: $ (130 + 140 + 125 + 135 + 145) / 5 = 135 $

2. The group means are:
   - **Shop A**: 110  
   - **Shop B**: 87  
   - **Shop C**: 135  



### 🧮 **Step 2: Calculate Overall Mean**

- Combine all sales data:
  $$
  \text{Overall Mean} = \frac{100 + 120 + 110 + 115 + 105 + 80 + 85 + 90 + 95 + 85 + 130 + 140 + 125 + 135 + 145}{15} = 110
  $$



### 📊 **Step 3: Measure Variations**

1. **Between-Group Variation**:  
   How much do the group means (Shop A, B, C) differ from the overall mean?  
   Use this formula:  
   $$
   SS_{\text{between}} = n \cdot \sum (\text{Group Mean} - \text{Overall Mean})^2
   $$  
   Where $ n $ is the number of data points per group.  

   $$
   SS_{\text{between}} = 5 \cdot \left[(110 - 110)^2 + (87 - 110)^2 + (135 - 110)^2\right]
   $$  
   $$
   SS_{\text{between}} = 5 \cdot \left[0 + 529 + 625\right] = 5770
   $$

2. **Within-Group Variation**:  
   How much do individual data points vary within each group?  
   Use this formula:  
   $$
   SS_{\text{within}} = \sum (\text{Data Point} - \text{Group Mean})^2
   $$  

   For Shop A:  
   $$
   (100 - 110)^2 + (120 - 110)^2 + (110 - 110)^2 + (115 - 110)^2 + (105 - 110)^2 = 250
   $$  
   Similarly, calculate for Shop B and Shop C:
   - Shop B: $ 250 $
   - Shop C: $ 500 $

   $$
   SS_{\text{within}} = 250 + 250 + 500 = 1000
   $$



### 🧪 **Step 4: Calculate the F-Statistic**

The F-statistic compares the **Between-Group Variation** and the **Within-Group Variation**:  
$$
F = \frac{\text{Mean Square Between Groups}}{\text{Mean Square Within Groups}}
$$  
Where:  
$$
\text{Mean Square Between Groups} = \frac{SS_{\text{between}}}{k - 1} \quad \text{and} \quad \text{Mean Square Within Groups} = \frac{SS_{\text{within}}}{N - k}
$$  
- $ k = $ Number of groups = 3  
- $ N = $ Total data points = 15  

1. **Mean Square Between Groups**:
   $$
   MS_{\text{between}} = \frac{5770}{3 - 1} = \frac{5770}{2} = 2885
   $$

2. **Mean Square Within Groups**:
   $$
   MS_{\text{within}} = \frac{1000}{15 - 3} = \frac{1000}{12} \approx 83.33
   $$

3. **F-Statistic**:
   $$
   F = \frac{2885}{83.33} \approx 34.61
   $$



### 📊 **Step 5: Check the P-Value**

- Use the F-distribution table or Python to find the p-value for $ F = 34.61 $ with $ df_{\text{between}} = 2 $ and $ df_{\text{within}} = 12 $.
- The p-value is **very small** (e.g., $ p < 0.05 $), meaning the group means are significantly different.



### 🐍 **Python Code**

```python
import scipy.stats as stats

# Data
shop_a = [100, 120, 110, 115, 105]
shop_b = [80, 85, 90, 95, 85]
shop_c = [130, 140, 125, 135, 145]

# Perform One-Way ANOVA
f_stat, p_value = stats.f_oneway(shop_a, shop_b, shop_c)

# Results
print(f"F-Statistic: {f_stat:.2f}")
print(f"P-Value: {p_value:.4f}")

# Interpretation
if p_value < 0.05:
    print("The groups are significantly different. 🎉")
else:
    print("The groups are not significantly different. 🤔")
```



### 🥳 **Key Takeaway from the Donut Story**  
- If $ p $-value < 0.05, **Yes, the shops have different average sales!**
- If $ p $-value ≥ 0.05, **No, the shops perform similarly.**

---