# Statistics Advance Part 2

#     Theory Section

**Q 1.   What is a random variable in probability theory?**

**Ans.** A **random variable** is a function that assigns a **numerical value** to each possible outcome in a **probability experiment**.

### 📌 Key Points:

- It **quantifies outcomes** of a random process (e.g., coin flips, dice rolls).
- It is called "random" because the value it takes depends on **chance**.
- It allows us to use **mathematics and statistics** to describe uncertainty.

---

### 🔄 Types of Random Variables:

1. **Discrete Random Variable**  
   - Takes **countable** values  
   - Example: Number of heads in 5 coin tosses → {0, 1, 2, 3, 4, 5}

2. **Continuous Random Variable)**  
   - Takes **uncountably infinite** values in a range  
   - Example: Temperature in a city → could be 26.1°C, 26.12°C, etc.

---

### 🧮 Example:

- Let $( X $) be the number shown when you roll a die  
- Possible values: $( X \in \{1, 2, 3, 4, 5, 6\} $)  
- $( X $) is a **discrete random variable**
--- 

**Q 2.  What is the null hypothesis, and how does it differ from the alternative hypothesis?**

**Ans.**

In **statistical hypothesis testing**, we test two opposing ideas:

### 📌 **Null Hypothesis (H₀)**

- The **default assumption** that there is **no effect**, **no difference**, or **no relationship**.
- It represents the idea of "no change" or "status quo".

**Example:**  
> H₀: The new drug has no effect on blood pressure.

---

### 📌 **Alternative Hypothesis (H₁ or Ha)**

- The **claim** we want to test or support.
- It suggests there **is** an effect, difference, or relationship.

**Example:**  
> H₁: The new drug **reduces** blood pressure.

---

### 🔄 Key Differences

| Feature                | Null Hypothesis (H₀)            | Alternative Hypothesis (H₁)         |
|------------------------|----------------------------------|--------------------------------------|
| Assumes               | No effect / No difference       | There is an effect / difference      |
| Purpose               | To be tested and possibly rejected | To be supported if H₀ is rejected |
| Outcome               | Retained unless evidence disproves it | Accepted when H₀ is rejected    |
| Symbol                | H₀                              | H₁ or Ha                             |

---

**Q 3.  What is the significance level in hypothesis testing, and why is it important?**

**Ans.** The **significance level**, denoted by **α (alpha)**, is a **threshold** used in hypothesis testing to determine whether to **reject the null hypothesis (H₀)**.

---

### 📌 Definition:

- The **significance level** is the **maximum probability of making a Type I error** — rejecting a true null hypothesis.

-  Common values are:  
  - $ \alpha = 0.05 $ (5%)  
  - $ \alpha = 0.01 $ (1%)  
  - $ \alpha = 0.10 $ (10%)  

---

### 🔍 Why Is It Important?

1. **Controls False Positives**:  
   Limits the chance of **incorrectly rejecting H₀** when it is actually true.

2. **Sets the Decision Boundary**:  
   It's the cutoff for the **p-value** comparison:  
   - If **p-value ≤ α**, reject H₀  
   - If **p-value > α**, fail to reject H₀

3. **Balances Risk and Sensitivity**:  
   - A **lower α** reduces false positives but increases false negatives (Type II error risk).  
   - A **higher α** makes it easier to detect effects but increases false positives.

---

**Q 4.  What does a P-value represent in hypothesis testing?**

**Ans.** A **P-value** is the **probability** of observing your data, or something more extreme, **assuming that the null hypothesis (H₀) is true**.

**📌 *Interpretation*:**

- A **small P-value** (typically ≤ 0.05) → **strong evidence against H₀** → **reject H₀**  
- A **large P-value** (> 0.05) → **weak evidence against H₀** → **fail to reject H₀**
---

**Q 5.  How do you interpret the P-value in hypothesis testing?** 

**Ans.** The **P-value** tells you the likelihood of obtaining your observed results (or more extreme) **assuming the null hypothesis is true**.

### 📌 Interpretation Guide:

- **P-value ≤ α** → **Reject the null hypothesis (H₀)**  
  → The result is **statistically significant**

- **P-value > α** → **Fail to reject the null hypothesis**  
  → Not enough evidence to claim a significant effect

Where  $ ( \alpha )$ is the **significance level**, commonly 0.05.


**Q 6. What are Type 1 and Type 2 errors in hypothesis testing?**

**Ans.** In hypothesis testing, errors can occur when making a decision about the null hypothesis $( H_0 )$.

### **Type I Error (False Positive)**

- **Definition**: Rejecting the null hypothesis $( H_0 $) when it is actually **true**.
- **Symbol**: $ ( \alpha $) (alpha), also known as the **significance level**.
- **Example**: Concluding a medicine is effective when it actually isn't.

> 🧠 Think of it as: **"Crying wolf" when there is no wolf.**

---

### **Type II Error (False Negative)**

- **Definition**: Failing to reject the null hypothesis $( H_0 $) when it is actually **false**.
- **Symbol**: $( \beta $) (beta).
- **Example**: Concluding a medicine is ineffective when it actually works.

> 🧠 Think of it as: **"Missing the wolf" when the wolf is really there.**

---

### **Summary Table**

|                          | **Null is True $( H_0 $)** | **Null is False $( H_0 $)** |
|--------------------------|------------------------------|-------------------------------|
| **Reject $( H_0 )$**     | Type I Error $( \alpha )$  | ✅ Correct Decision            |
| **Fail to Reject $( H_0 )$** | ✅ Correct Decision         | Type II Error $( \beta )$   |

---

**Q 7.  What is the difference between a one-tailed and a two-tailed test in hypothesis testing?**

**Ans.** In hypothesis testing, the **type of test** (one-tailed or two-tailed) depends on the research question and how the alternative hypothesis is defined.

---

### ✅ One-Tailed Test

- **Definition**: Tests whether a population parameter is **greater than** or **less than** a specified value.
- **Direction**: Only **one direction** (either left or right tail) is considered.
- **Hypotheses**:
  - Null hypothesis: $H_0: \mu = \mu_0$
  - Alternative hypothesis (right-tailed): $H_1: \mu > \mu_0$
  - Alternative hypothesis (left-tailed): $H_1: \mu < \mu_0$
- **When to Use**: If you are only interested in detecting an **increase or decrease**, not both.

> 📌 Example: Testing if a new drug **increases** recovery rate compared to the standard.

---

### ✅ Two-Tailed Test

- **Definition**: Tests whether a population parameter is **different** (either higher or lower) than a specified value.
- **Direction**: Considers **both directions** (both tails).
- **Hypotheses**:
  - Null hypothesis: $H_0: \mu = \mu_0$
  - Alternative hypothesis: $H_1: \mu \ne \mu_0$
- **When to Use**: If you are testing for **any significant difference**, not just in one direction.

> 📌 Example: Testing if a new drug has **any effect** (better or worse) on recovery rate.

---

### 📊 Summary Table

| Test Type     | Direction Tested         | Alternative Hypothesis         | Critical Region Location         |
|---------------|--------------------------|---------------------------------|----------------------------------|
| **One-Tailed**| One side (left or right) | $H_1: \mu > \mu_0$ or $H_1: \mu < \mu_0$ | Left or right tail only          |
| **Two-Tailed**| Both sides               | $H_1: \mu \ne \mu_0$             | Both tails (split $\alpha/2$)    |

---

### 🎯 Key Note:
- A **one-tailed test** has more power to detect an effect in one direction.
- A **two-tailed test** is more **conservative** and used when the direction of effect is not specified in advance.
---

**Q 8.  What is the Z-test, and when is it used in hypothesis testing?**

**Ans.** A **Z-test** is a type of statistical hypothesis test used to determine whether there is a significant difference between:

- A sample mean and a population mean, or  
- Two sample means, or  
- Sample and population proportions,

**when the population standard deviation is known** and/or the **sample size is large**.

---

### ✅ When to Use a Z-test:

Use a Z-test when the following conditions are met:

1. The sample size is large: $n \ge 30$
2. The population standard deviation $\sigma$ is known
3. The data is approximately normally distributed (or $n$ is large enough for normal approximation)
4. The data is quantitative and observations are independent

---

### 🔍 Common Types of Z-tests

| **Z-test Type**             | **Use Case**                                                  |
|----------------------------|---------------------------------------------------------------|
| One-sample Z-test          | Compare a sample mean to a known population mean              |
| Two-sample Z-test          | Compare the means of two independent samples                  |
| Z-test for proportions     | Compare a sample proportion to a known population proportion  |

---

### 🧠 Example:

You want to test if the average height of students in a college differs from the national average of 170 cm, assuming $\sigma$ is known.

- Null hypothesis: $H_0: \mu = 170$
- Alternative hypothesis: $H_1: \mu \ne 170$

Since $\sigma$ is known and $n \ge 30$, you would use a **two-tailed Z-test**.

---

### 🔑 Key Formula (One-Sample Z-test):

$$
Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}
$$

Where:

- $\bar{x}$ = sample mean  
- $\mu$ = population mean  
- $\sigma$ = population standard deviation  
- $n$ = sample size

---

**Q 9.  How do you calculate the Z-score, and what does it represent in hypothesis testing?**

**Ans.** 
### ✅ Z-score Formula (for Sample Mean):

$$
Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}
$$

Where:

- $\bar{x}$ = sample mean  
- $\mu$ = population mean (under the null hypothesis $H_0$)  
- $\sigma$ = population standard deviation  
- $n$ = sample size  

---

### 📌 What Does the Z-score Represent?

- The **Z-score** indicates how many **standard errors** the sample mean $\bar{x}$ is from the population mean $\mu$ under $H_0$.
- It converts your test statistic into the **standard normal distribution** (mean $= 0$, standard deviation $= 1$).

---

### ✅ Interpretation in Hypothesis Testing:

- Compare the calculated $Z$ value to a **critical value** from the standard normal distribution (e.g., $Z_{\text{critical}} = \pm 1.96$ for $\alpha = 0.05$ in a two-tailed test):

  - If $|Z| > Z_{\text{critical}}$: **Reject** the null hypothesis $H_0$.
  - If $|Z| \le Z_{\text{critical}}$: **Fail to reject** the null hypothesis $H_0$.

> 📌 A **larger** absolute Z-score suggests a more **statistically significant** result.

---

**Q 10.  What is the T-distribution, and when should it be used instead of the normal distribution?**

**Ans.** The **T-distribution** (also called **Student's t-distribution**) is a probability distribution used in statistical inference when:

- The **sample size** is **small** ($n < 30$), and  
- The **population standard deviation** ($\sigma$) is **unknown**.

It is similar in shape to the **normal distribution** but has **heavier tails**, which helps account for more variability in small samples.

---

### ✅ When to Use the T-distribution:

Use the **T-distribution** instead of the normal distribution when:

1. The sample size is small ($n < 30$)
2. The population standard deviation $\sigma$ is unknown
3. The data is approximately **normally distributed**
4. You are estimating a population mean using the **sample standard deviation** ($s$)

---

### 🔍 T-statistic Formula:

$$
t = \frac{\bar{x} - \mu}{s / \sqrt{n}}
$$

Where:

- $\bar{x}$ = sample mean  
- $\mu$ = population mean under the null hypothesis  
- $s$ = sample standard deviation  
- $n$ = sample size  

---

### 📊 Key Differences: T-distribution vs Normal Distribution

| Feature                    | T-distribution                            | Normal Distribution                    |
|----------------------------|--------------------------------------------|----------------------------------------|
| Shape                      | Bell-shaped, **heavier tails**             | Bell-shaped                            |
| Use Case                   | $\sigma$ unknown, small $n$                | $\sigma$ known, large $n$              |
| Variability                | Higher (due to sample estimate)            | Lower                                  |
| As $n \to \infty$          | Approaches normal distribution             | —                                      |

---

**Q 11.  What is the difference between a Z-test and a T-test?**

**Ans.** Both **Z-tests** and **T-tests** are used in hypothesis testing to determine if there is a statistically significant difference between means. The choice depends on sample size and whether the population standard deviation is known.

---

### 📊 Comparison Table

| Feature                       | **Z-test**                                         | **T-test**                                         |
|-------------------------------|----------------------------------------------------|----------------------------------------------------|
| Population standard deviation | Known ($\sigma$ is known)                          | Unknown (use sample standard deviation $s$)        |
| Sample size                   | Large ($n \ge 30$)                                 | Small ($n < 30$)                                   |
| Distribution used             | Normal distribution                                | T-distribution                                     |
| Formula                       | $Z = \dfrac{\bar{x} - \mu}{\sigma / \sqrt{n}}$     | $t = \dfrac{\bar{x} - \mu}{s / \sqrt{n}}$          |
| Distribution shape            | Bell-shaped, fixed                                 | Bell-shaped, heavier tails                         |
| Application example           | Comparing to known population mean with large $n$  | Small sample mean comparison with unknown $\sigma$ |

---

### ✅ When to Use

- **Use a Z-test when:**
  - The population standard deviation $\sigma$ is **known**
  - The sample size is **large** ($n \ge 30$)

- **Use a T-test when:**
  - The population standard deviation $\sigma$ is **unknown**
  - The sample size is **small** ($n < 30$)

---

**Q 12.  What is the T-test, and how is it used in hypothesis testing?**

**Ans.** The **T-test** is a statistical test used to compare **means** and determine if the difference is **statistically significant**.

It is commonly used when:

- The sample size is **small** ($n < 30$)
- The population standard deviation ($\sigma$) is **unknown**
- The data is approximately **normally distributed**

The T-test is based on the **T-distribution**, which has **heavier tails** than the normal distribution to account for extra uncertainty.

---

### ✅ Types of T-tests

| **T-test Type**            | **Purpose**                                                             |
|----------------------------|--------------------------------------------------------------------------|
| One-sample T-test          | Compares a sample mean to a known or hypothesized population mean       |
| Two-sample T-test          | Compares means from **two independent groups**                          |
| Paired sample T-test       | Compares means from the **same group** at two different times/conditions|

---

### 🔍 T-test Formula (One-sample):

$$
t = \frac{\bar{x} - \mu}{s / \sqrt{n}}
$$

Where:

- $\bar{x}$ = sample mean  
- $\mu$ = population mean (under $H_0$)  
- $s$ = sample standard deviation  
- $n$ = sample size  

---

### ✅ How It’s Used in Hypothesis Testing:

1. **State the hypotheses**:
   - Null hypothesis: $H_0: \mu = \mu_0$
   - Alternative hypothesis: $H_1: \mu \ne \mu_0$, $H_1: \mu > \mu_0$, or $H_1: \mu < \mu_0$

2. **Compute** the t-statistic using the formula above.

3. **Calculate degrees of freedom**: $df = n - 1$

4. **Find the critical t-value** from the T-distribution table or compute the **p-value**.

5. **Make a decision**:
   - If $|t| > t_{\text{critical}}$, **reject** $H_0$
   - If $|t| \le t_{\text{critical}}$, **fail to reject** $H_0$

---

**Q 13. What is the relationship between Z-test and T-test in hypothesis testing?**

**Ans.** Both the **Z-test** and the **T-test** are used to test hypotheses about population means. They are closely related but differ in terms of assumptions and when they should be applied.

---

### 📌 Key Relationship

| **Aspect**                   | **Z-test**                                             | **T-test**                                              |
|-----------------------------|--------------------------------------------------------|----------------------------------------------------------|
| Population Std. Deviation   | Known ($\sigma$ is known)                              | Unknown (use sample std. dev. $s$)                       |
| Sample Size                 | Large ($n \ge 30$)                                     | Small ($n < 30$)                                         |
| Distribution Used           | Standard normal distribution                           | T-distribution                                           |
| Formula                     | $Z = \dfrac{\bar{x} - \mu}{\sigma / \sqrt{n}}$         | $t = \dfrac{\bar{x} - \mu}{s / \sqrt{n}}$                |
| Shape of Distribution       | Fixed bell curve                                       | Bell curve with heavier tails                           |
| Convergence                 | —                                                      | Becomes normal as $n \to \infty$                         |

---

### 🧠 Theoretical Relationship:

- The **T-test** is a **generalization** of the **Z-test**.
- As the sample size increases ($n \to \infty$), the **T-distribution** becomes almost identical to the **standard normal distribution**.
- This means for large $n$, the **T-test and Z-test produce nearly identical results**.

---