# Statistics Advance Part 1

---

## 1. What is a random variable in probability theory?

A **random variable** is a variable whose value is determined by the outcome of a random experiment. It maps outcomes of a sample space to real numbers.

**Example**:  
In tossing a coin:  
Let \( X \) be a random variable:  
- \( X = 1 \) if heads  
- \( X = 0 \) if tails

---

## 2. What are the types of random variables?

There are two main types:

1. **Discrete Random Variable** – Takes countable values (finite or countably infinite).  
   - Example: Number of goals in a football match
2. **Continuous Random Variable** – Takes an infinite number of values within a range.  
   - Example: Temperature in a day

---

## 3. What is the difference between discrete and continuous distributions?

| Feature | Discrete Distribution | Continuous Distribution |
|--------|------------------------|--------------------------|
| Type of variable | Discrete | Continuous |
| Values taken | Countable (0, 1, 2...) | Uncountable (real numbers) |
| Probability at exact value | Can be > 0 | Always 0 |
| Function used | PMF (Probability Mass Function) | PDF (Probability Density Function) |
| Examples | Binomial, Poisson | Normal, Exponential |

---

## 4. What are probability distribution functions (PDF)?

A **Probability Density Function (PDF)** describes the **relative likelihood** for a **continuous** random variable to take on a value.

- PDF is non-negative: \( f(x) \geq 0 \)
- The area under the entire curve = 1
- PDF does **not** give actual probabilities at a point but over intervals.

---

## 5. How do cumulative distribution functions (CDF) differ from probability distribution functions (PDF)?

| Feature | PDF | CDF |
|--------|-----|-----|
| Stands for | Probability Density Function | Cumulative Distribution Function |
| Type | Rate of change (density) | Accumulated probability |
| Meaning | Probability per unit | Total probability up to a point |
| Formula | \( f(x) = \frac{d}{dx}F(x) \) | \( F(x) = P(X \le x) \) |
| Graph | Bell-curve like | S-shaped increasing curve |

---

## 6. What is a discrete uniform distribution?

A **discrete uniform distribution** gives **equal probability** to all values in a finite set.

**Example**: Rolling a fair die  
\[
P(X = x) = \frac{1}{n}, \text{ where } x \in \{1, 2, ..., n\}
\]

---

## 7. What are the key properties of a Bernoulli distribution?

- Two possible outcomes: 1 (success), 0 (failure)
- Probability of success = \( p \), failure = \( 1 - p \)
- PMF:  
  \[
  P(X = x) = p^x (1 - p)^{1 - x}, \quad x = 0 \text{ or } 1
  \]
- Mean: \( \mu = p \)
- Variance: \( \sigma^2 = p(1 - p) \)

---

## 8. What is the binomial distribution, and how is it used in probability?

The **binomial distribution** models the number of successes in \( n \) independent Bernoulli trials, each with probability \( p \) of success.

- PMF:  
  \[
  P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}
  \]
- Used in quality control, voting polls, medical trials.

---

## 9. What is the Poisson distribution and where is it applied?

The **Poisson distribution** models the number of times an event occurs in a **fixed interval of time or space**, given a known constant mean rate \( \lambda \).

- PMF:  
  \[
  P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}
  \]
- Applications: Call center traffic, number of emails, accidents per day.

---

## 10. What is a continuous uniform distribution?

A distribution where all values in the interval \([a, b]\) are **equally likely**.

- PDF:  
  \[
  f(x) = \frac{1}{b - a}, \quad \text{for } a \le x \le b
  \]
- Example: Picking a random number between 0 and 1.

---

## 11. What are the characteristics of a normal distribution?

- Bell-shaped, symmetric curve
- Mean = Median = Mode
- Fully described by mean \( \mu \) and standard deviation \( \sigma \)
- Total area under curve = 1
- 68-95-99.7 Rule:  
  - 68% data within \( \mu \pm 1\sigma \)  
  - 95% within \( \mu \pm 2\sigma \)  
  - 99.7% within \( \mu \pm 3\sigma \)

---

## 12. What is the standard normal distribution, and why is it important?

- A special case of the normal distribution where:
  - Mean \( \mu = 0 \)
  - Standard deviation \( \sigma = 1 \)
- The variable is called a **Z-score**
- Used for:
  - Hypothesis testing
  - Finding probabilities
  - Confidence intervals

---

## 13. What is the Central Limit Theorem (CLT), and why is it critical in statistics?

> The **Central Limit Theorem** states that the distribution of the sample mean of a large number of independent and identically distributed random variables approaches a normal distribution, regardless of the shape of the original population.

- Enables using the normal distribution for inference
- Crucial for constructing confidence intervals and hypothesis tests

---

## 14. How does the Central Limit Theorem relate to the normal distribution?

- Even if the original data is **not normal**, the distribution of the sample means **approaches normality** as sample size increases.
- If \( n \) is large (commonly \( \geq 30 \)), then:
  \[
  \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)
  \]

---

## 15. What is the application of Z statistics in hypothesis testing?

- Z-statistics are used to test hypotheses about population means when the standard deviation is known and sample size is large.
- Z formula:
  \[
  Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}
  \]
- Applications: A/B testing, medical studies, comparing sample to known population

---

## 16. How do you calculate a Z-score, and what does it represent?

- Formula:
  \[
  Z = \frac{X - \mu}{\sigma}
  \]
- It measures **how many standard deviations** a value \( X \) is away from the mean.
- Helps standardize different distributions for comparison.

---

## 17. What are point estimates and interval estimates in statistics?

- **Point estimate**: A single number used to estimate a population parameter (e.g., sample mean).
- **Interval estimate**: A range of values (with confidence level) that is likely to contain the parameter.

---

## 18. What is the significance of confidence intervals in statistical analysis?

- A **confidence interval (CI)** expresses the degree of certainty in an estimate.
- Example: "We are 95% confident the true mean lies within this range."
- Formula (for known \( \sigma \)):
  \[
  \bar{X} \pm Z \cdot \frac{\sigma}{\sqrt{n}}
  \]

---

## 19. What is the relationship between a Z-score and a confidence interval?

- The **Z-score** determines how far to extend the confidence interval from the sample mean.
  - 95% CI → Z = 1.96  
  - 99% CI → Z = 2.576  
- Used in:
  \[
  \text{CI} = \bar{X} \pm Z \cdot \frac{\sigma}{\sqrt{n}}
  \]

---

## 20. How are Z-scores used to compare different distributions?

- Z-scores **standardize** values from different distributions, making them comparable.
- Example: Comparing test scores from different grading systems or scales.

---

## 21. What are the assumptions for applying the Central Limit Theorem?

1. Observations must be **independent**  
2. Identically distributed random variables  
3. Sample size should be **large** (usually \( n \geq 30 \))  
4. Finite variance and mean

---

## 22. What is the concept of expected value in a probability distribution?

- The **expected value** is the long-run average outcome of a random variable.
- Discrete:
  \[
  E(X) = \sum x_i \cdot P(x_i)
  \]
- Continuous:
  \[
  E(X) = \int x \cdot f(x) \, dx
  \]

---

## 23. How does a probability distribution relate to the expected outcome of a random variable?

- The **probability distribution** lists all possible values and their probabilities.
- The **expected value** is the weighted average of those values — the "center" of the distribution.

---
