# Statistics Advance 1 - Probability Distributions Theory 

### **1. What is a random variable in probability theory?**

A **random variable** is a variable whose possible values are numerical outcomes of a random phenomenon.
- It maps outcomes of a random experiment to numerical values.
- Random variables can be either *discrete* (having countable outcomes) or *continuous* (uncountable outcomes like real numbers).

### **2. What are the types of random variables?**

There are two primary types of random variables:
1. **Discrete Random Variable** — takes countable values (e.g., number of heads in coin tosses).
2. **Continuous Random Variable** — takes values from an uncountably infinite set (e.g., time, temperature).

### **3. What is the difference between discrete and continuous distributions?**

- **Discrete distributions** describe the probability of outcomes of a *discrete* random variable.
  - Examples: Bernoulli, Binomial, Poisson
- **Continuous distributions** describe the behavior of *continuous* random variables.
  - Examples: Normal, Exponential, Uniform (continuous)

### **4. What are probability distribution functions (PDF)?**

A **Probability Distribution Function (PDF)** describes the likelihood of a random variable taking on a specific value (for discrete) or range of values (for continuous):
- For **discrete variables**, it's called a **probability mass function (PMF)**.
- For **continuous variables**, PDF gives density rather than exact probability.

### **5. How do cumulative distribution functions (CDF) differ from probability distribution functions (PDF)?**

- A **CDF** gives the probability that a random variable $X$ is less than or equal to a particular value $x$:  
  $$F(x) = P(X \le x)$$
- A **PDF** describes the relative likelihood for the variable to take on a specific value (only for continuous variables).
- The **CDF** is obtained by integrating the PDF (in continuous case) or summing the PMF (in discrete case).

### **6. What is a discrete uniform distribution?**

A **discrete uniform distribution** assigns equal probability to all outcomes in a finite set.
- Example: Rolling a fair die,  
  $$P(X = x) = 1/6 \quad \quad \text{for} \quad x = 1,2,...,6$$
- Each value has the same probability:  

$$ P(X = x) = \frac{1}{n} \quad \quad	\text{for} \quad x {\in } {x_1, x_2, ..., x_n} $$

### **7. What are the key properties of a Bernoulli distribution?**

- A **Bernoulli distribution** models a binary outcome: success (1) or failure (0).
- Probability mass function:
  $$P(X = x) = p^x (1 - p)^{1 - x}, \quad x \in \{0, 1\}$$
- Expected value: $E[X] = p$
- Variance: $Var(X) = p(1 - p)$

### **8. What is the binomial distribution, and how is it used in probability?**

- The **binomial distribution** represents the number of successes in $n$ independent Bernoulli trials with probability $p$.
- PMF:  
  $$P(X = k) = {n \choose k} p^k (1 - p)^{n - k}$$
- Applications: Coin tosses, quality control, yes/no surveys

### **9. What is the Poisson distribution and where is it applied?**

- The **Poisson distribution** models the number of times an event occurs in a fixed interval of time or space.
- PMF:  
  $$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$$
- Mean and variance: $\lambda$
- Applications: arrival of calls, radioactive decay, typing errors

### **10. What is a continuous uniform distribution?**

A **continuous uniform distribution** has constant probability over an interval $[a, b]$.
- PDF:  
  $$f(x) = \frac{1}{b - a}, \quad a \le x \le b$$
- CDF:  
  $$F(x) = \frac{x - a}{b - a}, \quad a \le x \le b$$
- Mean: $\mu = \frac{a + b}{2}$
- Variance: $\sigma^2 = \frac{(b - a)^2}{12}$

### **11. What are the characteristics of a normal distribution?**

- Bell-shaped and symmetric about the mean
- Defined by mean $mu$ and standard deviation $sigma$
- PDF:  
  $$f(x) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}$$
- Area under the curve is 1
- Used in natural and social sciences for real-valued random variables

### **12. What is the standard normal distribution, and why is it important?**

- A **standard normal distribution** is a normal distribution with:
  - Mean: $\mu = 0$
  - Standard deviation: $\sigma = 1$
- Denoted by $Z \sim N(0, 1)$
- Used for computing z-scores and statistical inference (hypothesis testing, confidence intervals)
- Simplifies calculations and helps standardize different datasets

### **13. What is the Central Limit Theorem (CLT), and why is it critical in statistics?**

The **Central Limit Theorem (CLT)** states that *the distribution of sample means approximates to a normal distribution as the sample size increases, regardless of the population's original distribution*.

- It allows us to make inferences about population parameters using sample statistics.

- The CLT justifies the use of normal distribution in hypothesis testing and confidence intervals.

### **14. How does the Central Limit Theorem relate to the normal distribution?**

- According to CLT, the sampling distribution of the mean becomes **approximately normal** when the sample size is large enough (typically $n >= 30$).

- Even if the population distribution is not normal, the **distribution of sample means** will be normal.

- This enables the application of **normal-based inference techniques** like z-tests.

### **15. What is the application of Z statistic in hypothesis testing?**

- **Z-statistic** are used when the population standard deviation is known, and the sampling distribution is normal.

- Z-tests are used to determine whether to **reject the null hypothesis**.

- Expression of Z-Stat:
  $$Z = \frac {\sqrt{n} (\bar{X} - \mu)}{\sigma}$$

### **16. How do you calculate a Z-score, and what does it represent?**

- A **Z-score** measures how many standard deviations an individual data point is away from the mean.

- Formula:
  $$Z = \frac{X - \mu}{\sigma}$$

- Interpretation:
  - $Z = 0$: Value is at the mean
  - $Z > 0$: Value above the mean
  - $Z < 0$: Value below the mean

### **17. What are point estimates and interval estimates in statistics?**

- A **point estimate** is a single value estimate of a population parameter (e.g., sample mean).

- An **interval estimate** provides a range within which the parameter is expected to lie (e.g., confidence interval).

### **18. What is the significance of confidence intervals in statistical analysis?**

- A **confidence interval (CI)** gives a range of plausible values for a population parameter.

- It reflects the **uncertainty** due to sampling error.

- A 95% CI means that in 95% of the samples, the interval would contain the true parameter.

### **19. What is the relationship between a Z-score and a confidence interval?**

- The **Z-score** determines the width of the confidence interval when the population standard deviation is known.

- For example, a 95% CI uses $Z = 1.96$:
  $$CI = \bar{x} \pm {Z}_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$$

### **20. How are Z-scores used to compare different distributions?**

- **Z-scores** standardize different distributions to a common scale.

- Allow comparison of values from distributions with different means and standard deviations.

- Example: Compare test scores from different exams with different scales.

### **21. What are the assumptions for applying the Central Limit Theorem?**

- Random and independent sampling

- Identically distributed population

- Sample size should be sufficiently large ($n \geq 30$ for most cases)

- No extreme skewness or outliers if $n$ is small

### **22. What is the concept of expected value in a probability distribution?**

- The **expected value** is the long-term average or mean value of a random variable.

- For discrete $X$:
  $$E[X] = \sum x_i P(x_i)$$

- For continuous $X$:
  $$E[X] = \int_{-\infty}^{\infty} x f(x) dx$$

### **23. How does a probability distribution relate to the expected outcome of a random variable?**

- A **probability distribution** defines the likelihood of all possible values a random variable can take.

- The **expected value** summarizes the center of that distribution, representing the average outcome if the experiment is repeated many times.