# **Statistics Advance Part 1**

---

## 1. What is a random variable in probability theory?

A **random variable** is a function that assigns a real number to each outcome in a sample space. It represents numerical outcomes of random phenomena.

There are two main types:
- **Discrete random variable**: Takes countable values (e.g., 0, 1, 2, ...)
- **Continuous random variable**: Takes any value in a continuum (e.g., any real number between 0 and 1)

---

## 2. What are the types of random variables?

1. **Discrete Random Variable**: Values are countable and finite/infinite. Example: Number of heads in 3 coin tosses.
2. **Continuous Random Variable**: Takes any value within a range. Example: Height of people.

---

## 3. What is the difference between discrete and continuous distributions?

| Feature         | Discrete Distribution      | Continuous Distribution         |
|----------------|----------------------------|----------------------------------|
| Values          | Countable (finite or infinite) | Uncountable, infinite values   |
| Example         | Number of students          | Temperature                     |
| Probability     | $ P(X = x) $              | $ P(a < X < b) = \int_a^b f(x) dx $ |

---

## 4. What are probability distribution functions (PDF)?

For a **continuous random variable**, the **Probability Density Function (PDF)** is a function \( f(x) \) such that:

- $ f(x) \geq 0 $
- $ \int_{-\infty}^{\infty} f(x) dx = 1 $

The probability that \( X \) lies between \( a \) and \( b \) is:

$$
P(a \leq X \leq b) = \int_a^b f(x) \, dx
$$

---

## 5. How do cumulative distribution functions (CDF) differ from PDFs?

The **Cumulative Distribution Function (CDF)** gives the probability that a variable takes a value less than or equal to $ x $:

$$
F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t) \, dt
$$

So, PDF is the **derivative** of the CDF:

$$
f(x) = \frac{d}{dx}F(x)
$$

---

## 6. What is a discrete uniform distribution?

A **discrete uniform distribution** is one in which all outcomes are equally likely. If there are \( n \) outcomes:

$$
P(X = x_i) = \frac{1}{n}, \quad i = 1, 2, ..., n
$$

Example: Rolling a fair die.

---

## 7. What are the key properties of a Bernoulli distribution?

- Two outcomes: Success (1) and Failure (0)
- One trial
- Probability mass function:

$$
P(X = x) = p^x (1 - p)^{1 - x}, \quad x \in \{0, 1\}
$$

- Mean: $ \mu = p $
- Variance: $ \sigma^2 = p(1 - p) $

---

## 8. What is the binomial distribution, and how is it used in probability?

Used for counting the number of successes in \( n \) independent Bernoulli trials.

$$
P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}
$$

- Mean: $ \mu = np $
- Variance: $ \sigma^2 = np(1 - p) $

---

## 9. What is the Poisson distribution and where is it applied?

Used to model the number of occurrences of an event in a fixed interval.

$$
P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, ...
$$

- $ \lambda $ = average rate of occurrence
- Mean and Variance: $ \lambda $

Applications: Call centers, website traffic, accidents, etc.

---

## 10. What is a continuous uniform distribution?

All values within an interval $[a, b]$ are equally likely.

PDF:

$$
f(x) = \begin{cases}
\frac{1}{b - a} & \text{if } a \leq x \leq b \\
0 & \text{otherwise}
\end{cases}
$$

---

## 11. What are the characteristics of a normal distribution?

- Bell-shaped and symmetric
- Defined by mean \( \mu \) and standard deviation \( \sigma \)
- PDF:

$$
f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{ -\frac{(x - \mu)^2}{2\sigma^2} }
$$

---

## 12. What is the standard normal distribution, and why is it important?

A normal distribution with:

- Mean $\mu$
- Standard deviation $\sigma$

It allows use of Z-tables for probabilities:

$$
Z \sim N(0,1),
$$

---

## 13. What is the Central Limit Theorem (CLT), and why is it critical in statistics?

The CLT states:

> Given a sufficiently large sample size \( n \), the sampling distribution of the sample mean will approximate a normal distribution, regardless of the original distribution.

This is foundational in hypothesis testing and confidence intervals.

---

## 14. How does the Central Limit Theorem relate to the normal distribution?

- The **sampling distribution** of the sample mean becomes **approximately normal** as $n \to \infty$.
- Helps justify using normal distribution for inference even when the data is not normally distributed.

---

## 15. What is the application of Z statistics in hypothesis testing?

- Used when population variance is known
- Standardize values to compare to standard normal distribution
- Used to test hypotheses and calculate **p-values**

---

## 16. How do you calculate a Z-score, and what does it represent?

Z-score formula:

$$
t(Z_t) = μ + Z_t \cdot σ
$$

It tells how many standard deviations a data point is from the mean.

---

## 17. What are point estimates and interval estimates in statistics?

- **Point Estimate**: Single value estimate of a population parameter (e.g., sample mean $\bar{x}$)
- **Interval Estimate**: Range of values with a confidence level (e.g., 95% confidence interval)

---

## 18. What is the significance of confidence intervals in statistical analysis?

- Indicates the range in which the population parameter is expected to fall
- Common levels: 90%, 95%, 99%
- Wider intervals → more uncertainty, higher confidence

---

## 19. What is the relationship between a Z-score and a confidence interval?

Confidence intervals use Z-scores to determine the margin of error:

$$
\text{CI} = \bar{x} \pm Z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}
$$

Where $Z_{\alpha/2}$ is the Z-value for the desired confidence level.

---

## 20. How are Z-scores used to compare different distributions?

Z-scores normalize data, allowing comparison across different distributions:

- Scores with different means and variances can be compared on the same scale.

---

## 21. What are the assumptions for applying the Central Limit Theorem?

1. Sample size $ n $ should be large (commonly $ n \geq 30 $)
2. Samples must be independent
3. Population must have finite variance

---

## 22. What is the concept of expected value in a probability distribution?

Expected value (mean) of a random variable represents the **long-run average**.

For discrete:

$$
E[X] = \sum x_i P(x_i)
$$

For continuous:

$$
E[X] = \int_{-\infty}^{\infty} x f(x) \, dx
$$

---

## 23. How does a probability distribution relate to the expected outcome of a random variable?

The **probability distribution** defines all possible outcomes and their probabilities, which are used to compute the **expected value** (average outcome over the long run).

---
