# Inferential Statistics: Comprehensive Guide

Inferential statistics allows analysts to make generalizations about a population based on sample data. Below are the key concepts you need to know, explained in detail with equations.

---

## 1. Probability Basics
Probability quantifies the likelihood of an event occurring. 

### Key Concepts:
- **Rule of Total Probability**:  
  $P(A) = \sum_{i} P(A \cap B_i)$
  
- **Conditional Probability**:  
  $P(A|B) = \frac{P(A \cap B)}{P(B)}$
  
- **Bayes' Theorem**:  
  $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$

### Common Distributions:
- **Normal Distribution**:  
  $f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$
  
- **Binomial Distribution**:  
  $P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$
  
- **Poisson Distribution**:  
  $P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$

---

## 2. Sampling
Sampling is the process of selecting a subset of data from a population.

### Key Concepts:
- **Sample Mean**:  
  $\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i$
  
- **Sample Variance**:  
  $s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2$

---

## 3. Confidence Intervals
Confidence intervals represent the range within which a population parameter lies with a certain level of confidence.

### Equation for a Confidence Interval:
- For the mean ($\mu$):  
  $CI = \bar{x} \pm z \frac{\sigma}{\sqrt{n}}$
  
  Where:
  - $\bar{x}$: Sample mean
  - $z$: z-score for the desired confidence level (e.g., 1.96 for 95%)
  - $\sigma$: Population standard deviation
  - $n$: Sample size

---

## 4. Hypothesis Testing
Hypothesis testing helps determine whether a claim about a population parameter is true.

### Steps:
1. Formulate the **Null Hypothesis ($H_0$)** and **Alternative Hypothesis ($H_a$)**.
2. Select a significance level ($\alpha$).
3. Compute the test statistic:
   $z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$
   
   For t-tests (when $\sigma$ is unknown):  
   $t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$
4. Compare the test statistic to critical values or calculate the p-value.

---

## 5. Correlation vs Causation
Correlation measures the relationship between two variables, but it doesn't imply causation.

### Pearson Correlation Coefficient:
$r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}$

---

## 6. Central Limit Theorem
The CLT states that the sampling distribution of the mean approaches a normal distribution as the sample size increases, regardless of the population's distribution.

### Implications:
- Sampling distribution mean:  
  $\mu_{\bar{x}} = \mu$
  
- Sampling distribution standard deviation:  
  $\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$

---

## 7. Regression Analysis
Regression predicts the value of a dependent variable based on one or more independent variables.

### Simple Linear Regression:
$y = \beta_0 + \beta_1 x + \epsilon$

Where:
- $y$: Dependent variable
- $x$: Independent variable
- $\beta_0$: Intercept
- $\beta_1$: Slope
- $\epsilon$: Error term
