# Probability Distributions in Data Science

Probability distributions describe how values of a random variable are distributed. Understanding distributions is essential for statistical modeling, hypothesis testing, and machine learning.

This notebook will cover the essentials of the most common distributions possible.

## Index

## 1. Discrete Distributions

Discrete distributions describe variables that take on a finite or countable number of values.

### 1.1 Bernoulli Distribution
A Bernoulli distribution models a single experiment with two possible outcomes: success (1) or failure (0).

**Applications:**
- Binary classification problems.
- Coin flips.
- Success/failure experiments.

### 1.2 Binomial Distribution
Models the number of successes in `n` independent Bernoulli trials.

**Applications:**
- A/B testing in marketing.
- Number of defective items in a batch.
- Polling statistics.

### 1.3 Poisson Distribution
Models the number of events occurring in a fixed interval of time or space.

**Applications:**
- Call center queue modeling.
- Website traffic predictions.
- Number of earthquakes in a year.

## 2. Continuous Distributions
Continuous distributions describe variables that can take on an infinite number of values within a range.

### 2.1 Uniform Distribution
All values in an interval have equal probability.

**Applications:**
- Random number generation.
- Monte Carlo simulations.


### 2.2 Normal (Gaussian) Distribution
The most common distribution, known as the bell curve.

**Applications:**
- Heights and weights of individuals.
- Stock price fluctuations.
- Measurement errors in experiments.


### 2.3 Exponential Distribution
Models the time between events in a Poisson process.

**Applications:**
- Time until failure of machinery.
- Customer arrival times in service systems.

## 3. Other Notable Distributions

### 3.1 Gamma Distribution
The Gamma distribution is a generalization of the Exponential distribution, often used for waiting times when multiple independent events occur.

**Properties:**

- PDF: $f(x) = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{(k-1)!}, \quad x > 0$
- Mean: $\mu = k/\lambda$
- Variance: $\sigma^2 = k/\lambda^2$

**Applications:**
- Modeling insurance claims.
- Reliability analysis of systems.
- Queuing theory.

### 3.2 Beta Distribution
The Beta distribution is a flexible distribution used in Bayesian statistics and probability modeling.

**Properties**
- PDF: $f(x) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad 0 \leq x \leq 1$
- Mean: $\mu = \frac{\alpha}{\alpha + \beta}$
- Variance: $\sigma^2 = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$

**Applications:**
- Bayesian statistics.
- Proportion and probability estimations.
- Machine learning hyperparameter tuning.


### 3.3 Cauchy Distribution
The Cauchy distribution is a heavy-tailed distribution with undefined mean and variance.

**Properties:**
- PDF: $f(x) = \frac{1}{\pi \gamma \left[ 1 + \left( \frac{x - x_0}{\gamma} \right)^2 \right] }, \quad -\infty < x < \infty$
- No defined mean or variance


**Applications:**
- Outlier detection.
- Physics (resonance behavior).
- Financial modeling with extreme values.

### 3.4 Chi-Square Distribution
The Chi-Square distribution is used extensively in hypothesis testing and inferential statistics.

**Properties**
- PDF: $f(x) = \frac{x^{(k/2)-1} e^{-x/2}}{2^{k/2} \Gamma(k/2)}, \quad x > 0$
- Mean: $\mu = k$
- Variance: $\sigma^2 = 2k$

**Applications:**
- Goodness-of-fit tests.
- Estimation of population variance.
- Independence tests in categorical data.

## 4. Where to Find Each Type of Distribution

| Distribution Type  | Common Applications |
|--------------------|----------------------------------|
| **Bernoulli**      | Coin flips, success/failure trials |
| **Binomial**       | A/B testing, quality control |
| **Poisson**        | Event counts over time, call arrivals |
| **Uniform**        | Random number generation, lotteries |
| **Normal**         | Heights, stock prices, exam scores |
| **Exponential**    | Machine failure times, service queues |
| **Gamma**          | Insurance claims, waiting times |
| **Beta**           | Bayesian statistics, probability estimations |
| **Cauchy**         | Outlier modeling, physical resonances |
| **Chi-Square**     | Hypothesis testing, independence tests |
