### Normal Distribution

The Gaussian distribution, also known as the normal distribution, plays a fundamental role in machine learning. It is a key concept used to model the distribution of real-valued random variables and is essential for understanding various statistical methods and algorithms.

It is a continuous probability distribution function that is symmetrical at the mean, and the majority of data falls within one standard deviation of the mean. It is characterized by its bell-shaped curve.

Gaussian Distribution Curve:-

    The curve is symmetric and bell-shaped, and it mathematically represents the probability distribution of a continuous random variable. The Gaussian distribution is characterized by two parameters: the mean (μ) and the standard deviation (σ), which determine the location and the spread of the curve.

    1. Mean = Median = Mode
    2. 50% of data lie before mean and 50% of data lie after mean 
![alt text](images\Normal-Distribution-Curve.png)

1. The standard deviations are used to subdivide the area under the normal curve. Each subdivided section defines the percentage of data, which falls into the specific region of a graph.

2. Analysis : A smaller standard deviation results in a narrower and taller bell curve, indicating that data points are clustered closely around the mean. Conversely, a larger standard deviation leads to a wider and shorter bell curve, suggesting that data points are more spread out from the mean.

3. The Empirical Rule, also known as the 68-95-99.7 rule, quantifies the proportion of data falling within certain intervals around the mean in a normal distribution. It provides a quick way to estimate the spread of data without performing detailed calculations.

        Within one standard deviation of the mean (Mean ± 1 SD), approximately 68% of the data is expected to fall.

        Within two standard deviations of the mean (Mean ± 2 SD), approximately 95% of the data is expected to fall.

        Within three standard deviations of the mean (Mean ± 3 SD), approximately 99.7% of the data is expected to fall.
A Gaussian distribution table, also known as a standard normal distribution table or z-table, is a tabulated form that provides values of the cumulative distribution function (CDF) for the standard normal distribution. 

The standard normal distribution has a mean(central value) of 0 and a standard deviation of 1.

$$
f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}
$$

    x: The variable or data point.
    μ: The mean (average) of the distribution.
    σ: The standard deviation of the distribution.
    σ2: The variance.
    sqrt(2π) : Normalization factor to ensure the total area under the curve is 1.

### Z - Score :- 
The z-score is a statistical measurement that describes how far a particular data point is from the mean of a dataset, measured in terms of standard deviations. It helps standardize data, making it easier to compare values from different datasets or distributions.

$$
    z = \frac{x - \mu}{\sigma}
$$

Where:

    z: The z-score.
    x: The individual data point.
    μ: The mean of the dataset.
    σ: The standard deviation of the dataset.

Interpretation:

    z=0: The data point is exactly at the mean.
    Positive z: The data point is above the mean.
    Negative z: The data point is below the mean.
    Magnitude of z: The number indicates how many standard deviations the data point is from the mean.

Applications:

    Outlier Detection: Identifying data points that are unusually far from the mean.
    Standardization: Converting data to a standard scale (e.g., in machine learning).
    Probability Calculations: Using z-scores to find probabilities under the normal distribution.
    Comparison Across Datasets: Comparing scores from datasets with different scales.


Note:- 
    When we find z - score of each data in a column than the z-score column become standard Notmal Distribution





### Skewness
Skewness is the degree of asymmetry observed in a probability distribution. When data points on a bell curve are not distributed symmetrically to the left and right sides of the median, the bell curve is skewed.

skewness in statistics is the measure of how much the probability distribution of a random variable deviates from the normal distribution.

There are 2-types of skewness:-

    Positive Skewness
    Negative Skewness

Distributions can be positive and right-skewed, or negative and left-skewed. A normal distribution exhibits zero skewness. 

Skewness is the degree of asymmetry observed in a probability distribution.

![sk1.webp](attachment:sk1.webp)

##### Why is Skewness Important :- 
    First, linear models work on the assumption that the distribution of the independent variable and the target variable are similar. Therefore, knowing about the skewness in statistics of data helps us create better linear models.

    Secondly, let’s take a look at the below distribution. It is the distribution of horsepower of cars:

![sk2.webp](attachment:sk2.webp)

    You can clearly see that the above distribution is positively skewed. Now, let’s say you want to use this as a feature for the model that will predict the mpg (miles per gallon) of a car.

    Since our data is positively skewed, it means that more data points have low values, such as cars with less horsepower. So when we train our model on this data, it will perform better at predicting the mpg of cars with lower horsepower as compared to those with higher horsepower.

    Also, skewness tells us about the direction of outliers. You can see that our distribution is positively skewed, and most outliers appear on the right side of the distribution.

##### Right Skewed Distribution :- 
![sk8.webp](attachment:sk8.webp)
![sk9.webp](attachment:sk9.webp)
#### Negative Skewed Distribution:-
![sk11.webp](attachment:sk11.webp)
![sk12.webp](attachment:sk12.webp)

#### Skewness Formula :- 

$$
    \text{Skewness} = \frac{n}{(n-1)(n-2)} \sum_{i=1}^n \left( \frac{x_i - \bar{x}}{s} \right)^3
$$

- n: The number of observations in the dataset.
- xˉ: The mean of the dataset.
$$
    \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i
$$
- s: The sample standard deviation.
$$
s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2}
$$
- xi​: Each individual observation.

#### Practicle Range of Interpretation :-

    Skewness Value Range	    Interpretation
    -0.5 to 0.5	            Approximately symmetrical distribution
    -1 to -0.5	            Moderately negatively skewed
    -1 or less	            Highly negatively skewed
    0.5 to 1	            Moderately positively skewed
    1 or more	            Highly positively skewed

### CDF of Normal Distribution :- 
    The Cumulative Distribution Function (CDF) of a normal distribution describes the probability that a random variable XX with a normal distribution takes on a value less than or equal to xx. The formula is.

    # CDF of Normal Distribution Terms

- <h3>CDF of Normal Distribution Terms</h3>



$$
    F(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \int_{-\infty}^x e^{-\frac{(t - \mu)^2}{2\sigma^2}} \, dt
$$

<ul>
    <li><b>F(x)</b>: CDF value at <i>x</i></li>
    <li><b>&mu;</b>: Mean of the normal distribution</li>
    <li><b>&sigma;<sup>2</sup></b>: Variance of the normal distribution</li>
    <li><b>&sigma;</b>: Standard deviation (<i>&sigma; = &radic;&sigma;<sup>2</sup></i>)</li>
    <li><b>t</b>: Dummy variable of integration</li>
</ul>

![Normal_Distribution_CDF.svg.png](attachment:Normal_Distribution_CDF.svg.png)

## Kurtosis:-

## ✅ What is Kurtosis?

Kurtosis is a statistical measure that describes the **shape** of a distribution's tails in relation to its overall shape. It tells us whether the data have **heavy tails** or **light tails** compared to a normal distribution.

- **High Kurtosis (> 3)**: Heavy tails, more outliers.
- **Low Kurtosis (< 3)**: Light tails, fewer outliers.
- **Normal Kurtosis (= 3)**: Normal distribution (excess kurtosis = 0).

### 🔎 Types of Kurtosis:
1. **Mesokurtic (Kurtosis ≈ 3)**  
   ➡️ Normal distribution.
   
2. **Leptokurtic (Kurtosis > 3)**  
   ➡️ Peaked with heavy tails → more extreme outliers.
   
3. **Platykurtic (Kurtosis < 3)**  
   ➡️ Flatter peak with light tails → fewer extreme outliers.

![](images\kurtosis.webp)

---

## 🚩 How to Use Kurtosis to Find Outliers?

- **High Kurtosis** indicates that your data likely has many outliers.
- You can calculate kurtosis using libraries like `scipy.stats`.

If kurtosis is high (typically > 3 for excess kurtosis), you should:
- Investigate the data’s tails.
- Use outlier detection methods like the **IQR method** or **Z-score method** focusing on the extremes.

#### 🎯 Where Kurtosis Helps? (Real-World Uses)

Finance

    Analyzing stock returns: High kurtosis means extreme gains/losses are more likely (fat tails).

Quality Control

    Detecting when manufacturing processes produce too many extreme defective items.

Machine Learning & Data Cleaning

    During Exploratory Data Analysis (EDA), kurtosis helps check if data is prone to outliers before feeding it into models.

Risk Management

    In insurance or credit scoring, high kurtosis flags datasets with rare but high-impact risks.

# 📊 Understanding **Excess Kurtosis**

---

#### ✅ What is Excess Kurtosis?

- **Excess Kurtosis = Kurtosis - 3**
- A **normal distribution** has kurtosis = 3, so excess kurtosis shows **how much** your data differs from normal.
- It tells if your data has **heavier** or **lighter** tails compared to normal distribution.

---

##### 🔥 Formula:
$$
\text{Excess Kurtosis} = \text{Kurtosis} - 3
$$


---

##### 🎯 Types Based on Excess Kurtosis

| **Type**        | **Excess Kurtosis Value**  | **Shape**                         |
|-----------------|----------------------------|-----------------------------------|
| **Mesokurtic**  | 0                          | Normal tails (like normal dist.)  |
| **Leptokurtic** | > 0 (positive excess)      | Fat tails → more outliers          |
| **Platykurtic** | < 0 (negative excess)      | Thin tails → fewer outliers        |

---

##### ✅ Why Excess Kurtosis Matters?

- **Excess Kurtosis = 0** ➡️ Normal distribution.
- **Excess Kurtosis > 0** ➡️ More prone to outliers (heavy tails).
- **Excess Kurtosis < 0** ➡️ Less prone to outliers (light tails).

---

##### 🔥 Simple Rule to Remember

> **Excess Kurtosis = 0 ➡️ Normal. Positive ➡️ More outliers. Negative ➡️ Less outliers.**

---


## Quantile - Quantile Plot (QQ-Plot)

### What is QQ - Plot ?
QQ Plot: A Visual Tool for Comparing Distributions

A Quantile-Quantile (QQ) plot is a graphical technique used to determine if two datasets come from populations with a common distribution. It plots the quantiles of one dataset against the quantiles of 1  a second dataset (or against the theoretical quantiles of a specific distribution)

##### What is Qunatile ?
Before we go further, let's understand quantiles. A quantile is a point in a probability distribution such that a specified fraction of the random variables are less than or equal to that point.

    Quartiles: Divide the data into four equal parts (25%, 50%, 75%).
    
    Deciles: Divide the data into ten equal parts (10%, 20%, ..., 90%).

    Percentiles: Divide the data into one hundred equal parts (1%, 2%, ..., 99%). 

In a QQ plot, we're essentially comparing the values at the same "rank" or proportion in both datasets.

##### How to Construct QQ - plot ?

1. Order the Data: Arrange both datasets in ascending order. Let's call them Dataset A and Dataset B.
2. Calculate Empirical Quantiles: For each data point in Dataset A, determine its empirical quantile (its position in the sorted list as a fraction).
3. Find Corresponding Quantile in the Other Dataset: For each empirical quantile calculated for Dataset A, find the data point in Dataset B that corresponds to the same quantile. This might involve interpolation if the exact quantile doesn't fall directly on a data point in Dataset B.
4. Plot the Points: Plot the quantiles of Dataset A on the x-axis and the corresponding quantiles of Dataset B on the y-axis.

##### How to Interpreting QQ-Plot ?
The pattern of the points on the QQ plot reveals information about the similarity of the distributions:

    If the two datasets come from the same distribution: The points on the QQ plot will roughly lie on a straight line with a slope of 1 and an intercept of 0 (the y = x line). Deviations from this line suggest differences in the distributions.

    If the two datasets come from distributions with the same shape but different locations (means): The points will still form a straight line, but the intercept will be non-zero.

    If the two datasets come from distributions with the same shape but different scales (variances): The points will form a straight line, but the slope will not be equal to 1. A slope greater than 1 indicates that the distribution on the y-axis has a larger spread, and a slope less than 1 indicates a smaller spread.

    If the two datasets have different shapes: The points will deviate systematically from a straight line.

        Curvature: Indicates differences in the skewness or kurtosis of the distributions.


You can also create a QQ plot by comparing the quantiles of your sample data against the theoretical quantiles of a specific distribution (e.g., normal, exponential, chi-squared).

![](images\QQ.png)













### Uniform Distribution :- 

Uniform distribution is one of the simplest probability distributions in statistics. It’s called "uniform" because each outcome is equally likely.

There are 2-types Of Uniform Distribution :- 
1. Discrete Uniform Distribution   --> Finite Outcomes with equal probability (Like rolling a fair die).

        Tossing a fair dice : outcomes = {1,2,3,4,5,6}
        Probability of each = 1/6
2. Contionous Uniform Distribution --> Contionous range of values where every interval of the same length hase the same probability (like choosing a random number between 0 and 1).

        Picking a number randomly from 2 to 6.
        Every Number in that interval (e.g 2.1 , 3.9 ,5.9999) has equal chance.


#### PDF of Uniform Distribution :- 
The probability density function (PDF) of a continuous Uniform distribution over the interval \([a, b]\) is given by:

$$
f(x) = 
\begin{cases}
\frac{1}{b - a}, & \text{for } a \leq x \leq b \\
0, & \text{otherwise}
\end{cases}
$$

<p align="center">
  <img src="images\uniform.png" width="500">
</p>

<p align="center">
  a = lower bound <br> b = upper bound<Br>x = value for which PDF is being evaluated
</p>

📉 Graph of Continuous Uniform Distribution
- The graph is a rectangle (flat line).
- Y-axis is the constant probability: $$\frac{1}{b - a}$$
- X-axis is the range from a to b.

### 📌 Real-Life Examples of Uniform Distribution
---

| **Application**             | **Explanation**                                                       |
|----------------------------|------------------------------------------------------------------------|
| Simulation                 | Randomly generating values in games or models.                         |
| Random Number Generation   | Computers use uniform distribution to generate random values.          |
| Manufacturing              | If a machine cuts wires randomly between 10cm and 20cm.                |
| Data Augmentation          | Uniform noise added to image pixels in deep learning.                  |


### 🧠 Intuition Check (Why It’s Important in Data Science)
---

    Sampling – When generating random samples (e.g., bootstrapping), you often start with uniform.

    Normalization – In some cases, features may follow uniform-like distributions.

    Simulations/Monte Carlo Methods – Uniform is used as a base to simulate other distributions.

    Probability Intuition – A lot of other distributions (like triangular, beta, etc.) can be derived or connected from uniform.

### 📘 CDF of Continuous Uniform Distribution
---
The cumulative distribution function (CDF) of a continuous uniform distribution over the interval \([a, b]\) is given by:

$$
F(x) = 
\begin{cases}
0, & x < a \\\\
\frac{x - a}{b - a}, & a \leq x \leq b \\\\
1, & x > b
\end{cases}
$$
<p align="center">
  <img src="images\uniform_cdf.png" width="500">
</p>

### Skewness:-
---
Skewness Of Uniform Distribution = 0 it is symmetric.



### Log Normal Distribution:- 
In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln X has a normal distribution.Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution.

In Log Normal Distribution Has all Positive value because we can't take log of negatie value.

it characterises data with positive values that show right-skewed patterns, which makes it suitable for various real-world scenarios like stock prices, income, resource reserves, social media, etc. Understanding Lognormal Distribution helps in risk assessment, portfolio optimisation, and decision-making in fields, like finance, economics, and resource management.

### Example :- 
1. The length of comment posted on Internet / Social-Media and discussion form follow log-Normal Distribution.
2. Each User Spending time on reading a Article also Example of log - Normal Distribution.

### 📘 PDF of Log-Normal Distribution

If a random variable \( X \) follows a log-normal distribution, then its probability density function (PDF) is given by:

$$
f(x) = \frac{1}{x \sigma \sqrt{2\pi}} \cdot \exp\left( -\frac{(\ln x - \mu)^2}{2\sigma^2} \right), \quad x > 0
$$

**Where:**
- \( x \): the variable (must be positive)
- $ \mu $: the mean of the logarithm of the variable (i.e., $ \mu = \mathbb{E}[\ln(X)] $)
- $ \sigma $: the standard deviation of the logarithm of the variable
- $ \exp $: the exponential function $ e^x $

This means:
- $ \ln(X) \sim \mathcal{N}(\mu, \sigma^2) $ → Logarithm of $ X $ is normally distributed.
- The PDF only applies for $ x > 0 $ because you can't take the logarithm of zero or a negative number.

<p align="center">
  <img src="images\Log-normal-pdfs.png" width="500">
</p>

The log-normal distribution is **right-skewed** and is useful for modeling **positive-valued data** such as:
- Stock prices
- Income
- Time to complete a task
- Biological measurements

<p align="center">
  <img src="images\Log-normal-cdfs.png" width="500">
</p>


### Pareto Distribution :- 
The Pareto distribution, named after Italian economist Vilfredo Pareto, originally modeled the distribution of wealth (e.g., 80% of wealth is owned by 20% of people → 80/20 rule).

    It is a power-law distribution — meaning, it models phenomena where a small number of occurrences account for the majority of the effect.

This Applies in :- 

    -- Wealth distribution

    --File sizes on the internet

    --City populations

    --Insurance claims

    --Stock returns

## 📘 Pareto Distribution Formulas

### 1. Probability Density Function (PDF)

For $ x \ge x_m $:

$$
f(x) = \frac{\alpha \cdot x_m^\alpha}{x^{\alpha + 1}}
$$

<p align="center">
  <img src="images\Pareto_Pdf.png" width="500">
</p>

✅ 1. αα (alpha): Shape Parameter

    Type: Positive real number α>0α>0

    Role: Controls the shape and tail heaviness of the distribution.

🔧 Think of it as:

    A measure of inequality or "rich-get-richer" strength.

    Smaller αα = more skewed, fatter tail → more extreme values.

    Larger αα = thinner tail → values closer to the minimum.

🔍 Real-World Interpretation:

    In wealth distributions, a small αα means a few people hold most of the wealth.

    In internet traffic, small αα implies a few users generate most of the traffic.

✅ 2. xmxm​ (x-min): Scale Parameter / Minimum Value

    Type: Any positive real number xm>0xm​>0

    Role: Represents the minimum possible value the variable xx can take.

🔧 Think of it as:

    The starting point of the distribution.

    No value of xx can exist below xmxm​.

🔍 Real-World Examples:

    If you’re modeling wealth, and the poorest person has ₹10,000, then xm=10,000xm​=10,000.

    If you’re modeling file sizes, and the smallest file is 1 MB, then xm=1xm​

---

### 2. Cumulative Distribution Function (CDF)

$$
F(x) = 1 - \left( \frac{x_m}{x} \right)^\alpha \quad \text{for } x \ge x_m
$$

<p align="center">
  <img src="images\Pareto_cdf.png" width="500">
</p>

---

### Effect of $\alpha$ in the Pareto Distribution
The shape parameter $\alpha$ controls the "heaviness" of the tail and the spread of the distribution.

1. Smaller $\alpha$ --> Heavier 
    If αα is small, the tail is heavy, meaning:

        Large values of xx are more likely.

        The distribution is more spread out.

        There's a higher chance of extreme values (outliers).
    Visual intuition: 
        Slowly Decresing Curve
        Long tail
2. Larger $\alpha$ --> Thinner Tail

    If αα is large, the tail drops quickly, meaning:

        Large values are less likely

        The distribution becomes more concentrated around the minimum value xmxm​

    Visual intuition:

        Sharp peak near xmxm​

        Tail quickly goes to zero