## 1. What Are Mean, Variance, and Standard Deviation?

| Concept | Description | NumPy Code Example |
| --- | --- | --- |
| Mean | The average of the numbers. | np.mean(data) |
| Variance | How much the data is spread out from the mean. | np.var(data) |
| Standard Deviation | Square root of variance — tells how much the data varies from the mean. | np.std(data) |

## 2. Why Are They Important in Machine Learning?

* Mean helps center or normalize data.
* Variance/Standard Deviation help understand how spread out your data is.
* Helps with feature scaling, outlier detection, and data normalization.
* Used in algorithms like PCA (Principal Component Analysis) and Gaussian Naive Bayes.

## 3. First, Import NumPy

In [1]:
import numpy as np

## 4. Mean (Average)

**Formula:**

$$ Mean=\frac{1}{n}\sum_{i=1}^{n}x_{i} $$

In [2]:
data = np.array([10, 20, 30, 40, 50])
mean_value = np.mean(data)
print("Mean:", mean_value)

Mean: 30.0


**Note:**
  
* In ML, you often subtract the mean from your features to normalize data.
* This is called **mean normalization**.

## 5. Variance

**Formula:**

$$ Variance=\frac{1}{n}\sum_{i=1}^{n}\left( x_{i}-Mean \right)^{2} $$

**Example:**

In [4]:
data = np.array([10, 20, 30, 40, 50])
variance = np.var(data)
print("Variance:", variance)

Variance: 200.0


**Note:**
  
* High variance = data is spread out.
* Low variance = data is concentrated around the mean.
* Variance is used in feature selection and understanding feature importance.

## 6. Standard Deviation

**Formula:**

$$ Std \: \: Dev=\sqrt{Variance} $$

In [5]:
data = np.array([10, 20, 30, 40, 50])
std_deviation = np.std(data)
print("Standard Deviation:", std_deviation)

Standard Deviation: 14.142135623730951


**Note:**
  
* Std Dev is easier to interpret because it’s in the same unit as the data.
* Useful in data normalization and Z-score calculations.

## 7. Practical Use Case in Machine Learning

**Example:** Normalizing Features (Zero Mean, Unit Variance)

In [7]:
data = np.array([10, 20, 30, 40, 50])

mean = np.mean(data)
std = np.std(data)

normalized_data = (data - mean) / std
print("Normalized Data:", normalized_data)

Normalized Data: [-1.41421356 -0.70710678  0.          0.70710678  1.41421356]


**Why Important:**

* Many ML algorithms perform better when data is scaled.
* Neural Networks and SVMs are sensitive to feature scales.

## 8. Multidimensional Data

In [8]:
data_2d = np.array([[1, 2, 3],
                    [4, 5, 6],
                    [7, 8, 9]])

# Mean along columns (axis=0)
print("Mean (axis=0):", np.mean(data_2d, axis=0))

# Mean along rows (axis=1)
print("Mean (axis=1):", np.mean(data_2d, axis=1))

Mean (axis=0): [4. 5. 6.]
Mean (axis=1): [2. 5. 8.]


**Note:**

* axis=0 → column-wise (per feature)
* axis=1 → row-wise (per sample)