# How do you use mean & Variance to Standardize data ?

❓ Question: How many standard deviations away from the mean is a given value of x?

![image.png](attachment:image.png)

❗ Intuition: Instead of expressing distances in absolute values, express them in units of standard deviations.

## Standardizing data

❓ If $\overline{x}$ is the mean and $s$ is the standard deviation of a data set, then find the point which is one standard deviation away from the mean.
> $x = \overline{x} + s$

❓ If $\overline{x}$ is the mean and $s$ is the standard deviation of a data set, then find the point which is two standard deviations away from the mean.
> $x = \overline{x} + 2*s$

❓ If $\overline{x}$ is the mean and $s$ is the standard deviation of a data set, then find the point which is **z** standard deviations away from the mean.
> $x = \overline{x} + z*s$

We can express any point in the data as

$$x_i = \overline{x} + z_i*s$$

Where

$$z_i = \frac{x_i - \overline{x}}{s}$$

**$z_i$ is called the z-score of x and tells us the number of standard deviations that the point is away from the mean**

![image.png](attachment:image-2.png)

## Standardizing data (Usage in ML)

![image.png](attachment:image-3.png)

Income is considered as a parameter because of affluent diseases where the lifestyle may impact the health of the person. So, we need to standardize the income parameter.

* Standardizing the data is a common practice in machine learning.
* Standardizing the data is necessary as it might dominate the other parameters and affect the decision boundary.

![image.png](attachment:image-4.png)

❓ What is the mean and standard deviation of the standardized data?

![image.png](attachment:image-5.png)

### Proof that mean of the standardized data is 0

**Proof:**
$$ z_i = \frac{x_i - \overline{x}}{s}$$
$$ \overline{z} = \frac{1}{n} \sum_{i=1}^n z_i$$
$$ \overline{z} = \frac{1}{n} \sum_{i=1}^n \frac{x_i -\overline{x}}{s} \;(Substituting\;the\;value\;of\;z_i)$$
$$ \overline{z} = \frac{1}{s}(\frac{1}{n} \sum_{i=1}^n x_i - \frac{1}{n} \sum_{i=1}^n \overline{x}) $$
$$ \overline{z} = \frac{1}{s}(\overline{x} - \frac{1}{n} *n *\overline{x}) $$
$$ \overline{z} = \frac{1}{s}(\overline{x} - \overline{x}) $$
$$ \overline{z} = 0 $$

### Proof that standard deviation of the standardized data is 1

**Proof:**
$$ z_i = \frac{x_i - \overline{x}}{s}$$
$$ s_z^2 = \frac{1}{n-1} \sum_{i=1}^n (z_i - \overline{z})^2\;\;\;\;\;(\;∵\overline{z} = 0)$$
$$ s_z^2 = \frac{1}{n-1} \sum_{i=1}^n (z_i)^2$$
$$ s_z^2 = \frac{1}{n-1} \sum_{i=1}^n \frac{(x_i - \overline{x})^2}{s^2} \;(Substituting\;the\;value\;of\;z_i)$$
$$ s_z^2 = \frac{1}{s^2} *\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2$$
$$ s_z^2 = \frac{1}{s^2} *s^2$$
$$ s_z^2 = 1$$

# Summary:

## Measures of spread

![image.png](attachment:image-2.png)

## Effect of transformations

![image.png](attachment:image.png)

## Standardizing data

![image.png](attachment:image-3.png)

"After standardizing the data, the mean of the standardized data is 0 and the standard deviation of the standardized data is 1."