# Standard Scaler (Z-Score Normalization)

## Formula

$$z = \frac{x - \mu}{\sigma}$$

Where:
- **z** = standardized value (output)
- **x** = original value (input)
- **μ** (mu) = mean of the feature
- **σ** (sigma) = standard deviation of the feature

## What It Does

Standard Scaler transforms data to have:
- **Mean (μ) = 0**
- **Standard Deviation (σ) = 1**

This process is called **standardization** or **z-score normalization**.

## How It Works

1. **Calculate the mean**: μ = (x₁ + x₂ + ... + xₙ) / n
2. **Calculate the standard deviation**: σ = √[Σ(xᵢ - μ)² / n]
3. **Transform each value**: z = (x - μ) / σ

## When to Use Standard Scaler

**Use when:**
- Features have different units or scales (e.g., age vs. salary)
- Using algorithms sensitive to feature magnitude:
  - Gradient Descent-based algorithms (Linear/Logistic Regression, Neural Networks)
  - Support Vector Machines (SVM)
  - K-Nearest Neighbors (KNN)
  - Principal Component Analysis (PCA)
- Data is approximately normally distributed
- You want to preserve outliers (unlike robust scaling)

**Avoid when:**
- Using tree-based algorithms (Decision Trees, Random Forest, XGBoost) - they're scale-invariant
- Data has many outliers (consider RobustScaler instead)
- You need values in a specific range [0, 1] (use MinMaxScaler instead)

In [1]:
Data=[1,2,3,4,5,6,7,8,9,10]
print(f"Data:{Data}")

Data:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [7]:
Mean=sum(Data)/len(Data)
print(f"Mean of Data:{Mean}")

Mean of Data:5.5


In [9]:
Variance=sum((x-Mean)**2 for x in Data)/len(Data)
print(f"Variance:{Variance}")

Variance:8.25


In [10]:
Sd=Variance**0.5
print(f"Standard deviation:{Sd}")

Standard deviation:2.8722813232690143


In [11]:
Scaled_data=[(x-Mean)/Sd for x in Data]
print(f"Scaled Data:{Scaled_data}")

Scaled Data:[-1.5666989036012806, -1.2185435916898848, -0.8703882797784892, -0.5222329678670935, -0.17407765595569785, 0.17407765595569785, 0.5222329678670935, 0.8703882797784892, 1.2185435916898848, 1.5666989036012806]


In [22]:
import numpy as np
print("Scaled data mean:",round(np.mean(Scaled_data)))
print("Scaled data std:", round(np.std(Scaled_data)))

Scaled data mean: 0
Scaled data std: 1


## Example

Given a dataset of ages: **[20, 30, 40, 50, 60]**

### Step 1: Calculate Mean
μ = (20 + 30 + 40 + 50 + 60) / 5 = 40

### Step 2: Calculate Standard Deviation
σ = √[((-20)² + (-10)² + 0² + 10² + 20²) / 5] = √(1000/5) = √200 ≈ 14.14

### Step 3: Transform Each Value

| Original (x) | Calculation | Standardized (z) |
|--------------|-------------|------------------|
| 20 | (20 - 40) / 14.14 | -1.41 |
| 30 | (30 - 40) / 14.14 | -0.71 |
| 40 | (40 - 40) / 14.14 | 0.00 |
| 50 | (50 - 40) / 14.14 | +0.71 |
| 60 | (60 - 40) / 14.14 | +1.41 |



## Python Implementation

```python
from sklearn.preprocessing import StandardScaler
import numpy as np

# Create data
data = np.array([[20], [30], [40], [50], [60]])

# Initialize and fit scaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

print("Mean:", scaler.mean_)
print("Std:", scaler.scale_)
print("Scaled data:", scaled_data.flatten())
```

In [16]:
import numpy as np
from sklearn.preprocessing import StandardScaler

In [14]:
Data=np.array([[1],[2],[3],[4],[5],[6],[7],[8],[9],[10]])
print(f"Data:{Data}")

Data:[[ 1]
 [ 2]
 [ 3]
 [ 4]
 [ 5]
 [ 6]
 [ 7]
 [ 8]
 [ 9]
 [10]]


In [17]:
scaler=StandardScaler()
Scaled_data=scaler.fit_transform(Data)

In [18]:
print("Mean:", scaler.mean_)
print("Std:", scaler.scale_)
print("Scaled data:", Scaled_data.flatten())

Mean: [5.5]
Std: [2.87228132]
Scaled data: [-1.5666989  -1.21854359 -0.87038828 -0.52223297 -0.17407766  0.17407766
  0.52223297  0.87038828  1.21854359  1.5666989 ]


## Verifying Standard Scaler Results

##### To check if Standard Scaler worked correctly, verify that the scaled data has mean ≈ 0 and standard deviation ≈ 1:

In [21]:
print("Scaled data mean:",round(np.mean(Scaled_data)))
print("Scaled data std:", round(np.std(Scaled_data)))

Scaled data mean: 0
Scaled data std: 1
