## 🧠 What is Feature Scaling?

**Feature Scaling** is the process of transforming your input features to a **similar scale**, so that no single feature **dominates** the others.

When features have **different ranges** — for example:
- `age` ranges from **0 to 100**
- `salary` ranges from **0 to 100,000**

➡️ Machine learning models like **Neural Networks**, **KNN**, and **SVM** can **struggle to learn efficiently** without scaling.

## 🧪 Standardization in Feature Scaling

### 📘 What is Standardization?

**Standardization** is a scaling technique that transforms features such that they have:
- **Mean = 0**
- **Standard Deviation = 1**

📌 It centers the data and spreads it based on how much variance exists.  
This is helpful when the model assumes a **normal distribution** (like in neural networks, SVM, logistic regression, etc.).

---

## 🧪 Example (Using `StandardScaler`)

```python
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

# Fit on training data and scale it
X_train_scaled = scaler.fit_transform(X_train)

# Scale the test data using same training parameters
X_test_scaled = scaler.transform(X_test)
```
### 🔍 What Happens Here?

#### 🔹 `fit_transform(X_train)`

- Calculates the **mean** and **standard deviation** of `X_train`
- Then applies the formula:
- **X_scaled = (X - mean) / std**


#### 🔹 `transform(X_test)`

- Uses the **same mean and std** calculated from `X_train`
- Ensures **no data leakage** from test data into training

---

### 📊 Output Example (Standardization)

If:
```python
X_train = [[1], [2], [3], [4], [5]]
```

### ▶️ Then (Result after Standardization):
```python
X_train_scaled = 
[[-1.41]
 [-0.71]
 [ 0.00]
 [ 0.71]
 [ 1.41]]
```

### ✅ Now:

- Mean ≈ 0
- Standard Deviation ≈ 1


## 🔬 Normalization in Feature Scaling

---

### 📘 What is Normalization?

**Normalization** is a technique to scale input features so they all fall within a **specific range**, usually **[0, 1]** or **[-1, 1]**.

📌 It’s especially useful when:
- Features have **different units or scales**
- You need to bring all values to a **common range**
- You're using models that are **sensitive to distance** (like **KNN**, **SVM**, and **Neural Networks**)

📎 Formula:
**X_scaled = (X - X_min) / (X_max - X_min)**


## 🧪 Example (Using `MinMaxScaler`)

```python
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

# Fit and transform the training data
X_train_scaled = scaler.fit_transform(X_train)

# Transform the test data using same scaler
X_test_scaled = scaler.transform(X_test)
```

### 🔍 What Happens Here?

#### 🔹 `fit_transform(X_train)`
- Finds the **minimum** and **maximum** values in `X_train`
- Applies the formula to scale all values between 0 and 1:
- **X_scaled = (X - min) / (max - min)**


#### 🔹 `transform(X_test)`
- Uses the **same min and max** values from `X_train`
- Ensures **no data leakage** into the test data

---

### 📊 Output Example (Normalization)

If:
```python
X_train = [[1], [2], [3], [4], [5]]
```

### ▶️ Then (Result after Normalization):

```python
X_train_scaled = 
[[0.00]
 [0.25]
 [0.50]
 [0.75]
 [1.00]]
```

✅ **Now:**

- All values are **scaled between 0 and 1**
- The **range is normalized**, but **mean and standard deviation are not fixed**

---

### 🎯 When to Use Normalization?

| Use Case                          | ✅ Use Normalization |
|----------------------------------|----------------------|
| Pixel/Image data (0–255)         | ✅ Yes               |
| K-Nearest Neighbors (KNN)        | ✅ Yes               |
| Support Vector Machines (SVM)    | ✅ Yes               |
| Neural Networks                  | ✅ Often             |
| Features are bounded             | ✅ Best choice       |


In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv('/content/Social_Network_Ads.csv')

In [None]:
df = df.iloc[:,2:]
df.head()

In [None]:
import seaborn as sns

In [None]:
sns.scatterplot(df.iloc[:,0],df.iloc[:,1])

In [None]:
X = df.iloc[:,0:2]
y = df.iloc[:,-1]

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

In [None]:
import tensorflow as tf
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense

In [None]:
model = Sequential()

model.add(Dense(128,activation='relu',input_dim=2))
model.add(Dense(1,activation='sigmoid'))

In [None]:
model.summary()

In [None]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
history = model.fit(X_train,y_train,validation_data=(X_test,y_test),epochs=100)

In [None]:
import matplotlib.pyplot as plt
plt.plot(history.history['val_accuracy'])

In [None]:
# Applying scaling

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
X_train_scaled

In [None]:
sns.scatterplot(X_train_scaled[:,0],X_train_scaled[:,1])

In [None]:
model = Sequential()

model.add(Dense(128,activation='relu',input_dim=2))
model.add(Dense(1,activation='sigmoid'))

model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

history = model.fit(X_train_scaled,y_train,validation_data=(X_test_scaled,y_test),epochs=100)

In [None]:
import matplotlib.pyplot as plt
plt.plot(history.history['val_accuracy'])