
---

# ✨ FEATURE TRANSFORMATION

## 🔧 Why Feature Scaling?

Feature scaling ensures that **all features** in a dataset are on a **similar scale**, which helps:

* 📈 Machine learning models perform better
* ⚡ Faster convergence during training
* ✅ Prevents features with larger scales from dominating

---

## 📏 SCALING METHODS

### 1️⃣ **Min-Max Scaling**

🔹 **What it does:**
Compresses the feature values to a **fixed range**, usually \[0, 1].

🔹 **Good for:**
Preserving the **original distribution shape** of the data.

🔹 **Formula:**

$$
x_{\text{scaled}} = \frac{x - \min(x)}{\max(x) - \min(x)}
$$

---

### 2️⃣ **Standardization (Z-Score Normalization)**

🔹 **What it does:**
Centers the data around **zero mean** with a **standard deviation of 1**.

🔹 **Good for:**
Data that **does not follow** a normal distribution.

🔹 **Formula:**

$$
z = \frac{x - \mu}{\sigma}
$$

**📈 Z-Score Meaning:**

Z-score is also known as the **standard score**.

It is a **statistical measure** that indicates **how many standard deviations** a data point is **away from the mean** of the dataset.

After transformation, it shows **how far** (in standard deviations) each value deviates from the average.


---

## ✅ Summary

| Method          | Range     | Preserves Shape | Mean = 0? | Std = 1? |
| --------------- | --------- | --------------- | --------- | -------- |
| Min-Max Scaling | \[0, 1]   | ✅               | ❌         | ❌        |
| Standardization | Unlimited | ❌               | ✅         | ✅        |

---



In [2]:
# LOADING DATA:
import pandas as pd
DATA=pd.read_csv(r"C:\Users\Nagesh Agrawal\OneDrive\Desktop\EDA\DATA\OUTLIER_FREE_DATA.csv")

In [5]:
DATA

Unnamed: 0,event_time,event_type,product_id,category_id,brand,price,user_id,user_session,quantity,ANOMALY_SCORE
0,2019-12-01 00:00:00 UTC,remove_from_cart,5712790,1487580005268456287,f.o.x,6.27,576802932,51d85cb0-897f-48d2-918b-ad63965c12dc,1,1
1,2019-12-01 00:00:00 UTC,view,5764655,1487580005411062629,cnd,29.05,412120092,8adff31e-2051-4894-9758-224bfa8aec18,1,1
2,2019-12-01 00:00:05 UTC,view,5848413,1487580007675986893,freedecor,0.79,348405118,722ffea5-73c0-4924-8e8f-371ff8031af4,1,1
3,2019-12-01 00:00:07 UTC,view,5824148,1487580005511725929,missing,5.56,576005683,28172809-7e4a-45ce-bab0-5efa90117cd5,1,1
4,2019-12-01 00:00:09 UTC,view,5773361,1487580005134238553,runail,2.62,560109803,38cf4ba1-4a0a-4c9e-b870-46685d105f95,1,1
...,...,...,...,...,...,...,...,...,...,...
3013840,2019-12-31 23:59:35 UTC,view,5784043,1487580005754995573,missing,4.92,420652863,546f6af3-a517-4752-a98b-80c4c5860711,1,1
3013841,2019-12-31 23:59:37 UTC,view,5834173,2151191070908613477,runail,2.62,595411904,74ca1cd5-5381-4ffe-b00b-a258b390db77,1,1
3013842,2019-12-31 23:59:39 UTC,view,5683350,1487580005671109489,masura,2.84,536812729,e4a2d47c-a956-4c46-8176-745f52ea664b,1,1
3013843,2019-12-31 23:59:52 UTC,view,5775982,1783999063314661546,missing,11.90,397780878,7e8a2b85-153a-44eb-a71f-b748fde14fcc,1,1


In [8]:
from sklearn.preprocessing import StandardScaler,MinMaxScaler
STD=StandardScaler()
MM=MinMaxScaler()

STANDARD_DATA=STD.fit_transform(DATA.select_dtypes("number"))
MINMAX_DATA=MM.fit_transform(DATA.select_dtypes("number"))

In [11]:
STANDARD_DATA=pd.DataFrame(STANDARD_DATA,columns=DATA.select_dtypes("number").columns)

In [13]:
STANDARD_DATA.describe()

Unnamed: 0,product_id,category_id,price,user_id,quantity,ANOMALY_SCORE
count,3013845.0,3013845.0,3013845.0,3013845.0,3013845.0,3013845.0
mean,-3.992164e-16,-7.696738e-15,3.206707e-16,1.45756e-16,6.275542e-16,0.0
std,1.0,1.0,1.0,1.0,1.0,0.0
min,-8.62855,-0.3729159,-7.53138,-5.592801,-0.1729129,0.0
25%,0.03724435,-0.3729159,-0.4520713,-0.4241566,-0.1729129,0.0
50%,0.1398399,-0.3729159,-0.272111,0.409434,-0.1729129,0.0
75%,0.210187,-0.3729159,-0.0373802,0.7150614,-0.1729129,0.0
max,0.2938065,4.645183,9.99432,0.8713139,68.38216,0.0


In [15]:
MINMAX_DATA=pd.DataFrame(MINMAX_DATA,columns=DATA.select_dtypes("number").columns)

In [18]:
MINMAX_DATA.describe()

Unnamed: 0,product_id,category_id,price,user_id,quantity,ANOMALY_SCORE
count,3013845.0,3013845.0,3013845.0,3013845.0,3013845.0,3013845.0
mean,0.9670708,0.07431419,0.4297335,0.8652075,0.002522249,0.0
std,0.112078,0.1992787,0.05705907,0.1547003,0.01458681,0.0
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.971245,1.267357e-09,0.4039387,0.7995904,0.0,0.0
50%,0.9827437,4.620807e-09,0.4142071,0.9285471,0.0,0.0
75%,0.9906281,1.165295e-08,0.4276006,0.9758277,0.0,0.0
max,1.0,1.0,1.0,1.0,1.0,0.0
