# 🏡 Min-Max Normalization Workshop

## Team Name: Group 6

## Team Members: 

### Eris Leksi

### Reham Abuarqoub

### Erica Holden


## ❗ Why We Normalize: The Problem with Raw Feature Scales

In housing data, features like `Price` and `Lot_Size` can have values in the hundreds of thousands, while others like `Num_Bedrooms` range from 1 to 5. This creates problems when we use algorithms that depend on numeric magnitudes.

---

### ⚠️ What Goes Wrong Without Normalization

---

### 1. 🧭 K-Nearest Neighbors (KNN)

KNN uses the **Euclidean distance** formula:

$$
d = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2 + \cdots}
$$

**Example:**

- $ \text{Price}_1 = 650{,}000, \quad \text{Price}_2 = 250{,}000 $
- $ \text{Bedrooms}_1 = 3, \quad \text{Bedrooms}_2 = 2 $

Now compute squared differences:

$$
(\text{Price}_1 - \text{Price}_2)^2 = (650{,}000 - 250{,}000)^2 = (400{,}000)^2 = 1.6 \times 10^{11}
$$
$$
(\text{Bedrooms}_1 - \text{Bedrooms}_2)^2 = (3 - 2)^2 = 1
$$

➡️ **Price dominates the distance calculation**, making smaller features like `Bedrooms` irrelevant.

---

### 2. 📉 Linear Regression

Linear regression estimates:

$$
y = \beta_1 \cdot \text{Price} + \beta_2 \cdot \text{Bedrooms} + \beta_3 \cdot \text{Lot\_Size} + \epsilon
$$

If `Price` has very large values:
- Gradient updates for $ \beta_1 $ will be **much larger**
- Gradient updates for $ \beta_2 $ (Bedrooms) will be **very small**

➡️ The model overfits high-magnitude features like `Price`.

---

### 3. 🧠 Neural Networks

A single neuron computes:

$$
z = w_1 \cdot \text{Price} + w_2 \cdot \text{Bedrooms} + w_3 \cdot \text{Lot\_Size}
$$

If:

- $ \text{Price} = 650{,}000 $
- $ \text{Bedrooms} = 3 $
- $ \text{Lot\_Size} = 8{,}000 $

Then:

$$
z \approx w_1 \cdot 650{,}000 + w_2 \cdot 3 + w_3 \cdot 8{,}000
$$

➡️ Even with equal weights, `Price` contributes **most of the activation**, making it difficult for the network to learn from other features.

---

### ✅ Solution: Min-Max Normalization

We apply the transformation:

$$
x_{\text{normalized}} = \frac{x - x_{\text{min}}}{x_{\text{max}} - x_{\text{min}}}
$$

This scales all features to a common range (typically $[0, 1]$).

| Feature      | Raw Value | Min     | Max     | Normalized Value |
|--------------|-----------|---------|---------|------------------|
| Price        | 650,000   | 250,000 | 800,000 | 0.72             |
| Bedrooms     | 3         | 1       | 5       | 0.50             |
| Lot_Size     | 8,000     | 3,000   | 10,000  | 0.714            |

➡️ Now, **each feature contributes fairly** to model training or distance comparisons.

---

## 📌 Use Case: Housing Data
We are normalizing features from a real estate dataset to prepare it for machine learning analysis.

In [1]:
# 🔢 Load and display dataset
import pandas as pd
df = pd.read_csv('housing_data.csv')
print(df.head())
print(df.describe())
print(df.info())
print(df.shape)
print(df.columns)   

  House_ID   Price  Area_sqft  Num_Bedrooms  Num_Bathrooms  Year_Built  \
0  H100000  574507       1462             3              3        2002   
1  H100001  479260       1727             2              2        1979   
2  H100002  597153       1403             5              2        1952   
3  H100003  728454       1646             5              2        1992   
4  H100004  464876        853             1              1        1956   

   Lot_Size  
0      4878  
1      4943  
2      5595  
3      9305  
4      7407  
              Price    Area_sqft  Num_Bedrooms  Num_Bathrooms   Year_Built  \
count  2.000000e+03  2000.000000   2000.000000    2000.000000  2000.000000   
mean   5.068961e+05  1796.453000      2.983500       1.966000  1985.689500   
std    1.478786e+05   502.185109      1.409333       0.825945    21.159536   
min    1.000000e+05   400.000000      1.000000       1.000000  1950.000000   
25%    4.066002e+05  1445.000000      2.000000       1.000000  1967.000000   
50%

### 🔎 Step 1 — Implement Min-Max Normalization on the Housing Dataset

In [2]:
# Columns to normalize
columns_to_normalize = ['Price', 'Area_sqft', 'Num_Bedrooms', 'Num_Bathrooms', 'Lot_Size']

# Manually apply Min-Max normalization
for col in columns_to_normalize:
    min_val = df[col].min()
    max_val = df[col].max()
    df[col] = df[col].apply(lambda x: (x - min_val) / (max_val - min_val))

# Preview the result
df.head()

Unnamed: 0,House_ID,Price,Area_sqft,Num_Bedrooms,Num_Bathrooms,Year_Built,Lot_Size
0,H100000,0.485226,0.315789,0.5,1.0,2002,0.320814
1,H100001,0.387827,0.394588,0.25,0.5,1979,0.326191
2,H100002,0.508384,0.298246,1.0,0.5,1952,0.380129
3,H100003,0.642651,0.370503,1.0,0.5,1992,0.687045
4,H100004,0.373119,0.134701,0.0,0.0,1956,0.53003


### 🔎 Talking Point 1 — [Insert your review comment here]

Reviwed by:
- Name
- Name
- Name