# üè° Min-Max Normalization Workshop
## Team Name: 
## Team Members: 
---

## ‚ùó Why We Normalize: The Problem with Raw Feature Scales

In housing data, features like `Price` and `Lot_Size` can have values in the hundreds of thousands, while others like `Num_Bedrooms` range from 1 to 5. This creates problems when we use algorithms that depend on numeric magnitudes.

---

### ‚ö†Ô∏è What Goes Wrong Without Normalization

---

### 1. üß≠ K-Nearest Neighbors (KNN)

KNN uses the **Euclidean distance** formula:

$$
d = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2 + \cdots}
$$

**Example:**

- $ \text{Price}_1 = 650{,}000, \quad \text{Price}_2 = 250{,}000 $
- $ \text{Bedrooms}_1 = 3, \quad \text{Bedrooms}_2 = 2 $

Now compute squared differences:

$$
(\text{Price}_1 - \text{Price}_2)^2 = (650{,}000 - 250{,}000)^2 = (400{,}000)^2 = 1.6 \times 10^{11}
$$
$$
(\text{Bedrooms}_1 - \text{Bedrooms}_2)^2 = (3 - 2)^2 = 1
$$

‚û°Ô∏è **Price dominates the distance calculation**, making smaller features like `Bedrooms` irrelevant.

---

### 2. üìâ Linear Regression

Linear regression estimates:

$$
y = \beta_1 \cdot \text{Price} + \beta_2 \cdot \text{Bedrooms} + \beta_3 \cdot \text{Lot\_Size} + \epsilon
$$

If `Price` has very large values:
- Gradient updates for $ \beta_1 $ will be **much larger**
- Gradient updates for $ \beta_2 $ (Bedrooms) will be **very small**

‚û°Ô∏è The model overfits high-magnitude features like `Price`.

---

### 3. üß† Neural Networks

A single neuron computes:

$$
z = w_1 \cdot \text{Price} + w_2 \cdot \text{Bedrooms} + w_3 \cdot \text{Lot\_Size}
$$

If:

- $ \text{Price} = 650{,}000 $
- $ \text{Bedrooms} = 3 $
- $ \text{Lot\_Size} = 8{,}000 $

Then:

$$
z \approx w_1 \cdot 650{,}000 + w_2 \cdot 3 + w_3 \cdot 8{,}000
$$

‚û°Ô∏è Even with equal weights, `Price` contributes **most of the activation**, making it difficult for the network to learn from other features.

---

### ‚úÖ Solution: Min-Max Normalization

We apply the transformation:

$$
x_{\text{normalized}} = \frac{x - x_{\text{min}}}{x_{\text{max}} - x_{\text{min}}}
$$

This scales all features to a common range (typically $[0, 1]$).

| Feature      | Raw Value | Min     | Max     | Normalized Value |
|--------------|-----------|---------|---------|------------------|
| Price        | 650,000   | 250,000 | 800,000 | 0.72             |
| Bedrooms     | 3         | 1       | 5       | 0.50             |
| Lot_Size     | 8,000     | 3,000   | 10,000  | 0.714            |

‚û°Ô∏è Now, **each feature contributes fairly** to model training or distance comparisons.

---

## üìå Use Case: Housing Data
We are normalizing features from a real estate dataset to prepare it for machine learning analysis.

In [None]:
# üî¢ Load and display dataset
import pandas as pd
df = pd.read_csv('housing_data.csv')
df.head()

### üîé Step 1 ‚Äî Implement Min-Max Normalization on the Housing Dataset

In [1]:
# ‚úçÔ∏è Implement Min-Max Normalization manually here (no sklearn/numpy)
# Normalize: Price, Area_sqft, Num_Bedrooms, Num_Bathrooms, Lot_Size
import pandas as pd

class MinMaxNormalizer:
    def __init__(self, df):
        self.df = df.copy()
    
    def normalize_column(self, column_name, new_column_name=None):
        if column_name not in self.df.columns:
            raise ValueError(f"Column '{column_name}' does not exist in the DataFrame.")
        
        col_min = self.df[column_name].min()
        col_max = self.df[column_name].max()
        
        if col_max == col_min:
            raise ValueError(f"Cannot normalize column '{column_name}' because it has constant value.")
        
        if not new_column_name:
            new_column_name = f"{column_name}_MinMax"
        
        self.df[new_column_name] = (self.df[column_name] - col_min) / (col_max - col_min)
        return self.df[[column_name, new_column_name]]
    
    def get_dataframe(self):
        return self.df

In [2]:
# Load your data
df = pd.read_csv("data/housing_data.csv")

# Create an instance of the normalizer
normalizer = MinMaxNormalizer(df)

# Normalize the 'Price' column
normalized_price = normalizer.normalize_column('Price')

# Display normalized values
print(normalized_price.head())


    Price  Price_MinMax
0  574507      0.485226
1  479260      0.387827
2  597153      0.508384
3  728454      0.642651
4  464876      0.373119


### üîé Talking Point 1 ‚Äî [Insert your review comment here]

Reviwed by:
- Name
- Name
- Name