In [28]:
import numpy as np
import pandas as pd

In [29]:
# Dataset
data = {
    'Area': [1500, 1800, 2400, 3000, 3500, 4000],
    'Bedrooms': [3, 3, 4, 4, 5, 5],
    'Price': [400000, 450000, 600000, 650000, 700000, 750000]
}

In [30]:
df = pd.DataFrame(data)

In [31]:
df

Unnamed: 0,Area,Bedrooms,Price
0,1500,3,400000
1,1800,3,450000
2,2400,4,600000
3,3000,4,650000
4,3500,5,700000
5,4000,5,750000


In [32]:
# Features and target
X = df[['Area', 'Bedrooms']].values
y = df['Price'].values

In [33]:
X

array([[1500,    3],
       [1800,    3],
       [2400,    4],
       [3000,    4],
       [3500,    5],
       [4000,    5]])

In [34]:
y

array([400000, 450000, 600000, 650000, 700000, 750000])

# Why Do We Need Normalization?
**Imagine You Have Two Things to Compare:**

Let's say you have two things: the size of your house (in square feet) and the number of bedrooms. The size of the house might be something like 1500 square feet, while the number of bedrooms might be 3.
If you try to compare these two numbers directly, the size of the house (1500) is much bigger than the number of bedrooms (3). This can make it hard for the computer to understand how important each one is.

**Making Everything the Same Size:**

Normalization helps by changing all the numbers so they are around the same size. This way, the computer can compare them more easily.

In [35]:
X_mean, X_std = X.mean(axis=0), X.std(axis=0)
X = (X - X_mean) / X_std

In [36]:
X

array([[-1.34726614, -1.22474487],
       [-1.0104496 , -1.22474487],
       [-0.33681653,  0.        ],
       [ 0.33681653,  0.        ],
       [ 0.89817743,  1.22474487],
       [ 1.45953832,  1.22474487]])

In [37]:
X_mean

array([2700.,    4.])

In [38]:
X_std

array([8.90692614e+02, 8.16496581e-01])

In [39]:
y_mean, y_std = y.mean(), y.std()
y = (y - y_mean) / y_std

In [50]:
y_mean

591666.6666666666

In [51]:
y_std

127202.81268728123

### **Example:**  

Let's say you have these numbers for **Area** and **Bedrooms**:  

---  

**Area:** [1500, 1800, 2400]  
**Bedrooms:** [3, 3, 4]  

---  

### **1️⃣ Calculate the Mean:**  
The mean (average) helps us find the center of the data.  

- **X_mean for Area** = (1500 + 1800 + 2400) / 3 = **1900**  
- **X_mean for Bedrooms** = (3 + 3 + 4) / 3 = **3.33**  

---  

### **2️⃣ Calculate the Standard Deviation:**  
The standard deviation (spread) shows how much the values vary from the mean.  

- **X_std for Area** ≈ **404**  
- **X_std for Bedrooms** ≈ **0.58**  

---  

### **3️⃣ Normalize the Numbers:**  
We use the formula:  

$X_{\text{normalized}} = \frac{X - \text{mean}}{\text{std}}$

**For Area:**  

- (1500 - 1900) / 404 ≈ **-0.99**  
- (1800 - 1900) / 404 ≈ **-0.25**  
- (2400 - 1900) / 404 ≈ **1.24**  

**For Bedrooms:**  

- (3 - 3.33) / 0.58 ≈ **-0.57**  
- (3 - 3.33) / 0.58 ≈ **-0.57**  
- (4 - 3.33) / 0.58 ≈ **1.16**  

---

### **🔹 Why Do We Do This?**  
Now, all the numbers are **around the same size**, making it easier for the computer to work with them.  

---

### **🔹 Why Do We Normalize the Target (y)?**  
The same idea applies to the **target variable (Price)**.  
By normalizing **y**, we make sure that the predictions the model makes are also on the **same scale**.  
This helps the model **learn better and faster**.  

---

### **📌 Summary:**  
✅ **Normalization makes all the numbers similar in size.**  
✅ **It helps the model compare different features easily.**  
✅ **It improves the learning process and speeds up training.**  

🚀 **By normalizing, the model can learn faster and make better predictions!**

# Add ones column for bias term

In [41]:
X = np.c_[np.ones(len(X)), X]

In [42]:
X

array([[ 1.        , -1.34726614, -1.22474487],
       [ 1.        , -1.0104496 , -1.22474487],
       [ 1.        , -0.33681653,  0.        ],
       [ 1.        ,  0.33681653,  0.        ],
       [ 1.        ,  0.89817743,  1.22474487],
       [ 1.        ,  1.45953832,  1.22474487]])

💡 **Why do we add ones (1) to X?**  

In **linear regression**, the formula to predict the price is:  

**Price = θ₀ + θ₁ × Area + θ₂ × Bedrooms**  

### **Here:**  
- **θ₀** is a special number called the **intercept (or bias)**.  
- **θ₁** and **θ₂** are numbers that help the model learn the relationship between price, area, and bedrooms.  

To make this work in **matrix form**, we **add a column of ones (1) to X**, so the equation becomes:  

**Price = 1 × θ₀ + Area × θ₁ + Bedrooms × θ₂**  

This ensures the model correctly learns **θ₀ (the starting point)** while adjusting for Area and Bedrooms. 🚀



---



💡 **Simple Analogy 🎈**  

Think of **θ₀ (theta zero)** like a **starting point in a race**.  

- If you **don't add `1`**, the model **doesn’t know where to start!**  
- By adding `1`, you tell the model, **“Hey, start from here!”** before adjusting for Area and Bedrooms.  

This helps the model make better predictions by including the **intercept (bias)** in the learning process. 🚀

# Gradient Descent Function

In [43]:
def gradient_descent(X, y, theta, alpha, iterations):
    m = len(y)
    for _ in range(iterations):
        theta -= (alpha / m) * X.T @ (X @ theta - y)
    return theta

### **Gradient Descent Example**

#### **Introduction**
Gradient descent is an optimization algorithm used to minimize a cost function. In this example, we will use gradient descent to fit a linear regression model to a simple dataset. The goal is to find the best parameters (coefficients) for the model that minimize the difference between the predicted values and the actual values.

---

#### **Dataset**
We have a small dataset with two features (Feature 1 and Feature 2) and one target variable (Target):

| Feature 1 (x1) | Feature 2 (x2) | Target (y) |
|----------------|----------------|------------|
| 1              | 2              | 3          |
| 4              | 5              | 6          |
| 7              | 8              | 9          |

---

### **Steps**

#### **1. Initialize Variables:**

- **Feature matrix X:**


X = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 4 & 5 \\ 1 & 7 & 8 \end{bmatrix}


- **Target vector y:**


y = \begin{bmatrix} 3 \\ 6 \\ 9 \end{bmatrix}

- **Initial model parameters θ:**

theta = \begin{bmatrix} 0 \\ 0 \end{bmatrix}


- **Learning rate α:**


\[
\alpha = 0.01 \quad \text{(Learning rate, step size for each update)}
\]

This makes it clear that **\(alpha\)** is the learning rate, which controls the step size during each update of the parameters.

- **Number of iterations:**

\[
\text{iterations} = 1000 \quad \text{(Number of times we want to update theta)}
\]

---

#### **2. Add Bias Term:**

We add a column of ones to the **X** matrix to account for the bias term (θ₀):

\[
X = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 4 & 5 \\ 1 & 7 & 8 \end{bmatrix}
\]

Now **X** has an extra column that represents the bias term.

---

#### **3. Gradient Descent Formula:**

The **gradient_descent** function will iteratively update **theta** using the formula:

\[
\theta = \theta - \frac{\alpha}{m} X^T (X\theta - y)
\]

Where:

- **m = 3** is the number of data points.
- **α = 0.01** is the learning rate.
- **X** is the feature matrix (with the bias column added).
- **y** is the target vector.
- **θ** are the parameters (weights) we are trying to learn.

---

#### **4. Example Calculation:**

Let's go through one iteration of the gradient descent process.

##### **Initial Prediction:**

We calculate the initial prediction using the current **theta** (which is `[0, 0, 0]` initially):

\[
X\theta = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 4 & 5 \\ 1 & 7 & 8 \end{bmatrix} \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}
\]

So, the initial prediction is `0` for each data point.

##### **Error:**

Now we calculate the error (the difference between the predicted value and the actual target values):

\[
X\theta - y = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} - \begin{bmatrix} 3 \\ 6 \\ 9 \end{bmatrix} = \begin{bmatrix} -3 \\ -6 \\ -9 \end{bmatrix}
\]

##### **Gradient:**

Next, we calculate the gradient, which tells us how much we need to adjust the parameters:

\[
X^T (X\theta - y) = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 4 & 5 \\ 1 & 7 & 8 \end{bmatrix}^T \begin{bmatrix} -3 \\ -6 \\ -9 \end{bmatrix}
\]

This will give:

\[
\text{gradient} = \begin{bmatrix} -18 \\ -87 \\ -126 \end{bmatrix}
\]

##### **Update Theta:**

Now, we update the values of **theta** using the gradient and learning rate **α**:

\[
\theta = \theta - \frac{\alpha}{m} \text{gradient}
\]

\[
\theta = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} - \frac{0.01}{3} \begin{bmatrix} -18 \\ -87 \\ -126 \end{bmatrix}
\]

\[
\theta = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} 0.06 \\ 0.29 \\ 0.42 \end{bmatrix}
\]

\[
\theta = \begin{bmatrix} 0.06 \\ 0.29 \\ 0.42 \end{bmatrix}
\]

After one iteration, **theta** is updated to `[0.06, 0.29, 0.42]`.

---

### **Summary:**

- **Gradient Descent:** The function iteratively updates **theta** to minimize the error (difference between predicted and actual values).
- **Learning Rate (α):** Controls the size of the steps taken towards minimizing the error. If it's too large, we may overshoot; if it's too small, the process will be slow.
- **Iterations:** The number of times the function will update **theta**. More iterations generally lead to better results.
- By running the **gradient_descent** function for enough iterations, **theta** will converge to values that minimize the error and make the predictions as accurate as possible.

---

### **Making Predictions**

Once the model parameters are optimized, you can use them to make predictions on new data points. For example, if you have a new data point with Feature 1 = 3 and Feature 2 = 4, you can predict the target value using the optimized parameters.


# Initialize theta, learning rate, and iterations

In [44]:

theta = np.zeros(X.shape[1])
alpha = 0.1  # Adjusted learning rate
iterations = 1000

In [45]:
# Train model
theta = gradient_descent(X, y, theta, alpha, iterations)

In [46]:
theta

array([2.82814137e-16, 6.95395195e-01, 2.93667900e-01])

In [47]:
# Prediction function
def predict(area, bedrooms):
    # Standardize input using training mean/std
    area = (area - X_mean[0]) / X_std[0]
    bedrooms = (bedrooms - X_mean[1]) / X_std[1]

    # Compute predicted value (denormalized)
    normalized_price = theta[0] + theta[1] * area + theta[2] * bedrooms
    return normalized_price * y_std + y_mean

Here’s the explanation written as Google Colab text that you can directly use in your notebook:

---

### **Prediction Example**

#### **Given Values:**
- **area = 5000** (square feet)
- **bedrooms = 6**
- **X_mean = [2700, 4]** (mean of area and bedrooms)
- **X_std = [890.692614, 0.816496581]** (standard deviation of area and bedrooms)
- **y_mean = 591666.6666666666** (mean of the target variable, price)
- **y_std = 127202.81268728123** (standard deviation of the target variable, price)
- **theta = [2.82814137e-16, 6.95395195e-01, 2.93667900e-01]** (model coefficients for the intercept, area, and bedrooms)

---

### **Step 1: Standardization of Inputs**

We first standardize the inputs (area and bedrooms) using **X_mean** and **X_std**.

- **Standardized area**:

\[
\text{standardized area} = \frac{5000 - 2700}{890.692614} = \frac{2300}{890.692614} \approx 2.58
\]

- **Standardized bedrooms**:

\[
\text{standardized bedrooms} = \frac{6 - 4}{0.816496581} = \frac{2}{0.816496581} \approx 2.45
\]

---

### **Step 2: Compute the Predicted Value (Normalized)**

Now, we compute the **normalized price** using the **theta** values:

\[
\text{normalized price} = \theta[0] + \theta[1] \times \text{standardized area} + \theta[2] \times \text{standardized bedrooms}
\]
\[
\text{normalized price} = 2.82814137 \times 10^{-16} + 0.695395195 \times 2.58 + 0.2936679 \times 2.45
\]
\[
\text{normalized price} \approx 0 + 1.793 + 0.720
\]
\[
\text{normalized price} \approx 2.513
\]

---

### **Step 3: Denormalization**

Finally, we denormalize the **normalized price** using **y_mean** and **y_std**:

\[
\text{price} = (\text{normalized price} \times y_{\text{std}}) + y_{\text{mean}}
\]
\[
\text{price} = (2.513 \times 127202.81268728123) + 591666.6666666666
\]
\[
\text{price} \approx 319116.84 + 591666.67 = 910783.51
\]

---

### **Final Predicted Price:**

The predicted price for an area of **5000 sqft** and **6 bedrooms** is approximately **$910,783.51**, which matches closely with the previous prediction.

---

Let me know if you need further explanations!

In [48]:
# Predict for 5000 sqft, 6 bedrooms
predicted_price = predict(5000, 6)
print(f"Predicted price for 5000 sqft and 6 bedrooms: ${predicted_price:.2f}")

Predicted price for 5000 sqft and 6 bedrooms: $911585.27
