**Multiple Linear Regression on a Small Toy Dataset**

In this notebook, we will:

- Create a small, easy-to-understand dataset
- Use Multiple Linear Regression (MLR)
- Learn the equation:

\[
{y} = a + b_1 x_1 + b_2 x_2 + b_3 x_3
\]

- Predict values for **new inputs** where y is unknown


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score


In [4]:
X = np.array([
    [1.0, 2, 10],
    [1.5, 3, 5],
    [2.0, 3, 2],
    [2.5, 4, 1],
    [3.0, 4, 15],
    [3.5, 5, 20],
    [4.0, 4, 8],
    [4.5, 5, 12]
])

y = np.array([150, 200, 250, 300, 320, 330, 360, 380])

df = pd.DataFrame(X, columns=["Size (1000 sqft)", "Bedrooms", "Age (years)"])
df["Price (1000$)"] = y

df


Unnamed: 0,Size (1000 sqft),Bedrooms,Age (years),Price (1000$)
0,1.0,2.0,10.0,150
1,1.5,3.0,5.0,200
2,2.0,3.0,2.0,250
3,2.5,4.0,1.0,300
4,3.0,4.0,15.0,320
5,3.5,5.0,20.0,330
6,4.0,4.0,8.0,360
7,4.5,5.0,12.0,380


In [5]:
X = df[["Size (1000 sqft)", "Bedrooms", "Age (years)"]]
y = df["Price (1000$)"]

print("X shape:", X.shape)
print("y shape:", y.shape)


X shape: (8, 3)
y shape: (8,)


In [6]:
model = LinearRegression()
model.fit(X, y)

a = model.intercept_      # intercept
b1, b2, b3 = model.coef_  # coefficients

print("Intercept (a):", a)
print("Coefficient for Size (b1):", b1)
print("Coefficient for Bedrooms (b2):", b2)
print("Coefficient for Age (b3):", b3)

print(f"\nLearned equation (ma'am style):")
print(f"Price = {a:.2f} + {b1:.2f} * Size + {b2:.2f} * Bedrooms + {b3:.2f} * Age")


Intercept (a): 78.43180896513351
Coefficient for Size (b1): 50.121188686541004
Coefficient for Bedrooms (b2): 22.769798081729242
Coefficient for Age (b3): -1.6878707572170897

Learned equation (ma'am style):
Price = 78.43 + 50.12 * Size + 22.77 * Bedrooms + -1.69 * Age


In [7]:
y_pred = model.predict(X)

comparison_df = pd.DataFrame({
    "Size (1000 sqft)": X["Size (1000 sqft)"],
    "Bedrooms": X["Bedrooms"],
    "Age (years)": X["Age (years)"],
    "Actual Price": y,
    "Predicted Price": y_pred
})

comparison_df

Unnamed: 0,Size (1000 sqft),Bedrooms,Age (years),Actual Price,Predicted Price
0,1.0,2.0,10.0,150,157.213886
1,1.5,3.0,5.0,200,213.483632
2,2.0,3.0,2.0,250,243.607839
3,2.5,4.0,1.0,300,293.126102
4,3.0,4.0,15.0,320,294.556506
5,3.5,5.0,20.0,330,333.947545
6,4.0,4.0,8.0,360,356.49279
7,4.5,5.0,12.0,380,397.571699


In [8]:
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

print("Mean Squared Error (MSE):", mse)
print("R² Score:", r2)

Mean Squared Error (MSE): 163.24729093772712
R² Score: 0.9708405620429402


In [9]:
X_new = pd.DataFrame({
    "Size (1000 sqft)": [2.0, 3.5],
    "Bedrooms": [3, 4],
    "Age (years)": [5, 10]
})

y_new_pred = model.predict(X_new)

new_results = X_new.copy()
new_results["Predicted Price (1000$)"] = y_new_pred

new_results

Unnamed: 0,Size (1000 sqft),Bedrooms,Age (years),Predicted Price (1000$)
0,2.0,3,5,238.544227
1,3.5,4,10,328.056454


In [10]:
x1 = X_new.iloc[0]["Size (1000 sqft)"]
x2 = X_new.iloc[0]["Bedrooms"]
x3 = X_new.iloc[0]["Age (years)"]

# Manual calculation using y = a + b1*x1 + b2*x2 + b3*x3
y_manual = a + b1 * x1 + b2 * x2 + b3 * x3

print("From model.predict():", y_new_pred[0])
print("From manual formula a + b1*x1 + b2*x2 + b3*x3:", y_manual)

From model.predict(): 238.5442267973178
From manual formula a + b1*x1 + b2*x2 + b3*x3: 238.54422679731778
