# 🧠 Linear Regression from Scratch using NumPy

This project demonstrates how to build and evaluate a **Linear Regression model** using only **NumPy**, without any external ML libraries.

We use a realistic housing dataset (`housing.csv`) containing features like:
- Size (in sq. ft)
- Rooms
- Age of the house
- Garage (0 or 1)

We predict the house **Price** using a closed-form solution: **Normal Equation**.


In [20]:
import numpy as np


In [21]:
# Load CSV file (skip header)
data = np.genfromtxt("../data/housing.csv", delimiter=",", skip_header=1)

# Split into features (X) and target (y)
X = data[:, :-1]  # All columns except last
y = data[:, -1]   # Last column is the target


In [22]:
def normalize(X):
    mean = np.mean(X, axis=0)
    std = np.std(X, axis=0)
    return (X - mean) / std, mean, std

X_norm, mean, std = normalize(X)


In [23]:
def add_bias(X):
    return np.c_[np.ones(X.shape[0]), X]  # Add column of 1s

X_bias = add_bias(X_norm)


In [24]:
def normal_equation(X, y):
    return np.linalg.inv(X.T @ X) @ X.T @ y

theta = normal_equation(X_bias, y)
print("Model Coefficients (Theta):\n", theta)


Model Coefficients (Theta):
 [222541.0145      77357.14229981  22113.08821539  -3965.78708361
   6101.29576169]


In [25]:
y_pred = X_bias @ theta


In [26]:
def mean_squared_error(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

def r2_score(y_true, y_pred):
    ss_res = np.sum((y_true - y_pred) ** 2)
    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
    return 1 - (ss_res / ss_tot)

mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R² Score: {r2:.4f}")


Mean Squared Error: 95369413.20
R² Score: 0.9856


In [None]:
# Example new house: [Size, Rooms, Age, Garage]
new_house = np.array([[1800, 3, 10, 1]])

# Normalize with training mean and std
new_house_norm = (new_house - mean) / std

# Add bias term
new_house_bias = add_bias(new_house_norm)

# Predict
predicted_price = new_house_bias @ theta
print(f"Predicted Price for {new_house[0]}: ${predicted_price[0]:,.2f}")


Predicted Price for [1800    3   10    1]: $228,621.34


## ✅ Summary

- Built a linear regression model from scratch using NumPy.
- Trained on a realistic housing dataset with 100 entries.
- Achieved good performance using the closed-form Normal Equation.
- No scikit-learn or pandas used — pure NumPy!
