# Multiple Linear Regression: House Price Prediction

This notebook demonstrates how to implement **multiple linear regression** on a house dataset. We'll handle categorical features using **dummy variables**, fit the model, and evaluate it using R² and RMSE on both train and test data.

---

## Step 1: Import Libraries


In [10]:
# Load required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt


df = pd.read_csv("house.csv")

# Drop categorical columns as all values are the same
df = df.drop(columns=["Material", "Locality"])

# Split into features and target
X = df.drop(columns=["Price"])
y = df["Price"]

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

# Create polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Train the model
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict on train and test
y_train_pred = model.predict(X_train_poly)
y_test_pred = model.predict(X_test_poly)

# Evaluate model
train_rmse = np.sqrt(mean_squared_error(y_train, y_train_pred))
test_rmse = np.sqrt(mean_squared_error(y_test, y_test_pred))
train_r2 = r2_score(y_train, y_train_pred)
test_r2 = r2_score(y_test, y_test_pred)

# Prepare results
results = {
    "Train RMSE": train_rmse,
    "Test RMSE": test_rmse,
    "Train R2": train_r2,
    "Test R2": test_r2
}

results


{'Train RMSE': np.float64(19304.938712453604),
 'Test RMSE': np.float64(20768.254670610087),
 'Train R2': 0.4638738628190221,
 'Test R2': 0.4205031518199972}