`Regression` is a supervised learning technique used to predict a continuous numerical value based on one or more input features. It models the relationship between the input variables and a continuous output.

Types of Regression Algorithms

1. **Linear Regression**
2. **Ridge Regression**
3. **Lasso Regression**
4. **Polynomial Regression**

Dataset

We will use the Boston Housing dataset from `scikit-learn`. This dataset contains information about housing prices in Boston, which we will use to predict the  value of homes.


In [5]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns

In [6]:
boston = load_boston()

In [7]:
boston.data.shape

(506, 13)

In [8]:
X = boston.data
y = boston.target

In [9]:
feature_names = boston.feature_names

In [11]:
df = pd.DataFrame(data=X, columns=feature_names)
df['PRICE'] = y

In [12]:
df.head(3)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,PRICE
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7


## Linear Regression
Linear Regression estimates the relationship between the dependent variable and one or more independent variables using a linear approach.

In [13]:
from sklearn.linear_model import LinearRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# initializing and training the model
model = LinearRegression()
model.fit(X_train, y_train)

# making predictions
y_pred = model.predict(X_test)

# evaluating the model
print("Linear Regression")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))


Linear Regression
Mean Squared Error: 27.195965766883127
R^2 Score: 0.6733825506400205


Mean Squared Error (MSE): 27.20

Meaning: On average, the squared difference between the predicted and actual housing prices is 27.20 (in thousands of dollars squared). The  prediction error, when converted back to the price units, is about $5,220.


R² Score: 0.673

Meaning:  Model explains 67.3% of the variation in housing prices. This indicates that the model is fairly good at predicting prices, but there’s still some room for improvement.

## Ridge Regression
Ridge Regression includes a regularization term (L2 norm) that helps to prevent overfitting by shrinking the coefficients of less important features.

In [14]:
from sklearn.linear_model import Ridge

model = Ridge(alpha=1.0)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Ridge Regression")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))


Ridge Regression
Mean Squared Error: 27.762224592166543
R^2 Score: 0.6665819091486687


## Lasso Regression
Lasso Regression uses L1 regularization, which can reduce some coefficients to zero, effectively performing feature selection.

In [15]:
from sklearn.linear_model import Lasso

model = Lasso(alpha=0.1)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Lasso Regression")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))


Lasso Regression
Mean Squared Error: 28.87575946788067
R^2 Score: 0.6532086050344971


## Polynomial Regression
Polynomial Regression extends linear regression by adding polynomial terms, allowing it to model non-linear relationships.

In [16]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X_train)

model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
model.fit(X_poly, y_train)

X_test_poly = poly.transform(X_test)
y_pred = model.predict(X_test_poly)

print("Polynomial Regression")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))


Polynomial Regression
Mean Squared Error: 363647.7459760244
R^2 Score: -4366.327870401539


Mean Squared Error (MSE): 363,647.75

Meaning: The average squared error in predicting housing prices is very high, indicating poor model performance.

R² Score: -4366.33

Meaning: The model performs much worse than just predicting the average price, suggesting severe overfitting or inappropriate complexity.