# CYBR 486 - Lab #4: Polynomial Regression, Ridge Regression, and Regularization

This notebook demonstrates the process of building and evaluating regression models, including:
1. Linear Regression
2. Polynomial Regression (degrees 2, 4, and 6)
3. Ridge Regression

The performance of these models is compared using Root Mean Square Error (RMSE) and R² score.


## 1. Import Required Libraries


In [8]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression, RidgeCV
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score


## 2. Load and Split the Housing Dataset


In [11]:
# Load the dataset
df = pd.read_csv("BostonHousing.csv") 
# Inspect the dataset structure
print("Dataset Preview:")
print(df.head())

# Separate features (X) and target variable (y)
X = df.drop(columns=["medv"])  
y = df["medv"]

# Split the dataset into 80% training and 20% testing subsets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("\nDataset successfully loaded and split.")
print(f"Training set size: {X_train.shape[0]} samples")
print(f"Testing set size: {X_test.shape[0]} samples")


Dataset Preview:
      crim    zn  indus  chas    nox     rm   age     dis  rad  tax  ptratio  \
0  0.00632  18.0   2.31     0  0.538  6.575  65.2  4.0900    1  296     15.3   
1  0.02731   0.0   7.07     0  0.469  6.421  78.9  4.9671    2  242     17.8   
2  0.02729   0.0   7.07     0  0.469  7.185  61.1  4.9671    2  242     17.8   
3  0.03237   0.0   2.18     0  0.458  6.998  45.8  6.0622    3  222     18.7   
4  0.06905   0.0   2.18     0  0.458  7.147  54.2  6.0622    3  222     18.7   

        b  lstat  medv  
0  396.90   4.98  24.0  
1  396.90   9.14  21.6  
2  392.83   4.03  34.7  
3  394.63   2.94  33.4  
4  396.90   5.33  36.2  

Dataset successfully loaded and split.
Training set size: 404 samples
Testing set size: 102 samples


## 3. Train and Evaluate the Linear Regression Model


In [13]:
# Train the Linear Regression model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

# Predict and evaluate the Linear Regression model
linear_predictions = linear_model.predict(X_test)
linear_rmse = np.sqrt(mean_squared_error(y_test, linear_predictions))
linear_r2 = r2_score(y_test, linear_predictions)

print("Linear Regression Results:")
print(f"Root Mean Square Error (RMSE): {linear_rmse:.2f}")
print(f"R2 Score: {linear_r2:.2f}")


Linear Regression Results:
Root Mean Square Error (RMSE): 4.93
R2 Score: 0.67


## 4. Train and Evaluate the Polynomial Regression Model


In [15]:
# Function to train and evaluate Polynomial Regression with varying degrees
def evaluate_polynomial_model(degree):
    # Transform features to polynomial
    poly_transformer = PolynomialFeatures(degree=degree)
    poly_X_train = poly_transformer.fit_transform(X_train)
    poly_X_test = poly_transformer.transform(X_test)

    # Train a new Linear Regression model on transformed data
    poly_model = LinearRegression()
    poly_model.fit(poly_X_train, y_train)

    # Predictions and evaluations
    poly_predictions = poly_model.predict(poly_X_test)
    poly_rmse = np.sqrt(mean_squared_error(y_test, poly_predictions))
    poly_r2 = r2_score(y_test, poly_predictions)

    return poly_rmse, poly_r2

# Evaluate Polynomial Regression for degrees 2, 4, and 6
for degree in [2, 4, 6]:
    poly_rmse, poly_r2 = evaluate_polynomial_model(degree)
    print(f"Polynomial Regression (Degree {degree}) Results:")
    print(f"Root Mean Square Error (RMSE): {poly_rmse:.2f}")
    print(f"R² Score: {poly_r2:.2f}")


Polynomial Regression (Degree 2) Results:
Root Mean Square Error (RMSE): 3.77
R² Score: 0.81
Polynomial Regression (Degree 4) Results:
Root Mean Square Error (RMSE): 73.60
R² Score: -72.88
Polynomial Regression (Degree 6) Results:
Root Mean Square Error (RMSE): 178.72
R² Score: -434.54


## 5. Train and Evaluate the Ridge Regression Model


In [16]:
# Ridge Regression Model with Cross-Validation
alpha_values = [0.001, 0.01, 0.1, 1, 10]
ridge_model = RidgeCV(alphas=alpha_values)
ridge_model.fit(X_train, y_train)

ridge_predictions = ridge_model.predict(X_test)
ridge_rmse = np.sqrt(mean_squared_error(y_test, ridge_predictions))
ridge_r2 = r2_score(y_test, ridge_predictions)

print("Ridge Regression Results:")
print(f"Root Mean Square Error (RMSE): {ridge_rmse:.2f}")
print(f"R2 Score: {ridge_r2:.2f}")


Ridge Regression Results:
Root Mean Square Error (RMSE): 4.93
R2 Score: 0.67


## 6. Compare Model Performances and Observations


In [18]:
print("Model Comparison:")
print(f"Linear Regression: RMSE = {linear_rmse:.2f}, R2 = {linear_r2:.2f}")
print(f"Polynomial Regression (Degree {degrees}): RMSE = {poly_rmse:.2f}, R2 = {poly_r2:.2f}")
print(f"Ridge Regression: RMSE = {ridge_rmse:.2f}, R2 = {ridge_r2:.2f}")

# Observations
print("""
- Linear Regression captures simple relationships but struggles with complex patterns.
- Polynomial Regression improves accuracy but risks overfitting at higher degrees.
- Ridge Regression balances complexity and generalization effectively.
""")


Model Comparison:
Linear Regression: RMSE = 4.93, R2 = 0.67


NameError: name 'degrees' is not defined