(activity7)=

# Activity 7: Polynomial Features and Ridge Regression

**2026-02-17**

# Imports and setup

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import PolynomialFeatures, StandardScaler

In [None]:
def rmse(y_hat: np.ndarray, y: np.ndarray) -> float:
    """Root mean squared error."""
    assert y_hat.shape == y.shape
    return np.sqrt(np.mean((y_hat - y)**2))

# Data loading

We'll use the same California housing dataset and 80/20 split from Activity 6:

In [None]:
# Load data and split (same as Activity 6)
housing_df = pd.read_csv("~/COMSC-335/data/housing_data.csv")

housing_train = housing_df[:4000]
housing_test = housing_df[4000:]

X_train = housing_train.drop(columns=["MedHouseVal"])
X_test = housing_test.drop(columns=["MedHouseVal"])

# apply standardization to all features, helps with numerical stability
# we'll talk more preprocessing in future lectures!
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

y_train = housing_train["MedHouseVal"].to_numpy()
y_test = housing_test["MedHouseVal"].to_numpy()

print(f"Training set: {X_train.shape[0]} examples, {X_train.shape[1]} features")
print(f"Test set: {X_test.shape[0]} examples, {X_test.shape[1]} features")

# Part 2

`PolynomialFeatures` is what is known as a "transformer" in scikit-learn (not the neural network kind). The standard workflow is:

- Initialize the transformer object with any hyperparameters
- Call `fit()` on the training data: if the transformer has parameters, it will learn them from the training data
- Call `transform()` on both the training and test data: use the learned parameters to transform the data

In [None]:
# generates a degree 2 polynomial feature from a single feature $x$
poly_features = PolynomialFeatures(degree=2)

# fit the transformer to the training data
poly_features.fit(X_train)

# transform both the training and test data
X_poly_train = poly_features.transform(X_train)
X_poly_test = poly_features.transform(X_test)

# Examine the shape of the transformed data
print(X_poly_train.shape)

We'll combine polynomial features with a ridge-regularized linear regression model: `Ridge(alpha=a)`.

The `Ridge` model operates exactly like all of our other scikit-learn models, with a `fit()` method and a `predict()` method.

We need to set the regularization hyperparameter `alpha` (called $\lambda$ in the lecture slides and in most other ML literature) to a positive value if we want to use regularization.

If we set `alpha=0`, we get back the unregularized linear regression model.

In [None]:
# initialize the Ridge model with alpha = 1.0
ridge_model = Ridge(alpha=1.0)

Let's see what the largest weights are with and without ridge regularization:

In [None]:
deg = 3

poly_features = PolynomialFeatures(degree=deg)

# fit the transformer to the training data
poly_features.fit(X_train)

X_poly_train = poly_features.transform(X_train)
X_poly_test = poly_features.transform(X_test)

print("Degree", deg, "polynomial features:", X_poly_train.shape[1])

ridge_model = Ridge(alpha=0)

ridge_model.fit(X_poly_train, y_train)

# we can access the fitted weights using the `coef_` attribute
print(ridge_model.coef_)

# np.abs() returns the absolute value of each element in the array
# np.max() returns the maximum value in the array   
np.max(np.abs(ridge_model.coef_))

By convention, we typically try out regularization hyperparameters in the range of powers of 10, e.g:

$$
[10^{-3}, 10^{-2}, 10^{-1}, 1, 10, 10^2, 10^3]
$$

There is a convenient [np.logspace()](https://numpy.org/doc/stable/reference/generated/numpy.logspace.html) function that generates values in this range:

In [None]:
alphas = np.logspace(-3, 3, 7)
print(alphas)

Below is starter code for training a ridge model and evaluating its train and test RMSE. Complete the code to:

- Transform the training and test data using `PolynomialFeatures`
- Iterate over the range of alphas

Experiment with higher values of the degree. Compare with folks around you what values of the regularization hyperparameter and degree give the best test RMSE, and submit your best test RMSE to PollEverywhere:

https://pollev.com/tliu

Discuss with folks around you: from what you're seeing, is a higher or lower `alpha` needed to get most of the benefits of regularization?

In [None]:
degree = 1

poly_features = None # TODO initialize the transformer

# TODO fit the transformer on the training data


# TODO transform both the training and test data
X_poly_train = None
X_poly_test = None

print("Degree", degree, "polynomial features:", X_poly_train.shape[1])

# TODO iterate over alphas here
alphas = []

for alpha in alphas:

    # TODO initialize the model with the current alpha
    ridge_model = None

    # TODO fit the model on the polynomial training data
    

    # TODO compute the train and test RMSE
    train_rmse = rmse(ridge_model.predict(X_poly_train), y_train)
    test_rmse = rmse(ridge_model.predict(X_poly_test), y_test)

    print(f'  alpha={alpha:.4f} | Train: {train_rmse:.4f} | Test: {test_rmse:.4f}')
