<span style="font-family: 'JetBrains Mono', monospace; font-size:16px;">

# Problem Set: Cost Function Fun
In this problem, we understand how regularization (Ridge & Lasso) helps to prevent overfitting by shrinking model weights, and how changing the regularization strength affects training error, test error and the bias-variance trade-off. Using a synthetic daataset, we fit the Ridge & Lasso models with different regularization values, look at the effect on coefficient norms and use cross-validation to pick the best model and evaluate it. 

## Regularization Hyperparameter Questions
1. What is the difference between a model's parameters and a model's hyperparameters?
- **Parameter:** this is something the model learns from the data. For example, the weight coefficients in linear regression
- **Hyperparameter:** this is something we choose before training. For example, the alpha (learning rate) or the number of trees in a random forest.

2. As we incease (λ) from 0, how will the training MSE (mean-squared error) change? How will the test MSE change? Sketch the bias-variance trade-off?
As we increase the λ from 0, the training MSE increases because stronger regularization forces the model to fit the training data less perfectly. The testing MSE first decreases (less overfitting) and then increases (underfitting), forming a U-shaped curve.   

3. If training and validation errors are both high and almost equal, should you increase or decrease λ?
We should decrease λ because high training and high validation errors means the model is underfitting. Underfitting happens when the λ is too large and lowering λ gives the model more freedom to learn patterns. 

## The Different Kinds of Regression

</span>

In [None]:

""" 

    - generate a regression dataset
    - split data into 80% for training and 20% for testing


"""

# sklearn imports
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# other relevant imports
import numpy as np

X, y = make_regression(n_samples=200, n_features=50, n_informative=10, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=23)

# range of alpha values
alphas = np.arange(0, 100, 0.1)
coefficient_norms = []

for alpha in alphas:

    model = Pipeline([
        
        # scale the data and apply ridge
        ("scaler", StandardScaler()),
        ("ridge", Ridge(alpha=alpha))
    ])

    # fitting the model
    model.fit(X_train, y_train)

    # extracting coefficients
    coefficients = model.named_steps["ridge"].coef_

    # find L2 norm
    norm = np.linalg.norm(coefficients)
    coefficient_norms.append(norm)


