Concepts and Technologies of AI.
Linear Models, Regularization and Cross-Validation

Import Required Libraries

In [None]:
# Numerical computation
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt

# Dataset loading
from sklearn.datasets import fetch_openml, load_breast_cancer

# Model selection and evaluation
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, accuracy_score

# Models
from sklearn.linear_model import LinearRegression, Ridge, Lasso, LogisticRegression

# Feature scaling
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline


PART 1: REGRESSION (CALIFORNIA HOUSING)

Load Dataset (OpenML Workaround)

---



In [None]:
# Load California Housing dataset from OpenML
X, y = fetch_openml(
    name="california_housing",
    version=1,
    as_frame=True,
    return_X_y=True
)

# Combine features and target into a single DataFrame
df = X.copy()
df["MedHouseVal"] = y

# Create average-based features (feature engineering)
df["AveRooms"]  = df["total_rooms"] / df["households"]
df["AveBedrms"] = df["total_bedrooms"] / df["households"]
df["AveOccup"]  = df["population"] / df["households"]

# Select only relevant columns
df = df[
    [
        "median_income",
        "housing_median_age",
        "AveRooms",
        "AveBedrms",
        "population",
        "AveOccup",
        "latitude",
        "longitude",
        "MedHouseVal"
    ]
]

# Remove missing values
df.dropna(inplace=True)

# Split features and target
X = df.drop("MedHouseVal", axis=1)
y = df["MedHouseVal"]

# Train-test split (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


Step 1: Baseline Linear Regression (No Regularization)

Purpose :

Establish a baseline

Observe coefficients

Measure overfitting risk

In [31]:
# Create Linear Regression model (no regularization)
linear_model = LinearRegression()

# Train model on training data
linear_model.fit(X_train, y_train)

# Predict on training and test data
y_train_pred = linear_model.predict(X_train)
y_test_pred = linear_model.predict(X_test)

# Compute Mean Squared Error
train_mse = mean_squared_error(y_train, y_train_pred)
test_mse = mean_squared_error(y_test, y_test_pred)

print("Baseline Linear Regression")
print("Training MSE:", train_mse)
print("Test MSE:", test_mse)

# Display model coefficients
print("Model Coefficients:")
print(linear_model.coef_)


Baseline Linear Regression
Training MSE: 0.051601906634910176
Test MSE: 0.0641088624702943
Model Coefficients:
[ 1.97130218e-01 -2.79472278e-03 -2.27758664e-02 -3.28622398e-04
  4.11490191e-01  5.00171192e+00 -1.00587030e+00 -4.91570446e+00
  3.38393701e-01 -5.81425644e+00 -4.32261922e-01  1.26325368e-02
  8.24736376e-03  1.24507529e-03 -1.80785086e+01  2.20798677e+00
  4.27375913e+00 -1.81589526e+01  1.19449435e+00  3.01203668e+00
 -2.14438989e-01 -9.61718848e-03  8.71176397e-03  9.61253395e-04
 -1.32384962e-01 -7.62670138e-01 -6.15742798e-01  1.32619828e+00
 -1.02113249e+00 -1.27363832e+00]


Explanation (Baseline Model) :

Linear Regression minimizes residual sum of squares

No penalty → high variance

Large coefficients → risk of overfitting

Serves as reference point for Ridge and Lasso

Hyperparameter Tuning (Ridge & Lasso)

Why Hyperparameter Tuning?

Regularization strength (alpha) controls:

* Bias
* Variance

* Model complexity

Why Scaling?

Regularization is scale-sensitive, so we use StandardScaler.

Step 2: Ridge Regression (L2) with GridSearchCV

In [32]:
# Pipeline: scaling + Ridge regression
ridge_pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("ridge", Ridge())
])

# Hyperparameter grid
ridge_params = {
    "ridge__alpha": [0.01, 0.1, 1, 10, 100]
}

# Grid search with 5-fold cross-validation
ridge_grid = GridSearchCV(
    ridge_pipeline,
    ridge_params,
    cv=5,
    scoring="neg_mean_squared_error"
)

# Train GridSearch
ridge_grid.fit(X_train, y_train)

print("Best Ridge Alpha:", ridge_grid.best_params_)


Best Ridge Alpha: {'ridge__alpha': 1}


Step 2: Lasso Regression (L1) with GridSearchCV

In [33]:
# Pipeline: scaling + Lasso regression
lasso_pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("lasso", Lasso(max_iter=10000))
])

# Hyperparameter grid
lasso_params = {
    "lasso__alpha": [0.001, 0.01, 0.1, 1, 10]
}

# Grid search
lasso_grid = GridSearchCV(
    lasso_pipeline,
    lasso_params,
    cv=5,
    scoring="neg_mean_squared_error"
)

lasso_grid.fit(X_train, y_train)

print("Best Lasso Alpha:", lasso_grid.best_params_)


Best Lasso Alpha: {'lasso__alpha': 0.001}


Step 3: Ridge (L2) vs Lasso (L1) Evaluation

In [36]:
# Best models
best_ridge = ridge_grid.best_estimator_
best_lasso = lasso_grid.best_estimator_

# Predictions
ridge_pred = best_ridge.predict(X_test)
lasso_pred = best_lasso.predict(X_test)

# MSE comparison
print("Ridge Test MSE:", mean_squared_error(y_test, ridge_pred))
print("Lasso Test MSE:", mean_squared_error(y_test, lasso_pred))

# Coefficient sparsity
ridge_coef = best_ridge.named_steps["ridge"].coef_
lasso_coef = best_lasso.named_steps["lasso"].coef_

print("Non-zero Ridge coefficients:", np.sum(ridge_coef != 0))
print("Non-zero Lasso coefficients:", np.sum(lasso_coef != 0))


Ridge Test MSE: 0.062053145746705035
Lasso Test MSE: 0.05837456845081737
Non-zero Ridge coefficients: 30
Non-zero Lasso coefficients: 23


Theory: Bias–Variance Tradeoff (Regression)



---

Method--------------------------Effect

Ridge (L2)	----------------------->Shrinks coefficients, reduces variance

Lasso (L1)	----------------------->Shrinks + sets some coefficients to zero

Too much regularization	--------->High bias, underfitting

No regularization	---------------->Low bias, high variance



---



PART 2: CLASSIFICATION (BREAST CANCER)

Task 1: Load and Split Dataset

In [None]:
# Load Breast Cancer dataset
X, y = load_breast_cancer(return_X_y=True)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("Training samples:", X_train.shape)
print("Testing samples:", X_test.shape)


Training samples: (455, 30)
Testing samples: (114, 30)


Step 1: Baseline Logistic Regression

In [37]:
# Logistic Regression (default = L2 regularization)
log_reg = LogisticRegression(max_iter=10000)
log_reg.fit(X_train, y_train)

# Predictions
y_train_pred = log_reg.predict(X_train)
y_test_pred = log_reg.predict(X_test)

# Accuracy
print("Baseline Logistic Regression")
print("Train Accuracy:", accuracy_score(y_train, y_train_pred))
print("Test Accuracy:", accuracy_score(y_test, y_test_pred))

# Coefficients
print("Model Coefficients:")
print(log_reg.coef_)


Baseline Logistic Regression
Train Accuracy: 0.9582417582417583
Test Accuracy: 0.956140350877193
Model Coefficients:
[[ 1.0274368   0.22145051 -0.36213488  0.0254667  -0.15623532 -0.23771256
  -0.53255786 -0.28369224 -0.22668189 -0.03649446 -0.09710208  1.3705667
  -0.18140942 -0.08719575 -0.02245523  0.04736092 -0.04294784 -0.03240188
  -0.03473732  0.01160522  0.11165329 -0.50887722 -0.01555395 -0.016857
  -0.30773117 -0.77270908 -1.42859535 -0.51092923 -0.74689363 -0.10094404]]


Baseline Logistic Regression :

Uses sigmoid function

Outputs probability

Default L2 regularization

Prevents exploding weights

Step 2: Logistic Regression with L1/L2 & GridSearchCV (Hyperparameter Tuning (C & Penalty))

In [38]:
# Logistic regression pipeline
log_pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("logreg", LogisticRegression(
        solver="liblinear",
        max_iter=10000
    ))
])

# Hyperparameter grid
log_params = {
    "logreg__penalty": ["l1", "l2"],
    "logreg__C": [0.01, 0.1, 1, 10, 100]
}

# Grid search
log_grid = GridSearchCV(
    log_pipe,
    log_params,
    cv=5,
    scoring="accuracy"
)

log_grid.fit(X_train, y_train)

print("Best Logistic Params:", log_grid.best_params_)


Best Logistic Params: {'logreg__C': 0.1, 'logreg__penalty': 'l2'}


Step 3: L1 vs L2 Final Evaluation

In [39]:
# Best logistic model
best_log = log_grid.best_estimator_

# Test accuracy
y_test_pred = best_log.predict(X_test)
print("Optimized Test Accuracy:", accuracy_score(y_test, y_test_pred))

# Coefficient sparsity
coef = best_log.named_steps["logreg"].coef_
print("Non-zero coefficients:", np.sum(coef != 0))


Optimized Test Accuracy: 0.9912280701754386
Non-zero coefficients: 30


Theory: Bias–Variance Tradeoff (Classification)

1. L1 Regularization

* Feature selection

* Sparse model

* Higher bias, lower variance

2. L2 Regularization

* Keeps all features

* Stable predictions

* Balanced bias-variance

3. C parameter

* Small C → strong regularization

* Large C → weak regularization




Conclusion:

* Baseline models overfit

* Regularization improves generalization

* Ridge is better when many features matter

* Lasso is useful for feature selection

* Cross-validation ensures robust hyperparameter choice