Lab Assignment 5
Machine Learning (UML501)

KRISH KHAJURIA(102317023)

Q
1
(Based on Step-by-Step Implementation of Ridge Regression using Gradient
Descent Optimization)
Generate a dataset with atleast seven highly correlated columns and a target variable.
Implement Ridge Regression using Gradient Descent Optimization. Take different
values of learning rate (such as 0.0001,0.001,0.01,0.1,1,10) and regularization
parameter (10-15,10-10,10-5,10- 3,0,1,10,20). Choose the best parameters for which ridge
regression cost function is minimum and R2_score is maximum.

In [7]:
import numpy as np, pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score

np.random.seed(5)
n = 1200
x = np.random.randn(n, 7)
true_w = np.array([2.4, -1.8, 3.1, 0.7, 0.0, 1.0, -2.0])
y = x @ true_w + np.random.normal(0, 0.5, n)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=4)
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

def ridge_gd(x, y, lr, lam, steps):
    n, p = x.shape
    x = np.c_[np.ones((n, 1)), x]
    w = np.zeros(p + 1)
    for _ in range(steps):
        y_hat = x @ w
        e = y_hat - y
        g = (x.T @ e) / n
        g[1:] += lam * w[1:]
        w -= lr * g
    return w

def ridge_cost(x, y, w, lam):
    n = x.shape[0]
    x = np.c_[np.ones((n, 1)), x]
    e = x @ w - y
    return 0.5 * np.mean(e ** 2) + 0.5 * lam * np.sum(w[1:] ** 2)

best = None
for lr in [1e-4, 1e-3, 1e-2, 0.1]:
    for lam in [0, 0.001, 0.01, 0.1, 1, 10]:
        w = ridge_gd(x_train, y_train, lr, lam, 3000)
        cost = ridge_cost(x_train, y_train, w, lam)
        y_pred = np.c_[np.ones((x_test.shape[0], 1)), x_test] @ w
        r2 = r2_score(y_test, y_pred)
        if best is None or r2 > best['r2']:
            best = dict(lr=lr, lam=lam, r2=r2, cost=cost)

print("Best:", best)


Best: {'lr': 0.1, 'lam': 0.001, 'r2': 0.9890421683608686, 'cost': np.float64(0.13372750464682248)}


Q
2
Load the Hitters dataset from the following link
https://drive.google.com/file/d/1qzCKF6JKKMB0p7ul_lLy8tdmRk3vE_bG/view?usp=sharing
Pre-process the data (null values, noise, categorical to numerical encoding)
Separate input and output features and perform scaling
Fit a Linear, Ridge (use regularization parameter as 0.5748), and LASSO (use
regularization parameter as 0.5748) regression function on the dataset.
Evaluate the performance of each trained model on test set. Which model
performs the best and Why?

In [8]:
import pandas as pd, numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import r2_score, mean_squared_error



df = pd.read_csv('/content/Hitters.csv')
df = df.dropna()
y = df.select_dtypes(include=[np.number]).iloc[:, -1]
x = df.drop(columns=[y.name])
x = pd.get_dummies(x, drop_first=True)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=11)
sc = StandardScaler(with_mean=False)
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

lin = LinearRegression().fit(x_train, y_train)
rig = Ridge(alpha=0.5748).fit(x_train, y_train)
las = Lasso(alpha=0.5748, max_iter=20000).fit(x_train, y_train)

models = {
    "Linear": lin,
    "Ridge(0.5748)": rig,
    "Lasso(0.5748)": las
}

for name, m in models.items():
    p = m.predict(x_test)
    print(name, "R2:", round(r2_score(y_test, p), 4), "MSE:", round(mean_squared_error(y_test, p), 4))


Linear R2: 0.1487 MSE: 184380.2415
Ridge(0.5748) R2: 0.2158 MSE: 169859.6296
Lasso(0.5748) R2: 0.2091 MSE: 171309.4014


Q
3
Cross Validation for Ridge and Lasso Regression
Explore Ridge Cross Validation (RidgeCV) and Lasso Cross Validation (LassoCV)
function of Python. Implement both on Boston House Prediction Dataset (load_boston
dataset from sklearn.datasets).

In [9]:
import numpy as np, pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import RidgeCV, LassoCV
from sklearn.metrics import r2_score, mean_squared_error

data = fetch_california_housing()
x = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=5)
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

rid = RidgeCV(alphas=np.logspace(-4, 3, 30), cv=5).fit(x_train, y_train)
las = LassoCV(alphas=np.logspace(-4, 1, 30), cv=5, max_iter=30000).fit(x_train, y_train)

pred_r = rid.predict(x_test)
pred_l = las.predict(x_test)

print("RidgeCV α:", rid.alpha_, "R2:", round(r2_score(y_test, pred_r), 4), "MSE:", round(mean_squared_error(y_test, pred_r), 4))
print("LassoCV α:", las.alpha_, "R2:", round(r2_score(y_test, pred_l), 4), "MSE:", round(mean_squared_error(y_test, pred_l), 4))


RidgeCV α: 35.622478902624444 R2: 0.6101 MSE: 0.5338
LassoCV α: 0.003562247890262444 R2: 0.6102 MSE: 0.5336


Q
4
Multiclass Logistic Regression: Implement Multiclass Logistic Regression (step-by step)
on Iris dataset using one vs. rest strategy?

In [10]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

d = load_iris()
x = d.data
y = d.target
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=4, stratify=y)
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

def sig(z): return 1 / (1 + np.exp(-z))

def gd(x, y, lr=0.1, lam=0.0, steps=4000):
    n, p = x.shape
    x = np.c_[np.ones((n, 1)), x]
    w = np.zeros(p + 1)
    for _ in range(steps):
        z = x @ w
        p1 = sig(z)
        g = (x.T @ (p1 - y)) / n
        g[1:] += lam * w[1:]
        w -= lr * g
    return w

classes = np.unique(y_train)
w_all = []
for c in classes:
    y_bin = (y_train == c).astype(int)
    w_all.append(gd(x_train, y_bin, lr=0.2, steps=5000))
W = np.vstack(w_all)

def predict(x, W):
    x = np.c_[np.ones((x.shape[0], 1)), x]
    return np.argmax(sig(x @ W.T), axis=1)

pred = predict(x_test, W)
print("Accuracy:", round(accuracy_score(y_test, pred), 4))


Accuracy: 0.9474
