Q1. (Based on Step-by-Step Implementation of Ridge Regression using Gradient
Descent Optimization)
Generate a dataset with atleast seven highly correlated columns and a target variable.Implement Ridge Regression using Gradient Descent Optimization. Take different values of learning rate (such as 0.0001,0.001,0.01,0.1,1,10) and regularization parameter (10-15,10-10,10-5 ,10-3 ,0,1,10,20). Choose the best parameters for which ridge regression cost function is minimum and R2_score is maximum. 

In [13]:
import numpy as np
from sklearn.metrics import r2_score
from sklearn.preprocessing import StandardScaler

x = np.random.rand(20, 7)
y = 3 * x[:, 0] + 2 * x[:, 1] + np.random.randn(20) * 0.1 

# Add bias column (for intercept)
X = np.c_[np.ones(x.shape[0]), x]

scaler = StandardScaler()
x[:, 1:] = scaler.fit_transform(x[:, 1:])

# Ridge Regression using Gradient Descent
def ridge_regression(x, y, lr, lam, epochs=1000):
    m, n = x.shape
    w = np.zeros(n) #model weights
    for _ in range(epochs):
        y_pred = x.dot(w)
        grad = (-2/m) * x.T.dot(y - y_pred) + 2 * lam * w
        grad[0] -= 2 * lam * w[0]   # no regularization for bias term
        w -= lr * grad
        
        if np.any(np.isnan(w)) or np.any(np.isinf(w)):
            return np.zeros(n), np.inf
    cost = np.mean((y - x.dot(w))**2) + lam * np.sum(w**2)
    return w, cost

lrs = [0.0001, 0.001, 0.01, 0.1, 1, 10]
lams = [1e-15, 1e-10, 1e-5, 1e-3, 0, 1, 10, 20]
best = (None, float('inf'), -1)

for lr in lrs:
    for lam in lams:
        w, cost = ridge_regression(x, y, lr, lam)
        r2 = r2_score(y, x.dot(w))
        if cost < best[1] and r2 > best[2]:
            best = (w, cost, r2)

print("Best weights:", best[0])
print("Min Cost:", best[1])
print("Max R2 Score:", best[2])


Best weights: [ 4.36362647  0.72743825 -0.15426604 -0.19621603  0.14231416  0.06998212
 -0.10744382]
Min Cost: 0.0804300886414354
Max R2 Score: 0.9113768066479556


  grad = (-2/m) * x.T.dot(y - y_pred) + 2 * lam * w
  grad = (-2/m) * x.T.dot(y - y_pred) + 2 * lam * w
  grad = (-2/m) * x.T.dot(y - y_pred) + 2 * lam * w
  w -= lr * grad
  w -= lr * grad
  grad = (-2/m) * x.T.dot(y - y_pred) + 2 * lam * w
  grad = (-2/m) * x.T.dot(y - y_pred) + 2 * lam * w
  w -= lr * grad


Q2. Load the Hitters dataset from the following link
https://drive.google.com/file/d/1qzCKF6JKKMB0p7ul_lLy8tdmRk3vE_bG/view?usp=sharing
(a) Pre-process the data (null values, noise, categorical to numerical encoding)
(b) Separate input and output features and perform scaling
(c) Fit a Linear, Ridge (use regularization parameter as 0.5748), and LASSO (use
regularization parameter as 0.5748) regression function on the dataset.
(d) Evaluate the performance of each trained model on test set. Which model
performs the best and Why? 

Unable to download dataset

Q3. Cross Validation for Ridge and Lasso Regression
Explore Ridge Cross Validation (RidgeCV) and Lasso Cross Validation (LassoCV)
function of Python. Implement both on Boston House Prediction Dataset (load_boston
dataset from sklearn.datasets). 

Using California Housing dataset as boston not available

In [15]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import RidgeCV, LassoCV
from sklearn.metrics import r2_score
from sklearn.preprocessing import StandardScaler

data = fetch_california_housing()
x, y = data.data, data.target

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

ridge = RidgeCV(alphas=[0.1, 1, 10]).fit(x_train, y_train)
lasso = LassoCV(alphas=[0.1, 1, 10]).fit(x_train, y_train)

print("Ridge best alpha:", ridge.alpha_)
print("Ridge R2:", r2_score(y_test, ridge.predict(x_test)))
print("Lasso best alpha:", lasso.alpha_)
print("Lasso R2:", r2_score(y_test, lasso.predict(x_test)))


Ridge best alpha: 1.0
Ridge R2: 0.5758157428925871
Lasso best alpha: 0.1
Lasso R2: 0.4813611325029077


Q4. Multiclass Logistic Regression: Implement Multiclass Logistic Regression (step-by step)
on Iris dataset using one vs. rest strategy?

In [17]:
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np

x, y = load_iris(return_X_y=True)
x = StandardScaler().fit_transform(x)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

#logistic regression for one-vs-rest
def train_lr(x, y, lr=0.1, epochs=1000):
    m, n = x.shape
    w = np.zeros(n)
    for _ in range(epochs):
        z = x.dot(w)
        h = sigmoid(z)
        grad = (1/m) * x.T.dot(h - y)
        w -= lr * grad
    return w

weights = []
for cls in np.unique(y):
    y_bin = (y_train == cls).astype(int)
    weights.append(train_lr(x_train, y_bin))

def predict(x):
    probs = [sigmoid(x.dot(w)) for w in weights]
    return np.argmax(probs, axis=0)

y_pred = predict(x_test)
acc = np.mean(y_pred == y_test)
print("Accuracy:", acc)


Accuracy: 0.9
