<a href="https://colab.research.google.com/github/baldevoli/git_test/blob/main/CMSC478_Homework2_part1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Consider a logistic regression model with the following hypothesis function that includes both linear and quadratic terms for the features:

$$
h_\theta(x) = \frac{1}{1 + e^{-(w_0 + w_1 x_1 + w_2 x_2^2 + w_3 x_3^3)}}
$$

where:

- \(x_1\) is the first feature
- \(x_2\) is the second feature
- \(w_0, w_1, w_2, w_3\) are the model parameters.



Log-Loss (Logistic Loss Function):
The log-loss or binary cross-entropy loss function measures the performance of the classification model. This loss function penalizes wrong predictions and is based on the predicted probabilities output by the logistic regression model.

$$
\mathcal loss(w) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(h_\theta(x_i)) + (1 - y_i) \log(1 - h_\theta(x_i)) \right]
$$

where:

- \(m\) is the number of training examples.
- \(y_i\) is the true label for the \(i\)-th training example, where \(y_i \in \{0, 1\}\).
- \(h_\theta(x_i)\) is the predicted probability for the \(i\)-th training example.


You are given the following Python code that implements logistic regression without regularization.

Part A: Run the code and print the loss

In [None]:
import numpy as np

# Hypothesis function for logistic regression
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

In [None]:

# Logistic regression loss function
def compute_loss(w, X, y):
    m = len(y)
    h = sigmoid(np.dot(X, w))
    loss = (-1/m) * np.sum(y * np.log(h) + (1 - y) * np.log(1 - h))
    return loss

# Example dataset
X = np.array([[1, 2, 9], [2, 3, 16], [3, 4, 25]])  # Features
y = np.array([0, 1, 0])  # Labels
w = np.array([0.1, 0.2, 0.3])  # Initial weights

# Calculate loss
loss = compute_loss(w, X, y)
print("Loss:", loss)


Loss: 3.9479428218126125


Compute loss function to include L2 regularization.

L2 Regularization (Ridge Regularization):
In order to reduce overfitting, L2 regularization adds a penalty term to the loss function. The penalty is proportional to the sum of the squares of the model weights. This discourages large weight values and simplifies the model.

Here, L2 regularization is represented as:

$$
\text{Regularization Term} = \frac{\lambda}{2m} \sum_{j=1}^{m} w_j^2
$$

where:

- \(\lambda) is the regularization parameter, controlling the strength of the penalty.
- \( m \) is the number of training examples.
- \( w_j \) is the weight for the \( j \)-th feature.

### Regularized Logistic Loss Function:

When you combine the logistic loss with the L2 regularization term, you get the regularized loss function:

$$
\mathcal{loss}_{\text{reg}}(w) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(h_\theta(x_i)) + (1 - y_i) \log(1 - h_\theta(x_i)) \right] + \frac{\lambda}{2m} \sum_{j=1}^{m} w_j^2
$$

In this equation:

- The first part is the standard logistic loss function, which penalizes incorrect predictions.
- The second part is the regularization term, which penalizes large values of the weights \( w \).


Part B: Now Modify the compute_loss function to include L2 regularization(you can use different regularization parameters to see the difference)

In [None]:
def compute_loss_with_l2(w, X, y, lambda_):
    m = len(y)
    h = sigmoid(np.dot(X, w))
    loss = (-1/m) * np.sum(y * np.log(h) + (1 - y) * np.log(1 - h))

    #YOUR CODE HERE

    regularization_term = (lambda_ / (2 * m)) * np.sum(w[1:] ** 2) # regularization term

    loss += regularization_term  # added the regularization term on the loss function

    return loss

# Regularization parameter
lambda_ = 0.1  # You can adjust this value

# Calculate regularized loss
regularized_loss = compute_loss_with_l2(w, X, y, lambda_)
print("Loss with L2 regularization:", regularized_loss)


Loss with L2 regularization: 3.950109488479279


Part C: Compute Gradient function with L2 regularization, explain how the regularization affects the weights 𝑤

### Gradient Descent with L2 Regularization in Logistic Regression

To calculate the updated weights in logistic regression with L2 regularization, follow these steps:

#### 1. **Gradient of the logistic loss function without regularization:**

The gradient of the logistic loss function with respect to the weights \( w_j \) (for \( j = 0, 1, 2, \dots \)) is:

$$
\frac{\partial}{\partial w_j} loss(w) = \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x_i) - y_i \right) x_{ij}
$$

where \( h_\theta(x_i) \) is the predicted probability for the \( i \)-th example, and \( x_{ij} \) is the \( j \)-th feature of the \( i \)-th example.

#### 2. **L2 regularization term:**

For L2 regularization, we add a penalty term proportional to the weights \( w_j \):

$$
\frac{\partial}{\partial w_j} loss_{reg}(w) = \frac{\partial}{\partial w_j} loss(w) + \frac{\lambda}{m} w_j
$$

#### 3. **Gradient Descent Update Rule:**

To update the weights, we perform gradient descent, updating each weight \( w_j \) as follows:

$$
w_j := w_j - \alpha \left( \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x_i) - y_i \right) x_{ij} + \frac{\lambda}{m} w_j \right)
$$

This update rule ensures that each weight is penalized by its current value, effectively shrinking the weights during the update.


In [None]:
# Gradient function with L2 regularization
def compute_gradient_with_l2(w, X, y, lambda_):
    m = len(y)
    h = sigmoid(np.dot(X, w))

    # Gradient of the loss function
    gradient = (1/m) * np.dot(X.T, (h - y))

    #YOUR CODE HERE
    gradient[1:] += (lambda_ / m) * w[1:]  # added the penalty term  to the gradient descent.

    return gradient


In [None]:
# Gradient Descent Update with L2 regularization
def gradient_descent(X, y, w, alpha, lambda_, num_iters):
    for i in range(num_iters):
        # Compute the gradient with L2 regularization
        grad = compute_gradient_with_l2(w, X, y, lambda_)
        # Update the weights
        w -= alpha * grad
    return w

In [None]:
alpha = 0.01  # Learning rate
lambda_ = 0.1  # Regularization parameter # You can adjust this value
num_iters = 1000  # Number of iterations. # You can adjust this value

# Perform gradient descent to update weights
updated_w = gradient_descent(X, y, w, alpha, lambda_, num_iters)
print("Updated weights after regularization:", updated_w)


Updated weights after regularization: [ 0.33101155  0.2095138  -0.11602928]


### Explain explain how the regularization affects the weights 𝑤

L2 Regularization helps to prevent overfitting by adding a penalty to large weight value . We also modified the loss function to use regularization which makes the weight small. Gradient descent  function have
$$
 \frac{\lambda}{m} w_j  
$$
which penalty the large weight force it to shrink.Which increase the bias and decrease the variance which helps to prevent the overfitting.So it helps to make a balance between the bias and variance.So regularization can help to prevent overfitting by controlling model complexity and improving its generalization.

