# Implement Lasso Regression using Gradient Descent

In this problem, you need to implement the Lasso Regression algorithm using Gradient Descent. Lasso Regression (L1 Regularization) adds a penalty equal to the absolute value of the coefficients to the loss function. Your task is to update the weights and bias iteratively using the gradient of the loss function and the L1 penalty.

The objective function of Lasso Regression is:

$$J(w, b) = \frac{1}{n} \sum_{i=1}^{2n} (y_i - \hat{y}_i)^2 + \alpha \sum_{j=1}^{p} |w_j|$$ 
 
Where:

- $y_i$ is the actual value for the $i$-th sample
- $\hat{y}_i = \sum_{j=1}^{p} X_{ij} w_j + b$ is the predicted value for the $i$-th sample
- $w_j$ is the weight associated with the $j$-th feature
- $\alpha$ is the regularization parameter
- $b$ is the bias
 
Your task is to use the L1 penalty to shrink some of the feature coefficients to zero during gradient descent, thereby helping with feature selection.

Example
```python
import numpy as np

X = np.array([[0, 0], [1, 1], [2, 2]])
y = np.array([0, 1, 2])

alpha = 0.1
weights, bias = l1_regularization_gradient_descent(X, y, alpha=alpha, learning_rate=0.01, max_iter=1000)

# Expected Output:
(weights,bias)
(array([float, float]), float)
```
  
## Understanding Lasso Regression and L1 Regularization

Lasso Regression is a type of linear regression that applies L1 regularization to the model. It adds a penalty equal to the sum of the absolute values of the coefficients, encouraging some of them to be exactly zero. This makes Lasso Regression particularly useful for feature selection, as it can shrink the coefficients of less important features to zero, effectively removing them from the model.

## Steps to Implement Lasso Regression using Gradient Descent

- Initialize Weights and Bias: Start with the weights and bias set to zero.
- Make Predictions: Use the formula:

$$\hat{y}_i = \sum_{j=1}^{p} X_{ij} w_j + b$$

where $\hat{y}_i$ is the predicted value for the $i$-th sample.

- Compute Residuals: Find the difference between the actual values $y_i$ and the predicted values $\hat{y}_i$. These residuals are the errors in the model.
- Update the Weights and Bias: Update the weights and bias using the gradient of the loss function with respect to the weights and bias:

    - For weights $w_j$:
    
    $$\frac{\partial J}{\partial w_j} = \frac{1}{n} \sum_{i=1}^{n} X_{ij} (y_i - \hat{y}_i) + \alpha \cdot \text{sign}(w_j)$$
 
    - For bias $b$ (without the regularization term):

    $$\frac{\partial J}{\partial b} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)$$
    
    - Update the weights and bias:
 
    $$w_j = w_j - \eta \cdot \frac{\partial J}{\partial w_j}$$

    $$b = b - \eta \cdot \frac{\partial J}{\partial b}$$
 
- Check for Convergence: The algorithm stops when the L1 norm of the gradient with respect to the weights becomes smaller than a predefined threshold $tol$:

$$\| \nabla_w J \|_1 = \sum_{j=1}^{p} |\frac{\partial J}{\partial w_j}| < tol$$ 
 
Return the Weights and Bias: Once the algorithm converges, return the optimized weights and bias.

In [1]:
import numpy as np

def l1_regularization_gradient_descent(X: np.array, y: np.array, alpha: float = 0.1, learning_rate: float = 0.01, max_iter: int = 1000, tol: float = 1e-4) -> tuple:
    n_samples, n_features = X.shape
    # Zero out weights and bias
    weights = np.zeros(n_features)
    bias = 0
    
    for iteration in range(max_iter):
        # Predict values
        y_pred = np.dot(X, weights) + bias
        # Calculate error
        error = y_pred - y
        # Gradient for weights with L1 penalty
        grad_w = (1 / n_samples) * np.dot(X.T, error) + alpha * np.sign(weights)
        # Gradient for bias (no penalty for bias)
        grad_b = (1 / n_samples) * np.sum(error)
        
        # Update weights and bias
        weights -= learning_rate * grad_w
        bias -= learning_rate * grad_b
        
        # Check for convergence
        if np.linalg.norm(grad_w, ord=1) < tol:
            break
    
    return weights, bias


In [3]:
import numpy as np
X = np.array([[0, 0], [1, 1], [2, 2]])
y = np.array([0, 1, 2])
alpha = 0.1
output = l1_regularization_gradient_descent(X, y, alpha=alpha, learning_rate=0.01, max_iter=1000)
print('Test Case 1: Accepted') if np.allclose(output[0], np.array([0.42371644, 0.42371644])) and np.isclose(output[1], 0.15385068459377865) else print('Test Case 1: Failed')
print('Input:')
print('import numpy as np\nX = np.array([[0, 0], [1, 1], [2, 2]])\ny = np.array([0, 1, 2])\nalpha = 0.1\noutput = l1_regularization_gradient_descent(X, y, alpha=alpha, learning_rate=0.01, max_iter=1000)\nprint(output)')
print()
print('Output:')
print(output)
print()
print('Expected:')
print('(array([0.42371644, 0.42371644]), 0.15385068459377865)')
print()
print()

import numpy as np
X = np.array([[0, 1], [1, 2], [2, 3], [3, 4], [4, 5]])
y = np.array([1, 2, 3, 4, 5])
alpha = 0.1
output = l1_regularization_gradient_descent(X, y, alpha=alpha, learning_rate=0.01, max_iter=1000)
print('Test Case 2: Accepted') if np.allclose(output[0], np.array([0.27280148, 0.68108784])) and np.isclose(output[1], 0.4082863608718005) else print('Test Case 2: Failed')
print('Input:')
print('import numpy as np\nX = np.array([[0, 1], [1, 2], [2, 3], [3, 4], [4, 5]])\ny = np.array([1, 2, 3, 4, 5])\nalpha = 0.1\noutput = l1_regularization_gradient_descent(X, y, alpha=alpha, learning_rate=0.01, max_iter=1000)\nprint(output)')
print()
print('Output:')
print(output)
print()
print('Expected:')
print('(array([0.27280148, 0.68108784]), 0.4082863608718005)')

Test Case 1: Accepted
Input:
import numpy as np
X = np.array([[0, 0], [1, 1], [2, 2]])
y = np.array([0, 1, 2])
alpha = 0.1
output = l1_regularization_gradient_descent(X, y, alpha=alpha, learning_rate=0.01, max_iter=1000)
print(output)

Output:
(array([0.42371644, 0.42371644]), 0.15385068459377865)

Expected:
(array([0.42371644, 0.42371644]), 0.15385068459377865)


Test Case 2: Accepted
Input:
import numpy as np
X = np.array([[0, 1], [1, 2], [2, 3], [3, 4], [4, 5]])
y = np.array([1, 2, 3, 4, 5])
alpha = 0.1
output = l1_regularization_gradient_descent(X, y, alpha=alpha, learning_rate=0.01, max_iter=1000)
print(output)

Output:
(array([0.27280148, 0.68108784]), 0.40828636087180054)

Expected:
(array([0.27280148, 0.68108784]), 0.4082863608718005)
