# Implement Ridge Regression Loss Function

Write a Python function `ridge_loss` that implements the Ridge Regression loss function. The function should take a 2D numpy array `X` representing the feature matrix, a 1D numpy array `w` representing the coefficients, a 1D numpy array `y_true` representing the true labels, and a float `alpha` representing the regularization parameter. The function should return the Ridge loss, which combines the Mean Squared Error (MSE) and a regularization term.

Example:
```python
import numpy as np

X = np.array([[1, 1], [2, 1], [3, 1], [4, 1]])
w = np.array([0.2, 2])
y_true = np.array([2, 3, 4, 5])
alpha = 0.1

loss = ridge_loss(X, w, y_true, alpha)
print(loss)
# Expected Output: 2.204
```

## Ridge Regression Loss

Ridge Regression is a linear regression method with a regularization term to prevent overfitting by controlling the size of the coefficients.

## Key Concepts:

- **Regularization**: Adds a penalty to the loss function to discourage large coefficients, helping to generalize the model.
- **Mean Squared Error (MSE)**: Measures the average squared difference between actual and predicted values.
- **Penalty Term**: The sum of the squared coefficients, scaled by the regularization parameter $\alpha$, which controls the strength of the regularization.

## Ridge Loss Function:

The Ridge Loss function combines MSE and the penalty term:

$$
\text{Ridge Loss} = \frac{1}{n}\sum_{i=1}^n (y_i - \hat{y}_i)^2 + \alpha \sum_{i=1}^{p} w_i^2
$$ 

## Implementation Steps:

- **Calculate MSE**: Compute the average squared difference between actual and predicted values.
- **Add Regularization Term**: Compute the sum of squared coefficients multiplied by the regularization parameter $\alpha$.
- **Combine and Minimize**: Sum MSE and the regularization term to form the Ridge loss, then minimize this loss to find the optimal coefficients.

In [1]:
import numpy as np

def ridge_loss(X: np.ndarray, w: np.ndarray, y_true: np.ndarray, alpha: float) -> float:
    return np.mean((X@w-y_true)**2)+alpha*np.sum(w**2)

In [3]:
X = np.array([[1,1],[2,1],[3,1],[4,1]])
W = np.array([.2,2])
y = np.array([2,3,4,5])
alpha = 0.1
output = ridge_loss(X, W, y, alpha)
print('Test Case 1: Accepted') if output == 2.204 else print('Test Case 1: Failed')
print('Input:')
print('X = np.array([[1,1],[2,1],[3,1],[4,1]])\nW = np.array([.2,2])\ny = np.array([2,3,4,5])\nalpha = 0.1\noutput = ridge_loss(X, W, y, alpha)\nprint(output)')
print()
print('Output:')
print(output)
print()
print('Expected:')
print('2.204')
print()
print()

X = np.array([[1,1,4],[2,1,2],[3,1,.1],[4,1,1.2],[1,2,3]])
W = np.array([.2,2,5])
y = np.array([2,3,4,5,2])
alpha = 0.1
output = ridge_loss(X, W, y, alpha)
print('Test Case 2: Accepted') if output == 164.402 else print('Test Case 2: Failed')
print('Input:')
print('X = np.array([[1,1,4],[2,1,2],[3,1,.1],[4,1,1.2],[1,2,3]])\nW = np.array([.2,2,5])\ny = np.array([2,3,4,5,2])\nalpha = 0.1\noutput = ridge_loss(X, W, y, alpha)\nprint(output)')
print()
print('Output:')
print(output)
print()
print('Expected:')
print('164.402')

Test Case 1: Accepted
Input:
X = np.array([[1,1],[2,1],[3,1],[4,1]])
W = np.array([.2,2])
y = np.array([2,3,4,5])
alpha = 0.1
output = ridge_loss(X, W, y, alpha)
print(output)

Output:
2.204

Expected:
2.204


Test Case 2: Accepted
Input:
X = np.array([[1,1,4],[2,1,2],[3,1,.1],[4,1,1.2],[1,2,3]])
W = np.array([.2,2,5])
y = np.array([2,3,4,5,2])
alpha = 0.1
output = ridge_loss(X, W, y, alpha)
print(output)

Output:
164.402

Expected:
164.402
