# MIRAGE++ Notebook 2: Core Model Walkthrough

## Building the Entropy-Regularized Mirror Descent Linear Regression

In this notebook, we'll build the MIRAGE++ model step by step. We'll explain every part in plain English, show the code, and visualize how the model learns.

## 1. What is Entropy Regularization?

Entropy regularization encourages the model's weights to be spread out, not concentrated on just a few features. This is like making a portfolio diversified instead of betting everything on one asset.

Mathematically: $H(\theta) = -\sum_i \theta_i \log \theta_i$

We add this to the loss function to penalize 'spiky' weights.

## 2. What is Mirror Descent?

Mirror descent is a smarter way to update weights than regular gradient descent. Instead of moving in a straight line, it moves in a way that respects the geometry of the problem (e.g., keeping weights positive and summing to 1).

We use the KL-divergence as our 'distance' measure.

## 3. The MIRAGE++ Loss Function

Our loss combines mean squared error (MSE) and entropy regularization:

$L(\theta) = \|X\theta - y\|^2 + \lambda H(\theta)$

Where $\lambda$ controls how much we care about entropy.

In [None]:
import numpy as np

def entropy(theta, epsilon=1e-8):
    theta = np.clip(theta, epsilon, 1.0)
    return -np.sum(theta * np.log(theta))

def loss(X, y, theta, lam):
    preds = X @ theta
    mse = np.mean((preds - y)**2)
    ent = entropy(theta)
    return mse + lam * ent

# Example usage
X = np.random.randn(100, 5)
true_theta = np.array([0.2, 0.1, 0.4, 0.1, 0.2])
y = X @ true_theta + np.random.randn(100) * 0.1
theta = np.ones(5) / 5
print('Initial loss:', loss(X, y, theta, lam=0.1))

## 4. Mirror Descent Update Rule

Instead of subtracting the gradient (as in gradient descent), we multiply by the exponent of the negative gradient. This keeps weights positive and, after normalization, on the simplex (like probabilities).

$\theta_{t+1} = \theta_t \cdot \exp(-\eta \nabla L(\theta_t))$

Then normalize so $\sum_i \theta_i = 1$.

In [None]:
def mirror_descent_step(grad, theta_t, eta):
    theta_new = theta_t * np.exp(-eta * grad)
    theta_new = np.clip(theta_new, 1e-12, None)
    return theta_new / np.sum(theta_new)

# Example gradient
grad = np.random.randn(5)
theta_new = mirror_descent_step(grad, theta, eta=0.1)
print('Updated theta:', theta_new)
print('Sum of theta:', np.sum(theta_new))

## 5. Full Model Training Loop

Let's put it all together: initialize weights, compute gradient, update with mirror descent, and track loss.

In [None]:
def gradient(X, y, theta, lam):
    preds = X @ theta
    grad_mse = 2 * X.T @ (preds - y) / len(y)
    grad_entropy = -1 - np.log(np.clip(theta, 1e-8, 1.0))
    return grad_mse + lam * grad_entropy

def fit_mirage(X, y, lam=0.1, eta=0.1, n_iters=300):
    n = X.shape[1]
    theta = np.ones(n) / n
    loss_hist = []
    for i in range(n_iters):
        grad = gradient(X, y, theta, lam)
        theta = mirror_descent_step(grad, theta, eta)
        loss_hist.append(loss(X, y, theta, lam))
    return theta, loss_hist

theta_mirage, loss_hist = fit_mirage(X, y, lam=0.1, eta=0.2, n_iters=200)
print('Final MIRAGE++ weights:', theta_mirage)

## 6. Visualize Learning

Let's plot the loss over iterations to see if the model is learning.

In [None]:
import matplotlib.pyplot as plt
plt.plot(loss_hist)
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('MIRAGE++ Training Loss')
plt.show()

## 7. Compare to OLS, Ridge, and Lasso

Let's see how MIRAGE++ weights compare to standard models.

In [None]:
from sklearn.linear_model import LinearRegression, Ridge, Lasso
ols = LinearRegression().fit(X, y)
ridge = Ridge(alpha=0.1).fit(X, y)
lasso = Lasso(alpha=0.1).fit(X, y)

plt.figure(figsize=(8,4))
plt.bar(np.arange(5)-0.3, ols.coef_, width=0.2, label='OLS')
plt.bar(np.arange(5)-0.1, ridge.coef_, width=0.2, label='Ridge')
plt.bar(np.arange(5)+0.1, lasso.coef_, width=0.2, label='Lasso')
plt.bar(np.arange(5)+0.3, theta_mirage, width=0.2, label='MIRAGE++')
plt.legend()
plt.xlabel('Feature Index')
plt.ylabel('Weight')
plt.title('Model Weights Comparison')
plt.show()

## 8. What Did We Learn?

- MIRAGE++ produces weights that are positive, sum to 1, and are more diversified.
- OLS, Ridge, and Lasso can have negative or highly concentrated weights.
- Entropy regularization and mirror descent give us a new, interpretable solution.

In the next notebook, we'll apply MIRAGE++ to more realistic finance problems!