# Worked example (pen-and-paper): constrained ↔ penalized (Lagrangian) for ridge

**Goal.** Work a concrete, tiny example by hand so you can see step-by-step how the constrained problem and the penalized (ridge) problem are the same for an appropriate choice of multiplier. We keep everything scalar so algebra is simple and you can follow with pencil and paper.

---

## Problem statement (very small data)

We fit a model $y \approx x\beta$ with one scalar parameter $\beta$. Data:

$$
x = \begin{bmatrix}1 \\ 2\end{bmatrix},\qquad
y = \begin{bmatrix}1 \\ 2.2\end{bmatrix}.
$$

Define the least-squares objective (sum of squared errors):

$$
f(\beta) = \sum_{i=1}^2 (x_i\beta - y_i)^2.
$$

We will compare:

- **Unconstrained OLS**: $\min_\beta f(\beta)$.
- **Constrained**: $\min_\beta f(\beta)$ subject to $\beta^2 \le c$ (choose $c$ later).
- **Penalized (Ridge)**: $\min_\beta f(\beta) + \lambda \beta^2$.

You will see the same solution appear for a particular relation between $c$ and $\lambda$.

---

## Step 1 — compute some sums (do these first)

Work out the two sums you will use repeatedly (do with pencil):

$$
S_{xx} = \sum_i x_i^2 = 1^2 + 2^2 = 1 + 4 = 5.
$$
$$
S_{xy} = \sum_i x_i y_i = 1\cdot 1 + 2\cdot 2.2 = 1 + 4.4 = 5.4.
$$

These are simple and important. Keep them visible.

---

## Step 2 — solve the unconstrained least squares (OLS)

For scalar $\beta$, the normal equation reduces to:

$$
S_{xx}\,\beta = S_{xy} \quad\Longrightarrow\quad \beta_{\text{OLS}} = \frac{S_{xy}}{S_{xx}}.
$$

Plug numbers:

$$
\beta_{\text{OLS}} = \frac{5.4}{5} = 1.08.
$$

**Interpretation:** with no constraints, the best fit coefficient is $1.08$.

Compute its squared norm (we will compare to $c$):

$$
\beta_{\text{OLS}}^2 = 1.08^2 = 1.1664.
$$

---

## Step 3 — set a constraint and check if it is active

Pick a constraint radius. For the example, take

$$
c = 1 \quad(\text{so the allowed }|\beta|\le 1).
$$

Compare $\beta_{\text{OLS}}^2 = 1.1664$ with $c=1$. Since $1.1664 > 1$, the unconstrained solution violates the constraint. Therefore the constraint is **active**: the constrained minimizer lies on the boundary $\beta^2 = c$.

So the constrained solution is **not** $\beta_{\text{OLS}}$ — it will be on the circle/point $\beta = \pm \sqrt{c}$. (Because for scalar parameter and a convex symmetric problem we expect the positive sign here.)

---

## Step 4 — form the Lagrangian for the constrained problem

Write the constrained problem:

$$
\min_\beta f(\beta) \quad\text{s.t.}\quad \beta^2 \le c.
$$

Form the Lagrangian (use multiplier $\mu\ge0$):

$$
\mathcal{L}(\beta,\mu) = f(\beta) + \mu(\beta^2 - c).
$$

(We write constraint as $\beta^2 - c \le 0$; the sign is conventional. The constant $-\mu c$ does not affect minimization over $\beta$, but we keep $\mu$ to enforce the boundary.)

---

## Step 5 — stationarity (first-order condition)

Differentiate the Lagrangian w.r.t.\ $\beta$ and set derivative to zero.

For our data,
$$
f(\beta) = (1\cdot\beta -1)^2 + (2\cdot\beta - 2.2)^2.
$$
Differentiate on paper (or use the compact sums):

$$
\frac{d}{d\beta} f(\beta) = 2S_{xx}\,\beta - 2S_{xy}.
$$

(You can check this: expand the two squared terms and differentiate term by term.)

Including the $\mu$ term, stationarity is:

$$
0 = \frac{\partial\mathcal{L}}{\partial\beta}
  = (2S_{xx}\beta - 2S_{xy}) + 2\mu\beta.
$$

Rearrange:

$$
(2S_{xx} + 2\mu)\beta = 2S_{xy}.
$$

Divide by 2:

$$
(S_{xx} + \mu)\,\beta = S_{xy}.
$$

So the **stationarity formula** for the constrained problem is

$$
\boxed{\beta(\mu) = \frac{S_{xy}}{S_{xx} + \mu}.}
$$

This is the same algebraic form as ridge — note the appearance of $\mu$ added to $S_{xx}$.

---

## Step 6 — use the boundary condition (complementary slackness)

Because we already determined the constraint is active for $c=1$, complementary slackness says $\mu>0$ and $\beta^2 = c$.

So set $\beta = \pm\sqrt{c}$. We choose the positive root because the unconstrained solution was positive and the problem is symmetric: $\beta = +1$.

Plug into the stationarity formula to solve for $\mu$:

$$
1 = \frac{S_{xy}}{S_{xx} + \mu}
\quad\Longrightarrow\quad S_{xx} + \mu = S_{xy}.
$$

Thus

$$
\mu = S_{xy} - S_{xx}.
$$

Use numbers: $S_{xy}=5.4$, $S_{xx}=5$:

$$
\mu = 5.4 - 5 = 0.4.
$$

So the Lagrange multiplier that enforces the constraint $\beta^2\le1$ is $\mu = 0.4$, and the constrained solution is $\beta = 1$.

---

## Step 7 — check the penalized (ridge) problem with $\lambda=\mu$

Now consider the penalized objective with $\lambda = 0.4$:

$$
\min_\beta f(\beta) + \lambda \beta^2.
$$

The stationarity (derivative = 0) gives exactly the same linear equation as in Step 5:

$$
(S_{xx} + \lambda)\beta = S_{xy}.
$$

Plug $\lambda=0.4$:

$$
(5 + 0.4)\beta = 5.4 \quad\Longrightarrow\quad 5.4\,\beta = 5.4 \quad\Longrightarrow\quad \beta = 1.
$$

So with $\lambda=0.4$ the penalized (ridge) minimizer equals the constrained minimizer we found earlier. **This demonstrates the equivalence** in this concrete case.

---

## Step 8 — verify KKT conditions quickly (conceptual check)

For completeness, verify the four KKT conditions:

1. **Primal feasibility:** $\beta^2 \le c$. We have $\beta=1$, $1^2 = 1 \le 1$ ✓.
2. **Dual feasibility:** $\mu \ge 0$. We have $\mu=0.4 \ge 0$ ✓.
3. **Stationarity:** satisfied by construction: $(S_{xx}+\mu)\beta=S_{xy}$ ✓.
4. **Complementary slackness:** $\mu(\beta^2 - c)=0$. Since $\beta^2 - c = 0$ and $\mu>0$, product is 0 ✓.

All **KKT conditions** hold, so this is a valid optimal solution.

---

## Step 9 — try a different $c$ by hand (exercise)

Do this with pencil and paper to see the monotonic mapping $\mu \leftrightarrow c$.

1. If you pick $c = 2$ (a loose constraint), unconstrained $\beta_{\text{OLS}}^2 = 1.1664 < 2$. The constraint is **inactive**, so $\mu=0$ and the constrained minimizer is $\beta_{\text{OLS}}=1.08$. The penalized form with $\lambda=0$ matches (OLS).

2. If you pick $c$ smaller, e.g. $c=0.25$ (so allowed $|\beta|\le 0.5$), you can solve for $\mu$ from the equation
   $$
   \beta = \frac{S_{xy}}{S_{xx}+\mu},\qquad \beta^2 = c .
   $$
   Combine: $\sqrt{c} = S_{xy}/(S_{xx}+\mu)$ → solve for $\mu$:
   $$
   \mu = \frac{S_{xy}}{\sqrt{c}} - S_{xx}.
   $$
   Plug numbers and verify $\mu>0$. Then the penalized solution with $\lambda=\mu$ gives the same $\beta$.

This exercise shows the mapping $\mu \mapsto \beta(\mu)$ is decreasing: larger $\mu$ produces smaller $|\beta|$.

---

## Step 10 — takeaways (simple and important)

- The stationarity equation $(S_{xx}+\mu)\beta = S_{xy}$ is **exactly** the ridge normal equation when you set $\lambda=\mu$.
- For scalar problems you can explicitly solve for $\mu$ from the boundary condition $\beta^2=c$. In matrix/vector problems the idea is the same but solving for the dual multiplier analytically is harder.
- **In practice** we pick $\lambda$ (the penalty) by validation rather than solving for a particular $c$. The mathematics shows that each $\lambda$ corresponds to some constraint radius $c$ (and vice versa), under convexity.
- The KKT conditions are the formal set of equations that guarantee optimality and formalize the equivalence.

---

## Checklist to reproduce this example by hand

1. Compute $S_{xx}$ and $S_{xy}$.
2. Compute $\beta_{\text{OLS}} = S_{xy}/S_{xx}$. Check $\beta_{\text{OLS}}^2$ vs $c$.
3. If unconstrained solution violates constraint (active), form stationarity:
   $(S_{xx} + \mu)\beta = S_{xy}$.
4. Use boundary $\beta^2 = c$ to solve for $\mu$. (For scalar this is direct.)
5. Plug $\lambda=\mu$ into the penalized equation $(S_{xx}+\lambda)\beta=S_{xy}$ and verify same $\beta$.
6. Check KKT conditions.