# $\S$ 5.4. Smoothing Splines

Here we discuss a spline basis method that avoids the knot selection problem completely by using a maximal set of knots.

> The complexity of the fit is controlled by regularization.

Consider the following problem: Among all functions $f(x)$ with two continuous derivatives, find one that minimizes the penalized residual sum of squares

\begin{equation}
\text{RSS}(f, \lambda) = \sum_{i=1}^N \left( y_i - f(x_i) \right)^2 + \lambda \int \left( f''(t)\right)^2 dt,
\end{equation}

where $\lambda$ is a fixed _smoothing parameter_. The first term measures closeness to the data, while the second term penalizes curvature in the function, and $\lambda$ establishes a tradeoff between the two.

Consider the two special cases:
1. $\lambda = 0$: $f$ can be any function that interpolates the data.
2. $\lambda = \infty$: the simple least squares line fit, since no second derivative can be tolerated.

These vary from very rough to very smooth, and the hope is that $\lambda \in (0,\infty)$ indexes an interesting class of functions in between.

### The natural cubic spline as the minimizer

The above $\text{RSS}$ is defined on an infinite-dimensional function space -- in fact, a Sobolev space of functions for which the second term is defined.

Remarkably, it can be shown that for the $\text{RSS}$ there is an explicit, finite-dimensional, unique minimizer which is a natural cubic spline with knots at the unique values of the $x_i$, $i=1,\cdots,N$ (Exercise 5.7).

At face value it seems that the family is still over-parametrized, since there are as many as $N$ knots, which implies $N$ degrees of freedom. However, the penalty term translates to a penalty on the spline coefficients, which are shrunk some of the way toward the linear fit.

### Computation

Since the solution is a natural spline, we can write it as

\begin{equation}
f(x) = \sum_{j=1}^N N_j(x) \theta_j,
\end{equation}

where the $N_j(x)$ are an $N$-dimensional set of basis functions for representing this family of natural splines ($\S$ 5.2.1 and Exercise 5.4). The criterion thus reduces to

\begin{equation}
\text{RSS}(\theta, \lambda) = (\mathbf{y} - \mathbf{N}\theta)^T(\mathbf{y} - \mathbf{N}\theta) + \lambda\theta^T\mathbf{\Omega}_N\theta,
\end{equation}

where
* $\lbrace\mathbf{N}\rbrace_{ij} = N_j(x_i)$ and 
* $\lbrace\mathbf{\Omega}_N\rbrace_{jk} = \int N_j''(t)N_k''(t)dt$.

The solution is easily seen to be

\begin{equation}
\hat\theta = \left( \mathbf{N}^T\mathbf{N} + \lambda\mathbf{\Omega}_N \right)^{-1} \mathbf{N}^T \mathbf{y},
\end{equation}

a generalized ridge regression. The fitted smoothing spline is given by

\begin{equation}
\hat{f}(x) = \sum_{j=1}^N N_j(x) \hat\theta_j.
\end{equation}

See the Appendix of this chapter for efficient computational techniques for smoothing splines.

In [2]:
"""FIGURE 5.6. A smoothing spline to BMD data with fixed lambda ~= 0.00022
This choice, corresponding to about 12 degrees of freedom, will be discussed
in the next section."""
print('Under construction ...')

Under construction ...
