# $\S$ 5.7. Multidimensional Splines

So far we have focused on one-dimensional spline models. Each of the approaches have multidimensional analoigs.

### Tensor product basis

Suppose $X\in\mathbb{R}^2$, and we have
* a basis of functions $h_{1k}(X_1)$, for $k=1,\cdots,M_1$
  for representing functions of coordinate $X_1$,
* a set of $M_2$ functions $h_{2k}(X_2)$ for coordinate $X_2$, likewise.

Then the $M_1 \times M_2$ dimensional _tensor product basis_ defined by

\begin{equation}
g_{jk}(X) = h_{1j}(X_1) h_{2k}(X_2), \text{ for } j=1,\cdots,M_1 \text{ and } k=1,\cdots,M_2
\end{equation}

can be used for representing a two-dimensional function:

\begin{equation}
g(X) = \sum_{j=1}^{M_1}\sum_{k=1}^{M_2} \theta_{jk} g_{jk}(X).
\end{equation}

FIGURE 5.10 illustrates a tensor product basis using B-splines.

The coefficients can be fit by least squares, as before.

### Beyond 2D

This can be generalized to $d$ dimensions, but note that the dimension of the basis grow exponentially fast -- yet another manifestation of the curse of dimensionality.

The MARS procedure discussed in Chapter 9 is a greedy forward algorithm for including only thos tensor products that are deemed necessary by least squares.

### Additive vs. tensor product splines

FIGURE 5.11 illustrates the difference between additive and tensor product (natural) splines on the simulated classification example from Chapter 2.

A logistic regression model

\begin{equation}
\text{logit}\left[ \text{Pr}(T|x) \right] = h(x)^T\theta
\end{equation}

is fit to the binary response, and the estimated decision boundary is the contour

\begin{equation}
h(x)^T \hat\theta = 0.
\end{equation}

The tensor product basis can achieve more flexibility at the decision boundary, but introduces some spurious structure along the way.

In [1]:
"""FIGURE 5.11. The simulation example of FIGURE 2.1."""
print('Under construction ...')

Under construction ...


### Smoothing splines for higher dimension

One-dimensional smoothing splines (via regularization) generalize to higher-dimensions as well.

Suppose we have pairs $(y_i, x_i)$ with $x_i\in\mathbb{R}^d$, and we seek a $d$-dimensional regression function $f(x)$. The idea is to set up the problem

\begin{equation}
\min_f \sum_{i=1}^N \left( y_i - f(x_i) \right)^2 + \lambda J[f],
\end{equation}

where $J$ is an appropriate penalty functional for stabilizing a function $f$ in $\mathbb{R}^d$. For example, a natural generalization of the one-dimensional roughness penalty for functions of $\mathbb{R}^2$ is

\begin{equation}
J[f] = \int\int_{\mathbb{R}^2} \left[ \left( \frac{\partial^2f(x)}{\partial x_1^2} \right)^2 + 2\left( \frac{\partial^2f(x)}{\partial x_1 \partial x_2} \right)^2 + \left( \frac{\partial^2f(x)}{\partial x_2^2} \right)^2\right] dx_1dx_2.
\end{equation}

Optimizing the above minimization with this penalty leads to a smooth two-dimensional surface, a.k.a. a thin-plate spline. It shares many properties with the one-dimensional cubic smoothing spline:
* as $\lambda\rightarrow 0$, the solution approaches an interpolating function (the one with smallest penalty $J$);
* as $\lambda\rightarrow\infty$, the solution approaches the least squares plane;
* for intermediate values of $\lambda$, the solution can be represented as a linear expansion of basis functions, whose coefficients are obtained by a form of generalized ridge regression.

The solution has the form

\begin{align}
f(x) &= \beta_0 + \beta^Tx + \sum_{j=1}^N \alpha_j h_j(x), \\
\text{where } h_j(x) &= \|x-x_j\|^2\log\|x-x_j\|.
\end{align}

These $h_j$ are examples of _radial basis functions_, which are discussed in more detail in the next section.

The coefficients are found by plugging the form to the above minimization problem, which reducees to a finite-dimensional penalized least squares problem. For the penalty to be finite, the coefficients $\alpha_j$ have to satisfy a set of linear constraints (see Exercise 5.14).


### Hybrid approaches for computational and conceptual simplicity

Unlike one-dimensional smoothing splines, the computational complexity for thin-plate splines is $O(N^3)$, since there is not in general any sparse structure that can be exploited. However, as with univariate smoothing splines, we can get away with substantially less than the $N$ knots prescibed by the previous solution.

In practice, it is usually sufficient to work with a lattice of knots covering the domain. The penalty is computed for the reduced expansion just as before. Using $K$ knots reduces the computations to $O(NK^2+K^3)$. FIGURE 5.12 shows the result of fitting a thin-plate spline to some heart disease risk factors, representing the surface as a contour plot. Note that $\lambda$ was specified via $\text{df}_\lambda = \text{trace}(S_\lambda) =15$.

In [2]:
"""FIGURE 5.12. A thin-plate spline fit to the heart disease data, displayed as a contour plot.
Indicated are the location of te input features, as well as the knots used in the fit.

The response is `systolic blood pressure, modeled as a function of `age` and `obesity`.
Care should be taken to use knots from the lattice inside the convex hull of the data (red),
and ignore those outside (green).
"""
print('Under construction ...')

Under construction ...


### Generalized expansion and additive spline models

More generally one can represent $f\in\mathbb{R}^d$ as an expansion in any arbitrarily large collection of basis functions, and control the complexity by applying a regularizer. For example, we can construct a basis by forming the tensor products of all pairs of univariate smoothing-spline basis function. This leads to an exponential growth in basis functions as the dimension increases, and typically we have to reduce the number of functions per coordinate accordingly.

The additive spline models (discussed in Chapter 9) are a restricted class of multidimensional splines. They can be represented in this general formulation as well; e.g., there exists a penalty $J[f]$ that guarantees that the solution has the form

\begin{equation}
f(X) = \alpha + f_1(X_1) + \cdots + f_d(X_d),
\end{equation}
  
and that each of functions $f_j$ are univariate splines. In this case the penalty is somewhat degenerate, and it is more natural to _assume_ that $f$ is additive, and then simply impose an additional penalty on each of the component functions:

\begin{align}
J[f] &= J(f_1 + f_2 + \cdots + f_d) \\
&= \sum_{j=1}^d \int f_j''(t_j)^2 dt.
\end{align}

### ANOVA spline decomposition for additive spline models

These are naturally extended to ANOVA spline decompositions,

\begin{equation}
f(X) = \alpha + \sum_j f_j(X_j) + \sum_{j<k} f_{jk}(X_j,X_k) + \cdots,
\end{equation}

where each of the components are splines of the required dimension.

There are many choices to be made:
* The maximum order of interaction -- we have shown up to order 2 above.
* Which terms to include -- not all main effects and interactions are necessarily needed.
* What representation to use -- some choices are:
  * Regression splines with a relatively small number of basis functions per coordinate, and their tensor products for interactions;
  * a complete basis as in smoothing splines, and include appropriate regularizers for each term in the expansion.
  
In many cases when the number of potential dimensions (features) is large, automatic methods are more desirable. The MARS and MART procedures (Chapter 9 and 10, respectively) both fall into this category.