# $\S$ 5.6. Nonparametric Logistic Regression

The smoothing spline problem in $\S$ 5.4,

\begin{equation}
\text{RSS}(f, \lambda) = \sum_{i=1}^N \left( y_i - f(x_i) \right)^2 + \lambda\int \left( f''(t) \right)^2 dt,
\end{equation}

is posed in a regression setting. It is typically straightforward to transfer this technology to other domains. Here we consider logistic regression with a single quantitative input $X$. Then the model is

\begin{equation}
\log \frac{\text{Pr}(Y=1|X=x)}{\text{Pr}(Y=0|X=x)} = f(x),
\end{equation}

which implies

\begin{equation}
\text{Pr}(Y=1|X=x) = \frac{e^{f(x)}}{1+e^{f(x)}}.
\end{equation}

Fitting $f(x)$ in a smooth fashion leads to a smooth estimate of the conditional probability $\text{Pr}(Y=1|x)$, which can be used for classification or risk scoring.

### MLE

We construct the penalized log-likelihood criterion

\begin{align}
l(f;\lambda) &= \sum_{i=1}^N \left[ y_i\log{p(x_i)} + (1-y_i)\log{(1-p(x_i))} \right] - \frac{\lambda}{2} \int \left( f''(t) \right)^2 dt \\
&= \sum_{i=1}^N \left[ y_i f(x_i) - \log{(1+e^{f(x_i)})} \right] - \frac{\lambda}{2} \int \left( f''(t) \right)^2 dt,
\end{align}

where $p(x) = \text{Pr}(Y=1|x)$. The first term is the log-likelihood on the binomial distribution (c.f. Chapter 4, page 120).

### Iterative procedure using Newton-Raphson, again

Arguments similar to those used in $\S$ 5.4 show that the optimal $f$ is a finite-dimensional natural spline with knots at the unique values of $x$. This means that we can represent

\begin{equation}
f(x) = \sum_{j=1}^N N_j(x) \theta_j.
\end{equation}

We compute the first and second derivatives

\begin{align}
\frac{\partial l(\theta)}{\partial\theta} &= \mathbf{N}^T(\mathbf{y}-\mathbf{p}) - \lambda\mathbf{\Omega}\theta, \\
\frac{\partial^2 l(\theta)}{\partial\theta\partial\theta^T} &= -\mathbf{N}^T\mathbf{WN} - \lambda\mathbf{\Omega},
\end{align}

where
* $\mathbf{p}$ is the $N$-vector with elements $p(x_i)$,
* $\mathbf{W}$ is a diagonal matrix of weights $p(x_i)(1-p(x_i))$.

The first derivative is nonlinear in $\theta$, so we need to use an iterative algorithm as in $\S$ 4.4.1. Using Newton-Raphson as for linear logistic regression, the update equation can be written

\begin{align}
\theta^{\text{new}} &= \left( \mathbf{N}^T\mathbf{WN} + \lambda\mathbf{\Omega} \right)^{-1} \mathbf{N}^T\mathbf{W} \left( \mathbf{N}\theta^{\text{old}} + \mathbf{W}^{-1}(\mathbf{y}-\mathbf{p}) \right) \\
&= \left( \mathbf{N}^T\mathbf{WN} + \lambda\mathbf{\Omega} \right)^{-1} \mathbf{N}^T\mathbf{Wz}.
\end{align}

We can also express this update in terms of the fitted values

\begin{align}
\mathbf{f}^{\text{new}} &= \mathbf{N} \left( \mathbf{N}^T\mathbf{WN} + \lambda\mathbf{\Omega} \right)^{-1} \mathbf{N}^T\mathbf{W} \left( \mathbf{f}^{\text{old}} + \mathbf{W}^{-1}(\mathbf{y}-\mathbf{p}) \right) \\
&= \mathbf{S}_{\lambda,\omega}\mathbf{z}.
\end{align}

### Comparison with regressions

Referring back to the regression solution of the smoothing spline problem in $\S$ 5.4,

\begin{align}
\hat\theta &= \left( \mathbf{N}^T\mathbf{N} + \lambda\mathbf{\Omega}_N \right)^{-1} \mathbf{N}^T \mathbf{y} \\
\hat{\mathbf{f}} &= \mathbf{N} \left( \mathbf{N}^T\mathbf{N} + \lambda\mathbf{\Omega}_N \right)^{-1} \mathbf{N}^T \mathbf{y} \\
&= \mathbf{S}_\lambda \mathbf{y},
\end{align}

we see that the update fits a weighted smoothing spline to the working response $\mathbf{z}$ (Exercise 5.12).

The form of $\mathbf{f}^{\text{new}}$ is suggestive. It is tempting to replace $\mathbf{S}_{\lambda,\omega}$ by any nonparametric (weighted) regression operator, and obtain general families of nonparametric logistic regression models.

Although here $x$ is one-dimensional, this procedure generalizes naturally to higher-dimensional $x$. These extensions are at the heart of _generalized additive models_, which we pursue in Chapter 9.