## 2.3.2 Model Calibration via Backfitting

MGWR model calibration accommodates the selection and incorporation of relationship-specific bandwidths (i.e., multiple scales) by reformulating the GWR as a generalized additive model (GAM)

$$
y = f_{bw_0} + f_{bw_1}(\mathbf{X}_1) + f_{bw_2}(\mathbf{X}_2) + \dots + f_{bw_k}(\mathbf{X}_k) + \varepsilon \tag{2.31}
$$

where $f_{bw_k}(X_k) = bw_k(\beta_k)X_k$ is a smoothing function (i.e., the data-borrowing scheme) applied to the $k^{th}$ explanatory variable $X_k$. To estimate each smoothing function $f_{bw_k}$ and calculate parameters $\beta_k$, equation (2.31) can be rearranged, knowing $\hat{\varepsilon}$ is independent to $X_k$, gives the following component-wise conditional expectation (Li & Fotheringham, 2020)

$$
f_{bw_k} = E\left[Y - \sum_{p \neq k} f_{bw_p} - \hat{\varepsilon} \mid X_k\right] = E\left[Y - \sum_{p \neq k} f_{bw_p} \mid X_k\right] = A_k \left[Y - \sum_{p \neq k} f_{bw_p}\right] \tag{2.32}
$$

where $A_k \in E(1 \times 1)$, which is the hat matrix from a univariate GWR model of $X_k$, that maps $Y - \sum_{p \neq k} f_{bw_p}$ to $f_{bw_k}$. This GWR hat matrix $A_k$ is expressed as

$$
A_k = \left[
\begin{array}{ccc}
x_k(X_k^{\top} W_k X_k)^{-1} X_k^{\top} W_k \\
x_k(X_k^{\top} W_k X_k)^{-1} X_k^{\top} W_k \\
\vdots \\
x_k(X_k^{\top} W_k X_k)^{-1} X_k^{\top} W_k
\end{array}
\right]_{n \times n} \tag{2.33}
$$

where $W_k$ is a diagonal spatial weight matrix calculated on the basis of a covariate-specific bandwidth and a kernel function (e.g., bi-square or Gaussian). Putting all the additive components in equation (2.32) into a matrix form gives a normal system of

$$
\left[
\begin{array}{cccc}
A_1 & A_1 & \cdots & A_1 \\
A_2 & A_2 & \cdots & A_2 \\
\vdots & \vdots & \ddots & \vdots \\
A_k & A_k & \cdots & I
\end{array}
\right]
\left[
\begin{array}{c}
f_{bw_1} \\
f_{bw_2} \\
\vdots \\
f_{bw_k}
\end{array}
\right]
=
\left[
\begin{array}{c}
A_1 \\
A_2 \\
\vdots \\
A_k
\end{array}
\right] Y \tag{2.34}
$$

Equation (2.34) can be written in abbreviated form as

$$
P f = Q Y \tag{2.35}
$$

where $P$ is an $nk$-by-$nk$ matrix and $Q$ is an $nk$ by $n$ matrix. On rearranging, this gives the additive components $f_{bw_1}, \cdots, f_{bw_k}$ as

$$
f = \left[
\begin{array}{c}
f_{bw_1} \\
f_{bw_2} \\
\vdots \\
f_{bw_k}
\end{array}
\right] = P^{-1} Q Y \tag{2.36}
$$

provided that matrix $P$ is invertible. Then, covariate-specific hat matrices can be obtained by

$$
R = \begin{bmatrix}
R_1 \\
R_2 \\
\vdots \\
R_k
\end{bmatrix} = P^{-1} Q \tag{2.37}
$$

This normal system of equations (2.35) provides a closed form of MGWR parameter estimates and inference. From a computational perspective, directly solving equation (2.35) is often prohibitive for even moderately sized datasets. The solution to this is to use an iterative backfitting procedure originally formulated by Hastie and Tibshirani (1986) and Buja et al. (1989), which converges toward a stable solution as directly solved in equation (2.35). The backfitting routine also derives a set of data-driven bandwidth parameters for the $k$ processes being modeled and is optimal in the sense that they minimize the overall model fit while balancing a bias-variance trade-off for each individual model component. This means that each model component comprises a univariate GWR model, and each iteration of the backfitting procedure entails a GWR calibration based on a subset of the steps outlined in Algorithm 2.1.

The MGWR backfitting calibration, which is outlined in Algorithm 2.2, begins by first setting an initial value for the local parameter estimates $\hat{\beta}_{init}$, which could be zero $\hat{\beta}_{init} = 0 \forall i$, the global parameter estimates from a traditional regression $\hat{\beta}_{init} = \hat{\beta}_{GLM}$, or the local parameter estimates from a GWR model including all of the variables ($\hat{\beta}_{init} = \hat{\beta}_{GWR}$). The closer the initial local parameter estimates are to the eventual converged values, the fewer the iterations that are needed and the quicker algorithm can be executed. Fotheringham et al. (2017) demonstrate that using initial parameter estimate values from GWR is typically superior in this regard.

These initial values for the local parameter estimates are then used to obtain initial values for predicted values $\hat{y}$ and residuals $\hat{\varepsilon}$. During each iteration of the backfilling procedure, for each model term $k$, the current value of $\hat{f}_k$ is regressed on $X_k$ using GWR, providing a temporary value for the optimal bandwidth $bw_k$ associated with the relationship between $y$ and $X_k$, and updated values for the local parameter estimates $\hat{\beta}_k$, smooth $\hat{f}_k$, and model residuals $\hat{\varepsilon}$ (based on the other updated values). At the end of each iteration, the current values of $\hat{y}$ and $\hat{\varepsilon}$ are used to assess whether the differences between these values and the previous iteration are sufficiently small to denote that the backfitting algorithm has converged. A score of change (SOC) criterion can be used to assess convergence, which can be based on the MGWR model RSS

$$
SOC_{RSS} = \frac{RSS_{new} - RSS_{old}}{RSS_{new}} \tag{2.38}
$$

or based directly on the individual GWR-style smoothing functions

$$
SOC_f = \frac{\sum_k \sum_i (\hat{f}_k^{new} - \hat{f}_k^{old})^2}{\sum_k \sum_i (\hat{f}_k^{new})^2} \tag{2.39}
$$

Though both $SOC_{RSS}$ and $SOC_f$ are scale-free, the $SOC_f$ is advantageous because it focuses on the changes of each model component rather than the overall model fit and is therefore the suggested criterion to use. Convergence is achieved once the $SOC$ becomes smaller than some threshold value $\eta$, which is typically set to $10^{-5}$.

Following Yu et al. (2020b), it is also possible to compute a covariate-specific hat matrix $R_k$ that maps the dependent variable $y$ to each of the estimated model components $\hat{f}_k$ in the backfitting routine such that

$$
\hat{f}_k = R_k y \tag{2.40}
$$

where $R_k$ is computed as

$$
R_k = A_k \left(I - \sum_{p \neq k} R_p \right) \tag{2.41}
$$

and $A_k$ is the hat matrix from each univariate GWR model used to estimate $\hat{f}_k$ each iteration of the backfitting procedure. Then, once convergence is reached, the final values of $R_k$ can be summed to obtain the overall hat matrix $S$ for the MGWR model

$$
S = \sum_k R_k \tag{2.42}
$$

which maps $y$ to $\hat{y}$. In addition, the covariate-specific hat matrix $R_k$ can be used to compute the covariance-specific effective number of parameters in the following manner

$$
ENP_k = \text{tr}(R_k) \tag{2.43}
$$

and the model effective number of parameters can be calculated as

$$
ENP_{model} = \sum_k ENP_k \tag{2.44}
$$