# Weighted Least Squares - theoretical aspects

This notebook presents some theoretical developments for deducing the Weighted Least Squares Problem (Golub and Van Loan, 2013, p. 304; Menke, 2018, p. 58; Aster et al., 2019, p. 28).

### References

* Golub, G. H. and Van Loan, C. F. Matrix computations, 4th edition, Johns Hopkins University Press, 2013, ISBN 978-1-4214-0859-0

* Menke, W. Geophysical Data Analysis: Discrete inverse theory, 4th edition, Academic Press, 2018, ISBN 978-0-12-813555-6

* Aster, R. C., Borchers, B., and Thurber, C. H. Parameter Estimation and Inverse Problems, 3rd edition Elsevier Academic Press, 2019, ISBN: 978-0-12-804651-7

Let's consider now that, for some reason, we want to give "*less importance*" for some observations $d_{i}$ in the observed data vector $\mathbf{d}$ (equation 6, notebook `least_squares`). This "*less importance*" can be represented by pre-multiplying the linear system (equation 3, notebook `least_squares`) as follows:

<a id='eq1'></a>
$$
\mathbf{W}^{\frac{1}{2}} \, \mathbf{d} \approx \mathbf{W}^{\frac{1}{2}} \, \mathbf{y} = \mathbf{W}^{\frac{1}{2}} \, \mathbf{A} \, \mathbf{p} \: , \tag{1}
$$

where $\mathbf{W}^{\frac{1}{2}}$ is a diagonal matrix formed by:

<a id='eq2'></a>
$$
\mathbf{W}^{\frac{1}{2}} = \begin{cases}
w_{ii} = 1 \: &, \quad \text{``important'' observations } d_{i} \\
0 < w_{ii} < 1 \: &, \quad \text{``less important'' observations } d_{i}
\end{cases} \quad . \tag{2}
$$

Notice that matrix $\mathbf{W}^{\frac{1}{2}}$ gives rise to a new residuals vector:

<a id='eq3'></a>
$$
\mathbf{r}_{w} = \mathbf{W}^{\frac{1}{2}} \mathbf{r} \tag{3}
$$

and a weighted misfit function

<a id='eq4a'></a>
$$
\Phi_{w}(\mathbf{p}) = \left[ \mathbf{d} - \mathbf{A}\mathbf{p} \right]^{\top} \mathbf{W} \left[ \mathbf{d} - \mathbf{A}\mathbf{p} \right] \: , \tag{4a}
$$

where

<a id='eq4b'></a>
$$
\mathbf{W} = \left( \mathbf{W}^{\frac{1}{2}} \right)^{\top} \left( \mathbf{W}^{\frac{1}{2}} \right) \: . \tag{4b}
$$

In the present problem, we are now interested in estimating the parameter vector $\mathbf{p} = \breve{\mathbf{p}}$ minimizing the weighted misfit function $\Phi_{w}(\mathbf{p})$.

The $j$ th element of the gradient of $\Phi_{w}(\mathbf{p})$ ([equation 4a](#eq4a)) is given by:

<a id='eq5'></a>
$$
\begin{split}
\dfrac{\partial \, \Phi_{w}(\mathbf{p})}{\partial \, p_{j}} 
&= 2 \, \Big \{ \dfrac{\partial}{\partial \, p_{j}} \mathbf{W}^{\frac{1}{2}} \left[ \mathbf{d} - \mathbf{A}\mathbf{p} \right] \Big \} 
^{\top} \mathbf{W}^{\frac{1}{2}} \left[ \mathbf{d} - \mathbf{A}\mathbf{p} \right] \\
&= 2 \Big \{ -\mathbf{u}_{j}^{\top}\mathbf{A}^{\top} \Big \} \mathbf{W} \left[ \mathbf{d} - \mathbf{A}\mathbf{p} \right]
\end{split} \: . \tag{5}
$$

By using this derivative ([equation 5](#eq5)), the gradient of $\Phi_{w}(\mathbf{p})$ ([equation 4a](#eq4a)) can be defined in this way:

<a id='eq6'></a>
$$
\nabla \Phi_{w}(\mathbf{p}) = -2 \mathbf{A}^{\top} \mathbf{W} \left[ \mathbf{d} - \mathbf{A}\mathbf{p} \right] \: . \tag{6}
$$

Finally, by evaluating the gradient $\nabla \Phi_{w}(\mathbf{p})$ ([equation 6](#eq6)) at $\breve{\mathbf{p}}$ and equating the result to the null vector, we obtain

<a id='eq7'></a>
$$
\begin{align}
\nabla \Phi_{w}(\breve{\mathbf{p}}) &= -2 \mathbf{A}^{\top} \mathbf{W} \left[ \mathbf{d} - \mathbf{A}\breve{\mathbf{p}} \right] \\
\mathbf{0} &= -\mathbf{A}^{\top} \mathbf{W} \mathbf{d} + \mathbf{A}^{\top} \mathbf{W} \mathbf{A}\breve{\mathbf{p}} 
\end{align} \quad , \tag{7}
$$

resulting that

<a id='eq8'></a>
$$
\left( \mathbf{A}^{\top} \mathbf{W} \mathbf{A} \right) \breve{\mathbf{p}} = \mathbf{A}^{\top} \mathbf{W} \mathbf{d} \: . \tag{8}
$$

This equation is commonly called **Weighted Least Squares Estimator**.

## Estimating the uncertainty of the parameters

By using equations 20a and 20b of the notebook `least_squares`, and the weighted least-squares estimator ([equation 8](#eq8)), we define the covariance matrix $\mathbf{\Sigma}_{\breve{\mathbf{p}}}$ of the estimated parameter vector $\breve{\mathbf{p}}$ as follows:

<a id='eq9'></a>
$$
\mathbf{\Sigma}_{\breve{\mathbf{p}}} 
= \left[ \left( \mathbf{A}^{\top} \mathbf{W} \mathbf{A} \right)^{-1}\mathbf{A}^{\top} \mathbf{W}^{\frac{1}{2}} \right] \mathbf{\Sigma_{d}} 
\left[ \left( \mathbf{A}^{\top}\mathbf{W} \mathbf{A} \right)^{-1}\mathbf{A}^{\top} \mathbf{W}^{\frac{1}{2}} \right]^{\top} \: , \tag{9}
$$

where $\mathbf{\Sigma_{d}}$ is the covariance matrix of the observed data $\mathbf{d}$. By considering that the observed data are uncorrelated and have the same variance $\sigma_{\mathbf{d}}^{2}$, the covariance matrix $\mathbf{\Sigma}_{\breve{\mathbf{p}}}$ of the estimated parameter vector $\breve{\mathbf{p}}$ assumes the particular form:

<a id='eq10'></a>
$$
\mathbf{\Sigma}_{\breve{\mathbf{p}}} = \sigma_{\mathbf{d}}^{2} \left( \mathbf{A}^{\top}\mathbf{W} \mathbf{A} \right)^{-1} \: . \tag{10}
$$

Finally, the uncertainty of the $j$-th element of the estimated parameter vector $\breve{\mathbf{p}}$ can be defined as the root square of the $j$-th element of the main diagonal of $\mathbf{\Sigma}_{\breve{\mathbf{p}}}$.