# L1 regression

This notebook presents some theoretical developments for deducing the Iteratively Reweighted Least Squares (IRLS)  (Scales and Gersztenkorn, 1988; Aster et al., 2019, p. 46).

### References

* Scales, J. A., and Gersztenkorn, A. 1988. *Robust methods in inverse theory*, **Inverse Problems**, 4(4), 1071-1091, doi: <a href="https://doi.org/10.1088%2F0266-5611%2F4%2F4%2F010" target="_blank">10.1088/0266-5611/4/4/010</a>

* Aster, R. C., Borchers, B., and Thurber, C. H. Parameter Estimation and Inverse Problems, 3rd edition Elsevier Academic Press, 2019, ISBN: 978-0-12-804651-7

Ordinary least squares solutions are adversely affected by outliers since the large deviations are averaged into the solution. On the other hand, least-absolute deviation optimization is known to be resistant to certain (potentially large) errors in the data. So, as an alternative to least squares in the presence of outliers, consider the solution that minimizes the $\ell_{1}$-norm of the residuals vector (Scales and Gersztenkorn, 1988; Aster et al., 2019, p. 46),

<a id='eq1'></a>
$$
\Phi_{abs}(\mathbf{p}) = \sum\limits_{i = 0}^{N-1} \mid r_{i} \mid \quad , \tag{1}
$$

where $r_{i}$ is the $i$th element of the residuals vector

<a id='eq2'></a>
$$
\mathbf{r} = \mathbf{d} - \mathbf{A} \mathbf{p} \quad . \tag{2}
$$

The parameter vector $\acute{\mathbf{p}}$ minimizing the function $\Phi_{abs}(\mathbf{p})$ ([equation 1](#eq1)) is less affected by outliers and, because of that, it is considered a *robust* solution.

By following the same reasoning presented for ordinary least squares (notebook `least_squares`) and weighted least squares (notebook `weighted_least_squares`), we start our investigation about the solution $\acute{\mathbf{p}}$ by computing the $j$th element of the gradient of $\Phi_{abs}(\mathbf{p})$ ([equation 1](#eq1)):

<a id='eq3'></a>
$$
\begin{split}
\dfrac{\partial \, \Phi_{abs}(\mathbf{p})}{\partial \, p_{j}} 
&= \sum\limits_{i = 0}^{N-1} \frac{r_{i}}{\mid r_{i} \mid} \left( - a_{ij} \right) \\
&= - \mathbf{u}_{j}^{\top} \mathbf{A}^{\top} \mathbf{R} \left( \mathbf{d} - \mathbf{A} \mathbf{p} \right)
\end{split} \quad , \tag{3}
$$

where $\mathbf{u}_{j}$ is a $M \times 1$ vector whose $j$ th element is equal to $1$ and all the remaining elements are equal to $0$ and 

<a id='eq4'></a>
$$
\mathbf{R} = \begin{bmatrix}
\frac{1}{\mid r_{0} \mid} & & \\
& \ddots & \\
& & \frac{1}{\mid r_{N-1} \mid}
\end{bmatrix} \quad . \tag{4}
$$

By using the derivative defined by [equation 3](#eq3), we obtain the gradient of $\Phi_{abs}(\mathbf{p})$ ([equation 1](#eq1)) given by:

<a id='eq5'></a>
$$
\nabla \Phi_{abs}(\mathbf{p}) = - \mathbf{A}^{\top} \mathbf{R} \left( \mathbf{d} - \mathbf{A}\mathbf{p} \right) \: . \tag{5}
$$

Finally, by evaluating the gradient $\nabla \Phi_{abs}(\mathbf{p})$ ([equation 5](#eq5)) at $\mathbf{p} = \acute{\mathbf{p}}$, we obtain 

<a id='eq6'></a>
$$
\begin{align}
\nabla \Phi_{abs}(\acute{\mathbf{p}}) &= - \mathbf{A}^{\top} \mathbf{R} \left( \mathbf{d} - \mathbf{A}\acute{\mathbf{p}} \right) \\
\mathbf{0} &= -\mathbf{A}^{\top} \mathbf{R} \mathbf{d} + \mathbf{A}^{\top} \mathbf{R} \mathbf{A}\acute{\mathbf{p}}
\end{align} \tag{6}
$$

resulting that

<a id='eq7'></a>
$$
\left( \mathbf{A}^{\top} \mathbf{R} \mathbf{A} \right) \acute{\mathbf{p}} = \mathbf{A}^{\top} \mathbf{R} \, \mathbf{d} \quad . \tag{7}
$$

As $\mathbf{R}$ depends on the residuals, this equation must be used iteratively. This equation represents one iteration of the **IRLS** method (Scales and Gersztenkorn, 1988; Aster et al., 2019, p. 46).

The method starts with $\mathbf{R} = \mathbf{I}$, which results in an ordinary least-squares solution $\mathbf{p}_{0}$. This solution is used to compute the first residuals vector $\mathbf{r}_{0}$ and the matrix $\mathbf{R}_{0}$. Then, [equation 7](#eq7) is solved to obtain a new vector $\mathbf{p}_{1}$, which is used for computing a new residuals vector $\mathbf{r}_{1}$ and a matrix $\mathbf{R}_{1}$. This process is commonly repeated until the following criterion be satisfied (Aster et al., 2019, p. 47):

<a id='eq8'></a>
$$
\frac{\| \mathbf{p}_{k} - \mathbf{p}_{k-1} \|_{2}}{1 + \| \mathbf{p}_{k} \|_{2}} < \tau \quad , \tag{8}
$$

where $\tau$ is a specified tolerance.

## Estimating the uncertainty of the parameters

A way of propagating the uncertainties from data to the estimated parameters in the IRLS method is the Monte Carlo error propagation (Aster et al., 2019, p. 48). In this method, the first steps consists in obtaining an IRLS solution $\acute{\mathbf{p}}$. Then, a set of IRLS solutions $\acute{\mathbf{p}}_{q}$, $q = 0, \dots, Q-1$, is obtained by using different noise realizations to contaminate the observed data. A $Q \times M$ matrix $\mathbf{C}$ is formed by rows $\mathbf{c}_{q} = \acute{\mathbf{p}}_{q} - \acute{\mathbf{p}}$. Finally, an empirical estimate of the covariance matrix $\boldsymbol{\Sigma}_{\acute{\mathbf{p}}}$ is obtained in this fashion:

<a id='eq9'></a>
$$
\boldsymbol{\Sigma}_{\acute{\mathbf{p}}} = \frac{\mathbf{C}^{\top}\mathbf{C}}{Q} \quad . \tag{9}
$$