This posting has two different looks at how we can minimize the length of some vector $\mathbf x$ in an underdetermined (but full row rank) system of equations $\mathbf{Ax} = \mathbf b$.

The underlying scalars are in $\mathbb R$


consider the equation $\mathbf {A x} = \mathbf b$, where we know $\mathbf A$ and $\mathbf b$, and need to solve for $\mathbf x$. For avoidance of doubt $\mathbf b \neq \mathbf 0$.  $\mathbf A$ is an m x n matrix.

In the case where $\mathbf A$ is long and skinny (and with noise in the data, this should equate to an overdetermined system of equations, with full column rank), we can use ordinary least squares to solve for the $\mathbf x$ that minimizes the squared length (2 norm), of $\mathbf v = \mathbf {Ax} - \mathbf b$, thus we are minimizing $\mathbf v^T \mathbf v$.  We may may use the Normal Equations, or QR factorization, or many other tools at our disposal.  

If $\mathbf A$ is square and of full rank, then we can directly invert $\mathbf A$, or use Gaussian elimination or whatever tool we want. 


Now consider the case where the $\mathbf A$ has more columns than rows -- i.e. n > m-- (and again there is noise in the data, so we have full row rank), this means that there are *many* solutions to $\mathbf {A x} = \mathbf b$, because $\mathbf A$ has a non-trivial nullspace.  In this case, we first will want to question why we have this situation, and perhaps gather more data. If we still want to 'solve' this equation, what form might we take?  We have many solutions at our disposal, so perhaps one that minimizes the length (2 norm) of $\mathbf x$ is the one we want. 

There are basically two approaches to solving this.  

First the algebraic one.

$\mathbf {A x} =\big( \mathbf {U \Sigma V}^T\big)\mathbf x = \mathbf b$, using the Singular Value Decomposition, where $\mathbf U$ and $\mathbf V$ are both rull rank, square, orthogonal matrices, but because $\mathbf A$ is not square, $\mathbf \Sigma$ is a diagonal matrix that has more columns than rows.  

That is $\mathbf A$ is an m x n matrix with rank m (meaning that each singular value > 0)

$\mathbf A =
\bigg[\begin{array}{c|c|c|c}
\mathbf u_1 & \mathbf u_2 &\cdots & \mathbf u_{m}
\end{array}\bigg] \begin{bmatrix}
\sigma_1 & 0 &0  &0 & ... &0 \\ 
0 & \sigma_2& 0 & 0& ... &0\\ 
0 & 0 &  \ddots & 0& ... &0 \\ 
0 & 0 & 0 & \sigma_m& ... &0  
\end{bmatrix} \bigg[\begin{array}{c|c|c|c}
\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}
\end{array}\bigg]^T
$   


if we left multiply both sides of $\mathbf {A x} $ by $\mathbf U^T$, we get 

$\mathbf {\Sigma V}^T \mathbf x = \begin{bmatrix}
\sigma_1 & 0 &0  &0 & ... &0 \\ 
0 & \sigma_2& 0 & 0& ... &0\\ 
0 & 0 &  \ddots & 0& ... &0 \\ 
0 & 0 & 0 & \sigma_m& ... &0  
\end{bmatrix} \bigg[\begin{array}{c|c|c|c}
\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}
\end{array}\bigg]^T \mathbf x = \mathbf U^T \mathbf b$

now with an **abuse of notation**, consider left multiplying by $\mathbf \Sigma^{-1}$:

where $\mathbf \Sigma^{-1} = \Big(\begin{bmatrix}
\frac{1}{\sigma_1} & 0 &0  &0 &  ... &0 \\ 
0 & \frac{1}{\sigma_2}& 0 & 0& ... &0\\ 
0 & 0 &  \ddots & 0& ... &0 \\ 
0 & 0 & 0 & \frac{1}{\sigma_m}& ... &0  
\end{bmatrix}^T$ 

Thus it is not technically an inverse or a left inverse... $\mathbf \Sigma^{-1}$ **is actually a right inverse** but we ultimately are multiplying on the left because that is all we can do here -- hence this is an abuse of notation.

$\mathbf {(P)V}^T \mathbf x = 
\Big(\begin{bmatrix}
\frac{1}{\sigma_1} & 0 &0  &0 &  ... &0 \\ 
0 & \frac{1}{\sigma_2}& 0 & 0& ... &0\\ 
0 & 0 &  \ddots & 0& ... &0 \\ 
0 & 0 & 0 & \frac{1}{\sigma_m}& ... &0  
\end{bmatrix}^T \begin{bmatrix}
\sigma_1 & 0 &0  &0 & ... &0 \\ 
0 & \sigma_2& 0 & 0& ... &0\\ 
0 & 0 &  \ddots & 0& ... &0 \\ 
0 & 0 & 0 & \sigma_m& ... &0  
\end{bmatrix}\Big) \bigg[\begin{array}{c|c|c|c}
\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}
\end{array}\bigg]^T \mathbf x =  \mathbf \Sigma^{-1}\mathbf U^T \mathbf b$


$\mathbf {(P)V}^T \mathbf x = \begin{bmatrix}
1 & 0 &0  &0 & \mathbf 0^T \\ 
0 & 1 & 0 & 0& \mathbf 0^T\\ 
0 & 0 &  \ddots & 0& \mathbf 0^T \\ 
0 & 0 & 0 & 1 & \mathbf 0^T  \\
\mathbf 0 & \mathbf 0 & \mathbf 0 & \mathbf 0 & \mathbf 0\mathbf 0^T  \\ 
\end{bmatrix} \bigg[\begin{array}{c|c|c|c}
\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}
\end{array}\bigg]^T \mathbf x = \begin{bmatrix}
\mathbf I & \mathbf{00}^T \\ 
\mathbf {00}^T & \mathbf {00}^T  \\ 
\end{bmatrix} \bigg[\begin{array}{c|c|c|c}
\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}
\end{array}\bigg]^T \mathbf x = \mathbf \Sigma^{-1}\mathbf U^T \mathbf b$

Which is to say that $\mathbf P$ is an (already diagaonlized) projection matrix -- i.e. an n x n identity matrix except it has n - m diagonal elements equal to 0. (Note that to deal with notational overload, $\mathbf {0}$ is to be the appropriately sized zero vector, and $\mathbf {00}^T$ is the appropriately sized zero matrix.)  

From here multiply both sides by $\mathbf V$, and we get 

$\mathbf {VPV}^T \mathbf x =  \bigg[\begin{array}{c|c|c|c}
\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}
\end{array}\bigg] \begin{bmatrix}
\mathbf I & \mathbf{00}^T \\ 
\mathbf {00}^T & \mathbf {00}^T  \\ 
\end{bmatrix}\bigg[\begin{array}{c|c|c|c}
\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}
\end{array}\bigg]^T \mathbf x= \mathbf {V\Sigma}^{-1}\mathbf U^T \mathbf b$

Note that if $\mathbf A$ was full rank, $\mathbf P = \mathbf I$, and $\mathbf \Sigma^{-1}$ would be an actual inverse, not an abuse of notation (right inverse in this case), and hence we would have solved our equation.  

That is, if $\mathbf A$ was full rank, we would have had:


$\mathbf {VIV}^T \mathbf x = \mathbf {VV}^T \mathbf x = \big(\mathbf{v_1 v_1}^T + \mathbf{v_2 v_2}^T +... + \mathbf{v_n v_n}^T\big) \mathbf x = \mathbf {I} \mathbf x = \mathbf x = \mathbf {V\Sigma}^{-1}\mathbf U^T \mathbf b$

but instead what we have is

$\Big(\big(\mathbf{v_1 v_1}^T + \mathbf{v_2 v_2}^T +... + \mathbf{v_m v_m}^T\big) + \big(0*\mathbf{v_{m+1} v_{m+1}}^T + 0* \mathbf{v_{m+2} v_{m+2}}^T + ... + 0*\mathbf{v_n v_n}^T\big)\Big) \mathbf x = \mathbf {V\Sigma}^{-1}\mathbf U^T \mathbf b$

or more simply 

$= \big(\mathbf{v_1 v_1}^T + \mathbf{v_2 v_2}^T +... + \mathbf{v_m v_m}^T\big) \mathbf x =  \mathbf {V\Sigma}^{-1}\mathbf U^T \mathbf b $

Now recall that $\mathbf V = \bigg[\begin{array}{c|c|c|c}
\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}
\end{array}\bigg]$, which is an n x n orthogonal matrix -- that is $\mathbf V$ can be thought of as a coordinate system.  Thus our solution vector $\mathbf x$ can be a linear combination of the columns of $\mathbf V$.  We can write this as 

$\mathbf x = \mathbf{Vy} = y_1*\mathbf v_1 + y_2*\mathbf v_2 + ... + y_m*\mathbf V_m +  y_{m+1}*\mathbf V_{m+1} + ... + y_{n}*\mathbf V_{n}$

and recalling that since $\mathbf V$ is orthogonal, it is length preserving, so: $\big \vert\big \vert \mathbf x \big \vert \big \vert_2^{2} =  \big \vert\big \vert \mathbf{Vy} \big \vert\big \vert_2^{2} = \big \vert \big \vert \mathbf y \big \vert\big \vert_2^{2} = \mathbf y^T \mathbf y = y_1^2 + y_2^2 + ... + y_m^2 + y_{m+1}^2 + ... + y_n^2$

we substitute in and get 

$\big(\mathbf{v_1 v_1}^T + \mathbf{v_2 v_2}^T +... + \mathbf{v_m v_m}^T\big) \big(y_1*\mathbf v_1 + y_2*\mathbf v_2 + ... + y_m*\mathbf v_m +  y_{m+1}*\mathbf v_{m+1} + ... + y_{n}*\mathbf v_{n}\big) = \mathbf {V\Sigma}^{-1}\mathbf U^T \mathbf b$

which by the orthogonality of the columns in $\mathbf V$ equals:

$\mathbf x = y_1  \mathbf{v_1} + y_2\mathbf v_2 +... + y_m  \mathbf{v_m} = \mathbf {V\Sigma}^{-1}\mathbf U^T \mathbf b $

From here we notice that any $y_k$, for $k \gt m$ contributes to the length of $\mathbf x$ but does not contribute to the solution of the problem (i.e. they are in the null space).  

Thus the minimal length solution to the underdetermined $\mathbf {Ax} = \mathbf b$ comes in the form of a solution to the equation that has $\mathbf x $ written purely as a linear combination of $\{\mathbf v_1, \mathbf v_2, ..., \mathbf v_m \}$

Finally, we notice that we can recover the original equation by left multiplying by 3 things: 


$\Big(\big(\mathbf U\big) \big(\mathbf \Sigma \big) \big(\mathbf V^T\big)\Big) \mathbf x = \mathbf{Ax} = \Big(\big(\mathbf U\big) \big(\mathbf \Sigma \big) \big(\mathbf V^T\big)\Big) \Big(\mathbf {V\Sigma}^{-1}\mathbf U^T \mathbf b\Big) = \mathbf U \big( \mathbf \Sigma \mathbf \Sigma^{-1} \big) \mathbf U^T \mathbf b = \mathbf U \mathbf I \mathbf U^T \mathbf b  = \mathbf b$

noting that $\mathbf{\Sigma \Sigma}^{-1}= \mathbf I$ because $\mathbf \Sigma^{-1}$ is a right inverse

An alternative approach to solving the above, uses Lagrange multipliers.  

In this case, let 

$\mathbf d  =  \begin{bmatrix}
\lambda_1\\ 
\lambda_2\\ 
\vdots \\ 
\lambda_m
\end{bmatrix}
$

i.e. $\mathbf d$ is a vector containing the Lagrange multipliers

we setup the Lagrangian we want to minimize as

$L(\mathbf x) = \mathbf x^T \mathbf x + \mathbf d^T  \big(\mathbf{Ax} - \mathbf b \big)$

$\nabla_{\mathbf x} = 2 \mathbf x + \mathbf A^T \mathbf d := \mathbf 0$

Note: check dimensions to see why it is not $\mathbf d^T \mathbf A$

$\nabla_{\mathbf d } = \mathbf{Ax} - \mathbf b := \mathbf 0$

solving $\nabla_{\mathbf x}$ first, we see:

$\mathbf x = \frac{-1}{2} \mathbf{A}^T \mathbf d$

now substitute into the second equations $\nabla_{\mathbf d }$, we see

$\mathbf{A}\big(\frac{-1}{2} \mathbf{A}^T \mathbf d \big) - \mathbf b := \mathbf 0$

hence $\mathbf d = -2\big(\mathbf{AA}^T\big)^{-1} \mathbf b$

and here we plug this back into $\nabla_{\mathbf x}$  

$\mathbf 0 = 2 \mathbf x + \mathbf A^T \mathbf d  = 2 \mathbf x + \mathbf A^T \Big(-2\big(\mathbf{AA}^T\big)^{-1} \mathbf b\Big)$


$-2 \mathbf x = -2\mathbf A^T \big(\mathbf{AA}^T\big)^{-1} \mathbf b$ 

or 

$\mathbf x = \mathbf A^T \big(\mathbf{AA}^T\big)^{-1} \mathbf b $

and of course, if we used the SVD on $\mathbf A$, we'd see

$\mathbf x = \mathbf{V \Sigma}^T \mathbf U^T \big(\mathbf{U \Sigma \Sigma }^T \mathbf U^T\big)^{-1} \mathbf b = \mathbf{V \Sigma}^T \mathbf U^T \mathbf{U \big(\Sigma \Sigma }^T\big)^{-1} \mathbf U^T \mathbf b = \mathbf{V \Sigma}^T \big(\mathbf{\Sigma \Sigma }^T\big)^{-1} \mathbf U^T \mathbf b = \mathbf{V \Sigma}^{-1} \mathbf U^T \mathbf b $

where we recover the equation from the earlier approach, and as before 

where $\mathbf \Sigma^{-1} = \begin{bmatrix}
\frac{1}{\sigma_1} & 0 &0  &0 &  ... &0 \\ 
0 & \frac{1}{\sigma_2}& 0 & 0& ... &0\\ 
0 & 0 &  \ddots & 0& ... &0 \\ 
0 & 0 & 0 & \frac{1}{\sigma_m}& ... &0  
\end{bmatrix}^T$, 

which is to say that $\mathbf \Sigma^{-1}$ is the right inverse of the $\mathbf \Sigma$ matrix

