# Singular Value Decomposition

SVD is used in different places such as PCA and Closed form matrix regression. To begin, the goal of SVD is to take an input matrix $X$ that is of shape $n \times p$ and _decompose_ it into 3 individual terms. This kind of decomposition is always possible, and there is only ever one unique decomposition.


<img src="../../../assets/0-foundations/svd-graphic-1.png" alt="svd-graphic-1" style="width: 260px;"/>

<img src="../../../assets/0-foundations/svd-graphic-2.png" alt="svd-graphic-2" style="width: 260px;"/>

$$X = \sum_{i=1}^m d_i u_i v_i^T = U_{n \times m} D_{r \times r} V^T_{n \times r}$$

Where
* $A$ is the input matrix
* $U$ is the left singular vectors
* $D$ is the singular values, a diagonal matrix
* $V$ is the right singular vectors
* $r$ is the rank of $A$

Some other properties include:

* $U^T U = 1$ and $V^T V = 1$, that is - the sum of each column squared of $U$ and $V$ are always one. More over the product of and $U$ or $V$ column are zero. In other words, $U$ and $V$ are column orthonormal.

The two illstrations graphic two different kinds of decomposition. The first, as a matrix product, and the second as a sum of individual product vectors.

## Example


Suppose we have a user rating matrix where each row is the rating a user has for a movie (column).
$$
A = 
\begin{bmatrix}
    1 & 1 & 1 & 0 & 0 \\
    3 & 3 & 3 & 0 & 0 \\
    4 & 4 & 4 & 0 & 0 \\
    5 & 5 & 5 & 0 & 0 \\
    0 & 2 & 0 & 4 & 4 \\
    0 & 0 & 0 & 5 & 5 \\
    0 & 1 & 0 & 2 & 2
\end{bmatrix}
$$

The idea with SVD is break the matrix down into groups or concepts. Where for example, we can organize groups of users who like sci-fi films and other group that likes old roman type films.

$$
A = 
\begin{bmatrix}
    1 & 1 & 1 & 0 & 0 \\
    3 & 3 & 3 & 0 & 0 \\
    4 & 4 & 4 & 0 & 0 \\
    5 & 5 & 5 & 0 & 0 \\
    0 & 2 & 0 & 4 & 4 \\
    0 & 0 & 0 & 5 & 5 \\
    0 & 1 & 0 & 2 & 2
\end{bmatrix}
=
\begin{bmatrix}
    .13 & .02  &-.01 \\
    .41 & .07  &-.03 \\
    .55 & .09  &-.04 \\
    .68 & .11  &-.05 \\
    .15 & -.59 & .65 \\
    .07 & -.73 & -.67 \\
    .07 & -.29 & .32
\end{bmatrix}
\times
\begin{bmatrix}
    12.4 & 0   & 0 \\ 
    0    & 9.5 & 0 \\
    0    & 0   & 1.3
\end{bmatrix}
\times
\begin{bmatrix}
    .56 & .59  & .56 & .09  & .09 \\
    .12 & -.02 & .12 & -.69 & -.69 \\
    .40 & -.80 & .50 & .09  & .09
\end{bmatrix}
$$

The way to interpret this decomposition is:

* In $U$ the columns correspond to the strengths of each identified concepts on a per user basis. For example, the first column might corresponds to the strength of the sci-fi concept, and the second column, to the strength of the roman concept on a per user basis.
* In $D$ each value along the diagon represents the strength of each individual concept. For example, 12.4 as an overall concept has a high strength than 9.5, the concept for roman film. The final concept of 1.3 has low strength when compared to the other two concepts. For low strengths, the concept could just be measuring noise and potentially ignored.
* In $V$, each row corresponds to a concept and the values in each row coorespond to the strength each particular film belongs to a concept. In the example above we have three concepts for 3 different film genres, and the value for each movie (column) corresponds to how similiar that film is to each concept.

The two illstrations graphic two different kinds of decomposition. The first, as a matrix product, and the second as a sum of individual product vectors.