# Matrix Factorizations and Data Analysis<a class="tocSkip">

## Sang-Yun Oh <a class="tocSkip">

## Dimensionality and Rank

- Rank of a matrix $\approx$ effective dimensionality of dataset

- Many features does not mean data is rich:
    there may be redundant information

- Matrices could be low-rank: e.g. rank-one approximation of $R = vu^T$
    $$
    \begin{array}{|l|l|l|l|}\hline 
            & u_1 & u_2 & u_3 \\ \hline 
        m_1 & 1 & 0 & 2 \\ \hline 
        m_2 & 2 & 0 & 4 \\ \hline 
        m_3 & 3 & 0 & 6 \\ \hline 
    \end{array}
    = R = uv^T =
    \begin{array}{|l|l|}\hline 
        m_1 & 1  \\ \hline 
        m_2 & 2  \\ \hline 
        m_3 & 3  \\ \hline 
    \end{array}
    \begin{array}{|l|l|l|}\hline 
            u_1 & u_2 & u_3 \\ \hline 
            1 & 0 & 2 \\ \hline 
    \end{array}
    $$

- Large matrix does not mean rank is high: there may be linearly dependency

- Linear dependency on other features:  
    Some columns maybe linear combination of others

## Linear dependence and Redundant information

- Linear combination of vectors:  
$$
\frac{1}{10} \cdot \left[ \begin{array}{l}{2} \\ {3} \\ {4}\end{array}\right]+2 \cdot \left[ \begin{array}{l}{5} \\ {7} \\ {9}\end{array}\right]=\left[ \begin{array}{l}{10.2} \\ {14.3} \\ {18.4}\end{array}\right]
$$

- A matrix (mxn) times a column (nx1) gives  
    one linear combination of the columns of the matrix.

- A matrix (mxn) times a matrix (nxk) has k columns that are  
    each a matrix (mxn) times a column (nx1)

- Two height data columns are linear combination of each other

$$
\begin{array}{|c|c|}\hline \text { Age (days) } & {\text { Height (in) }} \\ \hline 182 & {28} \\ \hline 399 & {30} \\ \hline 725 & {33} \\ \hline\end{array}
\times
\begin{array}{|l|l|l|}\hline 1 & {0} & {0} \\ \hline 0 & {1} & {1 / 12} \\ \hline\end{array}
=
\begin{array}{|c|c|c|}\hline \text { Age (days) } & {\text { Height (in) }} & {\text { Height }(\mathrm{ft})} \\ \hline 182 & {28} & {2.33} \\ \hline 399 & {30} & {2.5} \\ \hline 725 & {33} & {2.75} \\ \hline\end{array}
$$

$$
\small
\begin{array}{|l|l|}\hline \text { width } & {\text { length }} & {\text { area }} \\ \hline 20 & {20} & {400} \\ \hline 16 & {12} & {192} \\ \hline 24 & {12} & {288} \\ \hline 25 & {24} & {600} \\ \hline\end{array}
\times
\begin{array}{|c|c|c|c|}\hline 1 & {0} & {0} & {2} \\ \hline 0 & {1} & {0} & {2} \\ \hline 0 & {0} & {1} & {0} \\ \hline\end{array}
=
\begin{array}{|l|l|l|}\hline \text { width } & {\text { length }} & {\text { area }} & {\text { perimeter }} \\ \hline 20 & {20} & {400} & {80} \\ \hline 16 & {12} & {192} & {60} \\ \hline 24 & {12} & {288} & {72} \\ \hline\end{array}
$$

- What if columns are not *perfect* linear combinations? 

- Columns may be *approximately* a linear combination of others (numerical rank)

## Recommender Systems

### Latent Factor approach for Recommender System

![](images/recommender-latent-factor.png)

[image source](https://ieeexplore.ieee.org/document/5197422)


#### Latent Factor Representation

![](images/movie-latent-factors.png)

[image source](https://ucsb-primo.hosted.exlibrisgroup.com/permalink/f/1egv95m/01UCSB_ALMA51268902820003776)

- Ratings is a combination of user and movie characteristics:  
    $$\begin{aligned} r_{i j} & \approx \sum_{s=1}^{k} u_{i s} \cdot v_{j s} \\ &=\sum_{s=1}^{k}(\text { Affinity of user $i$ to characteristic }s) \times(\text {Affinity of movie } j \text { to characteristic} s) \end{aligned}$$
   

 
- For User $i$'s rating of Movie $j$,  
    $$
    \begin{aligned} r_{i j} \approx &(\text {Affinity of user $i$ to history}) \times(\text {Affinity of item } j \text { to history}) \\ &+(\text {Affinity of user $i$ to romance)} \times(\text {Affinity of item } j\text { to romance) } \end{aligned}
    $$

#### Latent Factor Representation?

- $R$ is _approximately_ low rank: i.e. $U$ and $V$ are rank-2 matrices
    $$ R\approx UV^T, $$  

- Find $U$ and $V$ to minimize matrix norm of residual matrix: e.g.,
    $$ \min_{U,V} \|R - UV^T\|_F = \min_{U,V} \sum_{i,j} (r_{ij} - u_i^T v_j)^2$$
 

  
Ratings | Residual
- | - 
![](images/latent-factor-ratings.png) | ![](images/latent-factor-residual.png)

|                               |                                |
|------------------------------ | ------------------------------ |
|![matmul1](images/matmul1.png) | ![matmul1](images/matmul2.png) |
|$$ C_{ij}=A_{i-}^TB_{-j}=\sum_{k=1}^K A_{ik}B_{kj} $$ |$$ C=\sum_{k=1}^K A_{-k}B_{k-} $$ |
|![matmul3](images/matmul3.png) | ![matmul4](images/matmul4.png) |
|$$ C_{i-}=A_{i-}B $$           |$$C_{-j} = AB_{-j}$$            |

## Matrix Decompositions: Principal Components Analysis

$X$: Data matrix of size $\mathbb{R}^{n\times p}$

- Principal Components Analysis (PCA): $ X = Q Y $
    
    + $Q$: Orthonormal rotation matrix (loadings)
    + $Y$: Rotated data matrix (score)

- Rotation matrix $Q$ is computed to transform data $Y$

- First columns of $Y$ contain a larger proportion of _information_

- PCA can be described in terms of SVD factors

## Non-negative Matrix Factorization

- Assume data $X$ is $p\times n$ matrix of non-negative values

- e.g., images, probabilities, counts, etc

- NMF computes the following factorization:  
$$ \min_{W,H} \| X - WH \|_F\\
\text{ subject to } W\geq 0,\ H\geq 0, $$  
    where $W$ is ${p\times r}$ matrix and $H$ is ${r\times n}$ matrix.

### NMF for Image Analysis

![nmf-faces](images/nmf-faces.png)

### NMF for Hyperspectral image analysis

![nmf-hyper](images/nmf-hyper.png)


### NMF for Topic Discovery

![nmf-topics](images/nmf-topics.png)


- [More NMF examples](https://www.cs.rochester.edu/u/jliu/CSC-576/NMF-tutorial.pdf)

## Other Matrix Factorizations

- [Singular Value Decomposition](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html)

- [Principal Component Analysis](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)

- [Independent Component Analysis](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition)  
    [Blind Source Separation](https://scikit-learn.org/stable/auto_examples/decomposition/plot_ica_blind_source_separation.html)

- [Non-negative Matrix Factorization](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html#sklearn.decomposition.NMF)  
    [Topic Discovery](https://scikit-learn.org/stable/auto_examples/applications/plot_topics_extraction_with_nmf_lda.html)
    [Image Analysis](https://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decomposition.html)

- [Matrix Decompositions](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition)

## References

- [A Tutorial on Principal Component Analysis, Jonathon Shlens](https://arxiv.org/abs/1404.1100)

- [A Tutorial on Independent Component Analysis, Jonathon Shlens](https://arxiv.org/abs/1404.2986)

- UC Berkeley's Data Science 100 lecture notes, John Denero

- [The Why and How of Nonnegative Matrix Factorization - Nicolas Gillis](https://arxiv.org/abs/1401.5226)

- [Matrix-based introduction to multivariate data analysis - Adachi](https://doi.org/10.1007/978-981-15-4103-2)