## Singular Value Decomposition (SVD)

SVD is the core algorithm behind LSA.
<br>
As an instance, start with a corpus containing 11 documents and a vocabulary of 6 words:

In [7]:
from nlpia.book.examples.ch04_catdog_lsa_sorted import lsa_models, prettify_tdm

In [8]:
bow_svd, tfidf_svd = lsa_models()
prettify_tdm(**bow_svd)

100%|██████████| 263/263 [00:00<00:00, 202404.03it/s]


Unnamed: 0,cat,dog,apple,lion,nyc,love,text
0,,,1.0,,1.0,,NYC is the Big Apple.
1,,,1.0,,1.0,,NYC is known as the Big Apple.
2,,,,,1.0,1.0,I love NYC!
3,,,1.0,,1.0,,I wore a hat to the Big Apple party in NYC.
4,,,1.0,,1.0,,Come to NYC. See the Big Apple!
5,,,1.0,,,,Manhattan is called the Big Apple.
6,1.0,,,,,,New York is a big city for a small cat.
7,1.0,,,1.0,,,"The lion, a big cat, is the king of the jungle."
8,1.0,,,,,1.0,I love my pet cat.
9,,,,,1.0,1.0,I love New York City (NYC).


The above matrix shown is a document-term matrix (dtm) where each row is a vector of the BOW for a document.
* Interpretation - The sorting algorithm and the limited vocabulary created several identical BOW vectors (NYC, apple)
* SVD operations - SVD should be able to notice this and allocate a topic to such pair of words


Furthermore, we can use SVD on the term-document matrix (tdm) - the transposition of a dtm - where it can work on TF-IDF matrices or any other vector space model:

In [9]:
tdm = bow_svd['tdm']
tdm

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
cat,0,0,0,0,0,0,1,1,1,0,1
dog,0,0,0,0,0,0,0,0,0,0,1
apple,1,1,0,1,1,1,0,0,0,0,0
lion,0,0,0,0,0,0,0,1,0,0,0
nyc,1,1,1,1,1,0,0,0,0,1,0
love,0,0,1,0,0,0,0,0,1,1,0


SVD is an algorithm for decomposing any matrix into three factors - three matrices that can be multiplied together to recreate the original matrix.
* Purpose - The three matrix factors computed with SVD contain some convenient mathematical properties we can exploit for dimension reduction and LSA. LSA will be used to figure out topics (group of related words) need to be

Whether we run SVD on a word vector represention (BOW or TFIDF term-document matrices), SVD will find combinations of words that belong together 
* Process - SVD finds those co-occurring words by calculating the correlation between the columns (terms) of our term-document matrix
* Computation - SVD simultaneously finds the correlation of term use between documents and the correlation of documents with each other, additionally computing the linear combinations of terms that have the greatest variation across the corpus
* Filtering and dimensions reduction - We'll only keep those topics that retain the most information i.e. the most variance in our corpus
* Transformation - SVD gives us the linear transformation (rotation) of our term-document vectors to convert those vectors into shorter topic vectors for each document

SVD will group together terms that have high correlation with each other (given they appear in the same documents together frequently) and also vary together a lot over the set of documents.
<br>
These linear combinations of words are seen as 'topics'.
<br>
These topics that turn our BOW/TF-IDF vectors into topic vectors that tell us the topics a document is about.
<br>
A topic vector provides a summarization or generalization of what the document is about.

In mathematical terms, SVD like this:
<br>
<br>
$W_{mxn}$ &#8594; $U_{mxp}$$S_{pxp}$$V_{pxn}$$^{T}$

* $m$ - The number of terms in one's vocabulary 
* $n$ - The number of documents in a corpus 
* $p$ - The number of topics in a corpus i.e. the number of words 

We want to eventually end up with fewer topics than words, so we can use those topic vectors (rows of topic-document matrix) as a reduced-dimension representation of the original TF-IDF vectors. 
<br>
But currently in this example as a first stage, we retain all the dimensions in our matrices.
<br>
It's key to uncover what these three matrices (U, S and V) look like.