## Latent Semantic Analysis (LSA)

LSA utilises a common technique for dimension reduction, namely Singular Value Decomposition (SVD).
<br>
SVD decomposes a matrix into three square matrices - one of which is diagonal.
* Applications - Given that SVD utilises matrix inversion as its core transformation, it allows for many real word uses within data science including behaviour-based recommendation engines that run alongside content-based NLP recommendation engines
* SVD purpose - Allows Truncation of those matrices (ignore some rows/columns) before multiplying them back together, which reduces the number of dimensions one has to deal with in our vector space model
* Modified transformation - Truncated matrices can give a slightly better TF-IDF matrix representation then the one started with. The new representation of documents contains the essence (latent semantics) of those documents. It captures the essence of a dataset and ignores the noise, making it useful for applications the require compression
* Summary - SVD used in NLP is seen as LSA, which uncovers the meanings of words that is hidden and urging to be explored

***Technical explanation behind LSA:***
<br>
<br>
LSA is a mathematical technique for finding the 'best' way to linearly transform (rotate and stretch) any set of NLP vectors e.g. BOW or TF-IDF vectors.
* Optimisation - The ideal method for different applications is to line up the axes (dimensions) in the new vectors with the greatest variance in the word frequencies
* Filtering - We can then eliminate those dimensions in the new vector space that do not contribute much to the variance in the vectors from document to document
* Related concept - **Principal Component Analysis** (PCA) on TF-IDF vectors is identical to LSA on natural language documents, which is useful for problems and areas involving *feature engineering*
* Computation - LSA uses SVD to find the combinations of words that are responsible (together), for the greatest variation in the data. As mentioned earlier, we rotate TF-IDF vectors so that the new dimensions (basis vectors) of our rotated vectors all allign with these maximum variance directions. The basis vectors comprise of the axes of our new vector space, which are analogous to our new vector space. Each of the dimensions becomes a combination of word frequencies rather than a single word frequency.
* Interpretation - We can think of the output vectors as the weighted combinations of words that make up various 'topics' used throughout a given corpus

The machine/programme doesn't know what the combinations of words means, it just identifies that they go together.
* Words together - Seeing words like 'dog', 'cat' and 'love' together frequently means the programme will cluster them terms together under a topic
* Topic identification - The programme doesn't automatically 