In [None]:
 -- need to revamp, didnt take a look at this yet

## Non-negative Matrix Factorization (NMF) for Clustering

* Non-negative Matrix Factorization is a topic modeling algorithm that factorizes one matrix as a product of two smaller matrices such that all three matrices have no negative values. This can be though of decomposing a whole into two parts.
* Recall the rules for matrix multiplication:
    * $(a \text{ by } b) \text{ x } (c \text{ by } d) = (a \text{ by } d)$

$$W*H = V$$


### Fact
* This algorithm does not have a perfect solution but is numerically approximated. 
* $W$ and $H$ are non-unique.
* Typically produces sparse matrices


### In Practice
* There are some papers that suggest NMF is equivalent to k-means. This is true to some degree, but it is more accurate to say that it behaves like k-means.
* Works particularly well with documents. 


### Other Uses
* Dimensionality reduction (similar to PCA)


**What happens when we decompose a term-document matrix in two?**
* Suppose all the documents news groups (like above). We could imagine commonly co-occuring words grouped together in the same vector, and then each article would have a certain weight of the topic of the group. So for example suppose we have a food vector, then each document would have a particular weighting of words like 'tasty' or 'sushi'. 
* This is where the term topic modeling comes from because we are forming topic vectors that decompose from a term-matrix.
* For example suppose we had the following document-term frequency.

|          | labour | energy | market     | employment | 
|----------|--------|--------|------------|----| 
| Speech 1 | 36     | 3      | 45         | 54 | 
| Speech 2 | 4      | 34     | 23         | 31 | 
| Speech 3 | 9      | 65     | 11         | 0  | 
| Speech 4 | 17     | 3      | 3          | 0  | 
| Speech 5 | 0      | 14     | 7          | 4  | 


* And our algorithm decomposed it to the following $W$ and $H$ matrices, using `n_components=2`.


* $W$ or weights matrix

|          | Factor 1    | Factor 2    | 
|----------|-------------|-------------| 
| Speech 1 | 0.021135218 | 0.63411542  | 
| Speech 2 | 0.26893587  | 0.24248544  | 
| Speech 3 | 0.56521061  | 2.2204e-16  | 
| Speech 4 | 0.028056074 | 0.088332775 | 
| Speech 5 | 0.11666223  | 0.035066365 | 


* $H$ or factors matrix

|          | labour    | energy     | market     | employment |            | 
|----------|-----------|------------|------------|------------| 
| Factor 1 | 10.975128 | 118.16503  | 21.246259  | 2.2204e-16 | 
| Factor 2 | 55.024872 | 0.83496782 | 67.753741  | 89         | 


* Interpretations
    * (5 x 2) * (2 x 4) = (5 x 4)
    * Factor Matrix
        * The factors are the two components.
        * The values are the weights belonging to the particular factor. For example, energy and market seem to be the two largest weighted features in relation to factor 1. In general, the co-occurring features will have weights in relation to one another.
    * Weight Matrix
        * We can use this matrix to determine what component or factor each speech belongs to by considering the maximum value. For example, speech 1 is more so related to factor 2 than it is to factor 1. So can also observe to what degree this difference is.
        


**Some Parameters for sklearn's NMF**
* `n_components`: the number of topics (or clusters)
* `alpha`: multiplication factor for regularization terms (parameter tuning)


In [34]:
from sklearn.decomposition import NMF


nmf = NMF(n_components=20, random_state=43).fit(transformed)
for topic_idx, topic in enumerate(nmf.components_):
    print(f'{topic_idx+1}: ', ', '.join([cv.get_feature_names()[i] for i in topic.argsort()[:-9:-1]]))

1:  max, ma, air, end, usa, distribution, university, organization
2:  wa, did, said, people, know, say, armenian, went
3:  db, bit, data, left, right, time, stuff, place
4:  widget, window, application, use, value, set, display, work
5:  file, gun, control, state, house, law, crime, article
6:  space, center, year, data, nasa, research, ha, program
7:  entry, file, program, section, rule, use, number, source
8:  team, hockey, game, league, new, season, wa, player
9:  drive, disk, hard, support, card, scsi, head, speed
10:  image, format, file, color, data, display, software, program
11:  god, jesus, atheist, christian, doe, believe, people, religion
12:  president, ha, think, going, know, package, wa, said
13:  line, organization, subject, writes, article, university, just, like
14:  use, ground, doe, subject, ha, need, used, power
15:  output, file, program, line, return, entry, write, open
16:  key, encryption, chip, law, technology, government, clipper, device
17:  turkish, jew, ar

* Above we observe the commonly used terms within each topic. For example observe how the terms in topic 11 all refer to religious terms.