# NMF

![title](figs/nmf.png)

The idea is to decompose our matrix into two non-negative matricies, $W$ and $H$:

$V \approx W H$

Note that non-negative matrix decomposition is not exact that the solutions are not unique. One of the reasons why NMF is popular is that positive factors are (sometimes) easier to interpret.

We can find the two matricies by SGD. We try to minmize the different between $V$ and $W H$ and introduce an penalty when the elements are negative.

In [1]:
import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer

import torch
import torch.optim as optim

In [2]:
categories = ['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space']
remove = ('headers', 'footers', 'quotes')
newsgroups = fetch_20newsgroups(subset='train', categories=categories, remove=remove)

In [3]:
tfidf = TfidfVectorizer(stop_words='english')
X = tfidf.fit_transform(newsgroups.data) # (documents, vocab)
X = X.todense()
X = np.array(X)

In [4]:
num_components = 100
lambd = 10
device = "cuda"

In [5]:
X = torch.from_numpy(X).float().to(device)

In [6]:
torch.manual_seed(42)
W = torch.abs(torch.normal(0, 0.01, size=(X.shape[0], num_components))).float().to(device)
W.requires_grad = True
H = torch.abs(torch.normal(0, 0.01, size=(num_components, X.shape[1]))).float().to(device)
H.requires_grad = True

In [7]:
def penalty(W, H):
    return torch.clamp(-W, min=0).mean() + torch.clamp(-H, min=0).mean()

def loss_fct(X, W, H):
    return torch.norm(X - W @ H) + lambd * penalty(W, H)

In [8]:
optimizer = optim.Adam([W, H], lr=1e-3, betas=(0.9, 0.9))

In [9]:
for epoch in range(1000):
    optimizer.zero_grad()
    loss = loss_fct(X, W, H)
    loss.backward()
    optimizer.step()
    
    if epoch % 100 == 0:
        print(f"Epoch: {epoch}, Loss: {loss}")

Epoch: 0, Loss: 63.71663284301758
Epoch: 100, Loss: 41.273345947265625
Epoch: 200, Loss: 40.1900520324707
Epoch: 300, Loss: 40.085777282714844
Epoch: 400, Loss: 40.077972412109375
Epoch: 500, Loss: 40.075965881347656
Epoch: 600, Loss: 40.07423782348633
Epoch: 700, Loss: 40.07283401489258
Epoch: 800, Loss: 40.07253646850586
Epoch: 900, Loss: 40.07244873046875


In [10]:
W

tensor([[ 0.0977,  0.0420,  0.0748,  ...,  0.0481,  0.0441,  0.0490],
        [ 0.0293,  0.0296,  0.0064,  ...,  0.0093,  0.0289,  0.0092],
        [ 0.0368,  0.0383,  0.0354,  ...,  0.0362,  0.0331,  0.0527],
        ...,
        [ 0.0449,  0.0126,  0.0441,  ...,  0.0215,  0.0585,  0.0350],
        [ 0.0230,  0.0121,  0.0377,  ..., -0.0030,  0.0379,  0.0163],
        [ 0.0014,  0.0013,  0.0010,  ...,  0.0011,  0.0013,  0.0013]],
       device='cuda:0', requires_grad=True)