# NMF

The idea is to decompose our matrix into two non-negative matricies, $W$ and $H$:

$X \approx W H$

Note that non-negative matrix decomposition is not exact that the solutions are not unique. One of the reasons why NMF is popular is that positive factors are (sometimes) easier to interpret.

We can find the two matricies by SGD. We try to minmize the different between $X$ and $W H$ and introduce an penalty when the elements are negative.

In [1]:
import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer

import torch
import torch.optim as optim

In [2]:
categories = ['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space']
remove = ('headers', 'footers', 'quotes')
newsgroups = fetch_20newsgroups(subset='train', categories=categories, remove=remove)

In [3]:
tfidf = TfidfVectorizer(stop_words='english')
X = tfidf.fit_transform(newsgroups.data) # (documents, vocab)
X = X.todense()
X = np.array(X)

In [4]:
num_components = 100
lambd = 10
device = "cuda"

In [5]:
X = torch.from_numpy(X).float().to(device)

In [6]:
W = torch.abs(torch.normal(0, 0.01, size=(X.shape[0], num_components))).float().to(device)
W.requires_grad = True
H = torch.abs(torch.normal(0, 0.01, size=(num_components, X.shape[1]))).float().to(device)
H.requires_grad = True

In [7]:
def penalty(W, H):
    return torch.clamp(-W, min=0).mean() + torch.clamp(-H, min=0).mean()

def loss_fct(X, W, H):
    return torch.norm(X - W @ H) + lambd * penalty(W, H)

In [8]:
optimizer = optim.Adam([W, H], lr=1e-3, betas=(0.9, 0.9))

In [9]:
for epoch in range(1000):
    optimizer.zero_grad()
    loss = loss_fct(X, W, H)
    loss.backward()
    optimizer.step()
    
    if epoch % 100 == 0:
        print(f"Epoch: {epoch}, Loss: {loss}")

Epoch: 0, Loss: 63.59439468383789
Epoch: 100, Loss: 41.27014923095703
Epoch: 200, Loss: 40.1865348815918
Epoch: 300, Loss: 40.08699417114258
Epoch: 400, Loss: 40.078739166259766
Epoch: 500, Loss: 40.075218200683594
Epoch: 600, Loss: 40.075225830078125
Epoch: 700, Loss: 40.07392120361328
Epoch: 800, Loss: 40.072601318359375
Epoch: 900, Loss: 40.07277297973633


In [10]:
W

tensor([[ 0.0506,  0.0483,  0.0713,  ...,  0.0560,  0.0603,  0.0159],
        [ 0.0327,  0.0294,  0.0477,  ...,  0.0410,  0.0361,  0.0117],
        [ 0.0503,  0.0719,  0.0191,  ...,  0.0104,  0.0182,  0.0418],
        ...,
        [ 0.0441,  0.0494,  0.0585,  ...,  0.0342,  0.0145,  0.0278],
        [-0.0007,  0.0592,  0.0430,  ...,  0.0129, -0.0108,  0.0674],
        [ 0.0007,  0.0010,  0.0010,  ...,  0.0008,  0.0010,  0.0008]],
       device='cuda:0', requires_grad=True)