## Demo 1 -- PyTorch FFNNs with layer objects

We'll just do a little light text classification with the Reuters "crude"-topic and "grain"-topic articles.

In [1]:
import os
import sys
from sklearn.feature_extraction.text import TfidfVectorizer
from torch import nn
import torch.functional as F

### Loading and preparing the data

In [2]:
from glob import glob

def load_files(directory, classname):
    # Takes a directory name and loads all the .txt as file ids with the classname as classname.
    filenames = glob(directory + "/*.txt")
    return (open(f, "r").read() for f in filenames), [classname] * len(filenames)

Note the use of the generator expression---it helps us avoid having too many files open at once. Lazy evaluation!

In [3]:
crudefiles, y_crude = load_files("/scratch/reuters-topics/crude", "crude")

In [4]:
grainfiles, y_grain = load_files("/scratch/reuters-topics/grain", "grain")

In [5]:
import itertools

allfiles = itertools.chain(crudefiles, grainfiles)

We're just going to get tfidf vectors in one line, basically.

In [6]:
vectorizer = TfidfVectorizer(lowercase=True)

In [7]:
allvectors = vectorizer.fit_transform(allfiles)

In [8]:
allvectors

<1160x11186 sparse matrix of type '<class 'numpy.float64'>'
	with 117926 stored elements in Compressed Sparse Row format>

In [9]:
import numpy as np

X = allvectors.todense()

In [10]:
X

matrix([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.01907834, 0.        , ..., 0.        , 0.        ,
         0.        ],
        ...,
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ]])

In [11]:
X = np.asarray(X)

In [12]:
X

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.01907834, 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

In [13]:
X.shape

(1160, 11186)

In [14]:
len(y_crude)

578

sklearn has a train/test split facility. PyTorch has some of these sorts of utilities too and we may see them too but they're lower priority since we get most of what we need from sklearn.

In [15]:
from sklearn.model_selection import train_test_split

splits = train_test_split(X, y_crude+y_grain, test_size=0.2)

In [17]:
len(splits)

4

In [18]:
len(splits[0]), len(splits[1]), len(splits[2]), len(splits[3])

(928, 232, 928, 232)

In [19]:
X_train, X_test, Y_train, Y_test = splits

In [20]:
Y_test = [{'crude':0,'grain':1}[x] for x in Y_test]

In [21]:
Y_train = [{'crude':0,'grain':1}[x] for x in Y_train]

### Define the model

In [24]:
import torch

In [25]:
dev = torch.device("cuda:0")

We're no longer doing things by hand as in LT2212 but rather by applying the layers from PyTorch.  We shouldn't forget that what's going on is just the same matrix/tensor operations we practiced in LT2212, but with rather metaphorical shortcuts.

We treat our two-class problem as binary classification and just apply a sigmoid. 

In [26]:
class TextClassifier(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(TextClassifier, self).__init__()
        self.fc1 = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 1),
            nn.Sigmoid()
        )
        
    def forward(self, x):
        return self.fc1(x)

### Get the data onto the GPU and set up the training environment

In [60]:
from torch import optim

classifier = TextClassifier(len(X_train[0]), 200)
classifier = classifier.to(dev)

X_train_torch = torch.Tensor(X_train)
X_train_torch = X_train_torch.to(dev)
Y_train_torch = torch.Tensor(Y_train)
Y_train_torch = Y_train_torch.to(dev)

optimizer = optim.Adam(classifier.parameters())
criterion = nn.BCELoss()

In [61]:
Y_train_torch.size()

torch.Size([928])

In [62]:
Y_train_torch[0].size()

torch.Size([])

The fact that `Y_train_torch`'s size is (928) is actually a (minor) problem because it is storing the values as scalars. The network will output 1-dimensional vectors.  So we need something of size (928, 1).

We get this by "unsqueezing" the first dimension, ie, wrapping all elements along that dimension in a vector. Then we get (928, 1), and get rid of the warning we saw in class.

In [63]:
Y_train_torch = Y_train_torch.unsqueeze(1)

In [64]:
Y_train_torch.size()

torch.Size([928, 1])

In [65]:
X_train_torch.device

device(type='cuda', index=0)

In [66]:
def train():
    for i in range(len(X_train)):
        x = X_train_torch[i]
        y = Y_train_torch[i]
        optimizer.zero_grad()
        output = classifier.forward(x)
        loss = criterion(output, y)
        loss.backward()
        optimizer.step()

### Train and evaluate the model

In [67]:
train()

In [69]:
list(classifier.parameters())

[Parameter containing:
 tensor([[ 0.0143,  0.0908,  0.0064,  ..., -0.0077,  0.0211, -0.0046],
         [ 0.0123,  0.0589, -0.0083,  ...,  0.0029,  0.0140, -0.0050],
         [ 0.0046, -0.0086, -0.0088,  ...,  0.0009,  0.0093, -0.0061],
         ...,
         [ 0.0110,  0.0823, -0.0094,  ..., -0.0070,  0.0259, -0.0011],
         [ 0.0078, -0.0196,  0.0094,  ...,  0.0013,  0.0085,  0.0049],
         [ 0.0254, -0.0417,  0.0253,  ..., -0.0016, -0.0199, -0.0003]],
        device='cuda:0', requires_grad=True), Parameter containing:
 tensor([ 0.0364,  0.0348, -0.0130,  0.0321,  0.0682,  0.0372,  0.0837,  0.0814,
          0.0811,  0.0333,  0.0325,  0.0694,  0.0325,  0.0709,  0.0309,  0.0625,
          0.0398,  0.0778,  0.0766,  0.0350,  0.0775,  0.0722,  0.0757,  0.0376,
          0.0269,  0.0306,  0.0380,  0.0393,  0.0371,  0.0331, -0.0163, -0.0187,
          0.0562,  0.0806,  0.0435,  0.0312,  0.0481,  0.0607,  0.0793,  0.0780,
          0.0309,  0.0762,  0.0366,  0.0383,  0.0334,  0.0750, 

In [70]:
X_test_torch = torch.Tensor(X_test)
X_test_torch = X_test_torch.to(dev)
Y_test_torch = torch.Tensor(Y_test)
Y_test_torch = Y_test_torch.to(dev)

In [71]:
def test():
    result = []
    for i in range(len(X_test)):
        x = X_test_torch[i]
        result.append(classifier.forward(x).to('cpu'))
    return result

In [72]:
result = test()

In [73]:
result_int = [round(float(x[0])) for x in result]

In [74]:
correct = [1 if x[0] == x[1] else 0 for x in zip(result_int, Y_test)]

In [75]:
sum(correct)/len(correct)

0.9698275862068966

It seems like a single epoch with a very simple FFNN model was good enough to get nearly 100% accuracy.