# Deep Learning for Natural Language Processing with Pytorch
This tutorial will walk you through the key ideas of deep learning programming using Pytorch. Many of the concepts (such as the computation graph abstraction and autograd) are not unique to Pytorch and are relevant to any deep learning tool kit out there

In [1]:
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

## 1. Torch tensor

In [2]:
# Create a torch.Tensor object with the given data.  It is a 1D vector
V_data = [1., 2., 3.]
V = torch.Tensor(V_data)
print(V)

# Creates a matrix
M_data = [[1., 2., 3.], [4., 5., 6]]
M = torch.Tensor(M_data)
print (M)

# Create a 3D tensor of size 2x2x2.
T_data = [[[1.,2.], [3.,4.]],
          [[5.,6.], [7.,8.]]]
T = torch.Tensor(T_data)
print(T)


 1
 2
 3
[torch.FloatTensor of size 3]


 1  2  3
 4  5  6
[torch.FloatTensor of size 2x3]


(0 ,.,.) = 
  1  2
  3  4

(1 ,.,.) = 
  5  6
  7  8
[torch.FloatTensor of size 2x2x2]



## 2. Computation Graph and automatic differentiation

In [3]:
# Variables wrap tensor objects
x = autograd.Variable( torch.Tensor([1., 2., 3]), requires_grad=True )
# You can access the data with the .data attribute
print (x.data)

# You can also do all the same operations you did with tensors with Variables.
y = autograd.Variable( torch.Tensor([4., 5., 6]), requires_grad=True )
z = x + y
print (z.data)

# BUT z knows something extra.
print (z.grad_fn)


 1
 2
 3
[torch.FloatTensor of size 3]


 5
 7
 9
[torch.FloatTensor of size 3]

<torch.autograd.function.AddBackward object at 0x000001B29FFC8710>


# Logistic Regression Bag-of-words classifier

In [4]:
data = [ ("me gusta comer en la cafeteria".split(), "SPANISH"),
         ("Give it to me".split(), "ENGLISH"),
         ("No creo que sea una buena idea".split(), "SPANISH"),
         ("No it is not a good idea to get lost at sea".split(), "ENGLISH") ]

test_data = [ ("Yo creo que si".split(), "SPANISH"),
              ("it is lost on me".split(), "ENGLISH")]

word_to_ix = {}
for sent, _ in data + test_data:
    for word in sent:
        if word not in word_to_ix:
            word_to_ix[word] = len(word_to_ix)
print(word_to_ix)

VOCAB_SIZE = len(word_to_ix)
NUM_LABELS = 2

{'Give': 6, 'get': 20, 'gusta': 1, 'buena': 14, 'is': 16, 'on': 25, 'una': 13, 'sea': 12, 'a': 18, 'si': 24, 'Yo': 23, 'at': 22, 'good': 19, 'to': 8, 'idea': 15, 'not': 17, 'No': 9, 'la': 4, 'me': 0, 'en': 3, 'creo': 10, 'comer': 2, 'lost': 21, 'que': 11, 'it': 7, 'cafeteria': 5}


In [5]:
class BoWClassifier(nn.Module):
    def __init__(self, num_labels, vocab_size):
        super(BoWClassifier, self).__init__()
        
        self.linear = nn.Linear(vocab_size, num_labels)
        
    def forward(self, bow_vec):
        return F.log_softmax(self.linear(bow_vec))

In [6]:
def make_bow_vector(sentence, word_to_ix):
    vec = torch.zeros(len(word_to_ix))
    for word in sentence:
        vec[word_to_ix[word]] += 1
    return vec.view(1, -1)

def make_target(label, label_to_ix):
    return torch.LongTensor([label_to_ix[label]])

In [7]:
model = BoWClassifier(NUM_LABELS, VOCAB_SIZE)
for param in model.parameters():
    print(param)

Parameter containing:

Columns 0 to 9 
-0.0983 -0.0684  0.1552  0.0328 -0.1058 -0.1330  0.0014 -0.1159  0.0513 -0.0692
 0.1196 -0.1012  0.0768 -0.1571 -0.0217  0.0281  0.1389  0.1814 -0.0829 -0.1764

Columns 10 to 19 
 0.1244  0.0067 -0.0754 -0.0584 -0.0968  0.1395 -0.1799  0.0572  0.1019 -0.0092
-0.1643 -0.1483 -0.0166 -0.1481 -0.0896  0.1356 -0.0335 -0.1006  0.0624  0.0846

Columns 20 to 25 
 0.1879  0.1082  0.1308 -0.0258  0.0894  0.1710
 0.0282 -0.0694 -0.1929 -0.0884  0.1440 -0.0259
[torch.FloatTensor of size 2x26]

Parameter containing:
 0.0029
 0.1906
[torch.FloatTensor of size 2]



In [8]:
print(model.parameters)

<bound method Module.parameters of BoWClassifier (
  (linear): Linear (26 -> 2)
)>


In [9]:
print(data)

[(['me', 'gusta', 'comer', 'en', 'la', 'cafeteria'], 'SPANISH'), (['Give', 'it', 'to', 'me'], 'ENGLISH'), (['No', 'creo', 'que', 'sea', 'una', 'buena', 'idea'], 'SPANISH'), (['No', 'it', 'is', 'not', 'a', 'good', 'idea', 'to', 'get', 'lost', 'at', 'sea'], 'ENGLISH')]


In [12]:
sample = data[0]
bow_vector = make_bow_vector(sample[0], word_to_ix)
print(bow_vector)
log_probs = model(autograd.Variable(bow_vector))
print(log_probs)



Columns 0 to 12 
    1     1     1     1     1     1     0     0     0     0     0     0     0

Columns 13 to 25 
    0     0     0     0     0     0     0     0     0     0     0     0     0
[torch.FloatTensor of size 1x26]

Variable containing:
-0.8832 -0.5335
[torch.FloatTensor of size 1x2]



Which of the above values corresponds to the log probability of ENGLISH, and which to SPANISH? We never defined it, but we need to if we want to train the thing.

In [11]:
label_to_ix = { "SPANISH": 0, "ENGLISH": 1 }

So lets train! To do this, we pass instances through to get log probabilities, compute a loss function, compute the gradient of the loss function, and then update the parameters with a gradient step. Loss functions are provided by Torch in the nn package. nn.NLLLoss() is the negative log likelihood loss we want. It also defines optimization functions in torch.optim. Here, we will just use SGD.

Note that the input to NLLLoss is a vector of log probabilities, and a target label. It doesn't compute the log probabilities for us. This is why the last layer of our network is log softmax. The loss function nn.CrossEntropyLoss() is the same as NLLLoss(), except it does the log softmax for you.

In [15]:
# Run on test data before we train, just to see a before-and-after
for instance, label in test_data:
    bow_vec = autograd.Variable(make_bow_vector(instance, word_to_ix))
    log_probs = model(bow_vec)
    print(log_probs)
print(next(model.parameters())[:,word_to_ix["creo"]]) # Print the matrix column corresponding to "creo"

Variable containing:
-0.5698 -0.8339
[torch.FloatTensor of size 1x2]

Variable containing:
-0.9585 -0.4836
[torch.FloatTensor of size 1x2]

Variable containing:
 0.1244
-0.1643
[torch.FloatTensor of size 2]



In [16]:
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(100):
    for instance, label in data:
        model.zero_grad()
        
        bow_vec = autograd.Variable(make_bow_vector(instance, word_to_ix))
        target = autograd.Variable(make_target(label, label_to_ix))
        
        log_probs = model(bow_vec)
        
        loss = loss_function(log_probs, target)
        loss.backward()
        optimizer.step()

In [18]:
for instance, label in test_data:
    bow_vec = autograd.Variable(make_bow_vector(instance, word_to_ix))
    log_probs = model(bow_vec)
    print(log_probs)
print(next(model.parameters())[:,word_to_ix["creo"]]) # Index corresponding to Spanish goes up, English goes down!

Variable containing:
-0.1091 -2.2695
[torch.FloatTensor of size 1x2]

Variable containing:
-2.8481 -0.0597
[torch.FloatTensor of size 1x2]

Variable containing:
 0.5319
-0.5719
[torch.FloatTensor of size 2]

