# HW 1 Classification

In this homework you will be building several varieties of text classifiers.

## Goal

We ask that you construct the following models in PyTorch:

1. A naive Bayes unigram classifer (follow Wang and Manning http://www.aclweb.org/anthology/P/P12/P12-2.pdf#page=118: you should only implement Naive Bayes, not the combined classifer with SVM).
2. A logistic regression model over word types (you can implement this as $y = \sigma(\sum_i W x_i + b)$) 
3. A continuous bag-of-word neural network with embeddings (similar to CBOW in Mikolov et al https://arxiv.org/pdf/1301.3781.pdf).
4. A simple convolutional neural network (any variant of CNN as described in Kim http://aclweb.org/anthology/D/D14/D14-1181.pdf).
5. Your own extensions to these models...

Consult the papers provided for hyperparameters. 


## Setup

This notebook provides a working definition of the setup of the problem itself. You may construct your models inline or use an external setup (preferred) to build your system.

In [1]:
# Text text processing library and methods for pretrained word embeddings
import torchtext
import numpy as np
import torch as t
from torchtext.vocab import Vectors, GloVe

In [2]:
def variable(array, requires_grad=False):
    if isinstance(array, np.ndarray):
        return t.autograd.Variable(t.from_numpy(array), requires_grad=requires_grad)
    elif isinstance(array, list) or isinstance(array,tuple):
        return t.autograd.Variable(t.from_numpy(np.array(array)), requires_grad=requires_grad)
    elif isinstance(array, float) or isinstance(array, int):
        return t.autograd.Variable(t.from_numpy(np.array([array])), requires_grad=requires_grad)
    elif isinstance(array, t.Tensor):
        return t.autograd.Variable(array, requires_grad=requires_grad)
    else: raise ValueError

The dataset we will use of this problem is known as the Stanford Sentiment Treebank (https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf). It is a variant of a standard sentiment classification task. For simplicity, we will use the most basic form. Classifying a sentence as positive or negative in sentiment. 

To start, `torchtext` requires that we define a mapping from the raw text data to featurized indices. These fields make it easy to map back and forth between readable data and math, which helps for debugging.

In [3]:
# Our input $x$
TEXT = torchtext.data.Field()

# Our labels $y$
LABEL = torchtext.data.Field(sequential=False)

Next we input our data. Here we will use the standard SST train split, and tell it the fields.

In [4]:
train_dataset, val_dataset, test_dataset = torchtext.datasets.SST.splits(
    TEXT, LABEL,
    filter_pred=lambda ex: ex.label != 'neutral')

Let's look at this data. It's still in its original form, we can see that each example consists of a label and the original words.

In [5]:
print('len(train)', len(train_dataset))
print('vars(train[0])', vars(train_dataset[0]))

len(train) 6920
vars(train[0]) {'text': ['The', 'Rock', 'is', 'destined', 'to', 'be', 'the', '21st', 'Century', "'s", 'new', '``', 'Conan', "''", 'and', 'that', 'he', "'s", 'going', 'to', 'make', 'a', 'splash', 'even', 'greater', 'than', 'Arnold', 'Schwarzenegger', ',', 'Jean-Claud', 'Van', 'Damme', 'or', 'Steven', 'Segal', '.'], 'label': 'positive'}


In order to map this data to features, we need to assign an index to each word an label. The function build vocab allows us to do this and provides useful options that we will need in future assignments.

In [6]:
TEXT.build_vocab(train_dataset)
LABEL.build_vocab(train_dataset)
print('len(TEXT.vocab)', len(TEXT.vocab))
print('len(LABEL.vocab)', len(LABEL.vocab))

len(TEXT.vocab) 16286
len(LABEL.vocab) 3


Finally we are ready to create batches of our training data that can be used for training and validating the model. This function produces 3 iterators that will let us go through the train, val and test data. 

In [7]:
train_iter, val_iter, test_iter = torchtext.data.BucketIterator.splits(
    (train_dataset, val_dataset, test_dataset), batch_size=10, device=-1)

Let's look at a single batch from one of these iterators. The library automatically converts the underlying words into indices. It then produces tensors for batches of x and y. In this case it will consist of the number of words of the longest sentence (with padding) followed by the number of batches. We can use the vocabulary dictionary to convert back from these indices to words.

In [8]:
batch = next(iter(train_iter))
print("Size of text batch [max sent length, batch size]", batch.text.size())
print("Second in batch", batch.text[:, 0])
print("Converted back to string: ", " ".join([TEXT.vocab.itos[i] for i in batch.text[:, 0].data]))

Size of text batch [max sent length, batch size] torch.Size([24, 10])
Second in batch Variable containing:
  2128
    12
    22
  3496
   239
 15875
  3451
  1642
     6
  3370
   129
     3
  4798
    12
    22
   488
   871
   809
     5
  2813
    12
    22
   536
     2
[torch.LongTensor of size 24]

Converted back to string:  Warm in its loving yet unforgivingly inconsistent depiction of everyday people , relaxed in its perfect quiet pace and proud in its message .


Similarly it produces a vector for each of the labels in the batch. 

In [9]:
print("Size of label batch [batch size]", batch.label.size())
print("Second in batch", batch.label[0])
print("Converted back to string: ", LABEL.vocab.itos[batch.label.data[0]])

Size of label batch [batch size] torch.Size([10])
Second in batch Variable containing:
 1
[torch.LongTensor of size 1]

Converted back to string:  positive


Finally the Vocab object can be used to map pretrained word vectors to the indices in the vocabulary. This will be very useful for part 3 and 4 of the problem.  

In [10]:
# # Build the vocabulary with word embeddings
# url = 'https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.simple.vec'
# TEXT.vocab.load_vectors(vectors=Vectors('wiki.simple.vec', url=url))

# print("Word embeddings size ", TEXT.vocab.vectors.size())
# print("Word embedding of 'follows', first 10 dim ", TEXT.vocab.vectors[TEXT.vocab.stoi['follows']][:10])


In [510]:
from copy import deepcopy
TEXT2 = deepcopy(TEXT)

# Build the vocabulary with word embeddings
url = 'https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.simple.vec'
TEXT2.vocab.load_vectors(vectors=Vectors('wiki.simple.vec', url=url))

.vector_cache\wiki.simple.vec: 293MB [00:42, 6.90MB/s]                                                                 
  0%|                                                                                       | 0/111052 [00:00<?, ?it/s]Skipping token 111051 with 1-dimensional vector ['300']; likely a header
100%|████████████████████████████████████████████████████████████████████████| 111052/111052 [00:21<00:00, 5091.91it/s]


In [11]:
# Build the vocabulary with word embeddings
TEXT.vocab.load_vectors(vectors=GloVe())

print("Word embeddings size ", TEXT.vocab.vectors.size())
print("Word embedding of 'follows', first 10 dim ", TEXT.vocab.vectors[TEXT.vocab.stoi['follows']][:10])# Build the vocabulary with word embeddings


Word embeddings size  torch.Size([16286, 300])
Word embedding of 'follows', first 10 dim  
 0.2057
 0.1047
-0.3900
-0.1086
-0.0722
-0.1184
-0.1109
 0.1917
 0.4781
 2.0576
[torch.FloatTensor of size 10]



## Assignment

Now it is your turn to build the models described at the top of the assignment. 

Using the data given by this iterator, you should construct 4 different torch models that take in batch.text and produce a distribution over labels. 

When a model is trained, use the following test function to produce predictions, and then upload to the kaggle competition:  https://www.kaggle.com/c/harvard-cs281-hw1

In [12]:
def test(model):
    "All models should be able to be run with following command."
    upload = []
    # Update: for kaggle the bucket iterator needs to have batch_size 10
#     test_iter = torchtext.data.BucketIterator(test_dataset, train=False, batch_size=10)
    for batch in test_iter:
        # Your prediction data here (don't cheat!)
        probs = NB(batch.text).long()
        upload += list(probs.data)

    with open("predictions.txt", "w") as f:
        for u in upload:
            f.write(str(u) + "\n")

In addition, you should put up a (short) write-up following the template provided in the repository:  https://github.com/harvard-ml-courses/cs287-s18/blob/master/template/

# First model: NB

In [13]:
def embed_sentence(batch, vec_dim=300, sentence_length=16):
    """Convert integer-encoded sentence to word vector representation"""
    return t.cat([TEXT.vocab.vectors[batch.text.data.long()[:,i]].view(1,sentence_length,vec_dim) for i in range(batch.batch_size)])

In [14]:
# the NB weights. Then you classify with t.sign(t.sum(W(text)) + bias)
# You can do that because the features are the indicators variables of the word occurences
W = t.nn.Embedding(len(TEXT.vocab), 1)  

In [15]:
from collections import Counter
positive_counts = Counter()
negative_counts = Counter()

In [16]:
train_iter.batch_size = 10
len(train_iter)

692

In [17]:
from tqdm import tqdm
i = 0
pos = 0
neg = 0

# count the occurences of each word (classwise)
for b in tqdm(train_iter):
    i += 1
    pos_tmp = t.nonzero((b.label==1).data.long()).numpy().flatten().shape[0]
    neg_tmp = t.nonzero((b.label==2).data.long()).numpy().flatten().shape[0]
    pos += pos_tmp
    neg += neg_tmp
    if neg_tmp < 10:
        positive_counts += Counter(b.text.transpose(0,1).index_select(0, t.nonzero((b.label==1).data.long()).squeeze()).data.numpy().flatten().tolist())
    if pos_tmp < 10:
        negative_counts += Counter(b.text.transpose(0,1).index_select(0, t.nonzero((b.label==2).data.long()).squeeze()).data.numpy().flatten().tolist())
    if i >= 692:
        break

 95%|██████████████████████████████████████████████████████████████████████████▊    | 655/692 [00:01<00:00, 543.67it/s]


In [18]:
for k in range(len(TEXT.vocab)):  # pseudo counts
    positive_counts[k] += 1
    negative_counts[k] += 1

In [19]:
scale_pos = sum(list(positive_counts.values()))
scale_neg = sum(list(negative_counts.values()))
positive_prop = {k: v/scale_pos for k,v in positive_counts.items()}
negative_prop = {k: v/scale_neg for k,v in negative_counts.items()}

In [20]:
r = {k: np.log(positive_prop[k] / negative_prop[k]) for k in range(len(TEXT.vocab))}
W.weight.data = t.from_numpy(np.array([r[k] for k in range(len(TEXT.vocab))]))

In [21]:
bias = np.log(pos/neg)

In [22]:
def NB(text):
    """sign(Wx + b)"""
    return t.sign(t.cat([t.sum(W(text.transpose(0,1)[i])) for i in range(text.data.numpy().shape[1])]) + bias).long()

In [23]:
upload = []
true = []
for batch in test_iter:
    # Your prediction data here (don't cheat!)
    probs = NB(batch.text).long()
    upload += list(probs.data)
    true += batch.label.data.numpy().tolist()
true = [x if x == 1 else -1 for x in true]

  This is separate from the ipykernel package so we can avoid doing imports until


In [24]:
sum([(x*y == 1) for x,y in zip(upload,true)])/ len(upload)

0.8237232289950577

# Model 2: logistic regression

In [107]:
W = t.nn.Embedding(len(TEXT.vocab), 1)
b = variable(0., True)

In [108]:
# loss and optimizer
nll = t.nn.NLLLoss(size_average=True)

learning_rate = 1e-2
optimizer = t.optim.RMSprop([b, W.weight], lr=learning_rate)
sig = t.nn.Sigmoid()

In [109]:
len(train_iter), train_iter.batch_size

(692, 10)

In [110]:
import torch as t

def eval_perf(iterator):
    count = 0
    bs = iterator.batch_size * 1
    iterator.batch_size = 1
    for i, batch in enumerate(iterator):
        # get data
        y_pred = (sig(t.cat([W(batch.text.transpose(0,1)[i]).sum() for i in range(batch.text.data.numpy().shape[1])]) + b.float()) > 0.5).long()
        y = batch.label.long()*(-1) + 2

        count += t.sum((y == y_pred).long())
        if i >= len(iterator) - 1:
            break
    iterator.batch_size = bs
    return (count.float() / len(iterator)).data.numpy()[0]

In [112]:
import torch as t
n_epochs = 20

for _ in range(n_epochs):
    for i, batch in enumerate(train_iter):
        # get data
        y_pred = sig(t.cat([W(batch.text.transpose(0,1)[i]).sum() for i in range(batch.text.data.numpy().shape[1])]) + b.float()).unsqueeze(1)
        y = batch.label.long()*(-1) + 2
        
        # initialize gradients
        optimizer.zero_grad()
        
        # loss
        y_pred = t.cat([1-y_pred, y_pred], 1).float()  # nll needs two inputs: the prediction for the negative/positive classes

        loss = nll.forward(y_pred, y)

        # compute gradients
        loss.backward()

        # update weights
        optimizer.step()
                
        if i >= len(train_iter) - 1:
            break
    train_iter.init_epoch()
    print("Validation accuracy after %d epochs: %.2f" % (_, eval_perf(val_iter)))

Validation accuracy after 0 epochs: 0.65
Validation accuracy after 1 epochs: 0.70
Validation accuracy after 2 epochs: 0.73
Validation accuracy after 3 epochs: 0.73
Validation accuracy after 4 epochs: 0.74
Validation accuracy after 5 epochs: 0.74
Validation accuracy after 6 epochs: 0.75
Validation accuracy after 7 epochs: 0.76
Validation accuracy after 8 epochs: 0.77
Validation accuracy after 9 epochs: 0.77
Validation accuracy after 10 epochs: 0.78
Validation accuracy after 11 epochs: 0.78
Validation accuracy after 12 epochs: 0.78
Validation accuracy after 13 epochs: 0.78
Validation accuracy after 14 epochs: 0.78
Validation accuracy after 15 epochs: 0.78
Validation accuracy after 16 epochs: 0.78
Validation accuracy after 17 epochs: 0.77
Validation accuracy after 18 epochs: 0.77
Validation accuracy after 19 epochs: 0.78


# Model 3: CBOW

In [42]:
def vectorize(text, vdim=300):
    length, batch_size = text.data.numpy().shape
    return t.mean(t.cat([TEXT.vocab.vectors[text.long().data.transpose(0,1)[i]].view(1,length,vdim) for i in range(batch_size)]), 1)

In [249]:
W = variable(np.random.normal(0, .1, (300,)), True)
b = variable(0., True)

In [250]:
# loss and optimizer
nll = t.nn.NLLLoss(size_average=True)
learning_rate = 1e-2
optimizer = t.optim.RMSprop([b, W], lr=learning_rate)
sig = t.nn.Sigmoid()

In [253]:
import torch as t

def eval_perf(iterator):
    count = 0
    bs = iterator.batch_size * 1
    iterator.batch_size = 1
    for i, batch in enumerate(iterator):
        # get data
        text_ = batch.text
        length = text_.data.numpy().shape[0]
        y_pred = (sig(t.mm(variable(vectorize(text_)),W.float().resize(300,1)).squeeze() + b.float().squeeze()) > 0.5).long()
        y = batch.label.long()*(-1) + 2

        count += t.sum((y == y_pred).long())
        if i >= len(iterator) - 1:
            break
    iterator.batch_size = bs
    return (count.float() / len(iterator)).data.numpy()[0]

In [254]:
import torch as t
n_epochs = 20

for _ in range(n_epochs):
    for i, batch in enumerate(train_iter):
        # get data
        text_ = batch.text
        length = text_.data.numpy().shape[0]
        y_pred = sig(t.mm(variable(vectorize(text_)),W.float().resize(300,1)).squeeze() + b.float().squeeze()).unsqueeze(1)
        y = batch.label.long()*(-1) + 2
        
        # initialize gradients
        optimizer.zero_grad()
        
        # loss
        y_pred = t.cat([1-y_pred, y_pred], 1).float()  # nll needs two inputs: the prediction for the negative/positive classes

        loss = nll.forward(y_pred, y)

        # compute gradients
        loss.backward()

        # update weights
        optimizer.step()
                
        if i >= len(train_iter) - 1:
            break
    train_iter.init_epoch()
    print("Validation accuracy after %d epochs: %.2f" % (_, eval_perf(val_iter)))

Validation accuracy after 0 epochs: 0.71
Validation accuracy after 1 epochs: 0.70
Validation accuracy after 2 epochs: 0.70
Validation accuracy after 3 epochs: 0.71
Validation accuracy after 4 epochs: 0.72
Validation accuracy after 5 epochs: 0.71
Validation accuracy after 6 epochs: 0.69
Validation accuracy after 7 epochs: 0.71
Validation accuracy after 8 epochs: 0.71
Validation accuracy after 9 epochs: 0.71
Validation accuracy after 10 epochs: 0.71
Validation accuracy after 11 epochs: 0.71
Validation accuracy after 12 epochs: 0.71


KeyboardInterrupt: 

# Model 4: convnet on word vectors

In [294]:
from torch.nn import Conv1d as conv, MaxPool1d as maxpool, Linear as fc, Softmax, ReLU, Dropout, Tanh, BatchNorm1d as BN, LeakyReLU

In [295]:
softmax = Softmax()
dropout = Dropout()
relu = ReLU()
tanh = Tanh()
lrelu = LeakyReLU()

Best perf with:
* Adam, lr 0.001
* conv 50 filters, padding 1, kernel 3
* one FC layer
* batch size 100
* dropout 25% just before the FC layer
* relu activation
* GloVe 840B embedding


In [658]:
class Convnet(t.nn.Module):
    
    def __init__(self):
        super(Convnet, self).__init__()
        self.conv1 = conv(300, 100, 3, padding=1)
        self.dropout1 = Dropout(.25)
#         self.dropout2 = Dropout(.1)
        self.fc2 = fc(100, 20)
        self.fc3 = fc(20, 2)
        
    def forward(self, x):
        xx = lrelu(self.conv1(x))
        xx = t.max(xx, -1)[0]
        xx = self.dropout1(xx)
        xx = lrelu(self.fc2(xx))
#         xx = self.dropout2(xx)
        xx = self.fc3(xx)
        return softmax(xx)

In [659]:
class HRFConvnet(t.nn.Module):
    
    def __init__(self):
        super(HRFConvnet, self).__init__()
        self.conv1a = conv(300, 25, 3, padding=1)
        self.conv1b = conv(300, 25, 5, padding=2)
        self.conv2 = conv(100, 50, 2, padding=1)
        self.maxpool = t.nn.MaxPool1d(3, padding=1)
        self.avgpool = t.nn.AvgPool1d(3, padding=1)
        self.dropout = Dropout(.25)
        self.fc2 = fc(200, 2)
        
    def forward(self, x):
        # convolutions and pooling
        xx = lrelu(t.cat([self.conv1a(x), self.conv1b(x)], 1))
        xx = t.cat([self.maxpool(xx), self.avgpool(xx)], 1)
        xx = lrelu(self.conv2(xx))
        
        # several kinds of pooling over time
        xx_max = t.max(xx, -1)[0]
        xx_mean = t.mean(xx, -1)
        xx_min = t.min(xx, -1)[0]
        xx_med = t.median(xx, -1)[0]
        xx = t.cat([xx_max, xx_mean, xx_min, xx_med], -1)
        
        # dropout and linear layer
        xx = self.dropout(xx)
        xx = self.fc2(xx)
        return softmax(xx)

In [660]:
class SHRFConvnet(t.nn.Module):
    
    def __init__(self):
        super(SHRFConvnet, self).__init__()
        self.conv1a = conv(300, 25, 3, padding=1)
        self.conv1b = conv(300, 25, 5, padding=2)
        self.conv2 = conv(50, 50, 2, padding=1)
        self.maxpool = t.nn.MaxPool1d(3, padding=1)
        self.dropout = Dropout(.25)
        self.fc2 = fc(50, 2)
        
    def forward(self, x):
        # convolutions and pooling
        xx = lrelu(t.cat([self.conv1a(x), self.conv1b(x)], 1))
        xx = self.maxpool(xx)
        xx = lrelu(self.conv2(xx))
        
        # several kinds of pooling over time
        xx = t.max(xx, -1)[0]
        
        # dropout and linear layer
        xx = self.dropout(xx)
        xx = self.fc2(xx)
        return softmax(xx)

In [662]:
# loss and optimizer
nll = t.nn.NLLLoss(size_average=True)
learning_rate = 1e-3
convnet = Convnet()
optimizer = t.optim.Adam(convnet.parameters(), lr=learning_rate)

In [663]:
def vectorize(text, vdim=300):
    length, batch_size = text.data.numpy().shape
    return t.cat([TEXT.vocab.vectors[text.long().data.transpose(0,1)[i]].view(1,length,vdim) for i in range(batch_size)]).permute(0,2,1)

def vectorize2(text, vdim=300):
    length, batch_size = text.data.numpy().shape
    x = t.cat([TEXT.vocab.vectors[text.long().data.transpose(0,1)[i]].view(1,length,vdim) for i in range(batch_size)]).permute(0,2,1)
    return t.cat([x, t.cat([TEXT2.vocab.vectors[text.long().data.transpose(0,1)[i]].view(1,length,vdim) for i in range(batch_size)]).permute(0,2,1)], 1)


In [664]:
import torch as t

def eval_perf(iterator):
    count = 0
    bs = iterator.batch_size * 1
    iterator.batch_size = 1
    for i, batch in enumerate(iterator):
        # get data
        text = batch.text
        y_pred = (convnet(variable(vectorize(text)))[:, 1] > 0.5).long()
        y = batch.label.long()*(-1) + 2

        count += t.sum((y == y_pred).long())
        if i >= len(iterator) - 1:
            break
    iterator.batch_size = bs
    return (count.float() / (bs*len(iterator))).data.numpy()[0]

In [671]:
import torch as t
n_epochs = 25

train_iter.batch_size = 100

for _ in range(n_epochs):
    for i, batch in enumerate(train_iter):
        # get data
        text = batch.text
        y_pred = convnet(variable(vectorize(text)))
        y = batch.label.long()*(-1) + 2
        
        # initialize gradients
        optimizer.zero_grad()
        
        # loss
        loss = nll.forward(y_pred, y)

        # compute gradients
        loss.backward()

        # update weights
        optimizer.step()
                
    train_iter.init_epoch()
    convnet.eval()
    val_perf = eval_perf(val_iter)
    print("Validation accuracy after %d epochs: %.3f" % (_, val_perf))
    if val_perf >= .835: break
    convnet.train()
    

  import sys


Validation accuracy after 0 epochs: 0.829
Validation accuracy after 1 epochs: 0.818
Validation accuracy after 2 epochs: 0.828
Validation accuracy after 3 epochs: 0.818
Validation accuracy after 4 epochs: 0.812
Validation accuracy after 5 epochs: 0.819
Validation accuracy after 6 epochs: 0.826
Validation accuracy after 7 epochs: 0.822
Validation accuracy after 8 epochs: 0.821
Validation accuracy after 9 epochs: 0.820
Validation accuracy after 10 epochs: 0.835
Validation accuracy after 11 epochs: 0.815
Validation accuracy after 12 epochs: 0.808
Validation accuracy after 13 epochs: 0.825
Validation accuracy after 14 epochs: 0.821
Validation accuracy after 15 epochs: 0.815
Validation accuracy after 16 epochs: 0.821


KeyboardInterrupt: 

In [672]:
convnet.eval()
eval_perf(test_iter)

0.82841527

In [641]:
list(convnet(variable(vectorize(batch.text))).max(1)[1].data)

[0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 1]

In [642]:
def test(model):
    "All models should be able to be run with following command."
    upload = []
    # Update: for kaggle the bucket iterator needs to have batch_size 10
    test_iter = torchtext.data.BucketIterator(test_dataset, train=False, batch_size=10)
    for batch in test_iter:
        # Your prediction data here (don't cheat!)
        probs = model(variable(vectorize(b.text)))
        _, argmax = probs.max(1)
        upload += [x if x==1 else 2 for x in list(argmax.data)]
    
    id = 0
    with open("predictions.txt", "w") as f:
        f.write("Id,Cat" + "\n")        
        for u in upload:
            f.write(str(id) + ","+ str(u) + "\n")
            id += 1

In [374]:
print(batch.text)
convnet.eval()
convnet(variable(vectorize(batch.text)))

Variable containing:
  5585    296   1061  ...     437    457     29
     3      7     44  ...      10  14952     10
   284   7254      7  ...     294    218  10355
        ...            ⋱           ...         
     5    501      4  ...    3219    185    256
  1511  15923   4642  ...       2      2      2
     2      2      2  ...       1      1      1
[torch.LongTensor of size 11x100]



Variable containing:
 1.8744e-05  9.9998e-01
 1.0000e+00  5.7803e-10
 9.9182e-01  8.1791e-03
 9.9839e-01  1.6110e-03
 3.6838e-04  9.9963e-01
 9.9467e-01  5.3323e-03
 1.0000e+00  1.7794e-10
 6.4967e-04  9.9935e-01
 2.3464e-05  9.9998e-01
 4.6551e-06  1.0000e+00
 5.3114e-04  9.9947e-01
 1.0709e-08  1.0000e+00
 1.0000e+00  2.5872e-06
 1.0000e+00  5.9545e-07
 9.9917e-01  8.3254e-04
 9.9973e-01  2.6982e-04
 1.0000e+00  1.3791e-10
 9.9816e-01  1.8364e-03
 1.0831e-07  1.0000e+00
 3.9618e-03  9.9604e-01
 9.9911e-01  8.8839e-04
 9.5423e-06  9.9999e-01
 5.1716e-01  4.8284e-01
 1.1291e-04  9.9989e-01
 8.7021e-04  9.9913e-01
 9.9719e-01  2.8053e-03
 9.9240e-01  7.6007e-03
 9.9529e-01  4.7131e-03
 1.1589e-04  9.9988e-01
 7.8056e-03  9.9219e-01
 1.1179e-07  1.0000e+00
 9.8973e-01  1.0271e-02
 7.7427e-07  1.0000e+00
 2.7379e-06  1.0000e+00
 1.0000e+00  5.6208e-09
 9.9998e-01  2.4375e-05
 2.1724e-04  9.9978e-01
 1.0000e+00  1.1609e-07
 1.0000e+00  8.0384e-07
 9.9991e-01  8.8074e-05
 3.7483e-03  9.9625

### Have a look at specific mistakes

In [190]:
def batch2text(batch):
    return '\n'.join([" ".join([TEXT.vocab.itos[i] for i in batch.text[:, j].data]) for j in range(batch.text.data.numpy().shape[1])])

In [238]:
errors = []
true = []
pred = []
convnet.eval()

for batch in val_iter:
    proba_pred = convnet(variable(vectorize(batch.text)))[:, 1]
    label_pred = (proba_pred > 0.5).long()
    y = batch.label.long()*(-1) + 2
    for yyp, yyp_, yy, text in zip(label_pred, proba_pred, y, batch2text(batch).split('\n')):
        if yy.data.numpy()[0] != yyp.data.numpy()[0]:
            true.append(yy.data.numpy()[0])
            pred.append(yyp_.data.numpy()[0])
            errors.append(text)


  


In [239]:
len(errors), len(val_iter)*val_iter.batch_size, len(errors) / (len(val_iter)*val_iter.batch_size)

(151, 880, 0.1715909090909091)

In [240]:
for label, p, text in zip(true, pred, errors):
    print(label, p, text)

1 3.53044e-05 Funny but <unk> slight .
0 0.747016 Oh come on . <pad>
1 0.00295216 Cool ? <pad> <pad> <pad>
1 9.97105e-09 <unk> inept and ridiculous .
1 0.000109407 As unseemly as its title suggests .
1 0.000464429 Good film , but very glum .
1 0.166885 But it still <unk> in the pocket .
1 0.0098425 A painfully funny ode to bad behavior .
1 0.0211383 My thoughts were focused on the characters .
1 0.000118771 <unk> potentially forgettable formula into something strangely diverting .
0 0.999748 Rarely has <unk> looked so shimmering and <unk> .
0 0.996592 It 's not the ultimate <unk> gangster movie .
0 0.980605 An absurdist comedy about alienation , separation and loss .
1 0.000176797 <unk> neatly into the category of Good Stupid Fun .
0 0.981386 An <unk> amalgam of <unk> News and <unk> . <pad>
1 0.0269448 ... routine , harmless diversion and little else . <pad>
1 0.0420371 Despite its title , Punch-Drunk Love is never heavy-handed .
1 0.3446 A woman 's pic directed with resonance by <unk>

1 2.4222e-05 And if you 're not nearly moved to tears by a couple of scenes , you 've got ice water in your veins .
0 0.513526 The humor is n't as sharp , the effects not as innovative , nor the story as imaginative as in the original . <pad>
1 0.460046 Morton uses her face and her body language to bring us Morvern 's soul , even though the character is almost completely deadpan .
0 0.999723 The experience of going to a film festival is a rewarding one ; the <unk> of <unk> one through this movie is not .
1 0.479145 Not since Japanese filmmaker <unk> <unk> 's <unk> have the <unk> of combat and the specter of death been <unk> with such operatic grandeur .
1 0.0279909 That dogged good will of the parents and ` vain ' Jia 's <unk> of ego , make the film touching despite some <unk> .
0 0.982987 I <unk> with the plight of these families , but the movie does n't do a very good job conveying the issue at hand .
0 0.984277 ... <unk> to provide a mix of smiles and tears , `` Crossroads '' instea

0 0.999978 Very special effects , brilliantly bold colors and heightened reality ca n't hide the giant <unk> ' <unk> in `` Stuart Little 2 `` : There 's just no story , folks .
0 0.842675 Even the finest <unk> ca n't make a <unk> into anything more than a <unk> , and Robert De Niro ca n't make this movie anything more than a trashy cop buddy comedy . <pad>
0 0.999426 While it 's genuinely cool to hear characters talk about early rap records -LRB- Sugar Hill <unk> , etc. -RRB- , the constant <unk> of hip-hop <unk> can alienate even the <unk> audiences . <pad>
1 0.421931 Not the kind of film that will appeal to a mainstream American audience , but there is a certain charm about the film that makes it a suitable entry into the fest circuit . <pad>
1 0.0310935 Whether you like rap music or loathe it , you ca n't deny either the tragic loss of two young men in the prime of their talent or the power of this movie . <pad>
0 0.954168 Pumpkin means to be an outrageous dark satire on <unk> life 