## A light introduction to natural language processing

In lab 3, we're going to briefly cover some methods for analyzing data that comes in the form of text. This will help with the practical next week.

## Sentiment Classification in Movie Reviews

We'll use the dataset from Stanford here: http://ai.stanford.edu/~amaas/data/sentiment/

1) First click on the link and download the dataset (It's too big to put on github)

2) Make sure you move the directory "aclImdb" into the same folder as this notebook

As a preview I wrote a summary of the performance of a few classifiers:
**Summary of performance of different classifiers (Accuracy)**:
- Naive Bayes: 0.83
- Random Forest: 0.84
- Convolutional Neural Network: 0.85

Unfortunately, the data comes in separate files, which is kind of annoying. I used glob for this. glob("directory/*") just lists the filenames in that directory

In [14]:
import pandas as pd
import numpy as np
#glob lets us quickly access all the filenames, either pip install it or find a different way to do this
from glob import glob

In [15]:
pos_filenames = glob('aclImdb/train/pos/*')
neg_filenames = glob('aclImdb/train/neg/*')

You can check now that pos_filenames has all the filenames for positive reviews and neg_filenames has all the filenames for negative reviews. The following code is pretty hacky, but it does the job for combining all the text into one dataframe. We'll just open the files one by one in a list and append each string to a list. We'll also keep track of the sentiment.

In [16]:
#loop through the list of files and append the contents to a list
contents = []
sentiments = []

#loop through the positive sentiment files and save all the contents
for fname in pos_filenames:
    with open(fname,'rb') as f:
        contents.append(str(f.readlines()[0]))
        sentiments.append(1)
        
for fname in neg_filenames:
    with open(fname,'rb') as f:
        contents.append(str(f.readlines()[0]))
        sentiments.append(0)

Print the length of the list we just made (total number of movie revieews)

In [17]:
len(contents)

25000

Print the first movie review 

In [18]:
print(contents[0])

b'Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High\'s satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers\' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I\'m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn\'t!'


To get back to familiar territory, we'll turn this into a dataframe

In [19]:
#we can turn this into a pd Dataframe
df = pd.DataFrame()
df['txt'] = contents
df['sentiment'] = sentiments

In [20]:
df.head()

Unnamed: 0,txt,sentiment
0,b'Bromwell High is a cartoon comedy. It ran at...,1
1,b'Homelessness (or Houselessness as George Car...,1
2,b'Brilliant over-acting by Lesley Ann Warren. ...,1
3,b'This is easily the most underrated film inn ...,1
4,b'This is not the typical Mel Brooks film. It ...,1


Okay, cool. But we still don't really know how to deal with this. Computers aren't inherently able to understand text, so we'll need to get the "txt" column into a form we know how to work with in order to make predictions

### Using sklearn

sklearn isn't really the best library for working with text data, so we'll keep this section relatively short. For most purposes you'll want to use NLTK or spacy. But since you're familiar with sklearn we'll start here. 

The main thing that we'll be using from sklearn is CountVectorizer. This will take a corpus of text and turn each document into a "count vector." This count vector is essentially a histogram over the entire vocabulary (all words in the training set). As an example, consider the (fake) sentence "dog cat cat cat bear". Our vocab size is 3, so the sentence is represented by the three dimensional vector:

$$[1,3,1] $$

This is also called the bag of words representation. 

**Exercise 1:** what are the pros and cons of using this? Can you think of an alternative way of representing text at the sentence level? 

sklearn has a built in count vectorizer
- Fit: build vocabulary on some iterable containing strings
- transform: use existing vocabulary to transform the input into a N x V sparse matrix
- fit_transform: fit on this data, and also transform it (same as calling fit then transform)


In [21]:
from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer()

#by default countvectorizer will return a sparse array which is a special datatype for 
#arrays that are mostly 0s (to save space), but .toarray() will convert this back to a
#regular numpy array
cv.fit_transform(["cat dog dog dog cow"]).toarray()

array([[1, 1, 3]], dtype=int64)

As another example consider the sentence "dog dog dog snail snail". Since "snail" is not in the original vocab that we fit count_vectorizer with, it won't be part of the vector

In [22]:
cv.transform(["dog dog dog snail snail"]).toarray()

array([[0, 0, 3]])

**Exercise 2**: Use CountVectorizer on the Stanford dataset. This will take a few seconds. You'll want to use max_features to limit the number of words that you consider, since rare words won't help you much.  

Limit the number of features to 10,000

Save the result as new_data

In [23]:
cv = CountVectorizer(max_features = 10000)
new_data = cv.fit_transform(df.txt)

### Unigrams vs Bigrams vs N-grams

As we explored above, using this "Bag of Words" representation throws away a lot of information in the sentence. One way of trying to preserve local information is to use bigrams. This is just expanding our vocabulary to include consecutive word-pairs of length 2. So in our example: "cat dog dog dog cow", we would have the vocabulary
- cat x1 
- dog x3
- cow x1
- cat dog x1
- dog dog x2 
- dog cow x1

N-grams is the extension of this to word sequences of length N. For CountVectorizer, we specify this with:

ngram_range = (1, N) 

In [24]:
cv = CountVectorizer(ngram_range = (1,2))
cv.fit_transform(["cat dog dog dog cow"]).toarray()

array([[1, 1, 1, 3, 1, 2]], dtype=int64)

### Building a simple baseline model

For text classification, simple models trained on a large amount of data perform quite well. A pretty standard baseline is the Naive Bayes Model. We'll go through some of the math here. If you're not interested in it, you can skip it.

Suppose that $y_i$ is your class label (in this case $y_i$ is either 0 or 1 for negative and positive). $i$ just indexes what datapoint you're looking at. We'll say that $i$ ranges from $1$ to $N$ (in other words you have $N$ sentences in your training dataset).

Also we have $\mathbf{x}_i$ which is the sentence corresponding to the label $y_i$. We use bold to denote the fact that $\mathbf{x}_i$ is a vector where each element is a word.

Naive Bayes models the joint probability density, $p(y_i, \mathbf{x}_i)$. We do this by parameterizing the prior probability of having a certain class, and then parameterizing the probability of generating a certain sentence given that class. Using Bayes rule, we can write this as:

$$p(y_i, \mathbf{x}_i) = p(\mathbf{x}_i | y_i) p(y_i) = p(x_{i1},\dots,x_{iT_i} | y_i) p(y_i) $$

where $x_{it}$ is the word at position $t$, and $T_i$ is the length of sentence $i$. Now we apply a huge assumption (which seems like it is just wrong, but works decently in practice). That is, we assume that $x_{i1},\dots,x_{iT_i}$ are conditionally independent given the class, $y_i$. This lets us factor the probability as:

$$p(y_i, \mathbf{x}_i) = p(y_i)\prod_{t=1}^T p(x_{it}| y_i)  $$ 

We parameterize $p(x_{it}|y_i)$ as a Multinoulli random variable, i.e.:

$$p(x_{it} = dog | y_i = 0) =  \pi_{0,dog} $$ 

Where $\pi_{0,dog}$ is the probability that "dog" is generated given that we're in class 0 (negative). So we need 2*Vocab_size parameters for this, since we need a probability for every class for every word. For the english language, that's approximately 20,000 parameters. Also, we parameterize the prior probabilites as bernoulli random variables. That's only one extra parameter:

$$p(y_i = 0) = \theta_0$$
$$p(y_i = 1) = 1 - \theta_0$$

Given this model, it's pretty straightforward to get a maximum likelihood estimate for all the parameters, i.e. the $\theta$s and $\pi$s. If you're not familiar with maximum likelihood, it just means that we choose $\theta$ and $\pi$s to be the values that make the observed data have the highest likelihood. This turns the learning procedure into a simple optimization problem. 

If we go through all the math to solve this, it actually turns out that we get:
optimal $\theta_0$ is the proportion of sentences that are in class $0$, the negative class, and that $\pi_{0,dog}$ is just the proportion of words in class $0$ that are the word "dog". Likewise, $\pi_{1,dog}$ is just the proportion of words in class $1$ that are the word "dog". 

So "training" the model is just learning these parameters through an optimization procedure. But given a sentence, how do we make a prediction of whether its positive of negative?

We can express that as:

$$p(y_i = 0 | \mathbf{x}_i ) \propto p(\mathbf{x}_i|y_i = 0) p(y_i = 0) $$
$$p(y_i = 1 | \mathbf{x}_i ) \propto p(\mathbf{x}_i|y_i = 1) p(y_i = 1) $$

To predict we just take the higher of these values. 

**Optional (hard) exercise:** implement Naive Bayes 

In sklearn, this is easy

In [25]:
from sklearn.naive_bayes import MultinomialNB

nb = MultinomialNB()

We can fit the model with the normal sklearn syntax

In [26]:
nb.fit(new_data, df['sentiment']) 

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

This code splits into train and test for evaluation of the model. We'll make a random permutation of indices and then use that to randomly shuffle our data. We'll take a 70:30 split

In [27]:
#np.random.permutation makes a random permutation of indices {1...N} 
perm = np.random.permutation(range(len(df.sentiment)))

#split this permutation by a 70:30 split
trn = perm[:int(0.7*len(perm))]
tst = perm[int(0.7*len(perm)):]

#slice the processed data into train and test sets; alternatively we could have done this with 
#sklearn functions
x_train = new_data[trn]
x_tst = new_data[tst]
y_train = df.sentiment[trn]
y_tst = df.sentiment[tst]

Checking the shape of our splits. The test set has 7500 data points and the trianing set has 17500. We have 10,000 features since there are 10,000 words in our vocab.

In [28]:
y_tst.shape

(7500,)

In [29]:
y_train.shape

(17500,)

In [30]:
x_train.shape

(17500, 10000)

In [31]:
x_tst.shape

(7500, 10000)

Fitting the Naive Bayes model on our training x and y values

In [32]:
nb = MultinomialNB()
nb.fit(x_train, y_train) 

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

Score will give us our classification accuracy on test

In [33]:
nb.score(x_tst,y_tst)

0.83320000000000005

That's pretty good for such a simple model. Obviously, we'll do a bit better on the data that we trained on.

In [34]:
nb.score(x_train,y_train)

0.85845714285714281

**Exercise 3:** Improve the score somehow

## (Optional) building a more powerful model using pytorch

The following is for people who have a good background in machine learning and are interested in learning about how to build neural networks to solve this problem.

### Using torchtext (preprocessing)

We're going to save out tabular dataset from above as a csv and then load it with torchtext because I couldn't think of a better way to do this off the top of my head. Torchtext is a library for loading/dealing with datasets for pytorch. And pytorch is a library for implementing neural networks (similar to tensorflow). 

This is a lot of extra effort but hopefully buys us a couple percentage points of accuracy.

We're going to use 3 new libraries. In brief, Torch is a library for building neural network architectures, torchtext is a library for preprocessing text for Torch, and tqdm lets us display progress bars to keep track of progress

In [None]:
!pip install torch
!pip install torchtext
!pip install tqdm

In [153]:
#import the stuff we'll need from these libraries
import torchtext
from torchtext.vocab import Vectors, GloVe
import torchtext.datasets as datasets

#we'll save our data from before in a tabular dataset in two parts; train and test
df.iloc[trn,:].to_csv('saved_dataset_train.csv',index = False,header = False)
df.iloc[tst,:].to_csv('saved_dataset_test.csv',index = False,header = False)

We'll start by initializing two torchtext Field objects, which will hold label and text vocabularies. We'll load in the two datasets using these fields. 

In [154]:
#we'll initialize two fields. These will hold our vocabularies for text and our vocabulary (just postive and negative)
# for labels

TEXT = torchtext.data.Field()
LABEL = torchtext.data.Field(sequential = False,unk_token = None)

#these two lines will load in the datasets that we saved

pos_train = torchtext.data.TabularDataset(path='saved_dataset_train.csv', format='csv',fields=[('txt', TEXT),
 ('sentiment', LABEL)])

pos_test = torchtext.data.TabularDataset(path='saved_dataset_test.csv', format='csv',fields=[('txt', TEXT),
 ('sentiment', LABEL)])

Building the vocabulary using the training dataset

In [155]:
TEXT.build_vocab(pos_train)
LABEL.build_vocab(pos_train)
print('len(TEXT.vocab)', len(TEXT.vocab))
print('len(LABEL.vocab)', len(LABEL.vocab))

len(TEXT.vocab) 235683
len(LABEL.vocab) 2


LABEL.vocab.itos is a list that contains the vocabulary for the labels (0 and 1). TEXT.vocab.itos would give us a list with the vocabulary for all the text in our corpus

In [156]:
LABEL.vocab.itos

['0', '1']

We're also going to use an iterator to loop through the data. We'll use torchtext's BucketIterator with a batch size of 10 (we'll process 10 sentences at a time). 

In [157]:
train_iter, test_iter = torchtext.data.BucketIterator.splits(
    (pos_train,pos_test), batch_size=10, device=-1,sort_key=lambda x: len(x.txt),repeat = False)

In [171]:
batch = next(iter(train_iter))
batch.txt

Variable containing:
 7.2800e+02  1.6567e+05  2.8043e+04  ...   3.8730e+03  3.1509e+04  1.7394e+04
 1.2640e+03  3.4782e+04  1.0000e+01  ...   1.4100e+02  1.4800e+02  1.3250e+03
 1.4000e+01  1.4000e+01  5.7000e+01  ...   7.4520e+03  3.3377e+04  3.9030e+03
                ...                   ⋱                   ...                
 1.0000e+00  1.0000e+00  1.0000e+00  ...   1.0604e+04  3.7360e+03  2.2800e+02
 1.0000e+00  1.0000e+00  1.0000e+00  ...   1.4616e+04  1.3000e+01  2.7000e+01
 1.0000e+00  1.0000e+00  1.0000e+00  ...   1.7119e+04  1.9367e+04  3.6938e+04
[torch.LongTensor of size 301x10]

We'll also make use of pretrained Word Embeddings trained by Google. 

In [159]:
url = 'https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.simple.vec'
TEXT.vocab.load_vectors(vectors=Vectors('wiki.simple.vec', url=url))

In [160]:
print("Word embeddings size ", TEXT.vocab.vectors.size())
print("Word embedding of 'follows', first 10 dim ", TEXT.vocab.vectors[TEXT.vocab.stoi['follows']][:10])

Word embeddings size  torch.Size([235683, 300])
Word embedding of 'follows', first 10 dim  
 0.3925
-0.4770
 0.1754
-0.0845
 0.1396
 0.3722
-0.0878
-0.2398
 0.0367
 0.2800
[torch.FloatTensor of size 10]



Okay cool. Now that all the preprocessing stuff is done, we can focus on actually building a model. We're going to build a convolutional neural network in pytorch. This involves building a CNN class that inherits nn.Module. We'll implement this paper by Yoon Kim: http://aclweb.org/anthology/D/D14/D14-1181.pdf 

The paper does a pretty good job of explaining what a convolutional neural network is, but I can answer questions about the paper. 

In [161]:
VOCAB_SIZE = len(TEXT.vocab)

In [162]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable



class customConvNet(nn.Module):
    
    def __init__(self,input_embeddings, embedding_dim = 300, hidden_size = 100, vocab_size = 235807):
        super(customConvNet,self).__init__()  
        embedding = nn.Embedding(vocab_size,embedding_dim)
        embedding.weight = nn.Parameter(input_embeddings,requires_grad = True)
        self.embedding = embedding
        self.conv3 = nn.Conv1d(embedding_dim,hidden_size,kernel_size = 3,stride = 1)
        self.conv4 = nn.Conv1d(embedding_dim,hidden_size,kernel_size = 4,stride = 1)
        self.conv5 = nn.Conv1d(embedding_dim,hidden_size,kernel_size = 5,stride = 1)
        self.maxpool = nn.AdaptiveMaxPool1d(1)
        self.dropout = nn.Dropout(0.5)
        
        self.linear = nn.Linear(3*hidden_size,2)
        
        
    def forward(self, input_):
        #apply embedding layer
        embeds = self.embedding(input_).permute(1,2,0).contiguous()
        #apply convolution layers
        out1 = F.relu(self.conv3(embeds))
        out2 = F.relu(self.conv4(embeds))
        out3 = F.relu(self.conv5(embeds))
        
        #apply max pooling layers
        out1 = self.maxpool(out1).squeeze(2)
        out2 = self.maxpool(out2).squeeze(2)
        out3 = self.maxpool(out3).squeeze(2)
        #concatenate the outputs; ending up with a batch_size x 3*hidden_size vector
        out = torch.cat((out1,out2,out3),dim = 1)
        out = self.dropout(out)
        return self.linear(out)
        

Let's initialize an instance of the neural network, and make sure it produces output when we feed in a batch of training data.

In [163]:
cn = customConvNet(TEXT.vocab.vectors)

In [164]:
batch = next(iter(train_iter))

In [165]:
cn(batch.txt)

Variable containing:
 0.1135 -0.1271
 0.5178 -0.0601
-0.1192  0.1503
 0.2280 -0.2018
-0.2367 -0.1836
-0.1084  0.0332
 0.0394 -0.6457
-0.2830 -0.3115
-0.2230 -0.4110
 0.2639 -0.1976
[torch.FloatTensor of size 10x2]

Okay, that looks reasonable. The next thing we have to do is write a training loop that will optimize the Convolutional Neural Networks parameters using Stochastic Gradient Descent.
    

In [168]:
from tqdm import tqdm_notebook

def model_train(model,train_iter,num_epochs):
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=.01)
    
    for epoch in tqdm_notebook(range(num_epochs),desc = 'Epoch'):
        total_loss = 0 
        count = 0
        model.train()
        
        for batch in tqdm_notebook(train_iter, desc = 'batch'):
            optimizer.zero_grad()
            txt = batch.txt
            lbl = batch.sentiment
            
            loss = criterion(model(txt),lbl)
            total_loss += loss.data
            count += 1
            loss.backward()
            optimizer.step()
            torch.nn.utils.clip_grad_norm(model.parameters(), 3)
            
            
        print("Average NLL: ", (total_loss/count)) 
        a,b = model_val(model,test_iter)
        print("Accuracy:", a)
        print("Val NLL: ", b)
            
    

We'll also need a validation function

In [169]:
def model_val(model,val_iter):
    criterion = nn.CrossEntropyLoss()
    total_loss = 0 
    count = 0
    correct = 0 
    num_examples = 0
    model.eval()
    
    for batch in val_iter:
        txt = batch.txt
        lbl = batch.sentiment
        y_pred = model(txt)
        loss = criterion(y_pred,lbl)
        total_loss += loss.data
        count += 1
        
        y_pred_max, y_pred_argmax = torch.max(y_pred, 1)
        correct += (y_pred_argmax.data == lbl.data).sum()
        num_examples += y_pred_argmax.size(0)
    model.train()
    return(correct/num_examples, total_loss/count)

**Warning:** this takes a long time to run.... to speed things up we can run it on a GPU

In [170]:
model_train(cn,train_iter,10)

Widget Javascript not detected.  It may not be installed or enabled properly.


Widget Javascript not detected.  It may not be installed or enabled properly.


Average NLL:  
 0.7098
[torch.FloatTensor of size 1]



  if __name__ == '__main__':


Accuracy: 0.4988
Val NLL:  
 0.6937
[torch.FloatTensor of size 1]

Average NLL:  
 0.7000
[torch.FloatTensor of size 1]

Accuracy: 0.5538666666666666
Val NLL:  
 0.6869
[torch.FloatTensor of size 1]

Average NLL:  
 0.6952
[torch.FloatTensor of size 1]

Accuracy: 0.6304
Val NLL:  
 0.6767
[torch.FloatTensor of size 1]

Average NLL:  
 0.6905
[torch.FloatTensor of size 1]

Accuracy: 0.61
Val NLL:  
 0.6594
[torch.FloatTensor of size 1]

Average NLL:  
 0.6818
[torch.FloatTensor of size 1]

Accuracy: 0.6448
Val NLL:  
 0.6360
[torch.FloatTensor of size 1]

Average NLL:  
 0.6717
[torch.FloatTensor of size 1]

Accuracy: 0.7517333333333334
Val NLL:  
 0.5998
[torch.FloatTensor of size 1]



Widget Javascript not detected.  It may not be installed or enabled properly.


Average NLL:  
 0.5988
[torch.FloatTensor of size 1]

Accuracy: 0.7278666666666667
Val NLL:  
 0.5769
[torch.FloatTensor of size 1]

Average NLL:  
 0.5674
[torch.FloatTensor of size 1]

Accuracy: 0.7637333333333334
Val NLL:  
 0.5517
[torch.FloatTensor of size 1]

Average NLL:  
 0.5637
[torch.FloatTensor of size 1]

Accuracy: 0.7462666666666666
Val NLL:  
 0.5313
[torch.FloatTensor of size 1]

Average NLL:  
 0.5549
[torch.FloatTensor of size 1]

Accuracy: 0.7736
Val NLL:  
 0.5150
[torch.FloatTensor of size 1]

Average NLL:  
 0.5485
[torch.FloatTensor of size 1]

Accuracy: 0.7445333333333334
Val NLL:  
 0.5105
[torch.FloatTensor of size 1]

Average NLL:  
 0.5418
[torch.FloatTensor of size 1]

Accuracy: 0.7804
Val NLL:  
 0.4857
[torch.FloatTensor of size 1]



Widget Javascript not detected.  It may not be installed or enabled properly.


Average NLL:  
 0.3125
[torch.FloatTensor of size 1]

Accuracy: 0.784
Val NLL:  
 0.4775
[torch.FloatTensor of size 1]

Average NLL:  
 0.4673
[torch.FloatTensor of size 1]

Accuracy: 0.7832
Val NLL:  
 0.4680
[torch.FloatTensor of size 1]

Average NLL:  
 0.4762
[torch.FloatTensor of size 1]

Accuracy: 0.7965333333333333
Val NLL:  
 0.4599
[torch.FloatTensor of size 1]

Average NLL:  
 0.4732
[torch.FloatTensor of size 1]

Accuracy: 0.7989333333333334
Val NLL:  
 0.4518
[torch.FloatTensor of size 1]

Average NLL:  
 0.4695
[torch.FloatTensor of size 1]

Accuracy: 0.8062666666666667
Val NLL:  
 0.4389
[torch.FloatTensor of size 1]

Average NLL:  
 0.4667
[torch.FloatTensor of size 1]

Accuracy: 0.8006666666666666
Val NLL:  
 0.4391
[torch.FloatTensor of size 1]



Widget Javascript not detected.  It may not be installed or enabled properly.


Average NLL:  
 0.6914
[torch.FloatTensor of size 1]

Accuracy: 0.81
Val NLL:  
 0.4268
[torch.FloatTensor of size 1]

Average NLL:  
 0.4239
[torch.FloatTensor of size 1]

Accuracy: 0.8126666666666666
Val NLL:  
 0.4232
[torch.FloatTensor of size 1]

Average NLL:  
 0.4306
[torch.FloatTensor of size 1]

Accuracy: 0.8154666666666667
Val NLL:  
 0.4166
[torch.FloatTensor of size 1]

Average NLL:  
 0.4254
[torch.FloatTensor of size 1]

Accuracy: 0.8161333333333334
Val NLL:  
 0.4129
[torch.FloatTensor of size 1]

Average NLL:  
 0.4218
[torch.FloatTensor of size 1]

Accuracy: 0.8193333333333334
Val NLL:  
 0.4059
[torch.FloatTensor of size 1]

Average NLL:  
 0.4185
[torch.FloatTensor of size 1]

Accuracy: 0.8216
Val NLL:  
 0.4008
[torch.FloatTensor of size 1]



Widget Javascript not detected.  It may not be installed or enabled properly.


Average NLL:  
 0.2617
[torch.FloatTensor of size 1]

Accuracy: 0.8144
Val NLL:  
 0.4096
[torch.FloatTensor of size 1]

Average NLL:  
 0.3942
[torch.FloatTensor of size 1]

Accuracy: 0.8257333333333333
Val NLL:  
 0.3947
[torch.FloatTensor of size 1]

Average NLL:  
 0.3842
[torch.FloatTensor of size 1]

Accuracy: 0.8269333333333333
Val NLL:  
 0.3889
[torch.FloatTensor of size 1]

Average NLL:  
 0.3849
[torch.FloatTensor of size 1]

Accuracy: 0.8261333333333334
Val NLL:  
 0.3931
[torch.FloatTensor of size 1]

Average NLL:  
 0.3827
[torch.FloatTensor of size 1]

Accuracy: 0.8309333333333333
Val NLL:  
 0.3852
[torch.FloatTensor of size 1]

Average NLL:  
 0.3811
[torch.FloatTensor of size 1]

Accuracy: 0.8252
Val NLL:  
 0.3877
[torch.FloatTensor of size 1]



Widget Javascript not detected.  It may not be installed or enabled properly.


Average NLL:  
 0.2355
[torch.FloatTensor of size 1]

Accuracy: 0.8262666666666667
Val NLL:  
 0.3874
[torch.FloatTensor of size 1]

Average NLL:  
 0.3568
[torch.FloatTensor of size 1]

Accuracy: 0.8325333333333333
Val NLL:  
 0.3785
[torch.FloatTensor of size 1]

Average NLL:  
 0.3578
[torch.FloatTensor of size 1]

Accuracy: 0.8237333333333333
Val NLL:  
 0.3855
[torch.FloatTensor of size 1]

Average NLL:  
 0.3518
[torch.FloatTensor of size 1]

Accuracy: 0.8406666666666667
Val NLL:  
 0.3685
[torch.FloatTensor of size 1]

Average NLL:  
 0.3504
[torch.FloatTensor of size 1]

Accuracy: 0.8046666666666666
Val NLL:  
 0.4126
[torch.FloatTensor of size 1]

Average NLL:  
 0.3495
[torch.FloatTensor of size 1]

Accuracy: 0.8417333333333333
Val NLL:  
 0.3629
[torch.FloatTensor of size 1]



Widget Javascript not detected.  It may not be installed or enabled properly.


Average NLL:  
 0.2015
[torch.FloatTensor of size 1]

Accuracy: 0.8370666666666666
Val NLL:  
 0.3669
[torch.FloatTensor of size 1]

Average NLL:  
 0.3178
[torch.FloatTensor of size 1]

Accuracy: 0.8321333333333333
Val NLL:  
 0.3758
[torch.FloatTensor of size 1]

Average NLL:  
 0.3279
[torch.FloatTensor of size 1]

Accuracy: 0.8457333333333333
Val NLL:  
 0.3560
[torch.FloatTensor of size 1]

Average NLL:  
 0.3251
[torch.FloatTensor of size 1]

Accuracy: 0.8413333333333334
Val NLL:  
 0.3581
[torch.FloatTensor of size 1]

Average NLL:  
 0.3223
[torch.FloatTensor of size 1]

Accuracy: 0.8442666666666667
Val NLL:  
 0.3565
[torch.FloatTensor of size 1]

Average NLL:  
 0.3205
[torch.FloatTensor of size 1]

Accuracy: 0.8485333333333334
Val NLL:  
 0.3509
[torch.FloatTensor of size 1]



Widget Javascript not detected.  It may not be installed or enabled properly.


Average NLL:  
 0.1316
[torch.FloatTensor of size 1]

Accuracy: 0.844
Val NLL:  
 0.3529
[torch.FloatTensor of size 1]

Average NLL:  
 0.2906
[torch.FloatTensor of size 1]

Accuracy: 0.8414666666666667
Val NLL:  
 0.3558
[torch.FloatTensor of size 1]

Average NLL:  
 0.2924
[torch.FloatTensor of size 1]

Accuracy: 0.8268
Val NLL:  
 0.3751
[torch.FloatTensor of size 1]

Average NLL:  
 0.2907
[torch.FloatTensor of size 1]

Accuracy: 0.8426666666666667
Val NLL:  
 0.3549
[torch.FloatTensor of size 1]

Average NLL:  
 0.2907
[torch.FloatTensor of size 1]

Accuracy: 0.8450666666666666
Val NLL:  
 0.3496
[torch.FloatTensor of size 1]

Average NLL:  
 0.2909
[torch.FloatTensor of size 1]

Accuracy: 0.8450666666666666
Val NLL:  
 0.3462
[torch.FloatTensor of size 1]



Widget Javascript not detected.  It may not be installed or enabled properly.


Average NLL:  
 0.3721
[torch.FloatTensor of size 1]

Accuracy: 0.8516
Val NLL:  
 0.3412
[torch.FloatTensor of size 1]

Average NLL:  
 0.2487
[torch.FloatTensor of size 1]

Accuracy: 0.8432
Val NLL:  
 0.3453
[torch.FloatTensor of size 1]

Average NLL:  
 0.2569
[torch.FloatTensor of size 1]

Accuracy: 0.8502666666666666
Val NLL:  
 0.3410
[torch.FloatTensor of size 1]

Average NLL:  
 0.2598
[torch.FloatTensor of size 1]

Accuracy: 0.8302666666666667
Val NLL:  
 0.3735
[torch.FloatTensor of size 1]

Average NLL:  
 0.2617
[torch.FloatTensor of size 1]

Accuracy: 0.8410666666666666
Val NLL:  
 0.3514
[torch.FloatTensor of size 1]

Average NLL:  
 0.2649
[torch.FloatTensor of size 1]

Accuracy: 0.8536
Val NLL:  
 0.3369
[torch.FloatTensor of size 1]



Widget Javascript not detected.  It may not be installed or enabled properly.


Average NLL:  
 0.2978
[torch.FloatTensor of size 1]

Accuracy: 0.8516
Val NLL:  
 0.3388
[torch.FloatTensor of size 1]

Average NLL:  
 0.2437
[torch.FloatTensor of size 1]

Accuracy: 0.8532
Val NLL:  
 0.3335
[torch.FloatTensor of size 1]

Average NLL:  
 0.2378
[torch.FloatTensor of size 1]

Accuracy: 0.8545333333333334
Val NLL:  
 0.3306
[torch.FloatTensor of size 1]

Average NLL:  
 0.2349
[torch.FloatTensor of size 1]

Accuracy: 0.8486666666666667
Val NLL:  
 0.3474
[torch.FloatTensor of size 1]

Average NLL:  
 0.2386
[torch.FloatTensor of size 1]

Accuracy: 0.8372
Val NLL:  
 0.3611
[torch.FloatTensor of size 1]

Average NLL:  
 0.2369
[torch.FloatTensor of size 1]

Accuracy: 0.8545333333333334
Val NLL:  
 0.3335
[torch.FloatTensor of size 1]




So with a CNN we can get marginal improvements over Naive Bayes and Random Forest. 