In [81]:
# Licensing Information:  You are free to use or extend this project for
# educational purposes provided that (1) you do not distribute or publish
# solutions, (2) you retain this notice, and (3) you provide clear
# attribution to The Georgia Institute of Technology, including a link to https://aritter.github.io/CS-7650/

# Attribution Information: This assignment was developed at The Georgia Institute of Technology
# by Alan Ritter (alan.ritter@cc.gatech.edu)

#Project \#1: Text Classification

In this assignment, you will implement the perceptron algorithm, and a simple, but competitive neural bag-of-words model, as described in [this paper](https://www.aclweb.org/anthology/P15-1162.pdf) for text classification.  You will train your models on a (provided) dataset of positive and negative movie reviews and report accuracy on a test set.

In this notebook, we provide you with starter code to read in the data and evaluate the performance of your models.  After completing the instructions below, please follow the instructions at the end to submit your notebook and other files to Gradescope.

Make sure to make a copy of this notebook, so your changes are saved.

#Download the dataset

First you will need to download the IMDB dataset - to do this, simply run the cell below.  We have prepared a small version of the ACL IMDB dataset for you to use to help make your experiments faster.  The full dataset is available [here](https://ai.stanford.edu/~amaas/data/sentiment/), in case you are interested, but there is no need to use this for the assignment.

Note that files downloaded in Colab are only saved temporariliy - if your session reconnects you will need to re-download the files.  In case you need persistent storage, you can mount your Google drive folder like so:

```
from google.colab import drive
drive.mount('/content/drive')
```

You can also open a command line prompt by clicking on the shell icon on the left hand side of the page, and upload/download files from your local machine by clicking on the file icon.

In [82]:
#Download the data


!wget https://github.com/aritter/5525_sentiment/raw/master/aclImdb_small.tgz
!tar xvzf aclImdb_small.tgz > /dev/null

--2021-02-08 20:30:29--  https://github.com/aritter/5525_sentiment/raw/master/aclImdb_small.tgz
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/aritter/5525_sentiment/master/aclImdb_small.tgz [following]
--2021-02-08 20:30:30--  https://raw.githubusercontent.com/aritter/5525_sentiment/master/aclImdb_small.tgz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9635749 (9.2M) [application/octet-stream]
Saving to: ‘aclImdb_small.tgz.2’


2021-02-08 20:30:42 (51.9 MB/s) - ‘aclImdb_small.tgz.2’ saved [9635749/9635749]



# Converting text to numbers

Below is some code we are providing you to read in the IMDB dataset, perform tokenization (using `nltk`), and convert words into indices.  Please don't modify this code in your submitted version.  We will provide example usage below.

In [83]:
import os
import sys

import nltk
from nltk import word_tokenize
nltk.download('punkt')
import torch

#Sparse matrix implementation
from scipy.sparse import csr_matrix
import numpy as np
from collections import Counter

np.random.seed(1)

class Vocab:
    def __init__(self, vocabFile=None):
        self.locked = False
        self.nextId = 0
        self.word2id = {}
        self.id2word = {}
        if vocabFile:
            for line in open(vocabFile):
                line = line.rstrip('\n')
                (word, wid) = line.split('\t')
                self.word2id[word] = int(wid)
                self.id2word[wid] = word
                self.nextId = max(self.nextId, int(wid) + 1)

    def GetID(self, word):
        if not word in self.word2id:
            if self.locked:
                return -1        #UNK token is -1.
            else:
                self.word2id[word] = self.nextId
                self.id2word[self.word2id[word]] = word
                self.nextId += 1
        return self.word2id[word]

    def HasWord(self, word):
        return self.word2id.has_key(word)

    def HasId(self, wid):
        return self.id2word.has_key(wid)

    def GetWord(self, wid):
        return self.id2word[wid]

    def SaveVocab(self, vocabFile):
        fOut = open(vocabFile, 'w')
        for word in self.word2id.keys():
            fOut.write("%s\t%s\n" % (word, self.word2id[word]))

    def GetVocabSize(self):
        #return self.nextId-1
        return self.nextId

    def GetWords(self):
        return self.word2id.keys()

    def Lock(self):
        self.locked = True

class IMDBdata:
    def __init__(self, directory, vocab=None):
        """ Reads in data into sparse matrix format """
        pFiles = os.listdir("%s/pos" % directory)
        nFiles = os.listdir("%s/neg" % directory)

        if not vocab:
            self.vocab = Vocab()
        else:
            self.vocab = vocab

        #For csr_matrix (see http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.csr_matrix.html#scipy.sparse.csr_matrix)
        X_values = []
        X_row_indices = []
        X_col_indices = []
        Y = []

        XwordList = []
        XfileList = []

        #Read positive files
        for i in range(len(pFiles)):
            f = pFiles[i]
            for line in open("%s/pos/%s" % (directory, f)):
                wordList   = [self.vocab.GetID(w.lower()) for w in word_tokenize(line) if self.vocab.GetID(w.lower()) >= 0]
                XwordList.append(wordList)
                XfileList.append(f)
                wordCounts = Counter(wordList)
                for (wordId, count) in wordCounts.items():
                    if wordId >= 0:
                        X_row_indices.append(i)
                        X_col_indices.append(wordId)
                        X_values.append(count)
            Y.append(+1.0)

        #Read negative files
        for i in range(len(nFiles)):
            f = nFiles[i]
            for line in open("%s/neg/%s" % (directory, f)):
                wordList   = [self.vocab.GetID(w.lower()) for w in word_tokenize(line) if self.vocab.GetID(w.lower()) >= 0]
                XwordList.append(wordList)
                XfileList.append(f)
                wordCounts = Counter(wordList)
                for (wordId, count) in wordCounts.items():
                    if wordId >= 0:
                        X_row_indices.append(len(pFiles)+i)
                        X_col_indices.append(wordId)
                        X_values.append(count)
            Y.append(-1.0)
            
        self.vocab.Lock()

        #Create a sparse matrix in csr format
        self.X = csr_matrix((X_values, (X_row_indices, X_col_indices)), shape=(max(X_row_indices)+1, self.vocab.GetVocabSize()))
        self.Y = np.asarray(Y)

        #Randomly shuffle
        index = np.arange(self.X.shape[0])
        np.random.shuffle(index)
        self.X = self.X[index,:]
        self.XwordList = [torch.LongTensor(XwordList[i]) for i in index]  #Two different sparse formats, csr and lists of IDs (XwordList).
        self.XfileList = [XfileList[i] for i in index]
        self.Y = self.Y[index]

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


The code below reads the train,dev and test data into memory.  This will take a minute or so.

In [84]:
train = IMDBdata("aclImdb_small/train")
train.vocab.Lock()

dev  = IMDBdata("aclImdb_small/dev", vocab=train.vocab)
test  = IMDBdata("aclImdb_small/test", vocab=train.vocab)

# Exploring the data

Below are a few examples to help you get familiar with how the data is represented.  $X \in \mathbb{R}^{M \times N}$ contains the design matrix and $Y \in \{+1,-1\}^N$ contains the labels.  Because there are a large number of features in this representation ($X$ contains one column representing each unique word in the dataset), we are using a sparse matrix representation (Numpy's [csr_sparse](http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.csr_matrix.html#scipy.sparse.csr_matrix)).  PyTorch doesn't have great support for sparse matrices...


In [85]:
# X contains the design matrix representing the training data.
print(f"Train.X has {train.X.shape[0]} rows and {train.X.shape[1]} columns.")

Train.X has 7222 rows and 58162 columns.


In [86]:
# Labels
train.Y

array([-1., -1., -1., ..., -1.,  1., -1.])

In [87]:
# Let's count the frequency of every word appearing in the documents with negative sentiment:
word_counts = np.array(train.X[train.Y == -1,:].sum(axis=0)).flatten()
word_counts

array([13102,   408,    49, ...,     1,     1,     1], dtype=int64)

In [88]:
# Now, let's sort the words by frequency:
sorted_words = list(reversed(np.argsort(word_counts)))
sorted_words[0:10]

[12, 22, 15, 19, 38, 73, 31, 457, 454, 456]

In [89]:
# What is the index of the most frequent word?
sorted_words[0]

12

In [90]:
# Let's see what word that is:
train.vocab.GetWord(sorted_words[0])

'the'

In [91]:
# What are the 10 most frequent words?
[train.vocab.GetWord(sorted_words[x]) for x in range(10)]

['the', ',', '.', 'a', 'and', 'to', 'of', '>', '<', '/']

# Now it's time to write some code!

# Basic Perceptron (10 points)

Implement the perceptron classification algorithm (fill in the missing code marked with `TODO:` below).
The only hyperparameter to tune is the number of iterations.

In [92]:
class Eval:
    def __init__(self, pred, gold):
        self.pred = pred
        self.gold = gold
        
    def Accuracy(self):
        return np.sum(np.equal(self.pred, self.gold)) / float(len(self.gold))

class Perceptron:
    def __init__(self, X, Y, N_ITERATIONS):
        #TODO: Initalize parameters
        self.N_ITERATIONS = N_ITERATIONS
        self.X = X
        self.Y = Y
        self.weights = np.zeros(shape=(1, X.shape[1]))
        self.weights_avg = np.zeros(shape=(1, X.shape[1]))
        self.c = 1
        #print(X.shape)
        #print(Y.shape)
        self.Train(X,Y)


    def ComputeAverageParameters(self):
        #TODO: Compute average parameters (do this part last)
        #return
        self.weights = self.weights - self.weights_avg / self.c

    def Train(self, X, Y):
        #TODO: Estimate perceptron parameters

        x_arr = X.toarray()
       # print(x_arr.shape[0])
        for _ in range(self.N_ITERATIONS):
          #print(_)
          for i in range(x_arr.shape[0]):
           # x = np.insert(x_arr[i],0,1)
            y_pred = np.dot(self.weights, x_arr[i])#self.Predict(x_arr[i])
            #print(y_pred)
            target = 1.0 if (y_pred > 0) else -1.0
            delta = Y[i] - target
            if delta:
              self.weights += (delta * x_arr[i])
              self.weights_avg += self.c *(delta * x_arr[i])
            self.c +=1
         
        #print("Train loop terminates")
        return



    def Predict(self, X):
        #TODO: Implement perceptron classification

        x = (X.transpose())
        sparce_matrix = sum([x[i] * self.weights[0][i] for i in range(x.shape[0])])
        arr = sparce_matrix.toarray()
        # print("arr and arr shape")
        # print(arr)
        # print(arr.shape)
        final_pred = [1 if x > 0 else -1 for x in arr[0]]
        return final_pred
        #return [1 for i in range(X.shape[0])]

    def SavePredictions(self, data, outFile):
        Y_pred = self.Predict(data.X)
        fOut = open(outFile, 'w')
        for i in range(len(data.XfileList)):
          fOut.write(f"{data.XfileList[i]}\t{Y_pred[i]}\n")

    def Eval(self, X_test, Y_test):
        # print("Y_test type and ytest")
        # print(type(Y_test))
        # print(Y_test)
        # print("call predic method on test data")
        Y_pred = self.Predict(X_test)
        # print("Y_pred type and Y_pred")
        # print(type(Y_pred))
        # print(Y_pred)
        ev = Eval(Y_pred, Y_test)
        # print(ev)
        return ev.Accuracy()
    
ptron = Perceptron(train.X, train.Y, 10)
ptron.ComputeAverageParameters()
print(ptron.Eval(test.X, test.Y))
ptron.SavePredictions(test, 'test_pred_perceptron.txt')

print(train.X.shape)
print(test.X.shape)
#TODO: Print the 20 most positive and 20 most negative words



0.8512877319302132
(7222, 58162)
(7222, 58162)


In [93]:
type(train.XwordList[0])
train.XwordList[0]

tensor([40820, 40821,     6,     8,    12,  2884,  1189,  3552,    19,  1664,
         3021,    54,   912,   190,  1072,  1900,   289,    73,    12,  1664,
          645,   252,     6,  4409,    19,  1664,  2313,    31, 38527,    38,
        14842,  8304,    78,    19,   108,  2313,    31, 38527,    38, 14842,
         8304,    15,   136,    22,   128,   283,  3021,    22,    75,    61,
           19, 10329,   454,   455,   456,   457,   454,   455,   456,   457,
         5829,    22,    75,    61,    97, 40820, 40821,    26,    27,  3377,
           29,   310,    22,   136,   141,    18,  1739,    12,   108,    73,
           59,    70,   235,   855,    15,  1130,    75,   447,    51,    75,
           61,  1645,    22,   101, 40822,   103,  7199,   402,   256,  3940,
           23,  4914,    38,  1580,    19,  7993,    73,  4082, 18492,   454,
          455,   456,   457,   454,   455,   456,   457,   190,   277,  1837,
           22, 40822,    17,    19,  2036,    11,    12,  7568, 

In [94]:
from scipy.sparse import csr_matrix
from operator import itemgetter


#for all vectors in vocab dot product with ptron.weights, create list of scores
#scores = [np.dot(train.X[i], ptron.weights) for i in range(7222) ]
#scores = np.dot(train.X, ptron.weights.T ) #7222X1

# type(train.X) sparse matrix
# type(ptron.weights) np array
sparse_weights = csr_matrix(ptron.weights)
scores_sparce = csr_matrix.dot(train.X, sparse_weights.T)
# train.XwordList[0].tolist()
#print(train.X.toarray())
flat_scores = scores_sparce.toarray().flatten()


#sort lit of scores from highest to lowest, sort copy of train.X vector the same way
#largest_idx = 


# # use sorted copy of train.X to retrieve word_id of most positive words.
#highest_score_wid = [train.X[i] for i in largest_idx]

# lowest_score_wid = [train.X.toarray()[i] for i in smallest_idx]
# #repeat all steps for most negative

# pos_words = [train.vocab.GetWord(x) for x in highest_score_wid]
# neg_words = [train.vocab.GetWord(x) for x in lowest_score_wid]
# print(pos_words)
# print(neg_words)
flat_scores

array([-1803.03313441, -1253.9478545 , -1224.05724097, ...,
       -3445.1236344 ,   677.56561111,  -573.25326429])

In [95]:
large_idx = flat_scores.argsort()[-20:][::-1]
small_idx = flat_scores.argsort()[:20][::1]
len(small_idx)

20

In [96]:
# highest_score_wid = [train.X[i] for i in large_idx]
# lowest_score_wid = [train.X[i] for i in small_idx]


In [97]:
# highest_score_wid[0].toarray()

In [98]:

pos_words = [train.vocab.GetWord(x) for x in large_idx]
neg_words = [train.vocab.GetWord(x) for x in small_idx]
neg_words

['relatively',
 'glory',
 'en',
 'left',
 '1977',
 'ready',
 'partnership',
 'determine',
 'housewives',
 'gladly',
 'understand',
 'liners',
 'batman',
 'ceremonies',
 'donald',
 'sampson',
 'studio',
 'han',
 'filmmaker',
 'keys']

In [99]:
pos_words

['line',
 'distraught',
 'special',
 'elder',
 '13',
 'similarly',
 'top',
 'exquisite',
 'supported',
 'outlandish',
 'orson',
 'guessing',
 'halfway',
 'yikes',
 'their',
 'erupt',
 'hand',
 'rope',
 'perry',
 'dilapidated']

# Parameter Averaging (5 points)

Once your basic perceptron implementation is working, modify code to implement parameter averaging.  Instead of using parameters from the final iteration, $w_n$, to classify test examples, 
use the average of the parameters from every iteration, $\frac{1}{N}\sum_{i=1}^N w_i$.  A nice trick for doing this efficiently is described in section 2.1.1 of [Hal Daume's thesis](http://www.umiacs.umd.edu/~hal/docs/daume06thesis.pdf).  When you are done, save a copy of your predictions on the test set to turn in at the end (click on the file icon on the left side of the notebook).

# Inspect Feature Weights (5 points)

Modify the code block above to print out the 20 most positive and 20 most negative words in the vocabulary sorted by their learned weights according to your model.  This will require a bit of thought how to do because the words in each document have been converted to IDs (see examples above).

# Evaluate on the test set

Once you are happy with your performance on the dev set, run the cell below to evaluate your performance on the test set.  Make sure to download the predictions of your model in `test_pred_perceptron.txt`.

In [100]:
print(ptron.Eval(dev.X, dev.Y))

ptron.SavePredictions(test, 'test_pred_perceptron.txt')

0.8460846084608461


# PyTorch Introduction

Take a look at the code below, which provides an example of how a simple neural network is capable of learning the XOR function (note that the perceptron cannot do this).  Refer to the PyTorch documentation [here](https://pytorch.org/docs/stable/index.html) for more information.

In [101]:
import torch
import torch.nn as nn
from torch import optim
import random
import numpy as np

#Define the computation graph; one layer hidden network
class FFNN(nn.Module):
    def __init__(self, dim_i, dim_h, dim_o):
        super(FFNN, self).__init__()
        self.V = nn.Linear(dim_i, dim_h)
        self.g = nn.Tanh()
        self.W = nn.Linear(dim_h, dim_o)
        self.logSoftmax = nn.LogSoftmax(dim=0)

    def forward(self, x):
        return self.logSoftmax(self.W(self.g(self.V(x))))

train_X = np.array([[0,0], [0,1], [1,0], [1,1]])
train_Y = np.array([0,1,1,0])

num_classes  = 2
num_hidden   = 10
num_features = 2

ffnn = FFNN(num_features, num_hidden, num_classes)
optimizer = optim.Adam(ffnn.parameters(), lr=0.1)

for epoch in range(100):
    total_loss = 0.0
    #Randomly shuffle examples in each epoch
    shuffled_i = list(range(0,len(train_Y)))
    random.shuffle(shuffled_i)
    for i in shuffled_i:
        x        = torch.from_numpy(train_X[i]).float()
        y_onehot = torch.zeros(num_classes)
        y_onehot[train_Y[i]] = 1

        ffnn.zero_grad()
        logProbs = ffnn.forward(x)
        loss = torch.neg(logProbs).dot(y_onehot)
        total_loss += loss
        
        loss.backward()
        optimizer.step()
    if epoch % 10 == 0:    
      print("loss on epoch %i: %f" % (epoch, total_loss))

#Evaluate on the training set:
num_errors = 0
for i in range(len(train_Y)):
    x = torch.from_numpy(train_X[i]).float()
    y = train_Y[i]
    logProbs = ffnn.forward(x)
    prediction = torch.argmax(logProbs)
    if y != prediction:
        num_errors += 1
print("number of errors: %d" % num_errors)

loss on epoch 0: 3.940274
loss on epoch 10: 2.376710
loss on epoch 20: 1.213042
loss on epoch 30: 0.251226
loss on epoch 40: 0.114335
loss on epoch 50: 0.067142
loss on epoch 60: 0.044789
loss on epoch 70: 0.031554
loss on epoch 80: 0.023493
loss on epoch 90: 0.018145
number of errors: 0


# Neural Bag-of-Words (10 points)

Your next task is to implement a neural bag-of-words baseline, NBOW-RAND, as described in [this paper](https://www.aclweb.org/anthology/P15-1162.pdf).  See section 2.1.  Note that the dataset we are using is a sample of the full IMDB dataset to make developing your solution easier, so your performance will be a bit lower than the numbers reported in the paper for the full dataset.

# GPUs

You probably want to use GPUs here to enable faster training with larger word embeddings and hidden layers (the paper listed above uses 300 dimensions).

Colab provides free access to GPUs, which will be useful for rest of the assignment.  To access GPUs, select `Runtime` from the menu at the top of the page, and then choose `Change Runtime Type`; under the `Hardware Accelerator` dropdown select `GPU`.  Note that the free version of Colab does have quotas on how much GPU can be used - you won't need to use GPUs for the first part of the assignment (implementing the Perceptron algorithm).

You can check to make sure a GPU is available using the code below:

In [102]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Mon Feb  8 20:40:03 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    36W / 300W |   2445MiB / 16130MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Data format

In addition to reading in the data in sparse matrix (Numpy CSR) format, which you used in your implementation of the perceptron algorithm, the data loading code above also reads it in another format in `train.XwordList`.  This contains a list of PyTorch arrays consisting of sequences of word IDs corresponding to the content of each document.  The the documents in `train.XwordList` are presented in the same order as the labels (`train.Y`).  For the next part of the assignment, you should work with the data in this new format, which is fairly standard for text data in neural networks.  See below for a few examples on how to access the data in this new format.

In [103]:
# Example of using XwordList

print("Here is the first document:")
print(train.XwordList[0].tolist())

print("After converting to IDs to words:")
print([train.vocab.GetWord(x) for x in train.XwordList[0].tolist()])

print("Here is the label:")
print(train.Y[0])

Here is the first document:
[40820, 40821, 6, 8, 12, 2884, 1189, 3552, 19, 1664, 3021, 54, 912, 190, 1072, 1900, 289, 73, 12, 1664, 645, 252, 6, 4409, 19, 1664, 2313, 31, 38527, 38, 14842, 8304, 78, 19, 108, 2313, 31, 38527, 38, 14842, 8304, 15, 136, 22, 128, 283, 3021, 22, 75, 61, 19, 10329, 454, 455, 456, 457, 454, 455, 456, 457, 5829, 22, 75, 61, 97, 40820, 40821, 26, 27, 3377, 29, 310, 22, 136, 141, 18, 1739, 12, 108, 73, 59, 70, 235, 855, 15, 1130, 75, 447, 51, 75, 61, 1645, 22, 101, 40822, 103, 7199, 402, 256, 3940, 23, 4914, 38, 1580, 19, 7993, 73, 4082, 18492, 454, 455, 456, 457, 454, 455, 456, 457, 190, 277, 1837, 22, 40822, 17, 19, 2036, 11, 12, 7568, 7569, 22, 128, 8, 85, 19, 24435, 1407, 374, 975, 15, 40821, 953, 73, 288, 54, 40823, 17, 40824, 22163, 73, 345, 10071, 1189, 12, 293, 14842, 22, 12, 10327, 22, 38, 12, 293, 4408, 12, 10327, 15, 168, 3260, 9, 19, 108, 54, 2054, 398, 12252, 31, 40825, 1130, 822, 909, 2911, 38, 2153, 15, 307, 22, 157, 3841, 141, 811, 143, 75, 953, 

# Suggestions for getting started

We recommend following a similar structure as the XOR example above, starting with an [Embedding Layer](https://pytorch.org/docs/stable/nn.html\#embedding), [ReLU nonlinearity](https://pytorch.org/docs/stable/nn.html\#relu), a single hidden layer and [Adam optimizer.](https://pytorch.org/docs/stable/optim.html\#torch.optim.Adam)
Please refer to the Pytorch Documentation for more information as necessary.  You can use a batch size of one for this assignment, to simplify your implementation.


In [104]:
import tqdm

import torch
import torch.nn as nn
from torch import optim

class NBOW(nn.Module):
    def __init__(self, VOCAB_SIZE, DIM_EMB=300, NUM_CLASSES=2):
        super(NBOW, self).__init__()
        self.NUM_CLASSES=NUM_CLASSES
        #TODO: Initialize parameters.
        #embedding layer, relu, hiddenlayer, adam
        self.V = nn.Embedding(VOCAB_SIZE, DIM_EMB)
        self.g = nn.ReLU()
        self.W = nn.Linear(DIM_EMB, NUM_CLASSES)
        self.logSoftmax = nn.LogSoftmax(dim=0)


    def forward(self, X):
        #TODO: Implement forward computation.
        #return torch.randn(self.NUM_CLASSES)
        #X.cuda()
        vfx = self.V(X.cuda()).mean(0)
        g_v = self.g(vfx)
        w_g = self.W(g_v)
        #return w_g
        sft= self.logSoftmax(w_g)
        return sft


def EvalNet(data, net):
    num_correct = 0
    Y = (data.Y + 1.0) / 2.0
    X = data.XwordList
    for i in range(len(X)):
        logProbs = net.forward(X[i])
        pred = torch.argmax(logProbs)
        if pred == Y[i]:
            num_correct += 1
    print("Accuracy: %s" % (float(num_correct) / float(len(X))))

def SavePredictions(data, outFile, net):
    fOut = open(outFile, 'w')
    for i in range(len(data.XwordList)):
        logProbs = net.forward(data.XwordList[i])
        pred = torch.argmax(logProbs)
        fOut.write(f"{data.XfileList[i]}\t{pred}\n")

def Train(net, X, Y, n_iter, dev):
    print("Start Training!")
    #TODO: initialize optimizer.
    adam = optim.Adam(net.parameters(),lr=.0005) #.05 too high, .0001 too low
    num_classes = len(set(Y))
    #print(X.shape)
    for epoch in range(n_iter):
        num_correct = 0
        total_loss = 0.0
        net.train()   #Put the network into training model
        for i in tqdm.notebook.tqdm(range(len(X))):
           # pass
            #TODO: compute gradients, do parameter update, compute loss.
            #x  = torch.from_numpy(train_X[i]).float()
            y_onehot = torch.zeros(num_classes)
            y_onehot[Y[i].astype('int')] = 1

            adam.zero_grad()
            logProbs = net.forward(X[i])
           # print(y_onehot.shape)
           # print(logProbs)

            loss = torch.neg(logProbs).dot(y_onehot.cuda())
            total_loss += loss
            
            loss.backward()
            adam.step()

        net.eval()    #Switch to eval mode
        print(f"loss on epoch {epoch} = {total_loss}")
        EvalNet(dev, net)

nbow = NBOW(train.vocab.GetVocabSize()).cuda()
Train(nbow, train.XwordList, (train.Y + 1.0) / 2.0, 5, dev)

Start Training!


HBox(children=(FloatProgress(value=0.0, max=7222.0), HTML(value='')))


loss on epoch 0 = 4110.056640625
Accuracy: 0.8033303330333034


HBox(children=(FloatProgress(value=0.0, max=7222.0), HTML(value='')))


loss on epoch 1 = 2209.860595703125
Accuracy: 0.8582358235823583


HBox(children=(FloatProgress(value=0.0, max=7222.0), HTML(value='')))


loss on epoch 2 = 1312.31005859375
Accuracy: 0.8703870387038704


HBox(children=(FloatProgress(value=0.0, max=7222.0), HTML(value='')))


loss on epoch 3 = 791.2699584960938
Accuracy: 0.8735373537353736


HBox(children=(FloatProgress(value=0.0, max=7222.0), HTML(value='')))


loss on epoch 4 = 457.943603515625
Accuracy: 0.8717371737173717


# Evaluate on the test set

Once you are happy with your performance on the dev set, run the cell below to evaluate your performance on the test set.  Make sure to download the predictions of your model in `test_pred_nbow.txt`.

In [105]:
EvalNet(test, nbow)
SavePredictions(test, 'test_pred_nbow.txt', nbow)

Accuracy: 0.8667959014123512


# 1-D Convolutional Neural Networks (2 points - optional extra credit)

The final task for this assignment is to implement a convolutional neural network for text classification similar to the CNN-rand baseline described by [Kim (2014)](https://www.aclweb.org/anthology/D14-1181.pdf).  We haven't covered CNNs in lecture yet, so this part is optional.  You should use the same data format as the NBOW model above, starting out with an [Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html) layer, followed by a [1-D convolution](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html), a [max-pooling layer](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool1d.html#torch.nn.MaxPool1d) and a [Dropout layer](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html).  You should be able to use the same `Train()` function you wrote above.

In [106]:
import torch.nn.functional as F
import torch.nn as nn
from torch import optim
import tqdm


class CNN(nn.Module):
    def __init__(self, VOCAB_SIZE, DIM_EMB=300, NUM_CLASSES=2):
        super(CNN, self).__init__()
        self.NUM_CLASSES=NUM_CLASSES
        #TODO: Initialize parameters.
        self.emb = nn.Embedding(VOCAB_SIZE, DIM_EMB)
        self.conv = nn.Conv1d(DIM_EMB,1000 , kernel_size=3) #dim = B * OUT_Channels * Seq Length
        self.drop = nn.Dropout()
        self.hidden = nn.Linear( 1000, NUM_CLASSES)
        self.softmax = nn.LogSoftmax(dim=0)

    def forward(self, X):
        #TODO: Implement forward computation.
        pool = nn.MaxPool1d(500) #sequence length of conv output


        e = self.emb(X.cuda())
        c = self.conv(e.reshape(1, e.shape[1], e.shape[0]))
        p = F.max_pool1d(c, c.shape[2])
        d = self.drop(p)
        h = self.hidden(d.squeeze())
        return self.softmax(h)

        #return torch.randn(self.NUM_CLASSES)



def Train(net, X, Y, n_iter, dev):
  print("Start Training!")
  #TODO: initialize optimizer.
  adam = optim.Adam(net.parameters(),lr=.05) #.05 too high, .0001 too low
  num_classes = len(set(Y))
  #print(X.shape)
  for epoch in range(n_iter):
      num_correct = 0
      total_loss = 0.0
      net.train()   #Put the network into training model
      for i in tqdm.notebook.tqdm(range(len(X))):
        # pass
          #TODO: compute gradients, do parameter update, compute loss.
          #x  = torch.from_numpy(train_X[i]).float()
          y_onehot = torch.zeros(num_classes)
          y_onehot[Y[i].astype('int')] = 1

          adam.zero_grad()
          logProbs = net.forward(X[i])
        # print(y_onehot.shape)
        # print(logProbs)

          loss = torch.neg(logProbs).dot(y_onehot.cuda())
          total_loss += loss
          
          loss.backward()
          adam.step()
      net.eval()    #Switch to eval mode
      print(f"loss on epoch {epoch} = {total_loss}")
      EvalNet(dev, net)

cnn = CNN(train.vocab.GetVocabSize()).cuda()
Train(cnn, train.XwordList, (train.Y + 1.0) / 2.0, 5, dev)

Start Training!


HBox(children=(FloatProgress(value=0.0, max=7222.0), HTML(value='')))


loss on epoch 0 = 241733472.0
Accuracy: 0.7979297929792979


HBox(children=(FloatProgress(value=0.0, max=7222.0), HTML(value='')))


loss on epoch 1 = 313568384.0
Accuracy: 0.6903690369036903


HBox(children=(FloatProgress(value=0.0, max=7222.0), HTML(value='')))


loss on epoch 2 = 310668512.0
Accuracy: 0.7655265526552655


HBox(children=(FloatProgress(value=0.0, max=7222.0), HTML(value='')))


loss on epoch 3 = 268897728.0
Accuracy: 0.8307830783078308


HBox(children=(FloatProgress(value=0.0, max=7222.0), HTML(value='')))


loss on epoch 4 = 229844576.0
Accuracy: 0.828982898289829


In [107]:
def EvalNet(data, net):
    num_correct = 0
    Y = (data.Y + 1.0) / 2.0
    X = data.XwordList
    for i in range(len(X)):
        logProbs = net.forward(X[i])
        pred = torch.argmax(logProbs)
        if pred == Y[i]:
            num_correct += 1
    print("Accuracy: %s" % (float(num_correct) / float(len(X))))

def SavePredictions(data, outFile, net):
    fOut = open(outFile, 'w')
    for i in range(len(data.XwordList)):
        logProbs = net.forward(data.XwordList[i])
        pred = torch.argmax(logProbs)
        fOut.write(f"{data.XfileList[i]}\t{pred}\n")

# Evaluate on the test set

Once you are happy with your performance on the dev set, run the cell below to evaluate your performance on the test set.  Make sure to download the predictions of your model in `test_pred_nbow.txt`.

In [108]:
EvalNet(test, cnn)
SavePredictions(test, 'test_pred_cnn.txt', cnn)

Accuracy: 0.8296870672943782


# Gradescope

Gradescope allows you to add multiple files to your submission. Please submit this notebook along with the test set prediction:
* test_pred_perceptron.txt
* test_pred_nbow.txt
* test_pred_cnn.txt (optional)
* TextClassification_solution.ipynb

To download this notebook, go to `File > Download.ipynb`. You can download the predictions from Colab by clicking the folder icon on the left and finding them under Files. 

Please make sure that you name the files as specified above. You will be able to see the test set accuracy for your predictions. However, the final score will be assigned later based on accuracy and implementation. 

When submitting the .ipynb notebook, please make sure that all the cells run when executed in order starting from a fresh session. If the code doesn't take too long to run, you can re-run everything with `Runtime -> Restart and run all`

You can submit multiple times before the deadline and choose the submission which you want to be graded by going to `Submission History` on gradescope.

