# PS3: Neural Networks for Classification and Natural Language Inference

In [1]:
import json
import csv
import os
import glob

import torch
import torch.nn as nn
from torch.autograd import Variable
from torch.nn import functional as F

from sklearn.metrics import f1_score, precision_score, recall_score

import numpy as np

The purpose of this task is to gain an understanding of training neural networks. Likewise, you will get to learn about the pytorch framework.

## Submission Instructions

After completing the exercises below, generate a pdf of the code **with** outputs. After that create a zip file containing both the completed exercise and the generated PDF. You are **required** to check the PDF to make sure all the code **and** outputs are clearly visible and easy to read. If your code goes off the page, you should reduce the line size. I generally recommend not going over 80 characters.

Finally, name the zip file using a combination of your the assigment and your name, e.g., ps3_rios.zip

## PART I: Data Cleaning (10 points)

Load the "surnames.csv" file to train a LSTM to predict nationality based on surname. You will need to transform the data from a list of strings to a list of indexes. For example, the following data

```
Anthony
John
David
```

should be transformed into a list of lists.

```
[[0, 1, 2, 3, 4, 1, 5],
 [6, 4, 3, 1],
 [7, 8, 9, 10, 11]]
```

Next, you will need zero-pad all examples to be the same size.

```
[[0, 1, 2, 3, 4, 1, 5],
 [6, 4, 3, 1, 0, 0, 0],
 [7, 8, 9, 10, 11, 0, 0]]
```

Finally, everything will be converted into numpy arrays.

In [2]:
char2index = {'<PAD>': 0}
index2char = {0: '<PAD>'}
class2index = {} # stores the class index pairs.
index2class = {}
doc_lengths = [] # Stores the lengths of all docs (train, test and dev)
X_train = [] # stores an 
y_train = [] # stores an index to the correct class
X_dev = []
y_dev = []
X_test = []
y_test = []
X_train_len = [] # Stores the length of each training name
X_test_len = [] # ... length of each test name
X_dev_len = [] # ... length of each dev name

# Write code to load data here.
dataset_filename = 'surnames.csv'
datadir = 'data'
dataset_path = os.path.join(datadir, dataset_filename)
data_names = ['X_train', 'X_dev', 'X_test']

def access_data(path):
    ret = {'X_train':[], 'y_train':[], 'X_dev':[], 'y_dev':[], 'X_test':[], 'y_test':[], 'len':[]}
    with open(path, 'r', encoding='utf-8') as f:
        reader = csv.reader(f,dialect='excel')
        for i, row in enumerate(reader):
#             if i > 5:
#                 break
#             print(row)
            if row[0] == 'train':
                ret['X_train'].append(row[1])
                ret['y_train'].append(row[2])
            if row[0] == 'test':
                ret['X_test'].append(row[1])
                ret['y_test'].append(row[2])
            if row[0] == 'dev':
                ret['X_dev'].append(row[1])
                ret['y_dev'].append(row[2])
            ret['len'].append(len(row[1]))
    return ret
    
def load_data(path):
    
    results = access_data(path)
    class2index = {}
    for class_name in set(results['y_train']+results['y_dev']+results['y_test']):
        class2index[class_name] = len(class2index)
    index2class = {ind:class_name for class_name,ind in class2index.items()}
    return results, class2index, index2class
    
    
def update_mappings(X, x2index, index2x, map_elements=True):
    if map_elements:
        xs = set([element for x in X for element in x])
    else:
        xs = set([x for x in X])
    for x in xs:
#         print("\nx:",x)
#         print(len(x2index))
        x2index[x] = len(x2index)
        index2x[len(index2x)] = x
    
        
def convert_to_index_map(X, x2index, map_element=True):
    index_mappings = []
    for x in X:
        if map_element:
            index_map = [x2index[element] if element in x2index else 0 for element in x]
        else:
            if x in x2index:
                index_map = x2index[x]
            else:
                print('S')
                index_map = 0
        index_mappings.append(index_map)
    return index_mappings
            
    
    
    
data, class2index, index2class = load_data(dataset_path)
X_train, y_train, X_dev, y_dev, X_test, y_test = data['X_train'],data['y_train'],data['X_dev'],data['y_dev'] \
                                                 ,data['X_test'],data['y_test']
doc_lengths = data['len']
d = [X_train, X_dev, X_test]
dl = [X_train_len, X_dev_len, X_test_len]
[dl[i].append(len(x)) for i in range(len(d)) for x in d[i]]
update_mappings(X_train, char2index, index2char)
# print(class2index)
# update_mappings(y_train, class2index, index2class, map_elements=False)
print(char2index)
print(class2index)
print(index2class)

X_train_nums, X_dev_nums = convert_to_index_map(X_train, char2index),convert_to_index_map(X_dev, char2index)
X_test_nums = convert_to_index_map(X_test, char2index)
y_train = convert_to_index_map(y_train, class2index, map_element=False)
y_dev = convert_to_index_map(y_dev, class2index, map_element=False)
y_test = convert_to_index_map(y_test, class2index, map_element=False)

{'<PAD>': 0, 'ü': 1, 'C': 2, 'ń': 3, '1': 4, ',': 5, 'g': 6, 'm': 7, 'Q': 8, 'S': 9, 'H': 10, 'í': 11, 'j': 12, "'": 13, 'ú': 14, 'W': 15, 'ó': 16, 'á': 17, 'i': 18, 'B': 19, 'D': 20, 'l': 21, 'ż': 22, 'ß': 23, 'F': 24, 'ù': 25, 'õ': 26, 'd': 27, 'R': 28, 'U': 29, '-': 30, 'T': 31, 'N': 32, 'c': 33, 'ì': 34, 'p': 35, 'Ś': 36, 'ã': 37, 'è': 38, 'f': 39, 'e': 40, 'Y': 41, 'ö': 42, 'ñ': 43, 'ą': 44, 'M': 45, 'V': 46, 'a': 47, 'u': 48, 'A': 49, 'n': 50, 'é': 51, 'J': 52, 'ä': 53, 'z': 54, 'r': 55, 'O': 56, 'q': 57, '/': 58, 'w': 59, 'à': 60, ' ': 61, 'v': 62, 'X': 63, 'o': 64, 't': 65, 'ł': 66, 'I': 67, 'x': 68, 'Z': 69, 'Á': 70, 'E': 71, 'K': 72, 's': 73, 'k': 74, 'P': 75, 'G': 76, 'h': 77, 'ê': 78, 'L': 79, 'y': 80, 'b': 81, 'ò': 82}
{'french': 0, 'arabic': 1, 'russian': 2, 'english': 3, 'german': 4, 'greek': 5, 'dutch': 6, 'polish': 7, 'vietnamese': 8, 'japanese': 9, 'chinese': 10, 'spanish': 11, 'irish': 12, 'scottish': 13, 'czech': 14, 'korean': 15, 'portuguese': 16, 'italian': 17}
{0

In [3]:
from collections import Counter


print(len(X_dev))
print(len(X_dev_len))
print(len(doc_lengths))
print(X_train_nums[1])
print(X_train[1])
print(y_train[0])
cnt = Counter()
cnt.update(y_train)
print(cnt)
cnt = [val for key, val in sorted(cnt.items(), key=lambda x: x[0])]
class_weights = torch.FloatTensor(cnt)/sum(cnt)
print(class_weights)

3060
3060
20074
[75, 55, 18, 74, 47, 54, 33, 77, 18, 74, 64, 62]
Prikazchikov
1
Counter({2: 7050, 3: 2713, 1: 1507, 9: 770, 4: 532, 17: 526, 14: 380, 6: 223, 11: 213, 0: 203, 10: 199, 12: 174, 5: 154, 7: 103, 13: 73, 15: 66, 16: 57, 8: 57})
tensor([0.0135, 0.1005, 0.4700, 0.1809, 0.0355, 0.0103, 0.0149, 0.0069, 0.0038,
        0.0513, 0.0133, 0.0142, 0.0116, 0.0049, 0.0253, 0.0044, 0.0038, 0.0351])


In [4]:
# PADDING

max_seq_len = max(doc_lengths)
len_to_pad = len(max(X_train, key=lambda x: len(x)))
print('longest in training set:', len_to_pad)
X_train_eq_size = []
X_dev_eq_size = []
X_test_eq_size = []

def pad_example(ex, len_to_pad, pad):
    padded = ex[:len_to_pad] +[pad]*(len_to_pad -len(ex))
    return padded
# Write code to append data to code here
for x in X_train_nums:
    X_train_eq_size.append(pad_example(x,len_to_pad, 0))
    
for x in X_dev_nums:
    X_dev_eq_size.append(pad_example(x,len_to_pad, 0))
    
for x in X_test_nums:
    X_test_eq_size.append(pad_example(x,len_to_pad, 0))
    
print(len(X_dev))
X_train = np.array(X_train_eq_size)
X_dev = np.array(X_dev_eq_size)
X_test = np.array(X_test_eq_size)
print(len(X_dev))
print(X_dev[0])
y_train = np.array(y_train)
y_dev = np.array(y_dev)
y_test = np.array(y_test)

X_train_len = np.array(X_train_len)
X_dev_len = np.array(X_dev_len)
X_test_len = np.array(X_test_len)

idx = np.argsort(X_dev_len)[::-1]
X_dev = X_dev[idx]
y_dev = y_dev[idx]
X_dev_len = X_dev_len[idx]

idx = np.argsort(X_test_len)[::-1]
X_test = X_test[idx]
y_test = y_test[idx]
X_test_len = X_test_len[idx]

doc_lengths = np.array(doc_lengths)
print(X_train.shape)

longest in training set: 20
3060
3060
[24 48 55 21 64 50  6  0  0  0  0  0  0  0  0  0  0  0  0  0]
(15000, 20)


In [5]:
print(np.unique(y_train, axis=0, return_counts=True))
print(X_dev.shape)
print(np.unique(X_dev,  return_counts=True))
print(X_dev)
print(X_dev[1])

(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17]), array([ 203, 1507, 7050, 2713,  532,  154,  223,  103,   57,  770,  199,
        213,  174,   73,  380,   66,   57,  526], dtype=int64))
(3060, 20)
(array([ 0,  2,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
       21, 23, 24, 27, 28, 29, 30, 31, 32, 33, 35, 36, 39, 40, 41, 42, 43,
       45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 59, 60, 61, 62,
       63, 64, 65, 67, 68, 69, 71, 72, 73, 74, 75, 76, 77, 79, 80, 81, 82]), array([39281,    89,   333,   404,     6,   216,   178,     2,    80,
          15,     2,    46,     4,     1,  1488,   251,   133,   880,
           1,    91,   465,   130,    20,     4,   173,   108,   389,
         132,     1,   199,  1573,    59,     1,     1,   230,   117,
        2312,   673,   276,  1395,     2,   118,     1,   205,  1206,
          54,     2,   131,     2,    15,   842,     6,  1625,   716,
          35,     9,    78,    66,   169

## PART II: Classification (25 points)

In [6]:
class LSTM(nn.Module):
    def __init__(self, nb_layers, word2index, class2index, nb_lstm_units=100,
                 embedding_dim=3, batch_size=3, bidirectional=False):
        super(LSTM, self).__init__()
        self.vocab = word2index
        self.tags = class2index

        self.nb_layers = nb_layers
        self.nb_lstm_units = nb_lstm_units
        self.embedding_dim = embedding_dim
        self.batch_size = batch_size

        self.nb_tags = len(self.tags)
        self.bidirectional = bidirectional
        self.num_directions = 2 if self.bidirectional else 1
        print('num_directions:', self.num_directions)
        # build actual NN
        self.__build_model()

    def __build_model(self):
        nb_vocab_words = len(self.vocab)

        padding_idx = self.vocab['<PAD>']
        self.word_embedding = nn.Embedding(
            num_embeddings=nb_vocab_words,
            embedding_dim=self.embedding_dim,
            padding_idx=padding_idx
        )

        self.lstm = nn.LSTM(
            input_size=self.embedding_dim,
            hidden_size=self.nb_lstm_units,
            num_layers=self.nb_layers,
            batch_first=True
        )
        
#         if self.bidirectional:
            
#             self.lstm_back = nn.LSTM(
#                 input_size=self.embedding_dim,
#                 hidden_size=self.nb_lstm_units,
#                 num_layers=self.nb_layers,
#                 batch_first=True
#             )

        self.hidden_to_tag = nn.Linear(self.nb_lstm_units*self.nb_layers*self.num_directions, self.nb_tags)
        
        self.logsoftmax = nn.LogSoftmax(dim=1)
        self.softmax = nn.Softmax(dim=1)
        self.inference = False

    def init_hidden(self, X, bidirectional=False):
#         if bidirectional:
#             h0 = torch.zeros(self.nb_layers, X.size(0), self.nb_lstm_units*self.num_directions).float()
#             c0 = torch.zeros(self.nb_layers, X.size(0), self.nb_lstm_units*self.num_directions).float()
#         else:
            # Initial ht (hidden state) and ct (context)
        h0 = torch.zeros(self.nb_layers, X.size(0), self.nb_lstm_units).float()
        c0 = torch.zeros(self.nb_layers, X.size(0), self.nb_lstm_units).float()
        return (h0,c0)

    def forward(self, X, X_lengths):
        # reset the LSTM hidden state. Must be done before you run a new batch.
        # Otherwise the LSTM will treat
        # a new batch as a continuation of a sequence
        self.hidden = self.init_hidden(X)
        self.hidden_back = self.init_hidden(X)
        
        batch_size, seq_len = X.shape
        
        # ---------------------
        # 1. embed the input
        # Dim transformation: (batch_size, seq_len) -> (batch_size, seq_len, embedding_dim)
#         print(X[:128,:])
#         print(X[:128,:].size())
        
        X = self.word_embedding(X)
        if self.bidirectional:
            X = torch.cat((X,torch.flip(X,[1])), 2)
            

        # ---------------------
        # 2. Run through RNN
        # TRICK 2 ********************************
        # Dim transformation: (batch_size, seq_len, embedding_dim) -> (batch_size, seq_len, nb_lstm_units)

    
        # pack_padded_sequence so that padded items in the sequence won't be shown to the LSTM
        if self.bidirectional:
            X_for = torch.nn.utils.rnn.pack_padded_sequence(X[:,:,:self.embedding_dim], X_lengths, batch_first=True)
            X_back = torch.nn.utils.rnn.pack_padded_sequence(X[:,:,self.embedding_dim:], X_lengths, batch_first=True)
        else:
            X = torch.nn.utils.rnn.pack_padded_sequence(X, X_lengths, batch_first=True)
        
        
        # now run through LSTM
        # X contains the padded sequence output and ht contains the final hidden states
        if self.bidirectional:
            X_for, (ht_for, ct_for) = self.lstm(X_for, self.hidden)
            X_back, (ht_back, ct_back) = self.lstm(X_back, self.hidden_back)
#             X_back, (ht_back, ct_back) = self.lstm_back(X_back, self.hidden) 
            ht = torch.cat((ht_for, ht_back), 2)
        else:
            X, (ht, ct) = self.lstm(X, self.hidden)
        
#         print('hidden state shape:', ht.size())
        
        # Reshape to use the final state from each lstm layer
        out = ht.view(ht.size(1), self.nb_lstm_units*self.nb_layers*self.num_directions)

        # pass final states to output layer
        out = self.hidden_to_tag(out)
        
        # Use logsoftmax for training and softmax for testing
        if not self.inference:
            Y_hat = self.logsoftmax(out)
        else:
            Y_hat = self.softmax(out)

        return Y_hat

In [None]:
num_layers = 2
epochs = 15
batch_size = 128
lstm_unit_size = 512
embedding_size = 256
prints_per_epoch = 12
print_iter = len(y_train)//batch_size//prints_per_epoch
bidirectional = False
print('Nuberber of prints per epoch:',print_iter)

m = LSTM(num_layers, char2index, class2index, nb_lstm_units = lstm_unit_size,
         embedding_dim = embedding_size, batch_size = batch_size, bidirectional=bidirectional)

criterion = nn.NLLLoss()#size_average=False,weight=1/class_weights
optim = torch.optim.Adam(m.parameters(), lr=0.1)

indeces = np.arange(X_train.shape[0])     
print('y_train shape:', y_train.shape)
print('Iterations per epoch:', y_train.shape[0] // batch_size)
print('Nuberber of prints per epoch:',print_iter)

np_X_dev = torch.tensor(X_dev).long()
np_X_dev_len = torch.tensor(np.array(X_dev_len)[np.argsort(np.array(X_dev_len))[::-1]] ).long()

for epoch in range(epochs):
    np.random.shuffle(indeces)
    x_train = X_train[indeces]
    y_train2 = y_train[indeces]
    x_lens = X_train_len[indeces]

    np_x_sorted_lens = np.array(x_lens)[np.argsort(np.array(x_lens))[::-1]]
    current_batch = 0
    for iteration in range(y_train2.shape[0] // batch_size):
        
        batch_lengths = x_lens[current_batch: current_batch + batch_size]
        lengths = np.array(batch_lengths)
        idx = np.argsort(lengths)[::-1]
        batch_lengths = batch_lengths[idx]
        batch_lengths = torch.tensor(batch_lengths).long()
        
        
        batch_x = X_train[current_batch: current_batch + batch_size]
        batch_x = batch_x[idx]
        batch_x = torch.tensor(batch_x).long()
        
        batch_y = y_train2[current_batch: current_batch + batch_size]
        batch_y = batch_y[idx]
        batch_y = torch.tensor(batch_y).long()
        
        current_batch += batch_size
                        
        optim.zero_grad()
        if len(batch_x) > 0:
            batch_pred = m(batch_x, batch_lengths)
            
            loss = criterion(batch_pred, batch_y)
            loss.backward()
            optim.step()

#         if iteration % print_iter == 0:
#             with torch.no_grad():
#                 m.train(False)
#                 m.inference = True
#                 train_batch_pred = np.array(m(batch_x, batch_lengths)).argmax(axis=1)
#                 train_mic_f1 = f1_score(batch_y, train_batch_pred, average='micro')
#                 train_mac_f1 = f1_score(batch_y, train_batch_pred, average='macro')
          
#                 batch_pred = np.array(m(X_dev, X_dev_len)).argmax(axis=1)
                
#                 dev_batch_y = y_dev
#                 dev_mic_f1 = f1_score(dev_batch_y, batch_pred, average='micro')
#                 dev_mac_f1 = f1_score(dev_batch_y, batch_pred, average='macro')
# #                 precision = precision_score(batch_y, batch_pred, average='micro')
# #                 recall = recall_score(batch_y, batch_pred, average='micro')
#                 print(f'training loss {loss.item():.3f}\titeraton: { iteration}\tepoch {epoch} ')
#                 print('Train:')      
#                 print(f'\tmicro f1 { train_mic_f1:.3f} macro f1 {train_mac_f1:.3f}')
#                 print('Dev:')      
#                 print(f'\tmicro f1 { dev_mic_f1:.3f} macro f1 {dev_mac_f1:.3f}\n')
# #                 print('\tPrediction counts:')
# #                 uniques = np.unique(train_batch_pred, axis=0, return_counts=True)
# #                 print('\tpred indeces')
# #                 print(uniques[0])
# #                 print(uniques[0][0])
# #                 print(len(uniques[0]))
# #                 print('\t\t',end='')
# #                 [print(str(uniques[0][i])+'  ', end='') for i in range(len(uniques[0]))]
# #                 print()
# #                 print('\tpred counts')
# #                 print('\t\t',end='')
# #                 [print(str(uniques[1][i])+'  ', end='') for i in range(len(uniques[0]))]
# #                 print()
# #                 print('\ttrue indeces')
# #                 print('\t\t',end='')
# #                 true_unqs = np.unique(batch_y, axis=0, return_counts=True)
# #                 [print(str(true_unqs[0][i])+'  ', end='') for i in range(len(true_unqs[0]))]
# #                 print()
# #                 print('\ttrue counts')
# #                 print('\t\t',end='')
# #                 [print(str(true_unqs[1][i])+'  ', end='') for i in range(len(true_unqs[0]))]
# #                 print()
# #                 
#                 m.train(True)
#                 m.inference = False
        
        
    
    with torch.no_grad():
        m.train(False)
        m.inference = False
        raw_pred = m(torch.tensor(np.array(x_train)).long(),torch.tensor(np_x_sorted_lens).long() )
        train_pred = np.array(raw_pred.long()).argmax(axis=1)
        train_mic_f1 = f1_score(y_train, train_pred, average='micro')
        train_mac_f1 = f1_score(y_train, train_pred, average='macro')
        loss = criterion(raw_pred, torch.tensor(y_train2).long())

        
        dev_pred = np.array(m(np_X_dev, np_X_dev_len)).argmax(axis=1)

        dev_batch_y = y_dev
        dev_mic_f1 = f1_score(y_dev, dev_pred, average='micro')
        dev_mac_f1 = f1_score(y_dev, dev_pred, average='macro')
#                 precision = precision_score(batch_y, batch_pred, average='micro')
#                 recall = recall_score(batch_y, batch_pred, average='micro')
        print('EPOCH SUMMARY:')
        print(f'training loss {loss.item():.3f}\titeraton: { iteration}\tepoch {epoch} ')
        print('Train:')      
        print(f'\tmicro f1 { train_mic_f1:.3f} macro f1 {train_mac_f1:.3f}')
        print('Dev:')      
        print(f'\tmicro f1 { dev_mic_f1:.3f} macro f1 {dev_mac_f1:.3f}\n')
#                 print('\tPrediction counts:')
#                 uniques = np.unique(train_batch_pred, axis=0, return_counts=True)
#                 print('\tpred indeces')
#                 print(uniques[0])
#                 print(uniques[0][0])
#                 print(len(uniques[0]))
#                 print('\t\t',end='')
#                 [print(str(uniques[0][i])+'  ', end='') for i in range(len(uniques[0]))]
#                 print()
#                 print('\tpred counts')
#                 print('\t\t',end='')
#                 [print(str(uniques[1][i])+'  ', end='') for i in range(len(uniques[0]))]
#                 print()
#                 print('\ttrue indeces')
#                 print('\t\t',end='')
#                 true_unqs = np.unique(batch_y, axis=0, return_counts=True)
#                 [print(str(true_unqs[0][i])+'  ', end='') for i in range(len(true_unqs[0]))]
#                 print()
#                 print('\ttrue counts')
#                 print('\t\t',end='')
#                 [print(str(true_unqs[1][i])+'  ', end='') for i in range(len(true_unqs[0]))]
#                 print()
#                 
        m.train(True)
        m.inference = False

Nuberber of prints per epoch: 9
num_directions: 1
y_train shape: (15000,)
Iterations per epoch: 117
Nuberber of prints per epoch: 9


Answer the following questions below:

1. What was the micro and macro F1 on the test and dev sets?
2. Implement a bidirectional LSTM model. You will need to modify the hidden states and self.lstm variables. Does it work better?
3. Experiments with the various hyperparameters (hidden state size, learning rate, etc.). What hyperparemeters result in the best performance?

## PART III: Natural Language Inference (25 points)

Natural language inference is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise"[1, 2]. This task has been known to perform well for zero-shot classification[3].

Example:

| Premise | Label | Hypothesis |
| ------- | ----- | ---------- |
| A man inspects the uniform of a figure in some East Asian country. | contradiction | The man is sleeping. |
| An older and younger man smiling | neutral | Two men are smiling and laughing at the cats playing on the floor. |
| A soccer game with multiple males playing. | entailment | Some men are playing a sport. |

Your task is to load and train a model on the "multinli_1.0_train.jsonl" dataset and evaluate on "multinli_1.0_dev_matched.jsonl" using accuracy.

I am leaving this task relativley open. One solution is to modify the LSTM code above to pass two documents through a LSTM model and return the last hidden state for each. Next, concatenate the two vectors, then pass it through a softmax layer. Finally, train using the same forumlate as Part I.

**NOTE:** You do not need to train until convergence. You can train for only an epoch or 2 max; train less if it takes to long. I simply want to see that it runs and is learning.


[1] Williams, Adina, Nikita Nangia, and Samuel Bowman. "A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference." Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

[2] Bowman, Samuel R., et al. "A large annotated corpus for learning natural language inference." Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015.

[3] Yin, Wenpeng, Jamaal Hay, and Dan Roth. "Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach." Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.

In [None]:
# COPY AND EDIT CODE HERE

1. Describe your solution.

**ANSWER HERE**

## EXTRA CREDIT 1 (10 points)

Modify the LSTM model to train a language model, then write code to generate new text from the model. Do not forget to mask the loss function when training the language model to handle the different lengths of the sequences. Use the "en-ud-train.upos.tsv" dataset.

Generate 10 examples from your model.

In [None]:
# COPY AND EDIT CODE HERE