# Assignment 4. Sentiment analysis using recurrent neural networks

Full instructions on completion of this assignment can be found in Canvas. 

## 1. Theoretical part


### 1.1 Vanishing gradient problem

What is the vanishing / exploiding gradient problem in Elman recurrent neural networks? Write down update equations for Elman RNN and explain what is causing the vanishing / exploiding gradient issue.

**vanishing**
The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value. In the worst case, this may completely stop the neural network from further training.**exploding** when derivatives can take on larger values, one risks encountering the related exploding gradient problem.

For Elman RNN we can define follows reason for such problem: 

-Gradients vanish (explode) exponentially across time steps when the recurrent connection is <1 (>1)

-Problem is connected to the fact that it is always the same connection weight

-In the same way a product of n real numbers can shrink to zero or explode to infinity, so does this product of matrices

update equation for Elman RNN
$$
\begin{aligned} h_{t} &=\sigma_{h}\left(W_{h} x_{t}+U_{h} h_{t-1}+b_{h}\right) \\ y_{t} &=\sigma_{y}\left(W_{y} h_{t}+b_{y}\right) \end{aligned}
$$

How does LSTM help prevent the vanishing (and exploding) gradient problem in a recurrent neural network? Write down the equations of LSTM and explain how technically this schema is better than the Elman recurrent neural networks.

Classic  RNNs can keep track of arbitrary long-term dependencies in the input sequences. The problem of vanilla RNNs is computational  in nature: when training a vanilla RNN using back-propagation, the gradients which are back-propagated can vanish or explode.LSTM units partially solve the vanishing gradient problem, because LSTM units allow gradients to also flow unchanged. However, LSTM networks can still suffer from the exploding gradient problem.

LSTM equations

$$
\begin{array}{l}{f_{t}=\sigma_{g}\left(W_{f} x_{t}+U_{f} h_{t-1}+b_{f}\right)} \\ {i_{t}=\sigma_{g}\left(W_{i} x_{t}+U_{i} h_{t-1}+b_{i}\right)} \\ {o_{t}=\sigma_{g}\left(W_{o} x_{t}+U_{o} h_{t-1}+b_{o}\right)} \\ {c_{t}=f_{t} \circ c_{t-1}+i_{t} \circ \sigma_{c}\left(W_{c} x_{t}+U_{c} h_{t-1}+b_{c}\right)} \\ {h_{t}=o_{t} \circ \sigma_{h}\left(c_{t}\right)}\end{array}
$$

$x_t$ -input vector,$f_t$ forgate gates, $i_t$ -input/update gate's activation vector,  $o_t$ -output gate's activation vector, $h_t$ -hidden state vector also known as output vector of the LSTM unit,  $c_t$ -cell state vector. $W \in R^{hxd}$,$U\in R^{hxh}$ and $b\in R^h$. And $\sigma_g$ is sigmoid, $\sigma_c$ is hyperbolic tangent, $\sigma_h$ is hyperbolic tangent



As we can see here we have different gates, and we can control dependencies between the elements in the input sequence. Th input gate which a new value flows into the cell. The forget gate controls which values remains in the cell, and output gate control which value in cell used to compute activations in LSTM


Now we can show derivative of $f$
$
\begin{aligned} f^{\prime} =\frac{\partial c_{t}}{\partial c_{t-1}} =\operatorname{Diag}\left(f_{t}\right)+\operatorname{Diag}\left(c_{t-1}\right) \frac{\partial f_{t}}{\partial c_{t-1}}+\operatorname{Diag}\left(i_{t}\right) \frac{\partial a_{t}}{\partial c_{t-1}}+\operatorname{Diag}\left(a_{t}\right) \frac{\partial i_{t}}{\partial c_{t-1}} \end{aligned}
$

In the sum depend on partial derivatives with respect to  ct−1 . expanding those terms, we can find that long chain of weight matrices and derivatives of activation functions, as in the Elman RNN case. But thanksfully to gates in LSTM we can  delay the vanishing/exploding gradients problem grdaients stays close to 1 even when the other three terms vanish, which can be solved with careful weight initialization.

## 2. Practical part

### 2.1 Use LSTM and word embeddings for text classification 

Implement a text classifier based on Bi-LSTM network. Use hidden state(s) to represent an input text document.

If you use ``torch`` use the ``torch.nn.Embedding`` to load pre-trained word embeddings. Use the [GloVe](http://nlp.stanford.edu/data/wordvecs/glove.6B.zip) embeddings in the input layer of your network.

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')
%cd /content/gdrive/My Drive/Colab Notebooks/NNLP
%ls

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
/content/gdrive/My Drive/Colab Notebooks/NNLP
 assignment22.ipynb
 assignment2.ipynb
 assignment3.ipynb
 assignment4.ipynb
 classifier_doc_embeddings.py
 classifier_ffnn.py
 classifier_lr.py
 classifier_word_embeddings.py
 comments.tsv
'Copy of 01_seminar_starter.ipynb'
'Copy of sem_28_11.ipynb'
 d2v.model
 elmo_2x2048_256_2048cnn_1xhighway_options.json
 elmo_2x2048_256_2048cnn_1xhighway_weights.hdf5
 file2_ff.tsv
 file2.tsv
 file3.tsv
 file_ll.tsv
 file_l.tsv
 file_sk_lr.tsv
 [0m[01;34mFILIMDB[0m/
 glove.6B.300d.txt
 lstm_text_classification.ipynb
 Nikolay_Shvetsov_assignment2.ipynb
 [01;34mProject[0m/
 [01;34m__pycache__[0m/
 stomack.zip
 tut1-model
 tut1-model_last.pt
 tut1-model.pt
'Копия assignment4.ipynb'


In [0]:
import pandas as pd
from tqdm import tqdm
from sklearn.metrics import accuracy_score
import numpy as np
from collections import Counter, defaultdict
import codecs
import matplotlib.pyplot as plt
import re
import seaborn as sns
import string
from time import time
import string 
%matplotlib inline

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
import torch.utils.data as data_utils

In [0]:
def text_readers(path):
    file=codecs.open(path,'r','utf_8_sig')
    text=file.read()
    file.close()
    text=text.split('\n')
    text=text[:-1]
    return text

train_labels=pd.read_csv('FILIMDB/train.labels',header=None)
dev_labels=pd.read_csv('FILIMDB/dev.labels',header=None)
dev_labels_b=pd.read_csv('FILIMDB/dev-b.labels',header=None)

train_text=text_readers('FILIMDB/train.texts')
dev=text_readers('FILIMDB/dev.texts')
test=text_readers('FILIMDB/test.texts')
dev_b=text_readers('FILIMDB/dev-b.texts')
test_b=text_readers('FILIMDB/test-b.texts')

# train_labels=train_labels[0].replace(['neg','pos'],[0,1])
# dev_labels=dev_labels[0].replace(['neg','pos'],[0,1])
# dev_labels_b=dev_labels_b[0].replace(['neg','pos'],[0,1])

In [0]:
translator = str.maketrans('', '', string.punctuation)

def surround_non_symbols(word):
    new_word=''
    list_letters=list(word)    
    for symbol in list_letters:
        if symbol in set(string.punctuation):
            symbol=' '+symbol+' '
        else:
            symbol=symbol
        new_word+=symbol
    return new_word
    

def preprocess_text(Text,punct=False,figures=False):
    result=[]
    for sentense in Text:
        string=(sentense.lower())
        string = " ".join([surround_non_symbols(word) for word in string.split()])
        clear_sentence=" ".join(string.split())
        if punct==True:
            clear_sentence=clear_sentence.translate(translator)
        if figures==True:
            clear_sentence=re.sub(r'\d+', '', clear_sentence)
        result.append(clear_sentence)
    return result

def tokenization(data):
    data_tok =[line.split() for line in tqdm(data)]
    return data_tok

def vocab_creator(data):
  vocab=set()
  for sentence in data:
    vocab.update(sentence)
  return vocab

def load_embeddings(emb_path, vocab):
    clf_embeddings = {}
    emb_vocab = set()
    for line in open(emb_path):
        line = line.strip('\n').split()
        word, emb = line[0], line[1:]
        emb = [float(e) for e in emb]
        if word in vocab:
            clf_embeddings[word] = emb
    for w in vocab:
        if w in clf_embeddings:
            emb_vocab.add(w)
    word2idx = {w: idx for (idx, w) in enumerate(emb_vocab)}
    max_val = max(word2idx.values())
    
    word2idx['UNK'] = max_val + 1
    word2idx['EOS'] = max_val + 2
    emb_dim = len(list(clf_embeddings.values())[0])
    clf_embeddings['UNK'] = [0.0 for i in range(emb_dim)]
    clf_embeddings['EOS'] = [0.0 for i in range(emb_dim)]
    
    embeddings = [[] for i in range(len(word2idx))]
    for w in word2idx:
        embeddings[word2idx[w]] = clf_embeddings[w]
    embeddings = torch.Tensor(embeddings)
    return embeddings, word2idx

def to_matrix(lines, vocab, max_len=None, dtype='int32'):
    """Casts a list of lines into a matrix"""
    pad = vocab['EOS']
    max_len = max_len or max(map(len, lines))
    lines_ix = np.zeros([len(lines), max_len], dtype) + pad
    for i in range(len(lines)):
        line_ix = [vocab.get(l, vocab['UNK']) for l in lines[i]]
        lines_ix[i, :len(line_ix)] = line_ix
    lines_ix = torch.LongTensor(lines_ix)
    return lines_ix

def reorganize_labels(labels:pd.Series):
  lab=[]
  for ly in labels:
    if ly==0:
      lab.append([1,0])
    else:
      lab.append([0,1])
  return lab

def data_train(tokens:list,labels:list,vocab:dict):
  data=[]
  for idx, (t, l) in enumerate(zip(tokens, labels)):
    t = to_matrix([t], vocab)
    l = torch.Tensor([l])
    data.append((t,l))
  return data
def data_test(tokens:list,vocab:dict):
  data=[]
  for idx, t in enumerate(tokens):
    t = to_matrix([t], vocab)
    
    data.append(t)
  return data  

def binary_accuracy(preds, y):
    # y is either [0, 1] or [1, 0]
    # get the class (0 or 1)
    y = torch.argmax(y, dim=1)
    
    # get the predicted class
    preds = torch.argmax(torch.sigmoid(preds), dim=1)
    
    correct = (preds == y).float() 
    acc = correct.sum() / len(correct)
    return acc

In [0]:
class BiLSTM(nn.Module):
    def __init__(self, embeddings, hidden_dim=128, lstm_layer=1, output=2):
        
        super(BiLSTM, self).__init__()
        self.hidden_dim = hidden_dim
        
        # load pre-trained embeddings
        self.embedding = nn.Embedding.from_pretrained(embeddings)
        # embeddings are not fine-tuned
        self.embedding.weight.requires_grad = False
        
        # RNN layer with LSTM cells
        self.lstm = nn.LSTM(input_size=self.embedding.embedding_dim,
                            hidden_size=hidden_dim,
                            num_layers=lstm_layer, 
                            bidirectional=True)
        # dense layer


        self.output = nn.Linear(hidden_dim*2, output)
    
    def forward(self, sents):
        x = self.embedding(sents)
        
        # the original dimensions of torch LSTM's output are: (seq_len, batch, num_directions * hidden_size)
        lstm_out, _ = self.lstm(x)
        
        # reshape to get the tensor of dimensions (seq_len, batch, num_directions, hidden_size)
        lstm_out = lstm_out.view(x.shape[0], -1, 2, self.hidden_dim)#.squeeze(1)
        
        # lstm_out[:, :, 0, :] -- output of the forward LSTM
        # lstm_out[:, :, 1, :] -- output of the backward LSTM
        # we take the last hidden state of the forward LSTM and the first hidden state of the backward LSTM
        dense_input = torch.cat((lstm_out[-1, :, 0, :], lstm_out[0, :, 1, :]), dim=1)
        
        #hidden = self.linear(dense_input)
        y=self.output(dense_input).view([1, 2])
 
        return y

In [0]:
def train_epoch(model, train_data, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    # set the model to the training mode
    model.train(mode=True)
    
    for t, l in train_data:
        # reshape the data to n_words x batch_size (here batch_size=1)
        t = t.view((-1, 1))
        # transfer the data to GPU to make it accessible for the model and the loss
        t = t.to(device)
        l = l.to(device)
        
        # set all gradients to zero
        optimizer.zero_grad()
        
        # forward pass of training
        # compute predictions with current parameters
        predictions = model(t)
        # compute the loss
        loss = criterion(predictions, l)
        # compute the accuracy (this is only for report)
        acc = binary_accuracy(predictions, l)
        
        # backward pass (fully handled by pytorch)
        loss.backward()
        # update all parameters according to their gradients
        optimizer.step()
        
        # data for report
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(train_data), epoch_acc / len(train_data)

In [0]:
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

In [0]:
device = torch.device('cuda')
def train(texts,labels):
  train_labels=labels[0].replace(['neg','pos'],[0,1])
  # train_labels= np.array([1 if l == 'pos' else 0 for l in labels ])
  preprocessed_train = preprocess_text(texts,True,False)   
  train_tokens=tokenization(preprocessed_train)
  voc=vocab_creator(train_tokens)
  embeddings, vocab = load_embeddings('glove.6B.300d.txt', voc)


  train_labels=reorganize_labels(train_labels)
  train_data=data_train(train_tokens,train_labels,vocab)
  hidden_dim = 128
  layers = 1

  model = BiLSTM(embeddings, hidden_dim, lstm_layer=layers)
  optimizer = optim.Adam(model.parameters(), lr=1e-3)
  criterion = nn.BCEWithLogitsLoss()

  
  model = model.to(device)
  #model.load_state_dict(torch.load('tut1-model.pt', map_location=device))
  criterion = criterion.to(device)
  # train_accs=[]
  # valid_accs=[]
  for epoch in tqdm(range(3)):
    start_time = time()
    
    train_loss, train_acc = train_epoch(model, train_data, optimizer, criterion)
    print('train accuracy on  ',epoch,' epoch ',train_acc,' loss ',train_loss)

  return [model,vocab]


In [0]:
params=train(train_text,train_labels)


  0%|          | 0/15000 [00:00<?, ?it/s][A
 29%|██▉       | 4394/15000 [00:00<00:00, 43930.09it/s][A
 56%|█████▋    | 8465/15000 [00:00<00:00, 40306.34it/s][A
 88%|████████▊ | 13171/15000 [00:00<00:00, 42118.84it/s][A
100%|██████████| 15000/15000 [00:00<00:00, 41182.19it/s][A
  0%|          | 0/3 [00:00<?, ?it/s][A
 33%|███▎      | 1/3 [03:36<07:12, 216.42s/it][A

train accuracy on   0  epoch  0.799  loss  0.41612703055806244



 67%|██████▋   | 2/3 [07:17<03:37, 217.85s/it][A

train accuracy on   1  epoch  0.8968  loss  0.2526468457708976



100%|██████████| 3/3 [10:57<00:00, 218.34s/it][A
[A

train accuracy on   2  epoch  0.9407333333333333  loss  0.16090909712055662


In [0]:
def classify(text,params):
  
  preprocessed_train = preprocess_text(text,True,False)   
  train_tokens=tokenization(preprocessed_train)
  vocab=params[1]
  train_data=data_test(train_tokens,vocab)
  model=params[0]
  model.eval()
  predicts=[]

  with torch.no_grad():
    for t in train_data:
      t = t.view((-1, 1))
      t = t.to(device)
      predictions = model(t)
      pred=torch.argmax(torch.sigmoid(predictions), dim=1)
      predicts.append(int(pred.detach().cpu().numpy()))
  y_predict=np.array(['pos' if l == 1 else 'neg' for l in predicts ])
  return y_predict



In [0]:
y_pred=classify(dev,params)


  0%|          | 0/10000 [00:00<?, ?it/s][A
 60%|██████    | 6024/10000 [00:00<00:00, 60238.48it/s][A
100%|██████████| 10000/10000 [00:00<00:00, 59835.46it/s][A

In [0]:
from sklearn.metrics import accuracy_score
for i in [train_text,dev,dev_b]:
tr=dev_labels[0].replace(['neg','pos'],[0,1])
accuracy_score(tr,y_pred)

0.8785

In [0]:
for i,j in zip([train_text,dev,dev_b],[train_labels,dev_labels,dev_labels_b]):
  y_pred=classify(i,params)
  lab=j[0].replace(['neg','pos'],[0,1])
  print('acc',accuracy_score(y_pred,lab))


  0%|          | 0/15000 [00:00<?, ?it/s][A
 40%|███▉      | 5994/15000 [00:00<00:00, 59936.63it/s][A
 69%|██████▉   | 10352/15000 [00:00<00:00, 53868.36it/s][A
100%|██████████| 15000/15000 [00:00<00:00, 53715.01it/s][A

acc 0.9651333333333333



  0%|          | 0/10000 [00:00<?, ?it/s][A
 55%|█████▌    | 5532/10000 [00:00<00:00, 55318.21it/s][A
100%|██████████| 10000/10000 [00:00<00:00, 57008.12it/s][A
  0%|          | 0/2000 [00:00<?, ?it/s][A
100%|██████████| 2000/2000 [00:00<00:00, 319407.84it/s][A

acc 0.8785
acc 0.737


### 2.2 Use LSTM and ELMo for text classification

Use ``allennlp`` and the model ``elmo_2x2048_256_2048cnn_1xhighway_weights`` which is the model used in week5 seminar to build a text classification system. The only difference from the previous point is the use of ELMo contextualized word embeddings. Do not use any additional dependencies or versions of the ELMo model. Make sure that the model is located in the same directory with the classification Python script.

In [5]:
!pip install allennlp



In [0]:
from allennlp.commands.elmo import ElmoEmbedder
ELMO_OPTIONS = "elmo_2x2048_256_2048cnn_1xhighway_options.json"
ELMO_WEIGHT = "elmo_2x2048_256_2048cnn_1xhighway_weights.hdf5"


In [0]:
class elmo_LSTM(nn.Module):
    def __init__(self, embedding_dim, hidden_dim=128, lstm_layer=1, output=1):
        
        super(elmo_LSTM, self).__init__()
        self.hidden_dim = hidden_dim
        # self.embedding_dim=embedding_dim
        # self.batch_size=batch_size
        self.lstm_layer=lstm_layer
        # self.embedding = elmo

        
        # RNN layer with LSTM cells
        self.lstm = nn.LSTM(input_size=embedding_dim,
                            hidden_size=hidden_dim,
                            num_layers=lstm_layer, 
                            bidirectional=True)
        # dense layer
        self.output = nn.Linear(hidden_dim*2, hidden_dim)
        self.output1=nn.Linear(hidden_dim, output)

    
    def forward(self, x):
        
        # x = self.embedding(sents)["elmo_representations"][0]
        lstm_out, _ = self.lstm(x)
        

        lstm_out = lstm_out.view(x.shape[1], -1, 2, self.hidden_dim)
        

        # dense_input = torch.cat((lstm_out[-1, :, 0, :], lstm_out[0, :, 1, :]), dim=1)
        dense_input = torch.cat((lstm_out[:,-1,0,:], lstm_out[:,0,1,:]), dim=1)

        y=self.output(dense_input)
        y=self.output1(y)
        return y

In [0]:
from allennlp.modules.elmo import Elmo, batch_to_ids
elmo = Elmo(ELMO_OPTIONS, ELMO_WEIGHT, num_output_representations = 1).cuda()
hidden_dim = 128
layers = 1
model = elmo_LSTM(embedding_dim=512, hidden_dim=hidden_dim,lstm_layer=layers)
optimizer = optim.Adam(model.parameters(), lr=1e-3,weight_decay=4e-4)
criterion = nn.BCEWithLogitsLoss()
device = torch.device('cuda')
model = model.to(device)
#model.load_state_dict(torch.load('tut1-model.pt', map_location=device))
criterion = criterion.to(device)

In [74]:
preprocessed_train = preprocess_text(train_text,True,False)   
train_tokens=tokenization(preprocessed_train)
tr_lab=train_labels[0].replace(['neg','pos'],[0,1])
# tr_lab=reorganize_labels(tr_lab)

100%|██████████| 15000/15000 [00:00<00:00, 31361.24it/s]


In [0]:
from torch.utils.data import Dataset, DataLoader

In [80]:
preprocessed_train = preprocess_text(dev_b,True,False)   
train_tokens=tokenization(preprocessed_train)
tr_lab=dev_labels_b[0].replace(['neg','pos'],[0,1])
# tr_lab=reorganize_labels(tr_lab)

100%|██████████| 2000/2000 [00:00<00:00, 397376.03it/s]


In [0]:
from allennlp.modules.elmo import Elmo, batch_to_ids
from tqdm import tqdm_notebook
def train(text,labels):
  elmo = Elmo(ELMO_OPTIONS, ELMO_WEIGHT, num_output_representations = 1).cuda()
  hidden_dim = 128
  layers = 1
  model = elmo_LSTM(embedding_dim=512, hidden_dim=hidden_dim,lstm_layer=layers)
  optimizer = optim.Adam(model.parameters(), lr=1e-3,weight_decay=4e-4)
  criterion = nn.BCEWithLogitsLoss()
  device = torch.device('cuda')
  model = model.to(device)
  #model.load_state_dict(torch.load('tut1-model.pt', map_location=device))
  criterion = criterion.to(device)
  preprocessed_train = preprocess_text(text,True,False) 

  train_tokens=tokenization(preprocessed_train)
  tr_lab=labels[0].replace(['neg','pos'],[0,1])
  data_set=[[train_tokens[i], torch.Tensor([tr_lab[i]])] for i in range(len(train_tokens))]
  train_loader=DataLoader(data_set,batch_size=128)
  sigmoid=nn.Sigmoid()
  accuracy=[]
  losses=[]
  for epoch in range(50):
    epoch_accuracy=[]
    epoch_loss=[]
    model.train(True)
    for x,y in train_loader:
      optimizer.zero_grad()
      x = batch_to_ids(x).cuda()
      X=elmo(x)['elmo_representations'][0]
      y=y.cuda()
      predictions=model(X)
      y_pred=sigmoid(predictions).detach().cpu().numpy().round()
      loss = criterion(predictions, y)
      acc = accuracy_score(y_pred,y.detach().cpu().numpy())
      loss.backward()
      optimizer.step()

      epoch_accuracy.append(acc)
      epoch_loss.append(loss.detach().cpu().numpy())
    model.train(False)
    accuracy.append(np.mean(epoch_accuracy))
    losses.append(np.mean(epoch_loss))
    print(accuracy[-1])
  return model


In [36]:
params=train(train_text,train_labels)

100%|██████████| 15000/15000 [00:00<00:00, 44219.83it/s]


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.4991172316384181


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5114097810734463


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5126897951977402


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5267258121468926


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.529153425141243


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.534516242937853


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5432777189265536


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5487950211864406


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5553054378531074


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5609992937853108


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5727180437853108


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5717249293785311


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5709745762711864


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.580905720338983


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5864009533898306


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.584569209039548


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.5994217867231638


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6054466807909604


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6143405720338984


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6201889124293786


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6274717514124294


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6272069209039548


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6314221398305084


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6373146186440678


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6473781779661016


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.647510593220339


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6469588629943502


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6611052259887006


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6592514124293786


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.667770127118644


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6684763418079096


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6771716101694916


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6776129943502824


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.688184145480226


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.681784074858757


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6892876059322034


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.68652895480226


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6934807556497176


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.7006973870056498


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.6963056144067796


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.7034560381355932


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.7010504943502824


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.7025512005649718


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.709127824858757


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.7120409604519774


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.7113347457627118


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.7155058262711864


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.7161237641242938


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.7179996468926553


HBox(children=(IntProgress(value=0, max=118), HTML(value='')))


0.7202065677966102


In [0]:
def classify(text,model):
  elmo = Elmo(ELMO_OPTIONS, ELMO_WEIGHT, num_output_representations = 1).cuda()
  preprocessed_train = preprocess_text(text,True,False) 
  train_tokens=tokenization(preprocessed_train)
  data_set=[train_tokens[i] for i in range(len(train_tokens))]
  train_loader=DataLoader(data_set,batch_size=128)
  sigmoid=nn.Sigmoid()
  y_preds=[]
  model.eval()
  for x in train_loader:
    x = batch_to_ids(x).cuda()
    X=elmo(x)['elmo_representations'][0]
    preds=model(X)
    y_pred=sigmoid(preds).detach().cpu().numpy().round()
    y_preds.append(y_pred)
  y_preds=np.concatenate(y_preds)
  return y_preds





In [42]:
y_t=classify(train_text,params)
y_d=classify(dev,params)
y_d_b=classify(dev_b,params)

100%|██████████| 15000/15000 [00:00<00:00, 42348.83it/s]
100%|██████████| 10000/10000 [00:00<00:00, 52580.17it/s]
100%|██████████| 2000/2000 [00:00<00:00, 335101.99it/s]


In [43]:
dlb


0       1
1       0
2       1
3       0
4       0
       ..
1995    1
1996    0
1997    0
1998    0
1999    0
Name: 0, Length: 2000, dtype: int64

In [44]:
tl=train_labels[0].replace(['neg','pos'],[0,1])
dl=dev_labels[0].replace(['neg','pos'],[0,1])
dlb=dev_labels_b[0].replace(['neg','pos'],[0,1])
print('train_acc',accuracy_score(y_t,tl))
print('dev_acc',accuracy_score(y_d,dl))
print('dev_b_acc',accuracy_score(y_d_b,dlb))

train_acc 0.7206
dev_acc 0.5013
dev_b_acc 0.5165


## 3. Research part

### 3.1 Different types of embeddings
Compare performance of [GloVe](http://nlp.stanford.edu/data/wordvecs/glove.6B.zip), [word2vec](https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing) models to the model which has randomly initialized embedding layer (no pre-traied embeddings are used). Plot the results depending on the type of used embeddings. 

### 3.2 Impact of hyper-parameter choice

Try different numbers of hidden layers, LSTM cells used in each layers, learning rates, and other meta-parameters. Present plots which demonstrate performance of the model depending of values of these meta-parameters. Does bi-directional LSTM works better than uni-directioanl LSTM for this task? 