## Emojify!
Welcome to the third and last programming assignment! You are going to use word vector representations to build an Emojifier.

You will implement a model which inputs a sentence (such as "Let's go see the baseball game tonight!") and finds the most appropriate emoji to be used with this sentence (⚾️).

By using word vectors, you'll see that even if your training set explicitly relates only a few words to a particular emoji, your algorithm will be able to generalize and associate words in the test set to the same emoji even if those words don't even appear in the training set. This allows you to build an accurate classifier mapping from sentences to emojis, even using a small training set.

## Packages

Let's first import all the packages that you will need during this part of assignment.

Feel free to use other libraries if you want to.

If you don't have emoji or other libraries, write "pip install emoji" command in one of the code cells in the notebook. 

In [27]:
import numpy as np
import pandas as pd
import torch as torch
import torch.nn as nn
from torchvision import models
from torch.autograd import Variable
import matplotlib.pyplot as plt
import emoji
import os
import matplotlib.pyplot as plt

## Import and visualize the data

In this part you need to:
1. Import the train and test data
2. Seperate the sentences (in the first column) and the index of the emoji (in the second column).
3. Convert the Y value of every sentence from emoji index (0-4) to one hot encoding. 2 --> [0,0,1,0,0]
4. Print 10 sentances from training data and visualize their matching emojies using the label_to_emoji() help function. Print also the one-hot-encoding representation of these sentences.

In [28]:
### START CODE HERE ###

# import data
train = pd.read_csv('train_emoji.csv', names=["sentence", "emoji"])
test = pd.read_csv('test_emoji.csv', names=["sentence", "emoji"])

In [29]:
train.head(4)

Unnamed: 0,sentence,emoji
0,French macaroon is so tasty,4
1,work is horrible,3
2,I am upset,3
3,throw the ball,1


In [30]:
test.head(4)

Unnamed: 0,sentence,emoji
0,I want to eat\t,4
1,he did not answer\t,3
2,he got a very nice raise\t,2
3,she got me a nice present\t,2


In [31]:
# separate x,y,hot_encoding for test and train

train_x = train.loc[:,'sentence'].to_numpy()
train_y = train.loc[:,'emoji'].to_numpy()

test_x = test.loc[:,'sentence'].to_numpy()
test_y = test.loc[:,'emoji'].to_numpy()

In [32]:
# one hot encoding
def one_hot_encoding(data,size):
    data = np.eye(size)[data.reshape(-1)]
    return data

train_hot = one_hot_encoding(train_y, 5)
test_hot = one_hot_encoding(test_y, 5)

In [33]:
emoji_dictionary = {"0": "\u2764\uFE0F",   
                    "1": ":baseball:",
                    "2": ":smile:",
                    "3": ":disappointed:",
                    "4": ":fork_and_knife:"}

def label_to_emoji(label):
    # Converts a label (int or string) into the corresponding emoji code (string) ready to be printed
    # debug prints: print(emoji.emojize(emoji_dictionary[str(label)], language='alias'))
    return emoji.emojize(emoji_dictionary[str(label)], language='alias')

for i,row in train[:10].iterrows():
    # debug print:
    print('%s - %s' %(row['sentence'], label_to_emoji(row['emoji'])))
    label_to_emoji(row['emoji'])

French macaroon is so tasty - 🍴
work is horrible - 😞
I am upset - 😞
throw the ball - ⚾
Good joke - 😄
what is your favorite baseball game - ⚾
I cooked meat - 🍴
stop messing around - 😞
I want chinese food - 🍴
Let us go play baseball - ⚾


## Help functions for word embedding

The following functions will help you conver words and sentences to vectors and matrixes.

In [34]:
# A function that obtains vector representations for words. Each word is represented by vector with size 50.
# words_to_index is a dictionary that maps word into indexes - every word has a number. 'banana' --> 67752
# index_to_words is a dictionary that maps indexes into indexes - every index has a matching word. 344429 --> 'strawberry'

def read_glove_vecs(glove_file):
    with open(glove_file, 'r',encoding='UTF-8') as f:
        words = set()
        word_to_vec_map = {}
        for line in f:
            line = line.strip().split()
            curr_word = line[0]
            words.add(curr_word)
            word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)
        
        i = 1
        words_to_index = {}
        index_to_words = {}
        for w in sorted(words):
            words_to_index[w] = i
            index_to_words[i] = w
            i = i + 1
    return words_to_index, index_to_words, word_to_vec_map


In [35]:
# load word embeddings and create word_to_index and index_to_word dictionaries

word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('glove.6B.50d.txt')

In [36]:
# visualization: 

print(f"The vector embedding of banana is: {word_to_vec_map['banana']}")
print(f"The index of the word 'tree' is: {word_to_index['tree']}")
print(f"The word matcing the index 173081 is: {index_to_word[173081]}")

The vector embedding of banana is: [-0.25522  -0.75249  -0.86655   1.1197    0.12887   1.0121   -0.57249
 -0.36224   0.44341  -0.12211   0.073524  0.21387   0.96744  -0.068611
  0.51452  -0.053425 -0.21966   0.23012   1.043    -0.77016  -0.16753
 -1.0952    0.24837   0.20019  -0.40866  -0.48037   0.10674   0.5316
  1.111    -0.19322   1.4768   -0.51783  -0.79569   1.7971   -0.33392
 -0.14545  -1.5454    0.0135    0.10684  -0.30722  -0.54572   0.38938
  0.24659  -0.85166   0.54966   0.82679  -0.68081  -0.77864  -0.028242
 -0.82872 ]
The index of the word 'tree' is: 364528
The word matcing the index 173081 is: happy


In [37]:
# A function that translates the sentences vectors of word indexes --> I love you --> [185457,226278,394475]
# the function uses padding of the longes sentence in the train set, so I love you --> [185457,226278,394475,0,0,0,0,0,0,0]

def sentences_to_indices(X, word_to_index, max_len):
    m = X.shape[0]
    X_indices = np.zeros((m, max_len))
    for i in range(m):
        sentence_words = X[i].lower().split()
        j = 0
        for w in sentence_words:
            X_indices[i, j] = int(word_to_index[w])
            j = j + 1
    return X_indices

sentences_to_indices(np.array(['hello world']),word_to_index,4)

array([[176468., 389938.,      0.,      0.]])

In [38]:
# function that maps all the word indexes to their vectors embedding. 
# the embedding function is in shape (400000, 50) - each word is a vector in size 50.

def pretrained_embedding_layer(word_to_vec_map, word_to_index):
    vocab_len = len(word_to_index) + 1  #word index begin with 1,plus 1 for padding 0
    emb_dim = word_to_vec_map["cucumber"].shape[0] # the size of embedding of each word
    emb_matrix = np.zeros((vocab_len, emb_dim))
    for word, index in word_to_index.items():
        emb_matrix[index, :] = word_to_vec_map[word]
    return emb_matrix


## Train and test data preprocessing

The models that you will build will get the sentences as their vector representations - i.e., the sentences_to_indices() function output. 

Therefore, you need to:
* Transform the data to the right form using the above functions
* Transform the data and lables to tensors
* If needed, create train and test data loaders

In [39]:
### START CODE HERE ###


# find max len
train_max_len = train['sentence'].apply(len).max()
test_max_len = test['sentence'].apply(len).max()
max_len = max(train_max_len,test_max_len) + 10
print(max_len)

62


In [40]:
# sentences_to_indices
train_indices = sentences_to_indices(train_x, word_to_index, max_len)
test_indices = sentences_to_indices(test_x, word_to_index, max_len)

# train tensors
train_x_tensor = torch.from_numpy(train_indices)
train_y_tensor = torch.from_numpy(train_y)
train_tensor = torch.utils.data.TensorDataset(train_x_tensor, train_y_tensor)

# test tensors
test_x_tensor = torch.from_numpy(test_indices)
test_y_tensor = torch.from_numpy(test_y)
test_tensor = torch.utils.data.TensorDataset(test_x_tensor, test_y_tensor)

In [41]:
# dataloaders
train_loader = torch.utils.data.DataLoader(dataset=train_tensor, batch_size=32, shuffle=True, num_workers=2)
test_loader = torch.utils.data.DataLoader(dataset=test_tensor, batch_size=32, shuffle=True, num_workers=2)

# First model - regular neural network

Build a neural network model that gets:

1. The vocabulary size
2. Embedding dimention - the length of every embedding vector
3. Pretrained embedding weights - the embedding matrix 
                                       
and returns: 
1. 5 dimention vector with the scores of every emoji.


---

Then train the model and plot loss vs. epoch for train and test set. 

Show the results on 5 new sentences.


---

You can use the added model as your base model

In [18]:
class NN_Model(nn.Module):
    
    def __init__(self,vocab_size,embedding_dim,pretrained_weight):
        super(NN_Model,self).__init__()
        self.word_embeds = nn.Embedding(vocab_size, embedding_dim) # stores embeddings of a fixed dictionary and size
        self.word_embeds.weight.data.copy_(torch.from_numpy(pretrained_weight)) # place the pretrained weights to the embedding function
        self.layers = nn.Sequential(
            nn.Linear(embedding_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32 ,5),
            nn.Softmax(dim=1)
        )
        

    def forward(self,x):
        out = self.word_embeds(x) 
        out = out[:,-1,:]
        out = self.layers(out)
        return out


In [19]:
# prepare param
vocab_len = len(word_to_index) + 1
embedded_matrix = pretrained_embedding_layer(word_to_vec_map,word_to_index)

# create neural network model object
nn_model = NN_Model(vocab_len, 50, embedded_matrix)

# sanity check 
print(nn_model)

NN_Model(
  (word_embeds): Embedding(400001, 50)
  (layers): Sequential(
    (0): Linear(in_features=50, out_features=64, bias=True)
    (1): ReLU()
    (2): Linear(in_features=64, out_features=32, bias=True)
    (3): ReLU()
    (4): Linear(in_features=32, out_features=5, bias=True)
    (5): Softmax(dim=1)
  )
)


In [20]:
for p in nn_model.parameters():
    p.requires_grad = True

In [2]:
# define loss function
loss_function = nn.CrossEntropyLoss()

# define param + settings
opt1 = torch.optim.Adam(nn_model.parameters(), lr = 0.0005)
min_valid = np.Inf
loss_training = []
loss_validation = []
acc_train = []
acc_valid = []
epochs = 300

# train
for epoch in range(epochs):
    nn_model.train()
    loss_train = 0 
    accuracy_train = 0 
    for data,label in train_loader:
        data = data.long()
        label = label.long()
        nn_model.zero_grad()
        data = Variable(data)
        label = Variable(label)
        output = nn_model(data)
        loss = loss_function(output, label)
        loss.backward()
        opt1.step()
        loss_train += loss.item()
        label = label.cpu().numpy()
        output = output.cpu().detach().numpy()
        for i in range(len(data)):
            model_label = np.argmax(output[i])
            if model_label == label[i]:
                accuracy_train += 1
    else:
        with torch.no_grad():
            nn_model.eval()
            accuracy_test = 0 
            loss_valid = 0
            for data1,label1 in test_loader:
                data1 = Variable(data1.long())
                label1 = Variable(label1.long())
                pred = nn_model(data1)
                loss = loss_function(pred,label1)
                loss_valid += loss.item()
                label1 = label1.cpu().numpy()
                pred = pred.cpu().numpy()
                for i in range(len(data1)):
                    model_label = np.argmax(pred[i])
                    if model_label == label1[i]:
                        accuracy_test += 1
            accuracy_train = accuracy_train/len(train_x_tensor)
            accuracy_test = accuracy_test/len(test_x_tensor)
    loss_training.append(loss_train)
    acc_train.append(accuracy_train)
    loss_validation.append(loss_valid)
    acc_valid.append(accuracy_test)

    # performance prints
    if (epoch+1)%50==0:
        print('Epoch # %s' %(epoch+1))
        print('training accuracy - %s' %(accuracy_train*100.0))
        print('training loss - %s' %(loss_train))
        print('validation accuracy - %s' %(accuracy_test*100.0))
        print('validation loss - %s' %(loss_valid))
        print('')

NameError: name 'nn' is not defined

In [1]:
# plot loss function

def plot_loss(train_loss, test_loss, train_acc, test_acc, epochs):
    plt.figure(figsize=(12, 3))
    plt.plot(range(1,epochs), train_loss, color='orange', label='train')
    plt.plot(range(1,epochs), test_loss, color='blue', label='test')
    plt.title('Loss per num epochs')
    plt.xlabel('Num Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.show()

    
plot_loss(loss_training, loss_validation, acc_train, acc_valid, epochs+1)

NameError: name 'loss_training' is not defined

In [None]:
# plot accuracy function

def plot_accuracy(train_loss, test_loss, train_acc, test_acc, epochs):
    plt.figure(figsize=(12, 3))
    plt.plot(range(1,epochs), train_acc, color='orange', label='train')
    plt.plot(range(1,epochs), test_acc, color='blue', label='test')
    plt.title('Accuracy per num epochs')
    plt.xlabel('Num Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.show()

    
plot_accuracy(loss_training, loss_validation, acc_train, acc_valid, epochs+1)

In [None]:
# new setences

sentences = np.array(['I love apple', 'I love banana', 'cook me dinner', 'we went to a baseball game yesterday' , 'why so grumpy?'])

sentences_indices = sentences_to_indices(sentences, word_to_index, 10)
sentences_indices = torch.from_numpy(sentences_indices)
sentences_indices = Variable(sentences_indices.long())

prediction = nn_model(sentences_indices)

for sentence in range(len(sentences)):
    sentence_emoji = np.argmax(prediction.detach().numpy()[sentence])
    print(sentences_indices[sentence], label_to_emoji(sentence_emoji))

# Second model - neural network with RNN

Build a neural network + RNN model that gets the vocabulary size, embedding dimention and pretrained embedding weights, and returns a 5 dimention vector with the scores of every emoji.

Then train the model and plot loss vs. epoch for train and test set.

https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

---

You can use the added model as your base model

In [42]:
class RNN_Model(nn.Module):
    
    def __init__(self,vocab_size,embedding_dim,pretrained_weight):
        super(RNN_Model,self).__init__()
        self.word_embeds = nn.Embedding(vocab_size, embedding_dim) # stores embeddings of a fixed dictionary and size
        self.word_embeds.weight.data.copy_(torch.from_numpy(pretrained_weight)) # place the pretrained weights to the embedding function
        self.rnn = nn.LSTM(embedding_dim, 64, 3, batch_first=True, dropout=1/3)
        self.layers = nn.Sequential(
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 5),
            nn.Softmax(dim=1)
        )


    def forward(self,x,h):
        out = self.word_embeds(x)
        out, _ = self.rnn(out,h)
        out = out[:, -1, :]
        out = self.layers(out)
        return out


In [43]:
# prepare param
vocab_len = len(word_to_index) + 1
embedded_matrix = pretrained_embedding_layer(word_to_vec_map, word_to_index)

# create RNN model object
rnn_model = RNN_Model(vocab_len, 50, embedded_matrix)

# sanity check 
print(rnn_model)

RNN_Model(
  (word_embeds): Embedding(400001, 50)
  (rnn): LSTM(50, 64, num_layers=3, batch_first=True, dropout=0.3333333333333333)
  (layers): Sequential(
    (0): Linear(in_features=64, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=5, bias=True)
    (3): Softmax(dim=1)
  )
)


In [44]:
for p in rnn_model.parameters():
    p.requires_grad = True

In [None]:
# define loss function
loss_function = nn.CrossEntropyLoss()


# define param + settings
opt2 = torch.optim.Adam(rnn_model.parameters(), lr = 0.0005)
min_valid = np.Inf
acc_train = []
acc_valid = []
loss_training = []
loss_validation = []
num_epochs = 300


# train
for epoch in range(num_epochs):
    rnn_model.train()
    loss_train = 0 
    accuracy_train = 0 
    for data,label in train_loader:
        data = data.long()
        label = label.long()
        rnn_model.zero_grad()
        states = (Variable(torch.zeros(3, len(data), 64)), Variable(torch.zeros(3, len(data), 64)))
        data = Variable(data)
        label = Variable(label)
        output = rnn_model(data, states)
        loss = loss_function(output, label)
        loss.backward()
        opt2.step()
        loss_train +=loss.item()
        label = label.cpu().numpy()
        output = output.cpu().detach().numpy()
        for i in range(len(data)):
            model_label = np.argmax(output[i])
            if model_label == label[i]:
                accuracy_train += 1
    else:
        with torch.no_grad():
            rnn_model.eval()
            accuracy_test = 0 
            loss_valid = 0
            for data1,label1 in test_loader:
                data1 = Variable(data1.long())
                label1 = Variable(label1.long())
                states1 = (Variable(torch.zeros(3, len(data1), 64)), Variable(torch.zeros(3, len(data1), 64)))
                pred = rnn_model(data1, states1)
                loss = loss_function(pred,label1)
                loss_valid += loss.item()
                label1 = label1.cpu().numpy()
                pred = pred.cpu().numpy()
                for i in range(len(data1)):
                    model_label = np.argmax(pred[i])
                    if model_label == label1[i]:
                        accuracy_test += 1
            accuracy_train = accuracy_train/len(train_x_tensor)
            accuracy_test = accuracy_test/len(test_x_tensor)

    loss_training.append(loss_train)
    acc_train.append(accuracy_train)
    loss_validation.append(loss_valid)
    acc_valid.append(accuracy_test)

    if (epoch+1)%50==0:
        print('Epoch # %s' %(epoch+1))
        print('training accuracy - %s' %(accuracy_train*100.0))
        print('training loss - %s' %(loss_train))
        print('validation accuracy - %s' %(accuracy_test*100.0))
        print('validation loss - %s' %(loss_valid))
        print('')

Epoch # 50
training accuracy - 30.601092896174865
training loss - 9.271894216537476
validation accuracy - 32.142857142857146
validation loss - 3.0696232318878174

Epoch # 100
training accuracy - 30.601092896174865
training loss - 9.263789176940918
validation accuracy - 32.142857142857146
validation loss - 3.0795340538024902

Epoch # 150
training accuracy - 30.601092896174865
training loss - 9.2770414352417
validation accuracy - 32.142857142857146
validation loss - 3.057257652282715



In [None]:
# plot loss
plot_loss(loss_training, loss_validation, acc_train, acc_valid, epochs+1)

In [None]:
# plot accuracy
plot_accuracy(loss_training, loss_validation, acc_train, acc_valid, epochs+1)

In [None]:
# new setences

sentences = np.array(['I love apple', 'I love banana', 'cook me dinner', 'we went to a baseball game yesterday' , 'why so grumpy?'])

sentences_indices = sentences_to_indices(sentences, word_to_index, 10)
sentences_indices = torch.from_numpy(sentences_indices)
sentences_indices = Variable(sentences_indices.long())

prediction = rnn_model(sentences_indices)

for sentence in range(len(sentences)):
    sentence_emoji = np.argmax(prediction.detach().numpy()[sentence])
    print(sentences_indices[sentence], label_to_emoji(sentence_emoji))

# Third model - neural network with transformers

Build a neural network + transformer model that gets the vocabulary size, embedding dimention and pretrained embedding weights, and returns a 5 dimention vector with the scores of every emoji.

Then train the model and plot loss vs. epoch for train and test set.

https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html


---

You can use the added model as your base model

In [22]:
### START CODE HERE ###

# train tensors
train_x_tensor = torch.from_numpy(train_indices)
train_y_tensor = torch.from_numpy(train_hot)
train_tensor = torch.utils.data.TensorDataset(train_x_tensor, train_y_tensor)

# test tensors
test_x_tensor = torch.from_numpy(test_indices)
test_y_tensor = torch.from_numpy(test_hot)
test_tensor = torch.utils.data.TensorDataset(test_x_tensor, test_y_tensor)

# dataloaders
train_loader = torch.utils.data.DataLoader(dataset=train_tensor, batch_size=32, shuffle=True, num_workers=2)
test_loader = torch.utils.data.DataLoader(dataset=test_tensor, batch_size=32, shuffle=True, num_workers=2)

In [23]:
import math

class PositionalEncoding(nn.Module):

    def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):
        super().__init__()
        self.dropout = nn.Dropout(p=dropout)

        position = torch.arange(max_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
        pe = torch.zeros(max_len, 1, d_model)
        pe[:, 0, 0::2] = torch.sin(position * div_term)
        pe[:, 0, 1::2] = torch.cos(position * div_term)
        self.register_buffer('pe', pe)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Args:
            x: Tensor, shape [seq_len, batch_size, embedding_dim]
        """
        x = x + self.pe[:x.size(0)]
        return self.dropout(x)

In [24]:
class Transformer_model(nn.Module):
    
    def __init__(self,vocab_size, embedding_dim, pretrained_weight):
        super(Transformer_model,self).__init__()
        transform_dim = 100
        self.word_embeds = nn.Embedding(vocab_size, embedding_dim)
        self.word_embeds.weight.data.copy_(torch.from_numpy(pretrained_weight))
        self.emb2transformer = nn.Linear(embedding_dim, transform_dim)
        self.pos_encoding = PositionalEncoding(embedding_dim)

        self.transformer = nn.Transformer(d_model=transform_dim, nhead=5, num_encoder_layers=2, num_decoder_layers=1, dropout=1/3)
        self.attention = nn.MultiheadAttention(embed_dim=transform_dim, num_heads=5)
        self.layers = nn.Sequential(
            nn.Linear(transform_dim, 5),
            nn.Softmax(dim=1)
        )


    def forward(self,x):
        out = self.word_embeds(x)
        out += self.pos_encoding(out)
        out = self.emb2transformer(out)
        out = self.transformer(out,out)
        out = torch.sum(out, axis = 1)
        out = self.layers(out)
        return out

In [25]:
# prepare param
vocab_len = len(word_to_index) + 1
embedded_matrix = pretrained_embedding_layer(word_to_vec_map,word_to_index)

# create neural network model object
transformers_nn_model = Transformer_model(vocab_len, 50, embedded_matrix)

# sanity check 
print(transformers_nn_model)

Transformer_model(
  (word_embeds): Embedding(400001, 50)
  (emb2transformer): Linear(in_features=50, out_features=100, bias=True)
  (pos_encoding): PositionalEncoding(
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer): Transformer(
    (encoder): TransformerEncoder(
      (layers): ModuleList(
        (0): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=100, out_features=100, bias=True)
          )
          (linear1): Linear(in_features=100, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3333333333333333, inplace=False)
          (linear2): Linear(in_features=2048, out_features=100, bias=True)
          (norm1): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.3333333333333333, inplace=False)
          (dropout2): Dropout(p=0.3333333333333333, inplace=False

In [None]:
# define loss function
loss_function = nn.CrossEntropyLoss()


# define param + settings
opt3 = torch.optim.Adam(transformers_nn_model.parameters(), lr = 0.0005)
min_valid = np.Inf
acc_train = []
acc_valid = []
loss_training = []
loss_validation = []
epochs = 300


# train
for epoch in range(epochs):
    transformers_nn_model.train()
    loss_train = 0 
    accuracy_train = 0 
    for data,label in train_loader:
        data = data.long()
        # label = label.long()
        data = Variable(data)
        label = Variable(label)
        output = transformers_nn_model(data)
        opt3.zero_grad()
        loss = loss_function(output, label.float())
        loss.backward()
        opt3.step()
        loss_train += loss.item()
        label = label.cpu().numpy()
        output = output.cpu().detach().numpy()
        for i in range(len(data)):
            model_label = np.argmax(output[i])
            true_label = np.where(label[i] == 1)[0][0]
            if model_label == true_label:
              accuracy_train += 1
    else:
        with torch.no_grad():
            transformers_nn_model.eval()
            accuracy_test = 0 
            loss_valid = 0
            for data1,label1 in test_loader:
                data1 = Variable(data1.long())
                pred = transformers_nn_model(data1)
                loss = loss_function(pred,label1)
                loss_valid += loss.item()
                label1 = label1.cpu().numpy()
                pred = pred.cpu().numpy()
                for i in range(len(data)):
                    model_label = np.argmax(output[i])
                    true_label = np.where(label1[i] == 1)[0][0]
                    if model_label == true_label:
                        accuracy_test += 1

            accuracy_train = accuracy_train/len(train_x_tensor)
            accuracy_test = accuracy_test/len(test_x_tensor)
        
    loss_training.append(loss_train)
    acc_train.append(accuracy_train)
    loss_validation.append(loss_valid)
    acc_valid.append(accuracy_test)
    
    if (epoch+1)%50==0:
        print('Epoch # %s' %(epoch+1))
        print('training accuracy - %s' %(accuracy_train*100.0))
        print('training loss - %s' %(loss_train))
        print('validation accuracy - %s' %(accuracy_test*100.0))
        print('validation loss - %s' %(loss_valid))
        print('')

In [None]:
plot_loss(loss_training, loss_validation, acc_train, acc_valid, epochs+1)

In [None]:
plot_accuracy(loss_training, loss_validation, acc_train, acc_valid, epochs+1)

# Compare between the models - who had the best results? Try to explain why. 

RNN showed the best results.

RNNs fit for identifying a sentence to an emoji because they have the ability to retain information from previous input sequences. but, this ability might lead to overfitting.

A regular network can't have the ability to retain information from previous input sequences, so it shows worst results. however, it's less prone to overfitting.

transformer networks also showed bad results. This may be beacuse identifying an emoji from a sentence depends on emotion and this may not be captured by the self-attention mechanism transformer network. 