In [2]:
## imports and configuration
import numpy as np
from emo_utils import *
import emoji
import matplotlib.pyplot as plt
import torch as pt
from torch.utils.data import Dataset, DataLoader
import torch.optim as optim
import torch.nn as nn

%matplotlib inline
%load_ext autoreload
%autoreload 2
pt.set_printoptions(linewidth=200)
device = pt.device("cuda:0" if pt.cuda.is_available() else "cpu")

## 1 - Baseline model: Emojifier-V1

### 1.1 - Dataset EMOJISET

Let's start by building a simple baseline classifier. 

You have a tiny dataset (X, Y) where:
- X contains 127 sentences (strings)
- Y contains a integer label between 0 and 4 corresponding to an emoji for each sentence

<img src="images/data_set.png" style="width:700px;height:300px;">
<caption><center> **Figure 1**: EMOJISET - a classification problem with 5 classes. A few examples of sentences are given here. </center></caption>

Let's load the dataset using the code below. We split the dataset between training (127 examples) and testing (56 examples).

In [3]:
X_train, Y_train = read_csv('data/train_emoji.csv')
X_test, Y_test = read_csv('data/tesss.csv')

In [4]:
maxLen = len(max(X_train, key=len).split())

Run the following cell to print sentences from X_train and corresponding labels from Y_train. Change `index` to see different examples. Because of the font the iPython notebook uses, the heart emoji may be colored black rather than red.

In [5]:
index = 1
print(X_train[index], label_to_emoji(Y_train[index]))

I am proud of your achievements 😄


### 1.2 - Overview of the Emojifier-V1

In this part, you are going to implement a baseline model called "Emojifier-v1".  

<center>
<img src="images/image_1.png" style="width:900px;height:300px;">
<caption><center> **Figure 2**: Baseline model (Emojifier-V1).</center></caption>
</center>

The input of the model is a string corresponding to a sentence (e.g. "I love you). In the code, the output will be a probability vector of shape (1,5), that you then pass in an argmax layer to extract the index of the most likely emoji output.

To get our labels into a format suitable for training a softmax classifier, lets convert $Y$ from its current shape  current shape $(m, 1)$ into a "one-hot representation" $(m, 5)$, where each row is a one-hot vector giving the label of one example, You can do so using this next code snipper. Here, `Y_oh` stands for "Y-one-hot" in the variable names `Y_oh_train` and `Y_oh_test`: 


In [6]:
Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)

In [7]:
index = 50
print(Y_train[index], "is converted into one hot", Y_oh_train[index])

0 is converted into one hot [1. 0. 0. 0. 0.]


All the data is now ready to be fed into the Emojify-V2 model. Let's implement the model!

### 1.3 - Implementing Emojifier-V1

As shown in Figure (2), the first step is to convert an input sentence into the word vector representation, which then get averaged together. Similar to the previous exercise, we will use pretrained 50-dimensional GloVe embeddings. Run the following cell to load the `word_to_vec_map`, which contains all the vector representations.

In [8]:
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

You've loaded:
- `word_to_index`: dictionary mapping from words to their indices in the vocabulary (400,001 words, with the valid indices ranging from 0 to 400,000)
- `index_to_word`: dictionary mapping from indices to their corresponding words in the vocabulary
- `word_to_vec_map`: dictionary mapping words to their GloVe vector representation.

Run the following cell to check if it works.

In [9]:
word = "cucumber"
index = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(index) + "th word in the vocabulary is", index_to_word[index])

the index of cucumber in the vocabulary is 113317
the 289846th word in the vocabulary is potatos


**Exercise**: Implement `sentence_to_avg()`. You will need to carry out two steps:
1. Convert every sentence to lower-case, then split the sentence into a list of words. `X.lower()` and `X.split()` might be useful. 
2. For each word in the sentence, access its GloVe representation. Then, average all these values.

In [10]:
def sentence_to_avg(sentence, word_to_vec_map):
    words = sentence.lower().split()
    avg = pt.zeros(50, dtype=pt.float32)
    for w in words:
        avg += pt.tensor(word_to_vec_map[w], dtype=pt.float32)
    avg = avg/len(words)

    return avg

In [11]:
class Emo_Dataset(Dataset):
    def __init__(self, X, Y, word_to_vec_map):
        self.word_to_vec_map = word_to_vec_map
        self.X = X
        self.Y = Y
        super().__init__()
    
    def __getitem__(self, index):
        x = sentence_to_avg(self.X[index], word_to_vec_map)
        y = self.Y[index]
        return x, y
    
    def __len__(self):
        return self.X.shape[0]

In [12]:
trn_ds = Emo_Dataset(X_train, Y_train, word_to_vec_map)
trn_dl = DataLoader(trn_ds, batch_size=1, shuffle=True)
test_ds = Emo_Dataset(X_test, Y_test, word_to_vec_map)
test_dl = DataLoader(test_ds, batch_size=1, shuffle=True)

#### Model

You now have all the pieces to finish implementing the `model()` function.

**Exercise**: Implement the `model()` function described in Figure (2). Assuming here that $Yoh$ ("Y one hot") is the one-hot encoding of the output labels, the equations you need to implement in the forward pass and to compute the cross-entropy cost are:
$$ z^{(i)} = W . avg^{(i)} + b$$
$$ a^{(i)} = softmax(z^{(i)})$$
$$ \mathcal{L}^{(i)} = - \sum_{k = 0}^{n_y - 1} Yoh^{(i)}_k * log(a^{(i)}_k)$$

It is possible to come up with a more efficient vectorized implementation. But since we are using a for-loop to convert the sentences one at a time into the avg^{(i)} representation anyway, let's not bother this time. 

In [13]:
model = nn.Linear(in_features=50, out_features=5).to(device)
nn.init.xavier_uniform_(model.weight)  # Initialize parameters using Xavier initialization
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

In [14]:
def compute_accuracy(model, dl):
    model.eval()
    num_examples = len(dl) #length of the data loader is equal to the number of examples since it has a batch size of 1
    correct_preds = 0
    with pt.no_grad():
        for x, y in dl:
            x, y = x.to(device), y.to(device)
            z = model(x)
            pred_cls = pt.softmax(z, dim=-1).argmax()
            correct_preds += (y == pred_cls).item()
    model.train()
    return correct_preds / num_examples

In [15]:
def train(model, loss_fn, optimizer, trn_dl, num_epochs=400):
    n_y = 5         #number of classes
    n_h = 50        #dimensions of the GloVe vectors
    model.train()
    for e in range(num_epochs):
        for x, y in trn_dl:
            x, y = x.to(device), y.to(device)
            z = model(x)
            optimizer.zero_grad()
            loss = loss_fn(z, y)
            loss.backward()
            optimizer.step()
        if e % 100 == 0:
            print(f'Epoch: {e} --- cost = {loss}')
            accuracy = compute_accuracy(model, trn_dl)
            print(f"Accuracy: {accuracy}")

In [16]:
train(model, loss_fn, optimizer, trn_dl)

Epoch: 0 --- cost = 1.6938440799713135
Accuracy: 0.3106060606060606
Epoch: 100 --- cost = 0.20274090766906738
Accuracy: 0.9318181818181818
Epoch: 200 --- cost = 0.12577247619628906
Accuracy: 0.9545454545454546
Epoch: 300 --- cost = 0.025539398193359375
Accuracy: 0.9696969696969697


Great! Your model has pretty high accuracy on the training set. Lets now see how it does on the test set.

In [17]:
print("Training set accuracy:", compute_accuracy(model, trn_dl))
print("Test set accuracy:", compute_accuracy(model, test_dl))

Training set accuracy: 0.9696969696969697
Test set accuracy: 0.8571428571428571


Random guessing would have had 20% accuracy given that there are 5 classes. This is pretty good performance after training on only 127 examples. 

In the training set, the algorithm saw the sentence "*I love you*" with the label ❤️. You can check however that the word "adore" does not appear in the training set. Nonetheless, lets see what happens if you write "*I adore you*."



In [90]:
def test_custom_sentences(model, sentences, labels, word_to_vec_map):
    model.eval()
    num_examples = len(sentences)
    correct_preds = 0
    y = pt.tensor(labels).to(device)
    with pt.no_grad():
        for (sentence, label) in zip(sentences, y):
            x = sentence_to_avg(sentence, word_to_vec_map).to(device)
            z = model(x)
            pred_cls = pt.softmax(z, dim=-1).argmax()
            correct_preds += (label == pred_cls).item()
            print(sentence, label_to_emoji(pred_cls.item()))
    
    
    print('\nAccuracy:', correct_preds/num_examples)

In [106]:
X_my_sentences = ["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"]
Y_my_labels = [0, 0, 2, 1, 4, 3]

In [109]:
test_custom_sentences(model, X_my_sentences, Y_my_labels, word_to_vec_map)

i adore you ❤️
i love you ❤️
funny lol 😄
lets play with a ball ⚾
food is ready 🍴
not feeling happy 😞

Accuracy: 1.0
