# Lab 6 - Sarcasm Detection in Tweets

## Overview
In this lab, we are going to look into the practical part of figurative language processing. We will employ what we already learnt in the lecture and in previous labs and build on top of that to implement a computational model for sarcasm detection. We are going to employ the SEMEVAL 2016 dataset (https://aclanthology.org/2022.semeval-1.111/). We will focus on:

* Data loading and preprocessing
* Data analysis
* Model development using machine learning and neural networks
* Model Evaluation


## Dataset

The English training dataset consists of around 4,335 tweets that are divided into sarcastic and non-sarcastic tweets annotated mannually by human annotators (English native speakers).
The data can be downloaded from this link: https://sites.google.com/view/semeval2022-isarcasmeval#h.t53li2ejhrh8

More information about the dataset prepration can be found in the paper: https://aclanthology.org/2022.semeval-1.111.pdf

**Disclaimer:** Since this is a sarcastic content you might find some offensive or rude tweets. The lecturers and TAs assume no responsiblity for such content.  

## Exercise 1: Data Loading and Preprocessing

In [None]:
# Download dataset (train and test sets)

!wget https://raw.githubusercontent.com/iabufarha/iSarcasmEval/main/train/train.En.csv
!wget https://raw.githubusercontent.com/iabufarha/iSarcasmEval/main/test/task_A_En_test.csv

--2023-02-23 20:09:16--  https://raw.githubusercontent.com/iabufarha/iSarcasmEval/main/train/train.En.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 491395 (480K) [text/plain]
Saving to: ‘train.En.csv’


2023-02-23 20:09:16 (11.7 MB/s) - ‘train.En.csv’ saved [491395/491395]

--2023-02-23 20:09:17--  https://raw.githubusercontent.com/iabufarha/iSarcasmEval/main/test/task_A_En_test.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 130890 (128K) [text/plain]
Saving to: ‘task_A_En_test.csv’


2023-02-23 20:09:17 (5.14 MB/s) - ‘

In [None]:
!ls

sample_data  task_A_En_test.csv  train.En.csv



**EX 1.1:** Load the dataset from the CSV file and select the tweet id, tweet and the label. In this dataset it is the first 3 columns named "_,tweet,sarcastic".

In [None]:
# Read train dataset
import pandas as pd
df_train = pd.read_csv('train.En.csv')
df_train.head()

Unnamed: 0.1,Unnamed: 0,tweet,sarcastic,rephrase,sarcasm,irony,satire,understatement,overstatement,rhetorical_question
0,0,The only thing I got from college is a caffein...,1,"College is really difficult, expensive, tiring...",0.0,1.0,0.0,0.0,0.0,0.0
1,1,I love it when professors draw a big question ...,1,I do not like when professors don’t write out ...,1.0,0.0,0.0,0.0,0.0,0.0
2,2,Remember the hundred emails from companies whe...,1,"I, at the bare minimum, wish companies actuall...",0.0,1.0,0.0,0.0,0.0,0.0
3,3,Today my pop-pop told me I was not “forced” to...,1,"Today my pop-pop told me I was not ""forced"" to...",1.0,0.0,0.0,0.0,0.0,0.0
4,4,@VolphanCarol @littlewhitty @mysticalmanatee I...,1,I would say Ted Cruz is an asshole and doesn’t...,1.0,0.0,0.0,0.0,0.0,0.0


In [None]:
# Keep the first three columns (id, tweet and sarcastic) and drop others
df_train = df_train.drop(df_train.columns[3:], axis=1)

# Remove empty rows ('nan' value)
df_train = df_train.dropna()
df_train = df_train.reset_index(drop=True)

df_train.head()

Unnamed: 0.1,Unnamed: 0,tweet,sarcastic
0,0,The only thing I got from college is a caffein...,1
1,1,I love it when professors draw a big question ...,1
2,2,Remember the hundred emails from companies whe...,1
3,3,Today my pop-pop told me I was not “forced” to...,1
4,4,@VolphanCarol @littlewhitty @mysticalmanatee I...,1


In [None]:
df_test = pd.read_csv('task_A_En_test.csv')
df_test = df_test.dropna()
df_test = df_test.reset_index(drop=True)
df_test.head()

Unnamed: 0,text,sarcastic
0,"Size on the the Toulouse team, That pack is mo...",0
1,Pinball!,0
2,So the Scottish Government want people to get ...,1
3,villainous pro tip : change the device name on...,0
4,I would date any of these men 🥺,0


**Ex 1.2:** Pre-process the text as follows: convert hashtags to text, remove usernames, remove urls, convert emojis into textual descriptions. What other pre-processing steps can be implemented? Speculate how they will affect the model performance. 

For this exercise, we use *re* library to use regular expression. A regular expression specifies a set of strings that matches it.
You can use this [link](https://regexr.com/) to find your regular experssion.

In [None]:
!pip install emoji >> NULL
import emoji
import re

# https://regexr.com/

# lowercase
def lowercase(text):
  return text.lower()

# Convert hashtages to text
def convert_hashtage(text):
  return re.sub(r'#','',text)

# Remove usernames
def remove_username(text):
  return re.sub(r'@\S+','',text)

# Remove URLs
def remove_url(text):
  return re.sub(r'https?:\/\/\S+', '', text)

# Convert emojies
def convert_emoji(text):
  return emoji.demojize(text)

def text_preprocessing_pipeline(text):
  text = lowercase(text)
  text = convert_hashtage(text)
  text = remove_username(text)
  text = remove_url(text)
  text = convert_emoji(text)
  return text

In [None]:
df_train['clean_tweet'] = df_train['tweet'].apply(text_preprocessing_pipeline)
df_test['clean_tweet'] = df_test["text"].apply(text_preprocessing_pipeline)
df_train.head()

Unnamed: 0.1,Unnamed: 0,tweet,sarcastic,clean_tweet
0,0,The only thing I got from college is a caffein...,1,the only thing i got from college is a caffein...
1,1,I love it when professors draw a big question ...,1,i love it when professors draw a big question ...
2,2,Remember the hundred emails from companies whe...,1,remember the hundred emails from companies whe...
3,3,Today my pop-pop told me I was not “forced” to...,1,today my pop-pop told me i was not “forced” to...
4,4,@VolphanCarol @littlewhitty @mysticalmanatee I...,1,"i did too, and i also reported cancun cruz ..."


## Exercise 2: Data Analysis


**Ex 2.1:** Analyse the sarcastic to non-sarcastic words by:

* splitting the dataset using the gold labels into sets of sarcastic and non-sarcastic sentences;
* generate the most frequent 25 Noun Phrases for each set.

In [None]:
train_sarc = []
train_non_sarc = []
for index, tweet in enumerate(df_train["clean_tweet"]):
  if df_train["sarcastic"][index] == 1:
    train_sarc.append(tweet)
  else:
    train_non_sarc.append(tweet)
print(train_non_sarc[:10])
print(train_sarc[:10])

["i always think going braless is a good idea until i'm in public and am insecure because i'm not wearing a bra", 'life is so much better with a heating blanket', 'sometimes i just go through my phone and look at pictures of my dog', 'was not back in the states for even 5 minutes before someone ran into me at the airport with their suitcase and said “ope, sorry!” \n\ni’m home :-)', 'in desperate need of (and i can not stress this enough) spring break', "i've said it before and i'll say it again but your mental health is so much more important than a good grade", 'i couldn’t have imagined how much fun i would have with people i’ve met through streaming. i’m so thankful to live in this timeline even if it is relatively garbage.', 'woke up to my dog sneezing on my face. how’s your day going so far?', 'does the salad offset the beer(s)?', 'why do i have a doctorate and miss the restaurant industry so much']
['the only thing i got from college is a caffeine addiction', 'i love it when profe

In this exercise, we use *spacy* library to extrcat noun phreas from sarcastic and non-sarcastic sets

In [None]:
import spacy
nlp = spacy.load("en_core_web_sm")
dict_noun_sarc = {}

for review in nlp.pipe(train_sarc):
  chunks = [(chunk.root.text) for chunk in review.noun_chunks if chunk.root.pos_ == 'NOUN']
  for term in chunks:
    if dict_noun_sarc.get(term) is None:
      dict_noun_sarc[term] = 1
    else:
      dict_noun_sarc[term] +=1
dict_noun_sarc = dict(sorted(dict_noun_sarc.items(), key=lambda x:x[1], reverse=True))

dict_noun_non_sarc = {}
for review in nlp.pipe(train_non_sarc):
  chunks = [(chunk.root.text) for chunk in review.noun_chunks if chunk.root.pos_ == 'NOUN']
  for term in chunks:
    if dict_noun_non_sarc.get(term) is None:
      dict_noun_non_sarc[term] = 1
    else:
      dict_noun_non_sarc[term] +=1
dict_noun_non_sarc = dict(sorted(dict_noun_non_sarc.items(), key=lambda x:x[1], reverse=True))



In [None]:
print(list(dict_noun_sarc.keys())[:25])
print(list(dict_noun_non_sarc.keys())[:25])

['people', 'day', 'time', 'life', 'thing', 'man', 'work', 'thanks', 'friends', 'things', 'game', 'season', 'fun', 'money', 'family', 'way', 'love', 'am', 'dog', 'shit', 'year', 'days', 'house', 'school', 'face']
['people', 'time', 'life', 'day', 'thing', 'things', 'friends', 'school', 'work', 'year', 'way', 'love', 'years', 'money', 'job', 'person', 'girl', 'man', 'days', 'world', 'one', 'shit', 'phone', 'dog', 'game']


**Ex 2.2:** Do you see any patterns that diffrentiate sarcastic from non-sarcastic language?

**Ex 2.3:** How many sarcastic to non-sarcastic sentences are there in the labelled training data? How will the label distribution affect the ability of the model to predict sarcastic language?

In [None]:
print("Number of sarcastic sentences in the labelled training data:", len(train_sarc))
print("Number of non-sarcastic sentences in the labelled training data:", len(train_non_sarc))

Number of sarcastic sentences in the labelled training data: 867
Number of non-sarcastic sentences in the labelled training data: 2600


## Exercise 3: Classification Model Design

We will focus here on sub-task A in SemEval 2016 Task 6 on Sarcasm Detection. We can formulate this as a sentence-level classification task. 

**Ex 3.1:** Implement two different ML-based classifiers (Random Forest, SVM) to classify a given sentence to sarcastic or non-sarcastic using the training data. Generate and save predictions on the test set for each model.

In [None]:
df_train

Unnamed: 0.1,Unnamed: 0,tweet,sarcastic,clean_tweet
0,0,The only thing I got from college is a caffein...,1,the only thing i got from college is a caffein...
1,1,I love it when professors draw a big question ...,1,i love it when professors draw a big question ...
2,2,Remember the hundred emails from companies whe...,1,remember the hundred emails from companies whe...
3,3,Today my pop-pop told me I was not “forced” to...,1,today my pop-pop told me i was not “forced” to...
4,4,@VolphanCarol @littlewhitty @mysticalmanatee I...,1,"i did too, and i also reported cancun cruz ..."
...,...,...,...,...
3462,3463,The population spike in Chicago in 9 months is...,0,the population spike in chicago in 9 months is...
3463,3464,You'd think in the second to last English clas...,0,you'd think in the second to last english clas...
3464,3465,I’m finally surfacing after a holiday to Scotl...,0,i’m finally surfacing after a holiday to scotl...
3465,3466,Couldn't be prouder today. Well done to every ...,0,couldn't be prouder today. well done to every ...


In [None]:
df_test

Unnamed: 0,text,sarcastic
0,"Size on the the Toulouse team, That pack is mo...",0
1,Pinball!,0
2,So the Scottish Government want people to get ...,1
3,villainous pro tip : change the device name on...,0
4,I would date any of these men 🥺,0
...,...,...
1395,I’ve just seen this and felt it deserved a Ret...,0
1396,Omg how an earth is that a pen !!! 🤡,0
1397,Bringing Kanye and drake to a tl near you,0
1398,"I love it when women are referred to as ""girl ...",1


In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer(stop_words="english")

tfidf_vectorizer.fit(df_train["clean_tweet"])

Xtrain_tfidf = tfidf_vectorizer.transform(df_train["clean_tweet"])
Xtest_tfidf = tfidf_vectorizer.transform(df_test["text"])

# category class
ytrain_label = df_train["sarcastic"].tolist()
ytest_label = df_test["sarcastic"].tolist()

Train **Random Forest Classifier** on train dataset and predict labels for test dataset. Finally report precission, recall and f1-score for the classification model.

In [None]:
from sklearn.metrics import classification_report
from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier()

rfc.fit(Xtrain_tfidf, ytrain_label)
ytest_label_pred = rfc.predict(Xtest_tfidf)
print(classification_report(ytest_label, ytest_label_pred, zero_division=0, target_names=["non-sarcastic", "sarcastic"]))

               precision    recall  f1-score   support

non-sarcastic       0.86      0.97      0.91      1200
    sarcastic       0.22      0.06      0.09       200

     accuracy                           0.84      1400
    macro avg       0.54      0.51      0.50      1400
 weighted avg       0.77      0.84      0.79      1400



Train **SVM Classifier** on train dataset and predict labels for test dataset. Finally report precission, recall and f1-score for the classification model.

In [None]:
from sklearn import svm
svm = svm.SVC()
svm.fit(Xtrain_tfidf, ytrain_label)
ytest_label_pred = svm.predict(Xtest_tfidf)
print(classification_report(ytest_label, ytest_label_pred, zero_division=0, target_names=["non-sarcastic", "sarcastic"]))

               precision    recall  f1-score   support

non-sarcastic       0.86      1.00      0.92      1200
    sarcastic       0.00      0.00      0.00       200

     accuracy                           0.86      1400
    macro avg       0.43      0.50      0.46      1400
 weighted avg       0.73      0.86      0.79      1400



**Oversampling for Imbalanced Dataset**

Imbalanced dataset is a dataset with skewed class proportion. In imbalanced dataset, we have a class with a large proportion (majority classes) and a class with a smaller proportion (minority classes). One solution for this challange is [SMOTE (Synthetic Minority Over-sampling Technique)](https://arxiv.org/abs/1106.1813). The idea of SMOTE is to generate synthetic samples by randomly sampling in the minority class.

For this exercise, we use
[RandomOverSampler](https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.RandomOverSampler.html) library for oversampling minority class (sarcastic class)

In [None]:
from collections import Counter
Counter(ytrain_label)

Counter({1: 867, 0: 2600})

In [None]:
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=42)
Xtrain_ros, ytrain_ros = ros.fit_resample(Xtrain_tfidf, ytrain_label)
Counter(ytrain_ros)

Counter({1: 2600, 0: 2600})

In [None]:
rfc.fit(Xtrain_ros, ytrain_ros)
ytest_ros_pred = rfc.predict(Xtest_tfidf)
print(classification_report(ytest_label, ytest_ros_pred, zero_division=0, target_names=["non-sarcastic", "sarcastic"]))

               precision    recall  f1-score   support

non-sarcastic       0.88      0.92      0.90      1200
    sarcastic       0.32      0.22      0.26       200

     accuracy                           0.82      1400
    macro avg       0.60      0.57      0.58      1400
 weighted avg       0.80      0.82      0.81      1400



In [None]:
svm.fit(Xtrain_ros, ytrain_ros)
ytest_ros_pred = svm.predict(Xtest_tfidf)
print(classification_report(ytest_label, ytest_ros_pred, zero_division=0, target_names=["non-sarcastic", "sarcastic"]))

               precision    recall  f1-score   support

non-sarcastic       0.87      0.97      0.92      1200
    sarcastic       0.36      0.10      0.16       200

     accuracy                           0.85      1400
    macro avg       0.61      0.54      0.54      1400
 weighted avg       0.79      0.85      0.81      1400



**Ex 3.2** Construct a simple deep neural network-based classifier using (context independent) GloVe word embeddings pretrained on Tweets (https://nlp.stanford.edu/data/glove.twitter.27B.zip). The model architecture should consist of a Bi-LSTM layer and a multi-layer perceptron layer for classification with ReLU activation for the hidden layers and a sigmoid for the output layer.

**Ex 3.2.1: Extracting Glove Embeddings** We want to convert our To get started first we'll first install the [Gensim library](https://radimrehurek.com/gensim/auto_examples/index.html#documentation) which provides a set of APIs handling pretrained GloVe models. After install we can load the GloVE Twitter model using the `api.load` from gensim. A full list of supported models can be found here: https://github.com/RaRe-Technologies/gensim-data

In [None]:
!pip install --upgrade gensim >> NULL

In [None]:
import gensim.downloader as api
glove = api.load("glove-twitter-25")

We can retrieve the specific global word embedding for a given word as follows:

In [None]:
glove["sad"]

array([ 0.044072, -0.19031 ,  0.44185 , -0.15418 , -0.6026  ,  0.04668 ,
        1.4741  ,  0.14376 , -0.72328 ,  0.43288 , -1.7557  ,  0.41221 ,
       -4.0419  ,  0.40469 , -0.17825 ,  0.83272 ,  0.64866 ,  0.12397 ,
       -0.17873 , -0.59851 ,  0.67779 ,  1.0177  , -0.31664 ,  0.18662 ,
       -0.17645 ], dtype=float32)

However trying not all words are present in the GloVe vocabulary. If you try to retrieve a vector for out of scope word, you'll get an error. When dealing with out of vocabulary terms, we can either drop them during the feature extraction process or map them all to the `UNK` token. What might be some pros and cons of this these two strategies?



We'll create `vocab2id` dictionary which will map each word in the word to a numerical value called the token id. Gensim uses the .vocab attribute which is dictionary object containing the mappings of words to pretrained embedding vectors. If we convert the vocabulary keys into a flat list, each index in the list will correspond to a specific word. This will allow us to represent the text as a list of token ids that we can pass to model.

Note: As of Gensim 4.0, this functionality is avaialable through `glove.key_to_index`

In [None]:
vocab2id = glove.key_to_index

Next we'll encode our text inputs using the token ids which correspond to the GloVe vocabulary. We create a function which takes in a string and returns the encoded token ids using the `vocab2id` mapping. If we run across words not in the vocabulary, we'll drop them.

In [None]:
from typing import List 

def encode_text(text: str, vocab2id: dict) -> List[int]:
  """
  Function takes in a text and a vocab to id mapping. Return
  back a list of token ids corresponding to each word in the
  input text.
  """
  # 1. Tokenize text using white space splitting
  toks = text.split()

  # 2. Encode text. 
  encoded_ids = [vocab2id[tok] for tok in toks if tok in vocab2id]
  return encoded_ids

sample_tweet = "working on deep learning is greaat"
print(encode_text(sample_tweet, vocab2id))

[1135, 46, 2034, 4327, 32, 168369]


Let go ahead apply this function to the clean_tweet column on both the train and test dataframes. We'll create new column called encoded_text which will contain the ids.

In [None]:
df_train["encoded_text"] = df_train["clean_tweet"].apply(lambda x: encode_text(x, vocab2id))
df_test["encoded_text"] = df_test["clean_tweet"].apply(lambda x: encode_text(x, vocab2id))

df_train.head(5)

Unnamed: 0.1,Unnamed: 0,tweet,sarcastic,clean_tweet,encoded_text
0,0,The only thing I got from college is a caffein...,1,the only thing i got from college is a caffein...,"[13, 214, 410, 10, 143, 133, 1502, 32, 11, 224..."
1,1,I love it when professors draw a big question ...,1,i love it when professors draw a big question ...,"[10, 68, 33, 92, 42029, 4776, 11, 398, 1411, 2..."
2,2,Remember the hundred emails from companies whe...,1,remember the hundred emails from companies whe...,"[626, 13, 9759, 14129, 133, 9923, 92, 1302, 39..."
3,3,Today my pop-pop told me I was not “forced” to...,1,today my pop-pop told me i was not “forced” to...,"[148, 29, 677166, 813, 21, 10, 93, 78, 16, 111..."
4,4,@VolphanCarol @littlewhitty @mysticalmanatee I...,1,"i did too, and i also reported cancun cruz ...","[10, 195, 26, 10, 894, 12302, 23433, 5965, 78,..."


It's good idea to have a seperate validation set that we can use during training to evaluate how well the model is converging and avoid overfitting. We'll first split a 15% of the `df_train` into a seperate validation set. 

In [None]:
from sklearn.model_selection import train_test_split

train, val = train_test_split(
    df_train, 
    test_size=.15, 
    stratify = df_train["sarcastic"],
    random_state=2023
  )

Finally we can go create the DataSet and DataLoader that will used to generate the batches for the training. As the sequences are different lengths, we'll need to first standardize them to a fixed length. For sequences that are shorter than the fixed length, we'll right pad the sequence with a paddding token id reach the desire length. For longer sequences we'll truncate all token after past the specified length. We'll use the pytorch method [`pad_sequences`](https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pad_sequence.html) which handle this for padding all sequences in a list to the longest sequence length it finds. 

We'll also need to specify the padding value which tradionally is zero. However our GloVe embeddings already has a word assigned to the zero id. So we'll have to create new pad token and empty embedding and append it to the end of the vocab list and list of vectors. The id for this pad token will `len(vocab)` prior to adding the token. 

In [None]:
import torch
from torch.nn.utils.rnn import pad_sequence

pad_token_id = len(vocab2id)

# Train Inputs 
train_inputs = [torch.LongTensor(encoded_text) for encoded_text in train["encoded_text"].tolist()]
train_inputs = pad_sequence(
    train_inputs, 
    batch_first=True, 
    padding_value = pad_token_id)
print(train_inputs.shape)

# Val inputs
val_inputs = [torch.LongTensor(encoded_text) for encoded_text in val["encoded_text"].tolist()]
val_inputs = pad_sequence(
    val_inputs, 
    batch_first=True, 
    padding_value = pad_token_id)
print(val_inputs.shape)

# Test inputs
test_inputs = [torch.LongTensor(encoded_text) for encoded_text in df_test["encoded_text"].tolist()]
test_inputs = pad_sequence(
    test_inputs, 
    batch_first=True, 
    padding_value = pad_token_id)
print(test_inputs.shape)



torch.Size([2946, 55])
torch.Size([521, 49])
torch.Size([1400, 117])


Let's go ahead and create the final DataSet and DataLoader objects for training.

In [None]:
from torch.utils.data import DataLoader, TensorDataset
# First extract the labels
train_labels = torch.FloatTensor(train["sarcastic"].tolist())
val_labels = torch.FloatTensor(val["sarcastic"].tolist())
test_labels = torch.FloatTensor(df_test["sarcastic"].tolist())

# Data loaders
train_dl = DataLoader(
    TensorDataset(
      train_inputs,
      train_labels,   
    ),
    shuffle=True,
    batch_size = 64
)

val_dl = DataLoader(
    TensorDataset(
        val_inputs,
        val_labels
    ),
    shuffle=False,
    batch_size=64
)

test_dl = DataLoader(
    TensorDataset(
        test_inputs,
        test_labels
    ),
    shuffle=False,
    batch_size=64
)


**Ex 3.2.2: Developing the BiLSTM Classififier**
In this section we focus on architecting our classification model. At a high level our model will consit of the following:
- Embedding layer consisting of the pretrained GloVe weights
- BiLSTM layer: a neural architecture to model the sequential dependencies in text
- Classification head: a multi-layer perceptron which maps the hidden representation to our classification space (a binary outcome).


In [None]:
import torch
import torch.nn as nn 
import numpy as np

class SarcasmClassifier(nn.Module):

  def __init__(
      self, 
      glove: torch.FloatTensor,
      lstm_input_dim: int = 25,         # Dimension of GloVe embeddings
      lstm_hidden_dim: int = 64,  # Hidden dim for LSTM
      lstm_num_layers: int = 2,         # Number of LSTM layers
    ):
    super().__init__()
    # Extract GloVe weights and add padding tensor to end
    glove_weights = glove.vectors
    padding_tensor = np.zeros(lstm_input_dim).reshape(1,-1)
    glove_weights = np.concatenate((glove_weights, padding_tensor), axis=0)
    glove_weights = torch.FloatTensor(glove_weights)

    # Architecture
    self.embedding_layer = nn.Embedding.from_pretrained(glove_weights)
    self.bilstm_layer = nn.LSTM(
        input_size = lstm_input_dim,
        hidden_size = lstm_hidden_dim,
        num_layers = lstm_num_layers,
        bidirectional=True, 
        batch_first=True
    )
    self.linear1 = nn.Linear(lstm_hidden_dim * 2, (lstm_hidden_dim * 2) // 4 )
    self.linear2 = nn.Linear((lstm_hidden_dim * 2) // 4, 1)

  def forward(self, X: torch.tensor) -> torch.tensor:

    # 1. Retrieve embedding
    embedding = self.embedding_layer(X)
    
    # Extract last hidden layer from LSTM 
    lstm_out, (hidden_state, cell_state) = self.bilstm_layer(embedding)
    lstm_feats = torch.cat((hidden_state[-1, :, :], hidden_state[-2, :, :]), dim=1)
    
    # Map lstm hidden representation to output space 
    l1 = F.relu(F.dropout(self.linear1(lstm_feats), p=.5))
    logits = self.linear2(l1)

    return logits


**Ex 3.3:** Train this model for 50 epochs. Generate and save the model predictions on the test set. 

**Ex 3.3.1: Training Preparation**
Prior to training, we'll need to do the following
- initialize an optimizer 
- initialize a loss function
- freeze the embedding layer weights for the model

In [None]:
from torch.optim import AdamW

model = SarcasmClassifier(
    glove, 
    lstm_input_dim=25,
    lstm_num_layers=2,
    lstm_hidden_dim=256,
  )

# Establish loss criterion
criterion = nn.BCEWithLogitsLoss()

# Freeze embedding layer
for name, param in model.named_parameters():
  if 'embedding_layer' in name:
    param.requires_grad = False 

# Initialize optimzer w/ model parameters 
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)

Finally we implement our training loop. At a high level the loop must do the following:
- zero out the optimizer gradients
- perform a forward pass on the batch inputs
- calculate the loss
- backprop the loss through the network

In [None]:
import tqdm.notebook as tqdm
import torch.nn.functional as F
from sklearn.metrics import f1_score

num_epochs = 25
device = "cuda:0" if torch.cuda.is_available() else "cpu"

model.to(device)
for epoch in range(num_epochs):

  # Set model to train model
  model.train()

  running_loss_train = 0.0
  num_batches_train = 0
  
  # Iterate over train batches
  for batch in train_dl:

    inputs, labels = batch[0].to(device), batch[1].to(device)

    optimizer.zero_grad()           # 1. Zero out the optimizer gradients
    logits = model.forward(inputs)  # 2. Forward pass on the model
 
    loss = criterion(logits, labels.view(-1,1)) # 3. Calculate loss
    loss.backward()                             # 4. Backprop loss
    optimizer.step()                            # 5. Optimizer step

    running_loss_train += loss.item()
    num_batches_train += 1
  train_loss = round(running_loss_train / num_batches_train, 3)
  

  if epoch % 5 == 0:
    # Evaluate Model
    model.eval() # Set to eval mode

    running_loss_val = 0
    num_batches_val = 0
    all_preds = []
    all_gold  = []
    for batch in val_dl:
      inputs, labels = batch[0].to(device), batch[1].to(device)

      logits = model.forward(inputs)  
      loss = criterion(logits, labels.view(-1,1)) 

      running_loss_val += loss.item()
      num_batches_val += 1

      preds = [round(pred.item()) for pred in torch.sigmoid(logits).view(-1).detach()]
      all_preds.extend(preds)
      all_gold.extend(labels.tolist())

    val_loss = round(running_loss_val / num_batches_val, 3)
    val_acc = f1_score(all_gold, all_preds)

    print(f"Epoch {epoch}: Train loss {train_loss} | Val Loss: {val_loss} | Val F1: {val_acc}")

Epoch 0: Train loss 0.129 | Val Loss: 1.258 | Val F1: 0.27118644067796616
Epoch 5: Train loss 0.058 | Val Loss: 1.736 | Val F1: 0.2678571428571429
Epoch 10: Train loss 0.029 | Val Loss: 1.946 | Val F1: 0.31034482758620696
Epoch 15: Train loss 0.008 | Val Loss: 2.306 | Val F1: 0.29694323144104806
Epoch 20: Train loss 0.003 | Val Loss: 2.886 | Val F1: 0.31034482758620696


In [None]:
from sklearn.metrics import classification_report

model.eval()
all_preds = []
all_gold  = []
for batch in test_dl:
  inputs, labels = batch[0].to(device), batch[1].to(device)

  logits = model.forward(inputs)  
  loss = criterion(logits, labels.view(-1,1)) # 3. Calculate loss

  running_loss_val += loss.item()
  num_batches_val += 1

  preds = [round(pred.item()) for pred in torch.sigmoid(logits).view(-1).detach()]
  all_preds.extend(preds)
  all_gold.extend(labels.tolist())

print(classification_report(all_gold, all_preds))


              precision    recall  f1-score   support

         0.0       0.86      0.80      0.83      1200
         1.0       0.17      0.25      0.20       200

    accuracy                           0.72      1400
   macro avg       0.52      0.52      0.52      1400
weighted avg       0.77      0.72      0.74      1400



## Exercise 4: Evaluation

**Ex 4.1:** What quantitative metrics do you think would be appropriate for evaluating this classification task? and why?

**Ex 4.2:** Use the metrics selected in Ex 3.1 to evaluate the predictions of the implemented Ml and DL models of Ex 2. Compare the results from the two different models and determine which one performs the best and why.