## Programming Assignment (20 points)

In this assignment, you will solve an irony detection task: given a tweet, your job is to classify whether it is ironic or not.

You will implement a new classifier that does not rely on feature engineering as in previous homeworks. Instead, you will use pretrained word embeddings downloaded from using the `irony.py` script as your input feature vectors. Then, you will encode your sequence of word embeddings with an (already implemented) LSTM and classify based on its final hidden state.


In [17]:
# This is so that you don't have to restart the kernel everytime you edit hmm.py

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Data

We will use the dataset from SemEval-2018: https://github.com/Cyvhee/SemEval2018-Task3

In [1]:
from irony import load_datasets
from sklearn.model_selection import train_test_split

train_sentences, train_labels, test_sentences, test_labels, label2i = load_datasets()

# TODO: Split train into train/dev
train_sentences, dev_sentences, train_labels, dev_labels = train_test_split(train_sentences, train_labels, test_size=0.2, random_state=42)

print("Train set size:", len(train_sentences))
print("Dev set size:", len(dev_sentences))
print("Test set size:", len(test_sentences))

Train set size: 3067
Dev set size: 767
Test set size: 784


## Baseline: Naive Bayes

We have provided the solution for the Naive Bayes part from HW2 in [bayes.py](bayes.py)

There are two implementations: NaiveBayesHW2 is what was expected from HW2. However, we will use a more effecient implementation of it that uses vector operations to calculate the probabilities. Please go through it if you would like to

In [2]:
from irony import run_nb_baseline

run_nb_baseline()

Vectorizing Text: 100%|██████████████████| 3834/3834 [00:00<00:00, 13717.27it/s]
Vectorizing Text: 100%|██████████████████| 3834/3834 [00:00<00:00, 20228.34it/s]
Vectorizing Text: 100%|████████████████████| 784/784 [00:00<00:00, 25013.38it/s]

Baseline: Naive Bayes Classifier
F1-score Ironic: 0.6402966625463535
Avg F1-score: 0.6284487265300938





### Task 1: Implement avg_f1_score() in [util.py](util.py). Then re-run the above cell  (2 Points)

So the micro F1-score for the test set of the Ironic Class using a Naive Bayes Classifier is **0.64**

## Logistic Regression with Word2Vec  (Total: 18 Points)

Unlike sentiment, Irony is very subjective, and there is no word list for ironic and non-ironic tweets. This makes hand-engineering features tedious, therefore, we will use word embeddings as input to the classifier, and make the model automatically extract features aka learn weights for the embeddings

## Tokenizer for Tweets


Tweets are very different from normal document text. They have emojis, hashtags and bunch of other special character. Therefore, we need to create a suitable tokenizer for this kind of text.

Additionally, as described in class, we also need to have a consistent input length of the text document in order for the neural networks built over it to work correctly.

### Task 2: Create a Tokenizer with Padding (5 Points)

Our Tokenizer class is meant for tokenizing and padding batches of inputs. This is done
before we encode text sequences as torch Tensors.

Update the following class by completing the todo statements.

In [2]:
from typing import Dict, List, Optional, Tuple
from collections import Counter

import torch
import numpy as np
import spacy


class Tokenizer:
    """Tokenizes and pads a batch of input sentences."""

    def __init__(self, pad_symbol: Optional[str] = "<PAD>"):
        """Initializes the tokenizer

        Args:
            pad_symbol (Optional[str], optional): The symbol for a pad. Defaults to "<PAD>".
        """
        self.pad_symbol = pad_symbol
        self.nlp = spacy.load("en_core_web_sm")
    
    def __call__(self, batch: List[str]) -> List[List[str]]:
        """Tokenizes each sentence in the batch, and pads them if necessary so
        that we have equal length sentences in the batch.

        Args:
            batch (List[str]): A List of sentence strings

        Returns:
            List[List[str]]: A List of equal-length token Lists.
        """
        batch = self.tokenize(batch)
        batch = self.pad(batch)

        return batch

    def tokenize(self, sentences: List[str]) -> List[List[str]]:
        """Tokenizes the List of string sentences into a Lists of tokens using spacy tokenizer.

        Args:
            sentences (List[str]): The input sentence.

        Returns:
            List[str]: The tokenized version of the sentence.
        """
        # TODO: Tokenize the input with spacy.
        # TODO: Make sure the start token is the special <SOS> token and the end token
        #       is the special <EOS> token
        tokenized_sentences = []
        for sentence in sentences:
            tokens = self.nlp(sentence)
            tokens = [token.text for token in tokens]
            tokens.insert(0, "<SOS>")
            tokens.append("<EOS>")
            tokenized_sentences.append(tokens)

        return tokenized_sentences

    def pad(self, batch: List[List[str]]) -> List[List[str]]:
        """Appends pad symbols to each tokenized sentence in the batch such that
        every List of tokens is the same length. This means that the max length sentence
        will not be padded.

        Args:
            batch (List[List[str]]): Batch of tokenized sentences.

        Returns:
            List[List[str]]: Batch of padded tokenized sentences. 
        """
        # TODO: For each sentence in the batch, append the special <P>
        #       symbol to it n times to make all sentences equal length
        max_length = max(len(s) for s in batch)

        padded_batch = []
        for sentence in batch:
            padded_sentence = sentence.copy()
            padded_sentence.extend([self.pad_symbol] * (max_length - len(padded_sentence)))
            padded_batch.append(padded_sentence)

        return padded_batch

In [3]:
# create the vocabulary of the dataset: use both training and test sets here

SPECIAL_TOKENS = ['<UNK>', '<PAD>', '<SOS>', '<EOS>']

all_data = train_sentences + test_sentences + dev_sentences
my_tokenizer = Tokenizer()

tokenized_data = my_tokenizer.tokenize(all_data)
vocab = sorted(set([w for ws in tokenized_data + [SPECIAL_TOKENS] for w in ws]))

with open('vocab.txt', 'w') as vf:
    vf.write('\n'.join(vocab))

## Embeddings

We use GloVe embeddings https://nlp.stanford.edu/projects/glove/. But these do not necessarily have all of the tokens that will occur in tweets! Hoad the GloVe embeddings, pruning them to only those words in vocab.txt. This is to reduce the memory and runtime of your model.

Then, find the out-of-vocabulary words (oov) and add them to the encoding dictionary and the embeddings matrix.

In [5]:
# Dowload the gloVe vectors for Twitter tweets. This will download a file called glove.twitter.27B.zip

! wget https://nlp.stanford.edu/data/glove.twitter.27B.zip

--2023-11-14 13:20:13--  https://nlp.stanford.edu/data/glove.twitter.27B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://downloads.cs.stanford.edu/nlp/data/glove.twitter.27B.zip [following]
--2023-11-14 13:20:13--  https://downloads.cs.stanford.edu/nlp/data/glove.twitter.27B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1520408563 (1.4G) [application/zip]
Saving to: ‘glove.twitter.27B.zip.1’

glove.twitter.27B.z  10%[=>                  ] 146.11M  1.66MB/s    eta 10m 47s^C


In [6]:
# unzip glove.twitter.27B.zip
# if there is an error, please download the zip file again

! unzip glove.twitter.27B.zip

Archive:  glove.twitter.27B.zip
replace glove.twitter.27B.25d.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: ^C


In [7]:
# Let's see what files are there:

! ls . | grep "glove.*.txt"

glove.twitter.27B.100d.txt
glove.twitter.27B.200d.txt
glove.twitter.27B.25d.txt
glove.twitter.27B.50d.txt


In [4]:
# For this assignment, we will use glove.twitter.27B.50d.txt which has 50 dimensional word vectors
# Feel free to experiment with vectors of other sizes

embedding_path = 'glove.twitter.27B.50d.txt'
vocab_path = "./vocab.txt"

## Creating a custom Embedding Layer

Now the GloVe file has vectors for about 1.2 million words. However, we only need the vectors for a very tiny fraction of words -> the unique words that are there in the classification corpus. Some of the next tasks will be to create a custom embedding layer that has the vectors for this small set of words

### Task 2: Extracting word vectors from GloVe (3 Points)

In [5]:
from typing import Dict, Tuple

import torch


def read_pretrained_embeddings(
    embeddings_path: str,
    vocab_path: str
) -> Tuple[Dict[str, int], torch.FloatTensor]:
    """Read the embeddings matrix and make a dict hashing each word.

    Note that we have provided the entire vocab for train and test, so that for practical purposes
    we can simply load those words in the vocab, rather than all 27B embeddings

    Args:
        embeddings_path (str): _description_
        vocab_path (str): _description_

    Returns:
        Tuple[Dict[str, int], torch.FloatTensor]: _description_
    """
    word2i = {}
    vectors = []
    
    with open(vocab_path, encoding='utf8') as vf:
        vocab = set([w.strip() for w in vf.readlines()]) 
    
    print(f"Reading embeddings from {embeddings_path}...")
    with open(embeddings_path, "r") as f:
        i = 0
        for line in f:
            word, *weights = line.rstrip().split(" ")
            # TODO: Build word2i and vectors such that
            #       each word points to the index of its vector,
            #       and only words that exist in `vocab` are in our embeddings
            if word in vocab:
                word2i[word] = i
                vector_tensor = torch.tensor([float(weight) for weight in weights])
                vectors.append(vector_tensor)
                i += 1


    return word2i, torch.stack(vectors)

### Task 3: Get GloVe Out of Vocabulary (oov) words (0 Points)

The task is to find the words in the Irony corpus that are not in the GloVe Word list

In [6]:
def get_oovs(vocab_path: str, word2i: Dict[str, int]) -> List[str]:
    """Find the vocab items that do not exist in the glove embeddings (in word2i).
    Return the List of such (unique) words.

    Args:
        vocab_path: List of batches of sentences.
        word2i (Dict[str, int]): _description_

    Returns:
        List[str]: _description_
    """
    with open(vocab_path, encoding='utf8') as vf:
        vocab = set([w.strip() for w in vf.readlines()])
    
    glove_and_vocab = set(word2i.keys())
    vocab_and_not_glove = vocab - glove_and_vocab
    return list(vocab_and_not_glove)

### Task 4: Update the embeddings with oov words (3 Points)

In [7]:
from torch.nn.init import xavier_uniform_

def intialize_new_embedding_weights(num_embeddings: int, dim: int) -> torch.FloatTensor:
    """xavier initialization for the embeddings of words in train, but not in gLove.

    Args:
        num_embeddings (int): _description_
        dim (int): _description_

    Returns:
        torch.FloatTensor: _description_
    """
    # TODO: Initialize a num_embeddings x dim matrix with xiavier initiialization
    #      That is, a normal distribution with mean 0 and standard deviation of dim^-0.5
    weights = torch.FloatTensor(num_embeddings, dim).normal_()
    xavier_uniform_(weights)
    return weights


def update_embeddings(
    glove_word2i: Dict[str, int],
    glove_embeddings: torch.FloatTensor,
    oovs: List[str]
) -> Tuple[Dict[str, int], torch.FloatTensor]:
    # TODO: Add the oov words to the dict, assigning a new index to each

    # TODO: Concatenate a new row to embeddings for each oov
    #       initialize those new rows with `intialize_new_embedding_weights`

    # TODO: Return the tuple of the dictionary and the new embeddings matrix
    new_word2i = {word: i + len(glove_word2i) for i, word in enumerate(oovs)}
    glove_word2i.update(new_word2i)

    oov_embeddings = intialize_new_embedding_weights(len(oovs), dim=glove_embeddings.size(1))
    new_embeddings = torch.cat((glove_embeddings, oov_embeddings), dim=0)

    return glove_word2i, new_embeddings

In [8]:
def make_batches(sequences: List[str], batch_size: int) -> List[List[str]]:
    """Yield batch_size chunks from sequences."""
    # TODO
    batches = []
    current_batch = []

    for sequence in sequences:
        current_batch.append(sequence)

        if len(current_batch) == batch_size:
            batches.append(current_batch)
            current_batch = []

    if current_batch:
        batches.append(current_batch)

    return batches

def make_label_batches(labels: List[int], batch_size: int) -> List[List[int]]:
    """Yield batch_size chunks from labels."""
    batches = []
    current_batch = []

    for label in labels:
        current_batch.append(label)
        if len(current_batch) == batch_size:
            batches.append(current_batch)
            current_batch = []

    if current_batch:
        batches.append(current_batch)

    return batches


# TODO: Set your preferred batch size
batch_size = 8
tokenizer = Tokenizer()

# We make batches now and use those.
batch_tokenized = []
batch_labels=make_label_batches(train_labels, batch_size)
# Note: Labels need to be batched in the same way to ensure
# We have train sentence and label batches lining up.

for batch in make_batches(train_sentences, batch_size):
    batch_tokenized.append(tokenizer(batch))


# We make batches now and use those.
dev_batch_tokenized = []
dev_batch_labels=make_label_batches(dev_labels, batch_size)
# Note: Labels need to be batched in the same way to ensure
# We have train sentence and label batches lining up.

for batch in make_batches(dev_sentences, batch_size):
    dev_batch_tokenized.append(tokenizer(batch))

# We make batches now and use those.
test_batch_tokenized = []
test_batch_labels=make_label_batches(test_labels, batch_size)
# Note: Labels need to be batched in the same way to ensure
# We have train sentence and label batches lining up.

for batch in make_batches(test_sentences, batch_size):
    test_batch_tokenized.append(tokenizer(batch))

glove_word2i, glove_embeddings = read_pretrained_embeddings(
    embedding_path,
    vocab_path
)

# Find the out-of-vocabularies
oovs = get_oovs(vocab_path, glove_word2i)

# Add the oovs from training data to the word2i encoding, and as new rows
# to the embeddings matrix
word2i, embeddings = update_embeddings(glove_word2i, glove_embeddings, oovs)

Reading embeddings from glove.twitter.27B.50d.txt...


### Encoding words to integers: DO NOT EDIT

In [9]:
# Use these functions to encode your batches before you call the train loop.

def encode_sentences(batch: List[List[str]], word2i: Dict[str, int]) -> torch.LongTensor:
    """Encode the tokens in each sentence in the batch with a dictionary

    Args:
        batch (List[List[str]]): The padded and tokenized batch of sentences.
        word2i (Dict[str, int]): The encoding dictionary.

    Returns:
        torch.LongTensor: The tensor of encoded sentences.
    """
    UNK_IDX = word2i["<UNK>"]
    tensors = []
    for sent in batch:
        tensors.append(torch.LongTensor([word2i.get(w, UNK_IDX) for w in sent]))
        
    return torch.stack(tensors)


def encode_labels(labels: List[int]) -> torch.FloatTensor:
    """Turns the batch of labels into a tensor

    Args:
        labels (List[int]): List of all labels in the batch

    Returns:
        torch.FloatTensor: Tensor of all labels in the batch
    """
    return torch.LongTensor([int(l) for l in labels])

## Modeling   ( 7 Points)

In [10]:
import torch


# Notice there is a single TODO in the model
class IronyDetector(torch.nn.Module):
    def __init__(
        self,
        input_dim: int,
        hidden_dim: int,
        embeddings_tensor: torch.FloatTensor,
        pad_idx: int,
        output_size: int,
        dropout_val: float = 0.3,
    ):
        super().__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.pad_idx = pad_idx
        self.dropout_val = dropout_val
        self.output_size = output_size
        # TODO: Initialize the embeddings from the weights matrix.
        #       Check the documentation for how to initialize an embedding layer
        #       from a pretrained embedding matrix. 
        #       Be careful to set the `freeze` parameter!
        #       Docs are here: https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html#torch.nn.Embedding.from_pretrained
        self.embeddings = torch.nn.Embedding.from_pretrained(
            embeddings_tensor, padding_idx=pad_idx, freeze=True
        )
        # Dropout regularization
        # https://jmlr.org/papers/v15/srivastava14a.html
        self.dropout_layer = torch.nn.Dropout(p=self.dropout_val, inplace=False)
        # Bidirectional 2-layer LSTM. Feel free to try different parameters.
        # https://colah.github.io/posts/2015-08-Understanding-LSTMs/
        self.lstm = torch.nn.LSTM(
            self.input_dim,
            self.hidden_dim,
            num_layers=2,
            dropout=dropout_val,
            batch_first=True,
            bidirectional=True,
        )
        # For classification over the final LSTM state.
        self.classifier = torch.nn.Linear(hidden_dim*2, self.output_size)
        self.log_softmax = torch.nn.LogSoftmax(dim=2)
    
    def encode_text(
        self,
        symbols: torch.Tensor
    ) -> torch.Tensor:
        """Encode the (batch of) sequence(s) of token symbols with an LSTM.
            Then, get the last (non-padded) hidden state for each symbol and return that.

        Args:
            symbols (torch.Tensor): The batch size x sequence length tensor of input tokens

        Returns:
            torch.Tensor: The final hiddens tate of the LSTM, which represents an encoding of
                the entire sentence
        """
        # First we get the embedding for each input symbol
        embedded = self.embeddings(symbols)
        embedded = self.dropout_layer(embedded)
        # Packs embedded source symbols into a PackedSequence.
        # This is an optimization when using padded sequences with an LSTM
        lens = (symbols != self.pad_idx).sum(dim=1).to("cpu")
        packed = torch.nn.utils.rnn.pack_padded_sequence(
            embedded, lens, batch_first=True, enforce_sorted=False
        )
        # -> batch_size x seq_len x encoder_dim, (h0, c0).
        # print(embedded.shape)
        # print(packed)
        packed_outs, (H, C) = self.lstm(packed)
        
        encoded, _ = torch.nn.utils.rnn.pad_packed_sequence(
            packed_outs,
            batch_first=True,
            padding_value=self.pad_idx,
            total_length=None,
        )
        # Now we have the representation of eahc token encoded by the LSTM.
        encoded, (H, C) = self.lstm(embedded)
        
        # This part looks tricky. All we are doing is getting a tensor
        # That indexes the last non-PAD position in each tensor in the batch.
        last_enc_out_idxs = lens - 1
        # -> B x 1 x 1.
        last_enc_out_idxs = last_enc_out_idxs.view([encoded.size(0)] + [1, 1])
        # -> 1 x 1 x encoder_dim. This indexes the last non-padded dimension.
        last_enc_out_idxs = last_enc_out_idxs.expand(
            [-1, -1, encoded.size(-1)]
        )
        # Get the final hidden state in the LSTM
        last_hidden = torch.gather(encoded, 1, last_enc_out_idxs)
        return last_hidden
    
    def forward(
        self,
        symbols: torch.Tensor,
    ) -> torch.Tensor:
        # print("starting")
        encoded_sents = self.encode_text(symbols)
        # print("encoded")
        output = self.classifier(encoded_sents)
        return self.log_softmax(output)

## Evaluation

In [11]:
def predict(model: torch.nn.Module, dev_sequences: List[torch.Tensor]):
    preds = []
    # TODO: Get the predictions for the dev_sequences using the model
    # Set the model to evaluation mode
    model.eval()
    with torch.no_grad():
        for batch in dev_sequences:
            predictions = model(batch)
            # print(predictions)
            _, predicted_labels = torch.max(predictions, 2)
            predicted_labels = predicted_labels.squeeze().tolist()
            preds.extend(predicted_labels)
    # print(preds)
    return preds


## Training

In [12]:
from tqdm import tqdm_notebook as tqdm

import random
from util import avg_f1_score, f1_score


def training_loop(
    num_epochs,
    train_features,
    train_labels,
    dev_features,
    dev_labels,
    optimizer,
    model,
):
    print("Training...")
    loss_func = torch.nn.NLLLoss()
    batches = list(zip(train_features, train_labels))
    random.shuffle(batches)
    for i in range(num_epochs):
        losses = []
        for features, labels in tqdm(batches):
            # Empty the dynamic computation graph
            optimizer.zero_grad()
            # print(features.shape)
            preds = model(features).squeeze(1)
            # print(preds)
            # print(labels)
            loss = loss_func(preds, labels)
            # Backpropogate the loss through our model
            loss.backward()
            optimizer.step()
            losses.append(loss.item())
        
        print(f"epoch {i}, loss: {sum(losses)/len(losses)}")
        # Estimate the f1 score for the development set
        print("Evaluating dev...")
        preds = predict(model, dev_features)
        # print(preds)
        # print(dev_labels)
        dev_f1 = f1_score(preds, dev_labels, label2i['1'])
        dev_avg_f1 = avg_f1_score(preds, dev_labels, list(label2i.keys()))
        print(f"Dev F1 {dev_f1}")
        print(f"Avf Dev F1 {dev_f1}")
        
    # Return the trained model
    return model

In [13]:
encoded_train_data   = []
encoded_train_labels = []
encoded_dev_data     = []
encoded_dev_labels   = []
encoded_test_data    = []
encoded_test_labels  = []


for x in batch_tokenized:
    encoded_train_data.append(encode_sentences(x, word2i))

for x in batch_labels:
    # print(len(x))
    encoded_train_labels.append(encode_labels(x))

for x in dev_batch_tokenized:
    encoded_dev_data.append(encode_sentences(x, word2i))

for x in test_batch_tokenized:
    encoded_test_data.append(encode_sentences(x, word2i))

encoded_dev_labels = [int(x) for x in dev_labels]
encoded_test_labels = [int(x) for x in test_labels]
    
# print(len(encoded_train_labels))
# print(encoded_dev_labels)
# print(train_data[0].shape)


In [16]:
# TODO: Load the model and run the training loop 
#       on your train/dev splits. Set and tweak hyperparameters.
LR=0.001
model = IronyDetector(
    input_dim=50,
    hidden_dim=10,
    embeddings_tensor=embeddings,
    pad_idx=-1,
    output_size=2,
)

optimizer = torch.optim.Adam(model.parameters(), LR)
trained_model = training_loop(
    num_epochs=20,
    train_features=encoded_train_data,
    train_labels=encoded_train_labels,
    dev_features=encoded_dev_data,
    dev_labels=encoded_dev_labels,
    optimizer=optimizer,
    model=model,
)

Training...


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for features, labels in tqdm(batches):


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 0, loss: 0.6946465750224888
Evaluating dev...
Dev F1 0.30490018148820325
Avf Dev F1 0.30490018148820325


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 1, loss: 0.6928768303866187
Evaluating dev...
Dev F1 0.2925925925925926
Avf Dev F1 0.2925925925925926


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 2, loss: 0.6913390938813487
Evaluating dev...
Dev F1 0.3057090239410682
Avf Dev F1 0.3057090239410682


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 3, loss: 0.6865349303310117
Evaluating dev...
Dev F1 0.39024390243902446
Avf Dev F1 0.39024390243902446


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 4, loss: 0.6637004061291615
Evaluating dev...
Dev F1 0.6593406593406593
Avf Dev F1 0.6593406593406593


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 5, loss: 0.6356691583835831
Evaluating dev...
Dev F1 0.6212534059945504
Avf Dev F1 0.6212534059945504


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 6, loss: 0.6193794881304105
Evaluating dev...
Dev F1 0.6266666666666666
Avf Dev F1 0.6266666666666666


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 7, loss: 0.6075275833718479
Evaluating dev...
Dev F1 0.6361256544502618
Avf Dev F1 0.6361256544502618


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 8, loss: 0.5934363728156313
Evaluating dev...
Dev F1 0.6555697823303458
Avf Dev F1 0.6555697823303458


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 9, loss: 0.5866230320340643
Evaluating dev...
Dev F1 0.6406460296096905
Avf Dev F1 0.6406460296096905


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 10, loss: 0.5749956567694122
Evaluating dev...
Dev F1 0.6321525885558583
Avf Dev F1 0.6321525885558583


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 11, loss: 0.5610271838571256
Evaluating dev...
Dev F1 0.6551724137931034
Avf Dev F1 0.6551724137931034


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 12, loss: 0.5515796991918857
Evaluating dev...
Dev F1 0.6181818181818182
Avf Dev F1 0.6181818181818182


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 13, loss: 0.5453580498384932
Evaluating dev...
Dev F1 0.6393659180977542
Avf Dev F1 0.6393659180977542


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 14, loss: 0.5294688426656649
Evaluating dev...
Dev F1 0.6460296096904441
Avf Dev F1 0.6460296096904441


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 15, loss: 0.5204981039278209
Evaluating dev...
Dev F1 0.6699999999999999
Avf Dev F1 0.6699999999999999


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 16, loss: 0.5201380741006384
Evaluating dev...
Dev F1 0.6348773841961853
Avf Dev F1 0.6348773841961853


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 17, loss: 0.5159014753298834
Evaluating dev...
Dev F1 0.6322751322751322
Avf Dev F1 0.6322751322751322


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 18, loss: 0.4954522574165215
Evaluating dev...
Dev F1 0.6520618556701032
Avf Dev F1 0.6520618556701032


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 19, loss: 0.5043543406451741
Evaluating dev...
Dev F1 0.6356382978723404
Avf Dev F1 0.6356382978723404


In [17]:
# TODO: Load the model and run the training loop 
#       on your train/dev splits. Set and tweak hyperparameters.
LR=0.001
model = IronyDetector(
    input_dim=50,
    hidden_dim=30,
    embeddings_tensor=embeddings,
    pad_idx=-1,
    output_size=2,
)

optimizer = torch.optim.Adam(model.parameters(), LR)
trained_model = training_loop(
    num_epochs=20,
    train_features=encoded_train_data,
    train_labels=encoded_train_labels,
    dev_features=encoded_dev_data,
    dev_labels=encoded_dev_labels,
    optimizer=optimizer,
    model=model,
)

Training...


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for features, labels in tqdm(batches):


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 0, loss: 0.6941524649349352
Evaluating dev...
Dev F1 0.31711711711711715
Avf Dev F1 0.31711711711711715


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 1, loss: 0.6916748466901481
Evaluating dev...
Dev F1 0.5480519480519481
Avf Dev F1 0.5480519480519481


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 2, loss: 0.671515439171344
Evaluating dev...
Dev F1 0.6945525291828795
Avf Dev F1 0.6945525291828795


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 3, loss: 0.6465400225327661
Evaluating dev...
Dev F1 0.6731601731601732
Avf Dev F1 0.6731601731601732


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 4, loss: 0.6284978260907034
Evaluating dev...
Dev F1 0.6748603351955307
Avf Dev F1 0.6748603351955307


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 5, loss: 0.6169205630819002
Evaluating dev...
Dev F1 0.6914893617021276
Avf Dev F1 0.6914893617021276


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 6, loss: 0.6061445522742966
Evaluating dev...
Dev F1 0.6952089704383283
Avf Dev F1 0.6952089704383283


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 7, loss: 0.5883127086951087
Evaluating dev...
Dev F1 0.7021276595744682
Avf Dev F1 0.7021276595744682


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 8, loss: 0.5744428006776919
Evaluating dev...
Dev F1 0.7066381156316917
Avf Dev F1 0.7066381156316917


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 9, loss: 0.562877403300566
Evaluating dev...
Dev F1 0.6920492721164615
Avf Dev F1 0.6920492721164615


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 10, loss: 0.5472855176388597
Evaluating dev...
Dev F1 0.7028901734104046
Avf Dev F1 0.7028901734104046


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 11, loss: 0.5315774855747198
Evaluating dev...
Dev F1 0.7060063224446786
Avf Dev F1 0.7060063224446786


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 12, loss: 0.5051257775824828
Evaluating dev...
Dev F1 0.6927502876869965
Avf Dev F1 0.6927502876869965


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 13, loss: 0.4984907098502542
Evaluating dev...
Dev F1 0.7076923076923077
Avf Dev F1 0.7076923076923077


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 14, loss: 0.4749301486299373
Evaluating dev...
Dev F1 0.6970033296337402
Avf Dev F1 0.6970033296337402


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 15, loss: 0.45127477248509723
Evaluating dev...
Dev F1 0.6923950056753688
Avf Dev F1 0.6923950056753688


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 16, loss: 0.44643347852009657
Evaluating dev...
Dev F1 0.6952879581151832
Avf Dev F1 0.6952879581151832


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 17, loss: 0.43031478563595255
Evaluating dev...
Dev F1 0.6919642857142857
Avf Dev F1 0.6919642857142857


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 18, loss: 0.410838781526157
Evaluating dev...
Dev F1 0.687089715536105
Avf Dev F1 0.687089715536105


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 19, loss: 0.40136962034739554
Evaluating dev...
Dev F1 0.6837988826815642
Avf Dev F1 0.6837988826815642


In [30]:
# TODO: Load the model and run the training loop 
#       on your train/dev splits. Set and tweak hyperparameters.
LR=0.001
model = IronyDetector(
    input_dim=50,
    hidden_dim=30,
    embeddings_tensor=embeddings,
    pad_idx=-1,
    output_size=2,
)

optimizer = torch.optim.Adam(model.parameters(), LR)
trained_model = training_loop(
    num_epochs=10,
    train_features=encoded_train_data,
    train_labels=encoded_train_labels,
    dev_features=encoded_dev_data,
    dev_labels=encoded_dev_labels,
    optimizer=optimizer,
    model=model,
)

Training...


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for features, labels in tqdm(batches):


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 0, loss: 0.693461628165096
Evaluating dev...
Dev F1 0.675885911840968
Avf Dev F1 0.675885911840968


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 1, loss: 0.6924345496421059
Evaluating dev...
Dev F1 0.6825817860300619
Avf Dev F1 0.6825817860300619


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 2, loss: 0.6910616617339352
Evaluating dev...
Dev F1 0.6826666666666666
Avf Dev F1 0.6826666666666666


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 3, loss: 0.6731703453697264
Evaluating dev...
Dev F1 0.6861598440545809
Avf Dev F1 0.6861598440545809


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 4, loss: 0.6468639780456821
Evaluating dev...
Dev F1 0.6794871794871794
Avf Dev F1 0.6794871794871794


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 5, loss: 0.6276490481104702
Evaluating dev...
Dev F1 0.6911764705882353
Avf Dev F1 0.6911764705882353


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 6, loss: 0.6109020313403258
Evaluating dev...
Dev F1 0.6916099773242631
Avf Dev F1 0.6916099773242631


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 7, loss: 0.5975180253541718
Evaluating dev...
Dev F1 0.6906141367323291
Avf Dev F1 0.6906141367323291


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 8, loss: 0.583750571977968
Evaluating dev...
Dev F1 0.6817155756207676
Avf Dev F1 0.6817155756207676


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 9, loss: 0.568440290594784
Evaluating dev...
Dev F1 0.7044025157232705
Avf Dev F1 0.7044025157232705


In [16]:
# TODO: Load the model and run the training loop 
#       on your train/dev splits. Set and tweak hyperparameters.
LR=0.001
model = IronyDetector(
    input_dim=50,
    hidden_dim=30,
    embeddings_tensor=embeddings,
    pad_idx=-1,
    output_size=2,
)

optimizer = torch.optim.Adam(model.parameters(), LR)
trained_model = training_loop(
    num_epochs=20,
    train_features=encoded_train_data,
    train_labels=encoded_train_labels,
    dev_features=encoded_dev_data,
    dev_labels=encoded_dev_labels,
    optimizer=optimizer,
    model=model,
)

Training...


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for features, labels in tqdm(batches):


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 0, loss: 0.6940063933531443
Evaluating dev...
Dev F1 0.03414634146341464
Avf Dev F1 0.03414634146341464


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 1, loss: 0.6921781186635295
Evaluating dev...
Dev F1 0.05783132530120481
Avf Dev F1 0.05783132530120481


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 2, loss: 0.679385018767789
Evaluating dev...
Dev F1 0.5545722713864307
Avf Dev F1 0.5545722713864307


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 3, loss: 0.6622752004768699
Evaluating dev...
Dev F1 0.5547445255474452
Avf Dev F1 0.5547445255474452


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 4, loss: 0.6383328626397997
Evaluating dev...
Dev F1 0.5504587155963303
Avf Dev F1 0.5504587155963303


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 5, loss: 0.6250843536108732
Evaluating dev...
Dev F1 0.5636363636363637
Avf Dev F1 0.5636363636363637


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 6, loss: 0.6084212809024999
Evaluating dev...
Dev F1 0.5788667687595713
Avf Dev F1 0.5788667687595713


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 7, loss: 0.5953646470637372
Evaluating dev...
Dev F1 0.5937961595273265
Avf Dev F1 0.5937961595273265


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 8, loss: 0.5750521821901202
Evaluating dev...
Dev F1 0.5692068429237946
Avf Dev F1 0.5692068429237946


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 9, loss: 0.560974882915616
Evaluating dev...
Dev F1 0.565284178187404
Avf Dev F1 0.565284178187404


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 10, loss: 0.5369126863855248
Evaluating dev...
Dev F1 0.6284153005464481
Avf Dev F1 0.6284153005464481


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 11, loss: 0.5198513172023619
Evaluating dev...
Dev F1 0.6180257510729613
Avf Dev F1 0.6180257510729613


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 12, loss: 0.4948648175923154
Evaluating dev...
Dev F1 0.5946745562130178
Avf Dev F1 0.5946745562130178


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 13, loss: 0.4780462125975949
Evaluating dev...
Dev F1 0.6162624821683311
Avf Dev F1 0.6162624821683311


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 14, loss: 0.45054244754525524
Evaluating dev...
Dev F1 0.605890603085554
Avf Dev F1 0.605890603085554


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 15, loss: 0.4415863197646104
Evaluating dev...
Dev F1 0.5964391691394659
Avf Dev F1 0.5964391691394659


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 16, loss: 0.43235890299547464
Evaluating dev...
Dev F1 0.6246498599439775
Avf Dev F1 0.6246498599439775


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 17, loss: 0.41492125939112157
Evaluating dev...
Dev F1 0.6211699164345404
Avf Dev F1 0.6211699164345404


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 18, loss: 0.3919319190317765
Evaluating dev...
Dev F1 0.6011396011396011
Avf Dev F1 0.6011396011396011


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 19, loss: 0.36362645332701504
Evaluating dev...
Dev F1 0.631578947368421
Avf Dev F1 0.631578947368421


In [18]:
# TODO: Load the model and run the training loop 
#       on your train/dev splits. Set and tweak hyperparameters.
LR=0.001
model = IronyDetector(
    input_dim=50,
    hidden_dim=40,
    embeddings_tensor=embeddings,
    pad_idx=-1,
    output_size=2,
)

optimizer = torch.optim.Adam(model.parameters(), LR)
trained_model = training_loop(
    num_epochs=20,
    train_features=encoded_train_data,
    train_labels=encoded_train_labels,
    dev_features=encoded_dev_data,
    dev_labels=encoded_dev_labels,
    optimizer=optimizer,
    model=model,
)

Training...


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for features, labels in tqdm(batches):


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 0, loss: 0.6939954479845861
Evaluating dev...
Dev F1 0.0478468899521531
Avf Dev F1 0.0478468899521531


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 1, loss: 0.6920350467165312
Evaluating dev...
Dev F1 0.24696356275303646
Avf Dev F1 0.24696356275303646


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 2, loss: 0.6779892488848418
Evaluating dev...
Dev F1 0.669379450661241
Avf Dev F1 0.669379450661241


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 3, loss: 0.659302327161034
Evaluating dev...
Dev F1 0.6701902748414376
Avf Dev F1 0.6701902748414376


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 4, loss: 0.6443067965252945
Evaluating dev...
Dev F1 0.6697038724373576
Avf Dev F1 0.6697038724373576


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 5, loss: 0.6265025213360786
Evaluating dev...
Dev F1 0.6412776412776413
Avf Dev F1 0.6412776412776413


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 6, loss: 0.611960705408516
Evaluating dev...
Dev F1 0.6737841043890865
Avf Dev F1 0.6737841043890865


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 7, loss: 0.5974333773677548
Evaluating dev...
Dev F1 0.6268656716417911
Avf Dev F1 0.6268656716417911


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 8, loss: 0.5838992803279931
Evaluating dev...
Dev F1 0.6564705882352941
Avf Dev F1 0.6564705882352941


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 9, loss: 0.5627358270964274
Evaluating dev...
Dev F1 0.6352357320099256
Avf Dev F1 0.6352357320099256


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 10, loss: 0.5433924095317101
Evaluating dev...
Dev F1 0.649164677804296
Avf Dev F1 0.649164677804296


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 11, loss: 0.519370554946363
Evaluating dev...
Dev F1 0.6521739130434783
Avf Dev F1 0.6521739130434783


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 12, loss: 0.5017241603927687
Evaluating dev...
Dev F1 0.634517766497462
Avf Dev F1 0.634517766497462


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 13, loss: 0.48755788737131905
Evaluating dev...
Dev F1 0.5041186161449753
Avf Dev F1 0.5041186161449753


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 14, loss: 0.5051799272575105
Evaluating dev...
Dev F1 0.6075268817204302
Avf Dev F1 0.6075268817204302


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 15, loss: 0.4716725174027185
Evaluating dev...
Dev F1 0.5671641791044777
Avf Dev F1 0.5671641791044777


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 16, loss: 0.4458213898372681
Evaluating dev...
Dev F1 0.6047745358090185
Avf Dev F1 0.6047745358090185


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 17, loss: 0.45098620001226664
Evaluating dev...
Dev F1 0.5246422893481717
Avf Dev F1 0.5246422893481717


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 18, loss: 0.4354535351449158
Evaluating dev...
Dev F1 0.5558912386706949
Avf Dev F1 0.5558912386706949


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 19, loss: 0.412065264225627
Evaluating dev...
Dev F1 0.5761194029850746
Avf Dev F1 0.5761194029850746


In [23]:
# TODO: Load the model and run the training loop 
#       on your train/dev splits. Set and tweak hyperparameters.
LR=0.0005
model = IronyDetector(
    input_dim=50,
    hidden_dim=30,
    embeddings_tensor=embeddings,
    pad_idx=-1,
    output_size=2,
)

optimizer = torch.optim.Adam(model.parameters(), LR)
trained_model = training_loop(
    num_epochs=40,
    train_features=encoded_train_data,
    train_labels=encoded_train_labels,
    dev_features=encoded_dev_data,
    dev_labels=encoded_dev_labels,
    optimizer=optimizer,
    model=model,
)

Training...


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for features, labels in tqdm(batches):


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 0, loss: 0.6941311267825464
Evaluating dev...
Dev F1 0.014962593516209474
Avf Dev F1 0.014962593516209474


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 1, loss: 0.6927447154497107
Evaluating dev...
Dev F1 0.0963302752293578
Avf Dev F1 0.0963302752293578


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 2, loss: 0.6888899163653454
Evaluating dev...
Dev F1 0.19574468085106383
Avf Dev F1 0.19574468085106383


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 3, loss: 0.6633445664774626
Evaluating dev...
Dev F1 0.6109660574412532
Avf Dev F1 0.6109660574412532


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 4, loss: 0.6355153255475064
Evaluating dev...
Dev F1 0.6608478802992519
Avf Dev F1 0.6608478802992519


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 5, loss: 0.6194065758027136
Evaluating dev...
Dev F1 0.6745843230403801
Avf Dev F1 0.6745843230403801


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 6, loss: 0.6075671117287129
Evaluating dev...
Dev F1 0.6802395209580838
Avf Dev F1 0.6802395209580838


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 7, loss: 0.596371712240701
Evaluating dev...
Dev F1 0.6724782067247821
Avf Dev F1 0.6724782067247821


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 8, loss: 0.5853795868655046
Evaluating dev...
Dev F1 0.6683417085427136
Avf Dev F1 0.6683417085427136


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 9, loss: 0.5736202022526413
Evaluating dev...
Dev F1 0.659846547314578
Avf Dev F1 0.659846547314578


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 10, loss: 0.5625872926320881
Evaluating dev...
Dev F1 0.6683738796414853
Avf Dev F1 0.6683738796414853


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 11, loss: 0.5465474284331625
Evaluating dev...
Dev F1 0.6700379266750949
Avf Dev F1 0.6700379266750949


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 12, loss: 0.5318146028245488
Evaluating dev...
Dev F1 0.6501305483028721
Avf Dev F1 0.6501305483028721


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 13, loss: 0.5196049405494705
Evaluating dev...
Dev F1 0.6324324324324324
Avf Dev F1 0.6324324324324324


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 14, loss: 0.5062299983886381
Evaluating dev...
Dev F1 0.6500655307994757
Avf Dev F1 0.6500655307994757


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 15, loss: 0.49127059827636305
Evaluating dev...
Dev F1 0.6542553191489361
Avf Dev F1 0.6542553191489361


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 16, loss: 0.4800466441665776
Evaluating dev...
Dev F1 0.6059743954480796
Avf Dev F1 0.6059743954480796


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 17, loss: 0.4639670317992568
Evaluating dev...
Dev F1 0.6236263736263736
Avf Dev F1 0.6236263736263736


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 18, loss: 0.45180743011102703
Evaluating dev...
Dev F1 0.6348773841961853
Avf Dev F1 0.6348773841961853


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 19, loss: 0.44786235575641814
Evaluating dev...
Dev F1 0.6394736842105263
Avf Dev F1 0.6394736842105263


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 20, loss: 0.4277896812612501
Evaluating dev...
Dev F1 0.6572528883183569
Avf Dev F1 0.6572528883183569


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 21, loss: 0.41600600849293795
Evaluating dev...
Dev F1 0.6502463054187192
Avf Dev F1 0.6502463054187192


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 22, loss: 0.4086228770708355
Evaluating dev...
Dev F1 0.6642685851318946
Avf Dev F1 0.6642685851318946


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 23, loss: 0.40800978656625375
Evaluating dev...
Dev F1 0.6417525773195877
Avf Dev F1 0.6417525773195877


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 24, loss: 0.38992540683830157
Evaluating dev...
Dev F1 0.6191780821917808
Avf Dev F1 0.6191780821917808


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 25, loss: 0.38680574082536623
Evaluating dev...
Dev F1 0.6364846870838882
Avf Dev F1 0.6364846870838882


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 26, loss: 0.3763413761820023
Evaluating dev...
Dev F1 0.6452476572958501
Avf Dev F1 0.6452476572958501


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 27, loss: 0.37865773774683475
Evaluating dev...
Dev F1 0.6241699867197874
Avf Dev F1 0.6241699867197874


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 28, loss: 0.35351732664275914
Evaluating dev...
Dev F1 0.6374829001367989
Avf Dev F1 0.6374829001367989


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 29, loss: 0.33998656167144264
Evaluating dev...
Dev F1 0.6140845070422536
Avf Dev F1 0.6140845070422536


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 30, loss: 0.3549087582990372
Evaluating dev...
Dev F1 0.6342105263157896
Avf Dev F1 0.6342105263157896


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 31, loss: 0.3202688194772539
Evaluating dev...
Dev F1 0.6186666666666667
Avf Dev F1 0.6186666666666667


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 32, loss: 0.3149495306618822
Evaluating dev...
Dev F1 0.6234817813765182
Avf Dev F1 0.6234817813765182


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 33, loss: 0.3091018320944083
Evaluating dev...
Dev F1 0.6162018592297477
Avf Dev F1 0.6162018592297477


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 34, loss: 0.30676131998673856
Evaluating dev...
Dev F1 0.6109589041095891
Avf Dev F1 0.6109589041095891


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 35, loss: 0.2984805307957383
Evaluating dev...
Dev F1 0.6290956749672346
Avf Dev F1 0.6290956749672346


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 36, loss: 0.2890733474268927
Evaluating dev...
Dev F1 0.6332046332046332
Avf Dev F1 0.6332046332046332


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 37, loss: 0.27613975724185974
Evaluating dev...
Dev F1 0.6210670314637483
Avf Dev F1 0.6210670314637483


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 38, loss: 0.25945564947323874
Evaluating dev...
Dev F1 0.6325459317585302
Avf Dev F1 0.6325459317585302


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 39, loss: 0.2717176884543733
Evaluating dev...
Dev F1 0.6501240694789082
Avf Dev F1 0.6501240694789082


In [22]:
# TODO: Load the model and run the training loop 
#       on your train/dev splits. Set and tweak hyperparameters.
LR=0.0002
model = IronyDetector(
    input_dim=50,
    hidden_dim=30,
    embeddings_tensor=embeddings,
    pad_idx=-1,
    output_size=2,
)

optimizer = torch.optim.Adam(model.parameters(), LR)
trained_model = training_loop(
    num_epochs=40,
    train_features=encoded_train_data,
    train_labels=encoded_train_labels,
    dev_features=encoded_dev_data,
    dev_labels=encoded_dev_labels,
    optimizer=optimizer,
    model=model,
)

Training...


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for features, labels in tqdm(batches):


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 0, loss: 0.6943367924541235
Evaluating dev...
Dev F1 0.01990049751243781
Avf Dev F1 0.01990049751243781


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 1, loss: 0.6929909265600145
Evaluating dev...
Dev F1 0.1400437636761488
Avf Dev F1 0.1400437636761488


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 2, loss: 0.6924412871400515
Evaluating dev...
Dev F1 0.2475442043222004
Avf Dev F1 0.2475442043222004


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 3, loss: 0.6912655453197658
Evaluating dev...
Dev F1 0.39144736842105265
Avf Dev F1 0.39144736842105265


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 4, loss: 0.6847760467790067
Evaluating dev...
Dev F1 0.4709576138147567
Avf Dev F1 0.4709576138147567


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 5, loss: 0.6602443177253008
Evaluating dev...
Dev F1 0.5880794701986756
Avf Dev F1 0.5880794701986756


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 6, loss: 0.6432863816153258
Evaluating dev...
Dev F1 0.6175710594315246
Avf Dev F1 0.6175710594315246


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 7, loss: 0.6331685143522918
Evaluating dev...
Dev F1 0.6280566280566281
Avf Dev F1 0.6280566280566281


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 8, loss: 0.6253798168618232
Evaluating dev...
Dev F1 0.6269430051813472
Avf Dev F1 0.6269430051813472


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 9, loss: 0.6187461130321026
Evaluating dev...
Dev F1 0.6230366492146597
Avf Dev F1 0.6230366492146597


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 10, loss: 0.6126752592778454
Evaluating dev...
Dev F1 0.6236842105263157
Avf Dev F1 0.6236842105263157


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 11, loss: 0.6068442865119626
Evaluating dev...
Dev F1 0.6342105263157896
Avf Dev F1 0.6342105263157896


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 12, loss: 0.6010621580450485
Evaluating dev...
Dev F1 0.6304635761589404
Avf Dev F1 0.6304635761589404


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 13, loss: 0.595178628883635
Evaluating dev...
Dev F1 0.6263440860215054
Avf Dev F1 0.6263440860215054


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 14, loss: 0.5890378885281583
Evaluating dev...
Dev F1 0.6241519674355495
Avf Dev F1 0.6241519674355495


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 15, loss: 0.5825091736235967
Evaluating dev...
Dev F1 0.6231292517006803
Avf Dev F1 0.6231292517006803


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 16, loss: 0.5755747855485728
Evaluating dev...
Dev F1 0.6174863387978142
Avf Dev F1 0.6174863387978142


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 17, loss: 0.5682788956134269
Evaluating dev...
Dev F1 0.6187845303867403
Avf Dev F1 0.6187845303867403


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 18, loss: 0.5607068998506293
Evaluating dev...
Dev F1 0.6196403872752421
Avf Dev F1 0.6196403872752421


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 19, loss: 0.5529480905194456
Evaluating dev...
Dev F1 0.6115702479338844
Avf Dev F1 0.6115702479338844


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 20, loss: 0.5445925537496805
Evaluating dev...
Dev F1 0.6166666666666667
Avf Dev F1 0.6166666666666667


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 21, loss: 0.5350371667494377
Evaluating dev...
Dev F1 0.6125874125874124
Avf Dev F1 0.6125874125874124


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 22, loss: 0.5247002492736405
Evaluating dev...
Dev F1 0.6225895316804408
Avf Dev F1 0.6225895316804408


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 23, loss: 0.5165197921839232
Evaluating dev...
Dev F1 0.6344827586206897
Avf Dev F1 0.6344827586206897


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 24, loss: 0.5107474102211805
Evaluating dev...
Dev F1 0.6383561643835616
Avf Dev F1 0.6383561643835616


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 25, loss: 0.49918947578407824
Evaluating dev...
Dev F1 0.6353887399463808
Avf Dev F1 0.6353887399463808


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 26, loss: 0.49156687073021504
Evaluating dev...
Dev F1 0.6361185983827493
Avf Dev F1 0.6361185983827493


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 27, loss: 0.48617172049125656
Evaluating dev...
Dev F1 0.6304044630404463
Avf Dev F1 0.6304044630404463


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 28, loss: 0.47559739449449506
Evaluating dev...
Dev F1 0.6427622841965472
Avf Dev F1 0.6427622841965472


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 29, loss: 0.4644137909441876
Evaluating dev...
Dev F1 0.6418109187749668
Avf Dev F1 0.6418109187749668


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 30, loss: 0.45590507801777375
Evaluating dev...
Dev F1 0.6318681318681318
Avf Dev F1 0.6318681318681318


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 31, loss: 0.44280388493401307
Evaluating dev...
Dev F1 0.6330150068212824
Avf Dev F1 0.6330150068212824


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 32, loss: 0.44040161041387665
Evaluating dev...
Dev F1 0.6395663956639567
Avf Dev F1 0.6395663956639567


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 33, loss: 0.4324902231261755
Evaluating dev...
Dev F1 0.6300984528832629
Avf Dev F1 0.6300984528832629


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 34, loss: 0.4216851614958917
Evaluating dev...
Dev F1 0.6356164383561644
Avf Dev F1 0.6356164383561644


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 35, loss: 0.41448898330175626
Evaluating dev...
Dev F1 0.6559571619812583
Avf Dev F1 0.6559571619812583


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 36, loss: 0.4025958666752558
Evaluating dev...
Dev F1 0.6448087431693988
Avf Dev F1 0.6448087431693988


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 37, loss: 0.3931128863769118
Evaluating dev...
Dev F1 0.6450742240215924
Avf Dev F1 0.6450742240215924


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 38, loss: 0.41072504193289205
Evaluating dev...
Dev F1 0.6422764227642277
Avf Dev F1 0.6422764227642277


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 39, loss: 0.42114921784377657
Evaluating dev...
Dev F1 0.6419753086419753
Avf Dev F1 0.6419753086419753


In [121]:
# TODO: Load the model and run the training loop 
#       on your train/dev splits. Set and tweak hyperparameters.
LR=0.001
model = IronyDetector(
    input_dim=50,
    hidden_dim=20,
    embeddings_tensor=embeddings,
    pad_idx=-1,
    output_size=2,
)

optimizer = torch.optim.Adam(model.parameters(), LR)
trained_model = training_loop(
    num_epochs=20,
    train_features=encoded_train_data,
    train_labels=encoded_train_labels,
    dev_features=encoded_dev_data,
    dev_labels=encoded_dev_labels,
    optimizer=optimizer,
    model=model,
)

Training...


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for features, labels in tqdm(batches):


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 0, loss: 0.69380340911448
Evaluating dev...
Dev F1 0.2967032967032967
Avf Dev F1 0.2967032967032967


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 1, loss: 0.6926383003592491
Evaluating dev...
Dev F1 0.39869281045751637
Avf Dev F1 0.39869281045751637


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 2, loss: 0.6768349096334229
Evaluating dev...
Dev F1 0.5531914893617021
Avf Dev F1 0.5531914893617021


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 3, loss: 0.6538802763291945
Evaluating dev...
Dev F1 0.5851648351648352
Avf Dev F1 0.5851648351648352


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 4, loss: 0.6343199148929367
Evaluating dev...
Dev F1 0.6913849509269357
Avf Dev F1 0.6913849509269357


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 5, loss: 0.6204305886446188
Evaluating dev...
Dev F1 0.6534883720930232
Avf Dev F1 0.6534883720930232


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 6, loss: 0.6091704540885985
Evaluating dev...
Dev F1 0.6805074971164936
Avf Dev F1 0.6805074971164936


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 7, loss: 0.5938456194320073
Evaluating dev...
Dev F1 0.6723095525997582
Avf Dev F1 0.6723095525997582


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 8, loss: 0.5818841725898286
Evaluating dev...
Dev F1 0.6782178217821784
Avf Dev F1 0.6782178217821784


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 9, loss: 0.5729195424355567
Evaluating dev...
Dev F1 0.679372197309417
Avf Dev F1 0.679372197309417


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 10, loss: 0.5546143474833419
Evaluating dev...
Dev F1 0.6752037252619325
Avf Dev F1 0.6752037252619325


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 11, loss: 0.5395979527772093
Evaluating dev...
Dev F1 0.6463104325699746
Avf Dev F1 0.6463104325699746


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 12, loss: 0.5242651490649829
Evaluating dev...
Dev F1 0.6609124537607891
Avf Dev F1 0.6609124537607891


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 13, loss: 0.5070604562448958
Evaluating dev...
Dev F1 0.6682634730538922
Avf Dev F1 0.6682634730538922


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 14, loss: 0.49927060895909864
Evaluating dev...
Dev F1 0.6609124537607891
Avf Dev F1 0.6609124537607891


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 15, loss: 0.48399344998567057
Evaluating dev...
Dev F1 0.6674786845310597
Avf Dev F1 0.6674786845310597


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 16, loss: 0.464188937136593
Evaluating dev...
Dev F1 0.6730083234244947
Avf Dev F1 0.6730083234244947


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 17, loss: 0.44191731795823824
Evaluating dev...
Dev F1 0.6697674418604651
Avf Dev F1 0.6697674418604651


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 18, loss: 0.43400512024527416
Evaluating dev...
Dev F1 0.6722689075630252
Avf Dev F1 0.6722689075630252


  0%|          | 0/384 [00:00<?, ?it/s]

epoch 19, loss: 0.4339368274086155
Evaluating dev...
Dev F1 0.6535626535626535
Avf Dev F1 0.6535626535626535


In [17]:
preds = predict(model, encoded_test_data)
# print(preds)
# print(encoded_test_labels)
train_avg_f1 = f1_score(preds, encoded_test_labels, 1)
print(f"Avf Dev F1 {train_avg_f1}")


Avf Dev F1 0.6259097525473072


## Written Assignment (30 Points)

### 1. Describe what the task is, and how it could be useful.

### 2. Describe, at the high level, that is, without mathematical rigor, how pretrained word embeddings like the ones we relied on here are computed. Your description can discuss the Word2Vec class of algorithms, GloVe, or a similar method.

### 3. What are some of the benefits of using word embeddings instead of e.g. a bag of words?

### 4. What is the difference between Binary Cross Entropy loss and the negative log likelihood loss we used here (`torch.nn.NLLLoss`)?

### 5. Show your experimental results. Indicate any changes to hyperparameters, data splits, or architectural changes you made, and how those effected results.

#### 1. 
Here the task is irony detection in text, this is a dificult task for models due to the model needing to be able to understand the hidden subtext in the text. This is very necesary for a model due to the prevalence of irony and sarcasm in human conversations. Recognition of irony would help models come closer to understanding human speech in a more holistic way.

#### 2. 
Pretrained word embeddings are vector representations of words in a continuous vector space. They capturing semantic relationships between words based on their contextual usage in a large corpus of text. Two of the most widely used methods for generating pretrained word embeddings are Word2Vec and GloVe (Global Vectors for Word Representation). I will explain how Word2Vec works further:

**Word2Vec:** Word2Vec is a designed to learn word embeddings by predicting the context of words in a given corpus. Given a target word, the model predicts the context words that are likely to appear around it. The objective is to maximize the probability of context words given the target word. This way we arbe able to understand how the word relates to other words by looking at its neighbours.

#### 3.
Using word embeddings offers many advantages over bag-of-words representations. While BoW is a sparse method that does not take context into account, word embeddings provides us dense vector representations that capture semantic relationships and contextual information. They enable models to understand similarities between words, consider nuanced meanings based on context, and exhibit algebraic properties for word analogies. OVerall they offer more understanding into the word as compared to BoW and also allows for better models.

#### 4.
The major difference between Binary Cross Entropy loss and Negative Log Likelihood is when they are used. While we use BCE loss in the case of binary classification, NLL loss is used for multi-class scenarios, especially in the context of sequence modeling.

#### 5. 
The table for all the models trained is bellow:
| Sno      | Hidden Layer Size | Learning Rate     | F1: Epoch 5 | F1: Epoch 10  | F1: Epoch 15 | F1: Epoch 20  | F1: Epoch 25 | F1: Epoch 30  | F1: Epoch 35 | F1: Epoch 40  |
| :---     |    :----:   |:---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |
| 1      | 10        | 0.001  | 0.62 | 0.62 | 0.67 | 0.63|
| 2      | 20        | 0.001  | 0.65 | 0.67 | 0.67 | 0.65|
| 3      | 30        | 0.001  | 0.69 | 0.70 | 0.69 | 0.68|
| 4      | 30        | 0.0005  | 0.66 | 0.67 | 0.66 | 0.64| 0.64| 0.64 | 0.63|0.65|
| 5      | 30        | 0.0002  | 0.47 | 0.62 | 0.62 | 0.62| 0.61| 0.64 | 0.64|0.64|
| 6      | 40        | 0.001  | 0.64 | 0.65 | 0.56 | 0.57|

As we can see the model works best for a Hidden layer length of 30 and Learning rate of 0.001. This gave us an f1 score of 0.62 on the test set. Overall a hidden layer lenght of 30 provides the highest f1 score while the optimal learning rate is 0.001 for this case.