# Assignment 1

**Credits**: Federico Ruggeri, Eleonora Mancini, Paolo Torroni

**Keywords**: POS tagging, Sequence labelling, RNNs


# Contact

For any doubt, question, issue or help, you can always contact us at the following email addresses:

Teaching Assistants:

* Federico Ruggeri -> federico.ruggeri6@unibo.it
* Eleonora Mancini -> e.mancini@unibo.it

Professor:

* Paolo Torroni -> p.torroni@unibo.it

# Introduction

You are tasked to address the task of POS tagging.

<center>
        <img src="https://github.com/LeonardoM999/NLP/blob/main/Assignment%201/images/pos_tagging.png?raw=1" alt="POS tagging" />
</center>

In [29]:
!pip install keras



In [30]:
# Necessary Libraries
import pandas as pd
import numpy as np
import io
from pathlib import Path
import shutil
import urllib
import sys
import zipfile

import tqdm
import random
import tensorflow as tf
import tensorflow.keras as keras
from keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing import text
from tensorflow.keras.preprocessing import sequence
import os
from typing import List, Callable, Dict

# [Task 1 - 0.5 points] Corpus

You are going to work with the [Penn TreeBank corpus](https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/dependency_treebank.zip).

**Ignore** the numeric value in the third column, use **only** the words/symbols and their POS label.

### Example

```Pierre	NNP	2
Vinken	NNP	8
,	,	2
61	CD	5
years	NNS	6
old	JJ	2
,	,	2
will	MD	0
join	VB	8
the	DT	11
board	NN	9
as	IN	9
a	DT	15
nonexecutive	JJ	15
director	NN	12
Nov.	NNP	9
29	CD	16
.	.	8
```

### Splits

The corpus contains 200 documents.

   * **Train**: Documents 1-100
   * **Validation**: Documents 101-150
   * **Test**: Documents 151-199

### Instructions

* **Download** the corpus.
* **Encode** the corpus into a pandas.DataFrame object.
* **Split** it in training, validation, and test sets.

###Download the corpus

In [31]:
def download_url(download_path: Path, url: str):
        urllib.request.urlretrieve(url, filename=download_path)

In [32]:
dataset_url = "https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/dependency_treebank.zip"
dataset_name = "dependency_treebank"

#print(f"Current work directory: {Path.cwd()}")
dataset_folder = Path.cwd().joinpath("Datasets")
if not dataset_folder.exists():
    dataset_folder.mkdir(parents=True)

dataset_zip_path = dataset_folder.joinpath("dependency_treebank.zip")
if not dataset_zip_path.exists():
  print("Downloading dataset... ", end="")
  download_url(url=dataset_url, download_path=dataset_zip_path)
  print("Download complete!")
else:
  print("Dataset already downloaded!")
dataset_path = dataset_folder.joinpath(dataset_name)

if not dataset_path.exists():
  print("Extracting dataset... (it may take a while...) ", end="")
  shutil.unpack_archive(dataset_zip_path, dataset_folder)
  print("Extraction completed!")
else:
  print("Dataset already extracted!")

Downloading dataset... Download complete!
Extracting dataset... (it may take a while...) Extraction completed!


###Encode the corpus into a pandas DataFrame object

In [33]:
folder = dataset_folder.joinpath(dataset_name)


dataframe_rows = []
for file_path in sorted(folder.glob('*.dp')):
  with file_path.open(mode='r', encoding='utf-8') as text_file:
    # Reading the text
    text = text_file.read()
    # Split sentences (\n\n is used for most NLP datasets to split sentences)
    sentences = text.split("\n\n")

    # Observing each sentence
    for s in sentences:
      sentence = []
      tags =[]
      #sentence = [pierre,vinken,,aksjdajs, ]. tags = [NNP,aab,asd....]
      # Taking every line
      for line in s.split("\n"):
        columns = line.split("\t")
        # If every line have word, tag, value
        if len(columns) > 2:
          # Put words and tags into lists
          sentence.append(columns[0])
          tags.append(columns[1])

      # Get the File_ID
      file_id = int(file_path.stem.split("_")[1])
      dataframe_row = {
               "file_id": file_id,
               "sentence": sentence,
               "tag": tags
           }
      dataframe_rows.append(dataframe_row)
# Create the dataframe
df = pd.DataFrame(dataframe_rows)

FILE_ID, WORD, TAG = df.columns.values

In [34]:
df.head()

Unnamed: 0,file_id,sentence,tag
0,1,"[Pierre, Vinken, ,, 61, years, old, ,, will, j...","[NNP, NNP, ,, CD, NNS, JJ, ,, MD, VB, DT, NN, ..."
1,1,"[Mr., Vinken, is, chairman, of, Elsevier, N.V....","[NNP, NNP, VBZ, NN, IN, NNP, NNP, ,, DT, NNP, ..."
2,2,"[Rudolph, Agnew, ,, 55, years, old, and, forme...","[NNP, NNP, ,, CD, NNS, JJ, CC, JJ, NN, IN, NNP..."
3,3,"[A, form, of, asbestos, once, used, to, make, ...","[DT, NN, IN, NN, RB, VBN, TO, VB, NNP, NN, NNS..."
4,3,"[The, asbestos, fiber, ,, crocidolite, ,, is, ...","[DT, NN, NN, ,, NN, ,, VBZ, RB, JJ, IN, PRP, V..."


### Splitting Data Train-Test-Validation
Before splitting, lower case convertion is done as a mini preprocessing step. Main preprocessing steps will be done in further.

#### Lower Case

In [35]:
### Make a list lowercase
def lowercase_list(input_list):
    return [item.lower() for item in input_list]

In [36]:
df['sentence'] = df['sentence'].apply(lowercase_list)

#### Splitting

In [37]:
### file indices for train/validation/test no randomization


train_ids = np.arange(1, 101)
val_ids = np.arange(101,151)
test_ids = np.arange(151,200)

df_train = df[df[FILE_ID].isin(train_ids)]
df_val = df[df[FILE_ID].isin(val_ids)]
df_test = df[df[FILE_ID].isin(test_ids)]

# [Task 2 - 0.5 points] Text encoding

To train a neural POS tagger, you first need to encode text into numerical format.

### Instructions

* Embed words using **GloVe embeddings**.
* You are **free** to pick any embedding dimension.
* [Optional] You are free to experiment with text pre-processing: **make sure you do not delete any token!**

### Pre-processing

#### Reproducibility

In [38]:
def set_reproducibility(seed):
    random.seed(seed)               # Seed for the Python built-in random module
    np.random.seed(seed)            # Seed for NumPy
    tf.random.set_seed(seed)        # Seed for TensorFlow
    os.environ['TF_DETERMINISTIC_OPS'] = '1'  # Set an environment variable for deterministic TensorFlow operations


#### Hyperparameters

In [39]:
max_sequence_length=int(np.quantile([len(seq) for seq in df_train['sentence']], 0.99))
hparams = {
    "batch_size": 128,
    "embedding_dim": 100,
    "embedding_trainable": False,
    "learning_rate": 0.005,
    "max_sequence_length": max_sequence_length,
    "vocab_size" : 7405,
    "tag_size" : 46
}

#### Vocabulary Creation & Tokenization
In order to embed the words, we need a Vocabulary and Tokenized Tags.

In [40]:
### Use Keras Tokenizer to create Vocabulary

tokenizer = Tokenizer(oov_token = 'OOV')
tokenizer.fit_on_texts(df_train['sentence'])

tag_tokenizer = Tokenizer()
tag_tokenizer.fit_on_texts(df_train['tag'])

# Turns text into into padded sequences.
def prep_text(texts, tokenizer, max_sequence_length):
    text_sequences = tokenizer.texts_to_sequences(texts)
    return sequence.pad_sequences(text_sequences, maxlen=max_sequence_length,padding='post')

text_train = prep_text(df_train["sentence"], tokenizer, hparams["max_sequence_length"])
text_test = prep_text(df_test["sentence"], tokenizer, hparams["max_sequence_length"])
text_val = prep_text(df_val["sentence"], tokenizer, hparams["max_sequence_length"])

tag_train = prep_text(df_train['tag'], tag_tokenizer, hparams["max_sequence_length"])
tag_test = prep_text(df_test['tag'], tag_tokenizer, hparams["max_sequence_length"])
tag_val = prep_text(df_val['tag'], tag_tokenizer, hparams["max_sequence_length"])


In [41]:
text_train.shape

(1963, 56)

In [42]:
from keras.utils import to_categorical
num_classes = len(tag_tokenizer.word_index) + 1
y_train = to_categorical(tag_train, num_classes)
y_test = to_categorical(tag_test, num_classes)
y_val = to_categorical(tag_val, num_classes)

In [43]:
all_classes = list(tag_tokenizer.word_index.keys())
all_tokens = list(tag_tokenizer.word_index.values())
punct_classes = [",", ".", ":", "``", "''", "$", "#", "sym", "-rrb-", "-lrb-"]
punct_tokens = [tag_tokenizer.word_index[p] for p in punct_classes]
allowed_classes = [word for word in tag_tokenizer.index_word.values() if word not in punct_classes]
allowed_tokens = [token for token in all_tokens if token not in punct_tokens]

print(f"All classes are: {all_classes}\n" +
      f"Their translation in token is: {all_tokens}\n\n" +
      f"Not Allow token (Punctuation) are: {punct_classes}\n" +
      f"Their translation in token is: {punct_tokens}\n\n" +
      f"Classes without punctuation: {allowed_classes}\n" +
      f"Their translation in token is: {allowed_tokens}")

All classes are: ['nn', 'nnp', 'in', 'dt', 'nns', 'jj', ',', '.', 'vbd', 'rb', 'cd', 'vb', 'cc', 'vbz', 'vbn', 'to', 'prp', 'vbg', 'vbp', 'md', 'prp$', '``', 'pos', "''", '$', ':', 'wdt', 'jjr', 'wp', 'rp', 'nnps', 'jjs', 'wrb', 'rbr', '-rrb-', '-lrb-', 'ex', 'rbs', 'ls', 'pdt', 'wp$', 'fw', 'uh', 'sym', '#']
Their translation in token is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]

Not Allow token (Punctuation) are: [',', '.', ':', '``', "''", '$', '#', 'sym', '-rrb-', '-lrb-']
Their translation in token is: [7, 8, 26, 22, 24, 25, 45, 44, 35, 36]

Classes without punctuation: ['nn', 'nnp', 'in', 'dt', 'nns', 'jj', 'vbd', 'rb', 'cd', 'vb', 'cc', 'vbz', 'vbn', 'to', 'prp', 'vbg', 'vbp', 'md', 'prp$', 'pos', 'wdt', 'jjr', 'wp', 'rp', 'nnps', 'jjs', 'wrb', 'rbr', 'ex', 'rbs', 'ls', 'pdt', 'wp$', 'fw', 'uh']
Their translation in token is: [1, 2, 3, 4, 5, 6, 9, 10

In [44]:
y_train.shape

(1963, 56, 46)

### Glove Embeddings

#### Downloading Pre-Trained Glove Embeddings
This may take a few minutes to complete.

In [45]:
zip_file_url = "http://nlp.stanford.edu/data/glove.6B.zip"
zip_file = urllib.request.urlopen(zip_file_url)
archive = zipfile.ZipFile(io.BytesIO(zip_file.read()))

#### Creating Embedding Matrix
We use the downloaded GloVe embeddings to create an embedding matrix, where the rows contain the word embeddings for the tokens in the Tokenizer's vocabulary.

In [46]:
embeddings_index = {}
glove_file = "glove.6B.100d.txt"

with archive.open(glove_file) as f:
    for line in f:
        values = line.split()
        word = values[0].decode("utf-8")
        coefs = np.asarray(values[1:], dtype="float32")
        embeddings_index[word] = coefs

embedding_matrix = np.zeros((len(tokenizer.word_index) + 1, hparams["embedding_dim"]))
num_words_in_embedding = 0
for word, i in tokenizer.word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        num_words_in_embedding += 1
        embedding_matrix[i] = embedding_vector

In [47]:
### Inspect tokens' embedding vectors
idx_token = 2
print(f'Token: {list(tokenizer.word_index.keys())[idx_token]} \nVector: {embedding_matrix[idx_token]}')

Token: the 
Vector: [-0.10767     0.11053     0.59811997 -0.54360998  0.67395997  0.10663
  0.038867    0.35481     0.06351    -0.094189    0.15786    -0.81664997
  0.14172     0.21939     0.58504999 -0.52157998  0.22782999 -0.16642
 -0.68228     0.35870001  0.42568001  0.19021     0.91962999  0.57555002
  0.46184999  0.42363    -0.095399   -0.42749    -0.16566999 -0.056842
 -0.29595     0.26036999 -0.26605999 -0.070404   -0.27662     0.15820999
  0.69825     0.43081     0.27952    -0.45436999 -0.33801001 -0.58183998
  0.22363999 -0.57779998 -0.26862001 -0.20424999  0.56393999 -0.58524001
 -0.14365    -0.64218003  0.0054697  -0.35247999  0.16162001  1.1796
 -0.47674    -2.75530005 -0.1321     -0.047729    1.06550002  1.10339999
 -0.2208      0.18669     0.13177     0.15117     0.71310002 -0.35214999
  0.91347998  0.61782998  0.70991999  0.23954999 -0.14571001 -0.37858999
 -0.045959   -0.47367999  0.2385      0.20536    -0.18996     0.32506999
 -1.11119998 -0.36341     0.98679    -0.084

# [Task 3 - 1.0 points] Model definition

You are now tasked to define your neural POS tagger.

### Instructions

* **Baseline**: implement a Bidirectional LSTM with a Dense layer on top.
* You are **free** to experiment with hyper-parameters to define the baseline model.

* **Model 1**: add an additional LSTM layer to the Baseline model.
* **Model 2**: add an additional Dense layer to the Baseline model.

* **Do not mix Model 1 and Model 2**. Each model has its own instructions.

**Note**: if a document contains many tokens, you are **free** to split them into chunks or sentences to define your mini-batches.

### Model Creation

In [48]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import LSTM, Embedding, Dense, TimeDistributed, Dropout, Bidirectional, Input

In [53]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Bidirectional, LSTM, Dense
import tensorflow as tf

class CreateModel(tf.keras.Model):
    def __init__(self, config):
        super().__init__()

        vocab_size = config['vocab_size']
        embedding_dim = config['embedding_dim']
        max_sequence_length = config['max_sequence_length']
        embedding_matrix = config['embedding_matrix']
        tag_size = config['tag_size']
        lstm_units = config['lstm_units']
        Additional_LSTM = config['Additional_LSTM']
        Additional_Dense = config['Additional_Dense']
        add_lstm_units = config['add_lstm_units']
        add_dense_units = config['add_dense_units']

        # Embedding layer
        self.embedding_layer = Embedding(
            vocab_size + 1,
            embedding_dim,
            input_length=max_sequence_length,
            weights=[embedding_matrix],
            trainable=False
        )

        # Bidirectional LSTM layer
        self.bi_lstm = Bidirectional(LSTM(lstm_units, return_sequences=True))
        # Additional LSTM
        self.additional_lstm = Bidirectional(LSTM(add_lstm_units, return_sequences=True)) if Additional_LSTM else None
        # Additional Dense
        self.additional_dense = Dense(add_dense_units, activation='softmax') if Additional_Dense else None

        # Dense output layer
        self.dense_output = Dense(tag_size, activation='softmax')


    def call(self, inputs):
        # Define the forward pass
        x = self.embedding_layer(inputs)
        x = self.bi_lstm(x)

        # Add the additional LSTM layer if specified
        if self.additional_lstm:
            x = self.additional_lstm(x)

        # Add the additional Dense layer if specified
        if self.additional_dense:
            x = self.additional_dense(x)

        outputs = self.dense_output(x)
        return outputs

    def build(self, shape):
        x = tf.keras.layers.Input(shape=(shape,))
        return tf.keras.Model(inputs=x, outputs=self.call(x))



In [55]:
config_dict = {
    'vocab_size': 7405,
    'embedding_dim': 100,
    'max_sequence_length': max_sequence_length,
    'embedding_matrix': embedding_matrix,
    'tag_size': 46,
    'lstm_units': 64,
    'Additional_LSTM': False,
    'Additional_Dense': False,
    'add_lstm_units': None,
    'add_dense_units': None
}
# Create an instance of the custom model
custom_model = CreateModel(config_dict).build(config_dict["max_sequence_length"])

# Compile the model
custom_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Print model summary

custom_model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 56)]              0         
                                                                 
 embedding_3 (Embedding)     (None, 56, 100)           740600    
                                                                 
 bidirectional_3 (Bidirecti  (None, 56, 128)           84480     
 onal)                                                           
                                                                 
 dense_2 (Dense)             (None, 56, 46)            5934      
                                                                 
Total params: 831014 (3.17 MB)
Trainable params: 90414 (353.18 KB)
Non-trainable params: 740600 (2.83 MB)
_________________________________________________________________


### TestCode

In [None]:
# ### Baseline: implement a Bidirectional LSTM with a Dense layer on top.
# from tensorflow.keras.models import Model
# from tensorflow.keras.layers import Input, Embedding, Bidirectional, LSTM, Dense

# # Define input layer
# inputs = Input(shape=(hparams["max_sequence_length"],))

# # Add embedding layer
# embedding_layer = Embedding(hparams["vocab_size"]+1, hparams["embedding_dim"], input_length=hparams["max_sequence_length"], weights=[embedding_matrix], trainable=False)(inputs)

# # Add bidirectional LSTM layer
# bi_lstm = Bidirectional(LSTM(256, return_sequences=True))(embedding_layer)

# # Add dense output layer
# outputs = Dense(hparams["tag_size"], activation='softmax')(bi_lstm)

# # Create the model
# model = Model(inputs=inputs, outputs=outputs)

# # Compile the model
# model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 56)]              0         
                                                                 
 embedding (Embedding)       (None, 56, 100)           740600    
                                                                 
 bidirectional (Bidirection  (None, 56, 512)           731136    
 al)                                                             
                                                                 
 dense (Dense)               (None, 56, 46)            23598     
                                                                 
Total params: 1495334 (5.70 MB)
Trainable params: 754734 (2.88 MB)
Non-trainable params: 740600 (2.83 MB)
_________________________________________________________________


In [None]:
history = model.fit(text_train, y_train, batch_size=64, epochs=50, validation_data=(text_val,y_val) , verbose=1)

In [62]:
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report

In [None]:
y_pred = model.predict([text_test]).argmax(-1).flatten()
y_test_flatten = y_test.argmax(-1).flatten()
print("F1 score: {}".format(f1_score(y_test_flatten, y_pred, labels=allowed_tokens, average='macro', zero_division=0)))


F1 score: 0.7255384045552715


# [Task 4 - 1.0 points] Metrics

Before training the models, you are tasked to define the evaluation metrics for comparison.

### Instructions

* Evaluate your models using macro F1-score, compute over **all** tokens.
* **Concatenate** all tokens in a data split to compute the F1-score. (**Hint**: accumulate FP, TP, FN, TN iteratively)
* **Do not consider punctuation and symbol classes** $\rightarrow$ [What is punctuation?](https://en.wikipedia.org/wiki/English_punctuation)

**Note**: What about OOV tokens?
   * All the tokens in the **training** set that are not in GloVe **must** be added to the vocabulary.
   * For the remaining tokens (i.e., OOV in the validation and test sets), you have to assign them a **special token** (e.g., [UNK]) and a **static** embedding.
   * You are **free** to define the static embedding using any strategy (e.g., random, neighbourhood, etc...)

### More about OOV

For a given token:

* **If in train set**: add to vocabulary and assign an embedding (use GloVe if token in GloVe, custom embedding otherwise).
* **If in val/test set**: assign special token if not in vocabulary and assign custom embedding.

Your vocabulary **should**:

* Contain all tokens in train set; or
* Union of tokens in train set and in GloVe $\rightarrow$ we make use of existing knowledge!


### Token to embedding mapping

You can follow two approaches for encoding tokens in your POS tagger.

### Work directly with embeddings

- Compute the embedding of each input token
- Feed the mini-batches of shape (batch_size, # tokens, embedding_dim) to your model

### Work with Embedding layer

- Encode input tokens to token ids
- Define a Embedding layer as the first layer of your model
- Compute the embedding matrix of all known tokens (i.e., tokens in your vocabulary)
- Initialize the Embedding layer with the computed embedding matrix
- You are **free** to set the Embedding layer trainable or not

### Padding

Pay attention to padding tokens!

Your model **should not** be penalized on those tokens.

#### How to?

There are two main ways.

However, their implementation depends on the neural library you are using.

- Embedding layer
- Custom loss to compute average cross-entropy on non-padding tokens only

**Note**: This is a **recommendation**, but we **do not penalize** for missing workarounds.

### Metrics

#### Hyperparameter Tuning & Evaluation

In [21]:
from itertools import combinations_with_replacement

def layer_unit_calculator(max_units):
    base_units = [32, 64, 128, 256,512]
    for _ in base_units:
      if _ > max_units:
        base_units.remove(_)


    # Generate all possible two-element combinations
    # Convert the resulting iterator to a list
    layer_units = list(combinations_with_replacement(base_units, 2))

    return base_units, layer_units



In [63]:
def grid_search(parameters, max_units):
  baseline_units , layer_units = layer_unit_calculator(max_units)
  if parameters["Additional_LSTM"]:
    for i in layer_units:
      parameters["lstm_units"] = i[0]
      parameters["add_lstm_units"] = i[1]
      # Create an instance of the custom model
      custom_model = CreateModel(config_dict).build(config_dict["max_sequence_length"])

      # Compile the model
      custom_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

      # Print model summary
      custom_model.summary()
  elif parameters["Additional_Dense"]:
    for i in layer_units:
      parameters["lstm_units"] = i[0]
      parameters["add_dense_units"] = i[1]
      # Create an instance of the custom model
      custom_model = CreateModel(config_dict).build(config_dict["max_sequence_length"])

      # Compile the model
      custom_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

      # Print model summary
      custom_model.summary()
  else:
    for i in baseline_units:
      parameters["lstm_units"] = i
      # Create an instance of the custom model
      custom_model = CreateModel(config_dict).build(config_dict["max_sequence_length"])
      # Compile the model
      custom_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
      # Print model summary
      custom_model.summary()
      history = custom_model.fit(text_train, y_train, batch_size=64, epochs=50, validation_data=(text_val,y_val) , verbose=1)
      y_pred = custom_model.predict([text_test]).argmax(-1).flatten()
      y_test_flatten = y_test.argmax(-1).flatten()
      print("F1 score: {}".format(f1_score(y_test_flatten, y_pred, labels=allowed_tokens, average='macro', zero_division=0)))



config_dict = {
    'vocab_size': 7405,
    'embedding_dim': 100,
    'max_sequence_length': max_sequence_length,
    'embedding_matrix': embedding_matrix,
    'tag_size': 46,
    'lstm_units': 64,
    'Additional_LSTM': False,
    'Additional_Dense': False,
    'add_lstm_units': None,
    'add_dense_units': None
}
grid_search(config_dict,64)



Model: "model_14"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_16 (InputLayer)       [(None, 56)]              0         
                                                                 
 embedding_16 (Embedding)    (None, 56, 100)           740600    
                                                                 
 bidirectional_22 (Bidirect  (None, 56, 64)            34048     
 ional)                                                          
                                                                 
 dense_15 (Dense)            (None, 56, 46)            2990      
                                                                 
Total params: 777638 (2.97 MB)
Trainable params: 37038 (144.68 KB)
Non-trainable params: 740600 (2.83 MB)
_________________________________________________________________
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8

# [Task 5 - 1.0 points] Training and Evaluation

You are now tasked to train and evaluate the Baseline, Model 1, and Model 2.

### Instructions

* Train **all** models on the train set.
* Evaluate **all** models on the validation set.
* Compute metrics on the validation set.
* Pick **at least** three seeds for robust estimation.
* Pick the **best** performing model according to the observed validation set performance.

# [Task 6 - 1.0 points] Error Analysis

You are tasked to evaluate your best performing model.

### Instructions

* Compare the errors made on the validation and test sets.
* Aggregate model errors into categories (if possible)
* Comment the about errors and propose possible solutions on how to address them.

# [Task 7 - 1.0 points] Report

Wrap up your experiment in a short report (up to 2 pages).

### Instructions

* Use the NLP course report template.
* Summarize each task in the report following the provided template.

### Recommendations

The report is not a copy-paste of graphs, tables, and command outputs.

* Summarize classification performance in Table format.
* **Do not** report command outputs or screenshots.
* Report learning curves in Figure format.
* The error analysis section should summarize your findings.

# Submission

* **Submit** your report in PDF format.
* **Submit** your python notebook.
* Make sure your notebook is **well organized**, with no temporary code, commented sections, tests, etc...
* You can upload **model weights** in a cloud repository and report the link in the report.

# FAQ

Please check this frequently asked questions before contacting us

### Execution Order

You are **free** to address tasks in any order (if multiple orderings are available).

### Trainable Embeddings

You are **free** to define a trainable or non-trainable Embedding layer to load the GloVe embeddings.

### Model architecture

You **should not** change the architecture of a model (i.e., its layers).

However, you are **free** to play with their hyper-parameters.

### Neural Libraries

You are **free** to use any library of your choice to implement the networks (e.g., Keras, Tensorflow, PyTorch, JAX, etc...)

### Keras TimeDistributed Dense layer

If you are using Keras, we recommend wrapping the final Dense layer with `TimeDistributed`.

### Robust Evaluation

Each model is trained with at least 3 random seeds.

Task 4 requires you to compute the average performance over the 3 seeds and its corresponding standard deviation.

### Model Selection for Analysis

To carry out the error analysis you are **free** to either

* Pick examples or perform comparisons with an individual seed run model (e.g., Baseline seed 1337)
* Perform ensembling via, for instance, majority voting to obtain a single model.

### Error Analysis

Some topics for discussion include:
   * Model performance on most/less frequent classes.
   * Precision/Recall curves.
   * Confusion matrices.
   * Specific misclassified samples.

### Punctuation

**Do not** remove punctuation from documents since it may be helpful to the model.

You should **ignore** it during metrics computation.

If you are curious, you can run additional experiments to verify the impact of removing punctuation.

# The End