# Assignment 1

**Credits**: Federico Ruggeri, Eleonora Mancini, Paolo Torroni

**Keywords**: POS tagging, Sequence labelling, RNNs


# Contact

For any doubt, question, issue or help, you can always contact us at the following email addresses:

Teaching Assistants:

* Federico Ruggeri -> federico.ruggeri6@unibo.it
* Eleonora Mancini -> e.mancini@unibo.it

Professor:

* Paolo Torroni -> p.torroni@unibo.it

# Introduction

You are tasked to address the task of POS tagging.

<center>
    <img src="./images/pos_tagging.png" alt="POS tagging" />
</center>

In [9]:
#from google.colab import drive
#drive.mount('/content/drive')

In [10]:
#!cp -rf /content/drive/MyDrive/UNIBO/NLP/Assignments/Assignment-1/data ./
#!cp -rf /content/drive/MyDrive/UNIBO/NLP/Assignments/Assignment-1/images ./
#!cp /content/drive/MyDrive/UNIBO/NLP/Assignments/Assignment-1/data.csv ./

# [Task 1 - 0.5 points] Corpus

You are going to work with the [Penn TreeBank corpus](https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/dependency_treebank.zip).

**Ignore** the numeric value in the third column, use **only** the words/symbols and their POS label.

### Example

```Pierre	NNP	2
Vinken	NNP	8
,	,	2
61	CD	5
years	NNS	6
old	JJ	2
,	,	2
will	MD	0
join	VB	8
the	DT	11
board	NN	9
as	IN	9
a	DT	15
nonexecutive	JJ	15
director	NN	12
Nov.	NNP	9
29	CD	16
.	.	8
```

### Splits

The corpus contains 200 documents.

   * **Train**: Documents 1-100
   * **Validation**: Documents 101-150
   * **Test**: Documents 151-199

### Instructions

* **Download** the corpus.
* **Encode** the corpus into a pandas.DataFrame object.
* **Split** it in training, validation, and test sets.

In [11]:
import os
import pandas as pd
import numpy as np
import random

data_folder = "./data"
def encode_dataset(dataset_name: str, to_lower: bool) -> pd.DataFrame:
  """
    Takes the dataset and encodes it in a pandas dataframe having six columns ['split', 'doc_id', 'sentence_num', 'words', 'tags', 'num_tokens']. Computes also unique tags set and unique words set and returns them with the dataframe.
  
  """
  print("Encoding dataset as pandas dataframe...")

  dataset_folder = os.path.join(data_folder+ "/dataset")
  
  dataframe_rows = []             #dataframe that will contain all the sentences in all the documents, each sentence as a list of word and a list of corresponding tags
  unique_tags = set()             
  unique_words = set()

  for doc in os.listdir(dataset_folder):
    if doc.endswith(".csv") or doc.endswith(".pkl"): continue
    doc_num = int(doc[5:8])
    doc_path = os.path.join(dataset_folder,doc)

    with open(doc_path, mode='r', encoding='utf-8') as file:
      df = pd.read_csv(file,sep='\t',header=None,skip_blank_lines=False)
      df.rename(columns={0:'word',1:"TAG",2:"remove"},inplace=True)
      df.drop("remove",axis=1,inplace=True)

      if to_lower: df['word'] = df["word"].str.lower() #set all words to lower case
      
      #create another column that indicate the group id by sentence 
      df["group_num"] = df.isnull().all(axis=1).cumsum()
      df.dropna(inplace=True)
      df.reset_index(drop=True, inplace=True)
      
      unique_tags.update(df['TAG'].unique())     #save all the unique tags in a set 
      unique_words.update(df['word'].unique())   #save all the unique words in a set 

      #generate sentence list in a document 
      df_list = [df.iloc[rows] for _, rows in df.groupby('group_num').groups.items()]
      for n,d in enumerate(df_list) :           #for each sentence create a row in the final dataframe
          dataframe_row = {
              "split" : 'train' if doc_num<=100 else ('val' if doc_num<=150  else 'test'),
              "doc_id" : doc_num,
              "sentence_num" : n,
              "words": d['word'].tolist(),
              "tags":  d['TAG'].tolist(),
              "num_tokens": len(d['word'])
          }
          dataframe_rows.append(dataframe_row)

  dataframe_path = os.path.join(data_folder, dataset_name)
  df_final = pd.DataFrame(dataframe_rows)
  df_final.to_csv(dataframe_path + ".csv")                      #save as csv to inspect

  print("Encoding completed!")
    
  return  df_final, unique_tags, unique_words

df, unique_tags, unique_words = encode_dataset("encoded_dataset", to_lower = True)

print('Some words from the dataset:', random.choices(list(unique_words),k=15))
print('Some tags from the dataset:', random.choices(list(unique_tags),k=15))

print('\nencoded dataframe:')

Encoding dataset as pandas dataframe...


Encoding completed!
Some words from the dataset: ['referral', 'siegal', 'forward', 'headline', '30,841', 'e.w.', 'alzheimer', 'consideration', 'rhythm', 'procurement', 'asserts', 'combat', 'chrysler', 'advancing', 'norwick']
Some tags from the dataset: ['RBS', 'RP', 'NN', 'JJS', 'RB', 'VB', 'WP$', 'FW', 'WP$', 'VBZ', 'VB', '#', '.', '.', 'MD']

encoded dataframe:


In [12]:
df.sort_values("doc_id").groupby('split').head()

Unnamed: 0,split,doc_id,sentence_num,words,tags,num_tokens
3249,train,1,0,"[pierre, vinken, ,, 61, years, old, ,, will, j...","[NNP, NNP, ,, CD, NNS, JJ, ,, MD, VB, DT, NN, ...",18
3250,train,1,1,"[mr., vinken, is, chairman, of, elsevier, n.v....","[NNP, NNP, VBZ, NN, IN, NNP, NNP, ,, DT, NNP, ...",13
3376,train,2,0,"[rudolph, agnew, ,, 55, years, old, and, forme...","[NNP, NNP, ,, CD, NNS, JJ, CC, JJ, NN, IN, NNP...",26
2243,train,3,3,"[although, preliminary, findings, were, report...","[IN, JJ, NNS, VBD, VBN, RBR, IN, DT, NN, IN, ,...",35
2269,train,3,29,"[it, has, no, bearing, on, our, work, force, t...","[PRP, VBZ, DT, NN, IN, PRP$, NN, NN, NN, .]",10
3494,val,101,9,"[in, a, second, area, of, common, concern, ,, ...","[IN, DT, JJ, NN, IN, JJ, NN, ,, DT, NN, NN, ,,...",43
3485,val,101,0,"[a, house-senate, conference, approved, major,...","[DT, NNP, NN, VBD, JJ, NNS, IN, DT, NN, IN, JJ...",44
3489,val,101,4,"[these, fiscal, pressures, are, also, a, facto...","[DT, JJ, NNS, VBP, RB, DT, NN, IN, VBG, DT, NN...",39
3490,val,101,5,"[to, accommodate, the, additional, cash, assis...","[TO, VB, DT, JJ, NN, NN, ,, DT, NNP, NNPS, NNP...",26
3486,val,101,1,"[for, the, agency, for, international, develop...","[IN, DT, NNP, IN, NNP, NNP, ,, NNS, VBD, $, CD...",51


In [13]:
from collections import OrderedDict
import pickle

dict_path = os.path.join(data_folder,'dictionaries.pkl') #path where dictionaries will be saved 

def build_dict(words : list[str], tags : list[str]): 
    """
        Builds 4 dictionaries word2int, int2word, tag2int, int2tag and returns them
    """
    
    word2int = OrderedDict()
    int2word = OrderedDict()

    for i, word in enumerate(words):
        word2int[word] = i+1           #plus 1 since the 0 will be used as tag token 
        int2word[i+1] = word

    tag2int = OrderedDict()
    int2tag = OrderedDict()

    for i, tag in enumerate(tags):
        tag2int[tag] = i+1
        int2tag[i+1] = tag
    
    print('saving dictionaries as pickle files')
    pickle_files = [word2int,int2word,tag2int,int2tag]
    
    with open(dict_path, 'wb') as f:
        pickle.dump(pickle_files, f)

    return word2int,int2word,tag2int,int2tag

word2int,int2word,tag2int,int2tag = build_dict(unique_words,unique_tags)

saving dictionaries as pickle files


In [14]:
indexed_df_path = os.path.join(data_folder, "indexed_dataset.pkl") #numberized dataframe path

def build_indexed_dataframe(word2int, tag2int, df):
    """
        Given the dictionaries word2int, tag2int and the dataframe, creates a dataframe were every word and tag is represented by its number and returns it
    """
    print('Initiating numberization of words and tags in dataframe')
    indexed_rows = []
    for words,tags in zip(df['words'],df['tags']):
        indexed_row = {'indexed_words':[word2int[word] for word in words ],'indexed_tags':[tag2int[tag] for tag in tags ]}
        indexed_rows.append(indexed_row)
    
    indexed_df = pd.DataFrame(indexed_rows)

    indexed_df.insert(0,'split',df['split'])
    indexed_df.insert(1,'num_tokens',df['num_tokens'])

    print('Numberization completed')

    return indexed_df


def check_dataframe_numberization(indexed_df, normal_df, int2word, int2tag) :
    """
       Checks if the numberized dataframe will lead to the normal dataframe usind the dictionaries int2word and int2tag
    """
    for n, (w_t, t_t) in enumerate(zip(indexed_df['indexed_words'],indexed_df['indexed_tags'])):
        if not normal_df.loc[n,'words'] == [int2word[indexed_word] for indexed_word in w_t]:
            print('words numberization gone wrong') 
            return False
        if not normal_df.loc[n,'tags'] == [int2tag[indexed_tag] for indexed_tag in t_t]:
            print('tags numberization gone wrong')
            return False 
    
    print('\nAll right with dataset numberization')
    print('Saving indexed dataframe')
    
    indexed_df.to_pickle(indexed_df_path)


indexed_df = build_indexed_dataframe(word2int,tag2int,df)
check_dataframe_numberization(indexed_df,df, int2word, int2tag)


Initiating numberization of words and tags in dataframe
Numberization completed



All right with dataset numberization
Saving indexed dataframe


# [Task 2 - 0.5 points] Text encoding

To train a neural POS tagger, you first need to encode text into numerical format.

### Instructions

* Embed words using **GloVe embeddings**.
* You are **free** to pick any embedding dimension.
* [Optional] You are free to experiment with text pre-processing: **make sure you do not delete any token!**

In [15]:
import torch
from torchtext.vocab import GloVe

embedding_dimension = 50

glove_embeddings = GloVe(name='6B', dim=embedding_dimension)

  from .autonotebook import tqdm as notebook_tqdm


In [16]:
def build_embedding_matrix(emb_model, word2int):
    """
        Given the embedding model and the dict. word2int. If there is the embedding for the word, we add it to the embedding_matrix. In negative case we put a list of random values.
        Return the embedding matrix
    """
    #check_value_distribution_glove(emb_model)
   
    embedding_dimension = len(emb_model[0]) #how many numbers each emb vector is composed of                                                           
    embedding_matrix = np.zeros((len(word2int)+1, embedding_dimension), dtype=np.float32)   #create a matrix initialized with all zeros 

    for word, idx in word2int.items():
        try:
            embedding_vector = emb_model[word]
        except (KeyError, TypeError):
            embedding_vector = np.random.uniform(low=-0.05, high=0.05, size=embedding_dimension)

        embedding_matrix[idx] = embedding_vector     #assign the retrived or the generated vector to the corresponding index 
    
    print('Saving embedding matrix')
    path = os.path.join(data_folder, "emb_matrix")
    np.save(path,embedding_matrix,allow_pickle=True)

    print("Embedding matrix shape: {}".format(embedding_matrix.shape))

    return embedding_matrix

embedding_matrix = build_embedding_matrix(glove_embeddings, word2int)

Saving embedding matrix
Embedding matrix shape: (10948, 50)


In [17]:
def load_data():
    """
        Loads the data "emb_matrix, indexed_dataset, word2int, int2word, tag2int, int2tag " and returns them
    """
    emb_matrix_path = os.path.join(data_folder,'emb_matrix.npy')
    indexed_dataset_path = os.path.join(data_folder,'indexed_dataset.pkl')
    dictionaries_path = os.path.join(data_folder,'dictionaries.pkl')

    if os.path.exists(emb_matrix_path) and os.path.exists(indexed_dataset_path):
        print('Loading embedding matrix')
        emb_matrix = np.load(emb_matrix_path,allow_pickle=True)
        print('Loading numberized dataset')
        indexed_dataset = pd.read_pickle(indexed_dataset_path)
        print('Loading dictionaries')
        with open(dictionaries_path, 'rb') as f:
            word2int,int2word,tag2int,int2tag = pickle.load(f)
        
        print('All data loaded')
    else:
        print('What you are looking for is not present in the folder')
        emb_matrix, indexed_dataset = None, None

    return emb_matrix, indexed_dataset, word2int, int2word, tag2int, int2tag

emb_matrix, indexed_dataset, word2int, int2word, tag2int, int2tag = load_data()

Loading embedding matrix
Loading numberized dataset
Loading dictionaries
All data loaded


# [Task 3 - 1.0 points] Model definition

You are now tasked to define your neural POS tagger.

### Instructions

* **Baseline**: implement a Bidirectional LSTM with a Dense layer on top.
* You are **free** to experiment with hyper-parameters to define the baseline model.

* **Model 1**: add an additional LSTM layer to the Baseline model.
* **Model 2**: add an additional Dense layer to the Baseline model.

* **Do not mix Model 1 and Model 2**. Each model has its own instructions.

**Note**: if a document contains many tokens, you are **free** to split them into chunks or sentences to define your mini-batches.

### Baseline

In [18]:
import torch.nn as nn
import torch.nn.functional as F

def create_emb_layer(weights_matrix, pad_idx = 0):
    """
        Creates and returns the embedding layer
    """
    matrix = torch.Tensor(weights_matrix)   #the embedding matrix 
    _ , embedding_dim = matrix.shape 
    emb_layer = nn.Embedding.from_pretrained(matrix, freeze=True, padding_idx = pad_idx)   #load pretrained weights in the layer and make it non-trainable 
    return emb_layer, embedding_dim

In [19]:
class Baseline(nn.Module):
    def __init__(self, lstm_dimension, dense_dimension):
        super().__init__()
        self.bidirectional_layer = nn.LSTM(bidirectional=True, input_size=embedding_dimension, hidden_size=lstm_dimension, batch_first=True)
        self.dense_layer = nn.Linear(in_features=lstm_dimension*2, out_features=dense_dimension)
        self.embedding_layer, self.embedding_dim = create_emb_layer(embedding_matrix)

    def forward(self, sentences, sentences_length):
        embedded_sentences = self.embedding_layer(sentences)
        packed_sentences = nn.utils.rnn.pack_padded_sequence(embedded_sentences, sentences_length, batch_first=True, enforce_sorted=False)
        packed_output, _ = self.bidirectional_layer(packed_sentences)
        output, _ = nn.utils.rnn.pad_packed_sequence(packed_output, batch_first=True)
        output = self.dense_layer(output)
        output = F.log_softmax(output, dim=2)
        return output

        

### Model 1

In [20]:
class Model1(nn.Module):
    def __init__(self, lstm_dimension, dense_dimension):
        super().__init__()
        self.bidirectional_layer_1 = nn.LSTM(bidirectional=True, input_size=embedding_dimension, hidden_size=lstm_dimension, batch_first=True)
        self.bidirectional_layer_2 = nn.LSTM(bidirectional=True, input_size=lstm_dimension*2, hidden_size=lstm_dimension, batch_first=True)
        self.dense_layer = nn.Linear(in_features=lstm_dimension*2, out_features=dense_dimension)
        self.embedding_layer, self.embedding_dim = create_emb_layer(embedding_matrix)

    def forward(self, sentences, sentences_length):
        embedded_sentences = self.embedding_layer(sentences)
        packed_sentences = nn.utils.rnn.pack_padded_sequence(embedded_sentences, sentences_length, batch_first=True, enforce_sorted=False)
        packed_output, _ = self.bidirectional_layer_1(packed_sentences)
        packed_output, _ = self.bidirectional_layer_2(packed_output)
        output, _ = nn.utils.rnn.pad_packed_sequence(packed_output, batch_first=True)
        output = self.dense_layer(output)
        output = F.log_softmax(output, dim=2)
        return output

### Model 2

In [21]:
class Model2(nn.Module):
    def __init__(self, lstm_dimension, dense_dimension):
        super().__init__()
        self.bidirectional_layer = nn.LSTM(bidirectional=True, input_size=embedding_dimension, hidden_size=lstm_dimension, batch_first=True)
        self.dense_layer_1 = nn.Linear(in_features=lstm_dimension*2, out_features=dense_dimension)
        self.dense_layer_2 = nn.Linear(in_features=dense_dimension, out_features=dense_dimension)
        self.embedding_layer, self.embedding_dim = create_emb_layer(embedding_matrix)

    def forward(self, sentences, sentences_length):
        embedded_sentences = self.embedding_layer(sentences)
        packed_sentences = nn.utils.rnn.pack_padded_sequence(embedded_sentences, sentences_length, batch_first=True, enforce_sorted=False)
        packed_output, _ = self.bidirectional_layer(packed_sentences)
        output, _ = nn.utils.rnn.pad_packed_sequence(packed_output, batch_first=True)
        output = self.dense_layer_1(output)
        output = self.dense_layer_2(output)
        output = F.log_softmax(output, dim=2)
        return output

# [Task 4 - 1.0 points] Metrics

Before training the models, you are tasked to define the evaluation metrics for comparison.

### Instructions

* Evaluate your models using macro F1-score, compute over **all** tokens.
* **Concatenate** all tokens in a data split to compute the F1-score. (**Hint**: accumulate FP, TP, FN, TN iteratively)
* **Do not consider punctuation and symbol classes** $\rightarrow$ [What is punctuation?](https://en.wikipedia.org/wiki/English_punctuation)

**Note**: What about OOV tokens?
   * All the tokens in the **training** set that are not in GloVe are **not** considered as OOV
   * For the remaining tokens (i.e., OOV in the validation and test sets), you have to assign them a **static** embedding.
   * You are **free** to define the static embedding using any strategy (e.g., random, neighbourhood, etc...)

# [Task 5 - 1.0 points] Training and Evaluation

You are now tasked to train and evaluate the Baseline, Model 1, and Model 2.

### Instructions

* Train **all** models on the train set.
* Evaluate **all** models on the validation set.
* Compute metrics on the validation set.
* Pick **at least** three seeds for robust estimation.
* Pick the **best** performing model according to the observed validation set performance.

In [40]:
def get_to_be_masked_tags():
    punctuation_tags = ['$', '``', '.', ',', '#', 'SYM', ':', "''",'-RRB-','-LRB-']   #tags to be masked 
    token_punctuation = [tag2int[tag] for tag in punctuation_tags]
    return torch.LongTensor(token_punctuation+[0])

to_mask = get_to_be_masked_tags()

def reshape_and_mask(predictions,targets): 
    non_masked_elements = torch.isin(targets, to_mask, invert=True)
    
    return predictions[non_masked_elements],targets[non_masked_elements]


In [22]:
from torch.utils.data import Dataset, DataLoader

class PosDataset(Dataset):
    def __init__(self, text, labels):
        self.labels = labels
        self.text = text
        self.sentence_lengths = [len(sentence) for sentence in self.text]
    def __len__(self):
            return len(self.labels)
    def __getitem__(self, idx):
            label = self.labels[idx]
            text = self.text[idx]
            sample = (text, label, self.sentence_lengths[idx])
            return sample

In [23]:
def collate_fn(data):
    return ([x[0] for x in data], [x[1] for x in data], [x[2] for x in data])

In [24]:
def create_dataloaders(b_s : int):     #b_s = batch_size
    
    train_df = indexed_dataset[indexed_dataset['split'] == 'train'].reset_index(drop=True)      
    val_df = indexed_dataset[indexed_dataset['split'] == 'val'].reset_index(drop=True)
    test_df = indexed_dataset[indexed_dataset['split'] == 'test'].reset_index(drop=True)

    #create DataframeDataset objects for each split 
    train_dataset = PosDataset(train_df.iloc[:,2],train_df.iloc[:,3])
    val_dataset = PosDataset(val_df.iloc[:,2],val_df.iloc[:,3])
    test_dataset = PosDataset(test_df.iloc[:,2],test_df.iloc[:,3])

    train_dataloader = DataLoader(train_dataset, batch_size=b_s, shuffle=True, collate_fn= collate_fn)
    val_dataloader = DataLoader(val_dataset, batch_size=b_s, shuffle=True, collate_fn= collate_fn)
    test_dataloader = DataLoader(test_dataset, batch_size=b_s, shuffle=True, collate_fn= collate_fn)

    return train_dataloader,val_dataloader,test_dataloader 

In [25]:
batch_size = 32

tr_dl, val_dl, test_dl = create_dataloaders(batch_size)

In [45]:
from torch.nn import CrossEntropyLoss
from torch.optim import Adam
import torch.nn.utils.rnn as rnn

def train(model, epochs, loss_function, dataloader):
    model.train()
    optimizer = Adam(model.parameters(), lr=5e-4)
    for epoch in range(epochs):
        for sentences, pos, s_len in dataloader:
            optimizer.zero_grad()
            
            tensor_sentences = [torch.LongTensor(s) for s in sentences]
            tensor_pos = [torch.LongTensor(p) for p in pos]

            padded_sentences = rnn.pad_sequence(tensor_sentences, batch_first = True, padding_value = 0)
            padded_pos = rnn.pad_sequence(tensor_pos, batch_first = True, padding_value = 0)

            predicted = model(padded_sentences, s_len)

            predicted = predicted.view(-1,predicted.shape[-1])    
            targets = padded_pos.view(-1)

            predicted, targets = reshape_and_mask(predicted, targets)

            loss = loss_function(predicted, targets)
            loss.backward()
            optimizer.step()
        print(f'Train epoch [{epoch+1}/{epochs}] loss: {loss.item()}')

In [46]:
loss_function = CrossEntropyLoss()

lstm_dimension = 16
dense_dimension = len(unique_tags)+1

baseline_model = Baseline(lstm_dimension, dense_dimension)
#double_lstm_model = Model1(lstm_dimension, dense_dimension)
#double_dense_model = Model2(lstm_dimension, dense_dimension)

In [48]:
epochs = 10
train(baseline_model, epochs, loss_function, tr_dl)

Train epoch [0/10] loss: 0.8145679831504822
Train epoch [1/10] loss: 0.9911481738090515
Train epoch [2/10] loss: 0.7941203713417053
Train epoch [3/10] loss: 0.8584253191947937
Train epoch [4/10] loss: 0.7546745538711548
Train epoch [5/10] loss: 0.9665728807449341
Train epoch [6/10] loss: 0.8961250185966492
Train epoch [7/10] loss: 0.6722370982170105
Train epoch [8/10] loss: 0.8010509610176086
Train epoch [9/10] loss: 0.7266693711280823


In [51]:
test_phrase = ['i', 'like', 'to', 'eat', 'apples']
test_phrase = [word2int[word] for word in test_phrase]
test_phrase = torch.LongTensor(test_phrase).unsqueeze(0)
test_phrase_len = [len(test_phrase[0])]

predicted = baseline_model(test_phrase, test_phrase_len)
predicted = predicted.view(-1,predicted.shape[-1])
predicted = torch.argmax(predicted, dim=1)
predicted = [int2tag[int(p)] for p in predicted]
print(predicted)


['PRP', 'IN', 'TO', 'VB', 'NNP']


# [Task 6 - 1.0 points] Error Analysis

You are tasked to evaluate your best performing model.

### Instructions

* Compare the errors made on the validation and test sets.
* Aggregate model errors into categories (if possible)
* Comment the about errors and propose possible solutions on how to address them.

# [Task 7 - 1.0 points] Report

Wrap up your experiment in a short report (up to 2 pages).

### Instructions

* Use the NLP course report template.
* Summarize each task in the report following the provided template.

### Recommendations

The report is not a copy-paste of graphs, tables, and command outputs.

* Summarize classification performance in Table format.
* **Do not** report command outputs or screenshots.
* Report learning curves in Figure format.
* The error analysis section should summarize your findings.

# Submission

* **Submit** your report in PDF format.
* **Submit** your python notebook.
* Make sure your notebook is **well organized**, with no temporary code, commented sections, tests, etc...
* You can upload **model weights** in a cloud repository and report the link in the report.

# FAQ

Please check this frequently asked questions before contacting us

### Trainable Embeddings

You are **free** to define a trainable or non-trainable Embedding layer to load the GloVe embeddings.

### Model architecture

You **should not** change the architecture of a model (i.e., its layers).

However, you are **free** to play with their hyper-parameters.

### Neural Libraries

You are **free** to use any library of your choice to implement the networks (e.g., Keras, Tensorflow, PyTorch, JAX, etc...)

### Keras TimeDistributed Dense layer

If you are using Keras, we recommend wrapping the final Dense layer with `TimeDistributed`.

### Error Analysis

Some topics for discussion include:
   * Model performance on most/less frequent classes.
   * Precision/Recall curves.
   * Confusion matrices.
   * Specific misclassified samples.

### Punctuation

**Do not** remove punctuation from documents since it may be helpful to the model.

You should **ignore** it during metrics computation.

If you are curious, you can run additional experiments to verify the impact of removing punctuation.

# The End