# Assignment 5: Neural Networks

---

## Task 2) RNN for Classification

The theses dataset also contains types (diploma, bachelor, master) and categories (internal/external) for each thesis. 
In this part, we want to classify whether the thesis is bachelor or master; and if it's internal or external. 
Since PyTorch provides most things sort-of out of the box, we want you to compare the following Recurrent Neural Network variation: 
[RNN](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html), [GRU](https://pytorch.org/docs/stable/generated/torch.nn.GRU.html), [LSTM](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html), and Bidirectional-[LSTM](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) by using the `bidirectional` flag.
The basic setup as well as some code and steps can be reused from your solution for the language modeling task.

### Data

Download the `theses.csv` data set from the `Supplemental Materials` in the `Files` section of our Microsoft Teams group.
This dataset consists of approx. 3,000 theses topics chosen by students in the past.
Here are some examples of the file content:

```
27.10.94;14.07.95;1995;intern;Diplom;DE;Monte Carlo-Simulation für ein gekoppeltes Round-Robin-System;
04.11.94;14.03.95;1995;intern;Diplom;DE;Implementierung eines Testüberdeckungsgrad-Analysators für RAS;
01.11.20;01.04.21;2021;intern;Bachelor;DE;Landessprachenerkennung mittels X-Vektoren und Meta-Klassifikation;
```

### Basic Setup

For the assignment on Recurrent Neural Networks, we'll (again) heavily use [PyTorch](https://pytorch.org) as go-to Deep Learning library.
Here, we'll rely on the RNN and Embedding modules already implemented by PyTorch.
You can imagine the Embedding layer as a simple lookup table that stores embeddings of a fixed dictionary and size (quite similar to the Word2Vec parameters we've trained in assignment 2).
Head over to the [RNN](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) and [Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html) modules to gain some understanding of their functionality.
Code for processing data samples, batching, converting to tensors, etc. can get messy and hard to maintain. 
Therefore, you can use PyTorch's [Datasets & DataLoaders](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html). 
Get familiar with the basics of data handling, as it will help you for upcoming assignments.
As always, you can use [NumPy](https://numpy.org) and [Pandas](https://pandas.pydata.org) for data handling etc.

*In this Jupyter Notebook, we will provide the steps to solve this task and give hints via functions & comments. However, code modifications (e.g., function naming, arguments) and implementation of additional helper functions & classes are allowed. The code aims to help you get started.*

---

In [930]:
# Dependencies
import os
import tqdm
import numpy as np
import pandas as pd
from typing import TypedDict, Iterator, Optional, TypeVar, Generic, Callable
from dataclasses import dataclass
import re
from functools import reduce
from abc import abstractmethod, ABC
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import KFold
import csv

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.nn.utils.rnn import pack_sequence, PackedSequence
from torch.optim import Optimizer, Adam

### Prepare the Data

1.1 Spend some time on preparing the dataset. It may be helpful to lower-case the data and to filter for German titles. The format of the CSV-file should be:

```
Anmeldedatum;Abgabedatum;JahrAkademisch;Art;Grad;Sprache;Titel;Abstract
```

1.2 Create the vocabulary from the prepared dataset. You'll need it for the modeling part such as nn.Embedding.

1.3 Filter out all diploma theses; they might be too easy to spot because they only cover "old" topics.

1.4 Create a PyTorch Dataset class which handles your tokenized data with respect to input and (class) labels.

In [931]:
@dataclass
class Thesis:
    registration_date: str
    due_date: str
    year_academic: int
    type: str
    degree: str
    language: str
    title: str
    abstract: str

class _Thesis(TypedDict):
    Anmeldedatum: str
    Abgabedatum: str
    JahrAkademisch: str
    Art: str
    Grad: str
    Sprache: str
    Titel: str
    Abstract: str

def to_thesis(thesis: _Thesis) -> Thesis:
    return Thesis(
        registration_date=thesis["Anmeldedatum"],
        due_date=thesis["Abgabedatum"],
        year_academic=int(thesis["JahrAkademisch"]),
        type=thesis["JahrAkademisch"],
        degree=thesis["Grad"],
        language=thesis["Sprache"],
        title=thesis["Titel"],
        abstract=thesis["Abstract"]
    )

def load_theses_dataset(filepath) -> pd.DataFrame:
    """Loads all theses instances and returns them as a dataframe."""
    ### YOUR CODE HERE
    
    lists = {key: [] for key in Thesis.__dataclass_fields__.keys()}
    with open(filepath, encoding="utf-8-sig") as fp:
        theses = map(to_thesis, csv.DictReader(fp.readlines(), delimiter=";")) # type: ignore
        for thesis in theses:
            for key in lists:
                lists[key].append(thesis.__dict__[key])
    return pd.DataFrame(lists)
    
    ### END YOUR CODE

In [932]:
### Notice: Think about start and end of sentence tokens

def tokenize(text: str) -> Iterator[str]:
    yield "<s>"
    for s in text.split():
        m = re.match(r"^(\w+)?([,\.?!])?$", s)
        if m is not None:
            if m.group(1) is not None:
                yield m.group(1).lower()
            if m.group(2) is not None:
                yield m.group(2)
    yield "</s>"

def preprocess(dataframe) -> tuple[list[list[str]], list[str]]:
    """Preprocesses and tokenizes the given theses titles for further use."""
    ### YOUR CODE HERE
    
    seqs = []
    degrees = []
    for i in range(len(dataframe)):
        if dataframe["language"][i] == "DE":
            seqs.append(list(tokenize(dataframe["title"][i])))
            degrees.append(dataframe["degree"][i])
    return seqs, degrees

    ### END YOUR CODE

In [933]:
THESES_DATASET_PATH = "../4-nnet/data/theses2022.csv"

dataframe = load_theses_dataset(THESES_DATASET_PATH)
tokenized_data, degrees = preprocess(dataframe)
vocabulary = {w for l in tokenized_data for w in l}
idx2word = sorted(list(vocabulary))
word2idx = {w: i for i, w in enumerate(idx2word)}
unique_degrees = sorted(list(set(degrees)))

In [934]:
### TODO: 1.3 Implement the PyTorch theses dataset
### Notice: It is possible to solve the task without this class.
### Notice: However, with respect to DataLoaders it makes your life easier.

### YOUR CODE HERE

class ThesesDataset(Dataset):
    @property
    def dtype(self) -> torch.dtype:
        return self.__dtype
    
    @property
    def voc_size(self) -> int:
        return len(self.__word2idx)
    
    @property
    def class_count(self) -> int:
        return len(self.__unique_degrees)

    def __init__(self, sequences: list[list[str]], degrees:list[str], word2idx: dict[str, int], unique_degrees: list[str], dtype: torch.dtype = torch.float32):
        self.__sequences = sequences
        self.__degrees = degrees
        self.__word2idx = word2idx
        self.__unique_degrees = unique_degrees
        self.__dtype = dtype


    def __len__(self):
        return len(self.__sequences)

    def __getitem__(self, idx: slice | int) -> tuple[torch.Tensor | PackedSequence, torch.Tensor]:
        if isinstance(idx, int):
            return self.__get_single(idx)
        else:
            return self.__get_multiple(idx)
    
    def __get_single(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
        seq = self.__sequences[idx]
        x = torch.tensor([self.__word2idx[i] for i in seq], dtype=torch.int32)
        y = torch.zeros(self.class_count, dtype=self.dtype)
        y[self.__unique_degrees.index(self.__degrees[idx])] = 1
        return x, y
    
    def __get_multiple(self, idcs: slice) -> tuple[PackedSequence, torch.Tensor]:
        return ThesesDataset.collate([self.__get_single(i) for i in range(idcs.start, idcs.stop, idcs.step)])
    
    def loader(self, batch_size: int) -> DataLoader:
        return DataLoader(self, batch_size, True, collate_fn=ThesesDataset.collate)    
    
    @staticmethod
    def collate(tups: list[tuple[torch.Tensor, torch.Tensor]]) -> tuple[PackedSequence, torch.Tensor]:
        tups.sort(key=lambda tup: tup[0].shape[0], reverse=True)
        xs = []
        ys = []
        for tup in tups:
            xs.append(tup[0])
            ys.append(tup[1])
        return pack_sequence(xs), torch.vstack(ys)

### END YOUR CODE

### Train and Evaluate

2.1 Implement the RNN for Classification. Therefore, you can use the nn.Module and overwrite the forward function.

2.2 Train and evaluate your models with 5-fold cross-validation. As in RNN-LM, you can either learn the embeddings from scratch or reuse the ones from word2vec.

2.3 Assemble a table: Recall/Precision/F1 measure for each of the mentioned RNN variants (RNN, GRU, LSTM). Which one works best?

2.4 Bonus: Apply your best classifier to the remaining diploma theses; are those on average more bachelor or master? :-)

In [935]:
### TODO: 2.1 Implement the RNN classifier (nn.Module)
### Notice: Think about padding for batch sizes > 1
### Notice: 'torch.nn.utils.rnn' provides functionality

### YOUR CODE HERE

class RNNClassifierBase(nn.Module, ABC):
    @property
    def device(self) -> torch.device:
        return self.__device
    
    @device.setter
    def device(self, value: str | torch.device):
        if isinstance(value, str):
            value = torch.device(value)
        self.__device = value
        self.to(self.device)

    @property
    def dtype(self) -> torch.dtype:
        return self.__dtype
    
    def __init__(self, voc_size: int, embedding_dim: int, device: torch.device, dtype: torch.dtype, **kwargs):
        super(RNNClassifierBase, self).__init__(**kwargs)        
        self.__device = device
        self.__dtype = dtype
        self.embeddings = nn.Embedding(voc_size, embedding_dim, device=device, dtype=dtype)


class RNN_Classifier(RNNClassifierBase):    
    def __init__(self, voc_size: int, class_count, embedding_dim: int, hidden_layer_size: int, hidden_layer_count: int, device: torch.device, dtype: torch.dtype = torch.float32,  **kwargs):
        super(RNN_Classifier, self).__init__(voc_size, embedding_dim, device, dtype, **kwargs)
        self.__hidden_layer_size = hidden_layer_size
        self.hidden = nn.ModuleList()
        self.hidden.append(nn.Linear(embedding_dim + hidden_layer_size, hidden_layer_size, True, device, dtype))
        for _ in range(hidden_layer_count - 1):
            self.hidden.append(nn.Linear(2 * hidden_layer_size, hidden_layer_size, True, device, dtype))
        self.classification_head = nn.Linear(hidden_layer_size, class_count, True, device, dtype)
    
    def forward(self, X: torch.Tensor | PackedSequence) -> torch.Tensor:
        if isinstance(X, PackedSequence):
            hidden_states = [torch.zeros((int(X.batch_sizes[0].item()), self.__hidden_layer_size), device=self.device, dtype=self.dtype) for _ in range(len(self.hidden))] 
            word_embeddings = self.embeddings(X.data)
            start = 0
            for batch_size in X.batch_sizes:
                stop = start + batch_size
                x = word_embeddings[start:stop, :]
                new_hidden_states = [self.__update_hidden(0, x, hidden_states[0])]
                for i in range(1, len(self.hidden)):
                    new_hidden_states.append(self.__update_hidden(i, new_hidden_states[-1][:batch_size, :], hidden_states[i]))
                hidden_states = new_hidden_states
                start = stop
            return self.classification_head(hidden_states[-1])
        else:
            hidden_states = [torch.zeros(self.__hidden_layer_size, device=self.device, dtype=self.dtype) for _ in range(len(self.hidden))]
            word_embeddings = self.embeddings(X)
            for i in range(X.shape[0]):
                new_hidden_states = [self.__update_hidden(0, word_embeddings[i, :], hidden_states[0])]
                for i in range(1, len(self.hidden)):
                    new_hidden_states.append(self.__update_hidden(i, new_hidden_states[-1], hidden_states[i]))
                hidden_states = new_hidden_states
            return self.classification_head(hidden_states[-1])
    
    def __update_hidden(self, i: int, x: torch.Tensor, prior: torch.Tensor) -> torch.Tensor:
        if len(prior.shape) == 2:
            return torch.vstack([
                F.relu(self.hidden[i](torch.hstack([x, prior[0:x.shape[0], :]]))),
                prior[x.shape[0]:, :]
            ])
        else:
            return F.relu(self.hidden[i](torch.hstack([x, prior])))

def _get_features_for_classification(X: torch.Tensor | PackedSequence, bidirectional: bool) -> torch.Tensor:
    if isinstance(X, PackedSequence):
        feature_list = []
        total = 0
        i = X.batch_sizes.shape[0] - 1
        stop = X.data.shape[0]
        if bidirectional:
            features_per_dir = X.data.shape[1]//2
            while total < X.batch_sizes[0]:
                start = stop - X.batch_sizes[i] + total
                count = stop - start
                if count != 0:
                    total += count
                    feature_list.append(X.data[start:stop, :features_per_dir])
                stop -= X.batch_sizes[i]
                i -= 1
            return torch.hstack([
                torch.vstack(feature_list),
                X.data[:X.batch_sizes[0], features_per_dir:]
            ])
        else:
            while total < X.batch_sizes[0]:
                start = stop - X.batch_sizes[i] + total
                count = stop - start
                if count != 0:
                    total += count
                    feature_list.append(X.data[start:stop, :])
                stop -= X.batch_sizes[i]
                i -= 1
            return torch.vstack(feature_list)
    else:
        if bidirectional:
            features_per_dir = X.data.shape[1]//2
            return torch.hstack([X.data[-1, :features_per_dir], X.data[0, features_per_dir:]])
        else:
            return X.data[-1, :]

class LSTMClassifier(RNNClassifierBase):    
    @property
    def bidirectional(self) -> bool:
        return self.__bidirectional
    
    def __init__(self, voc_size: int, class_count, embedding_dim: int, hidden_layer_size: int, hidden_layer_count: int, bidirectional: bool, device: torch.device, dtype: torch.dtype = torch.float32,  **kwargs):
        super(LSTMClassifier, self).__init__(voc_size, embedding_dim, device, dtype, **kwargs)
        self.__bidirectional = bidirectional
        self.lstm = nn.LSTM(embedding_dim, hidden_layer_size, hidden_layer_count, bidirectional=bidirectional, device=device, dtype=dtype)
        self.classificaiton_head = nn.Linear((2 if self.bidirectional else 1) * hidden_layer_size, class_count, True, device, dtype)

    def forward(self, X: torch.Tensor | PackedSequence) -> torch.Tensor:
        if isinstance(X, PackedSequence):
            word_embeddings = self.embeddings(X.data)
            lstm_out, _ = self.lstm(PackedSequence(word_embeddings, X.batch_sizes, None, None))
        else:
            word_embeddings = self.embeddings(X.data)
            lstm_out, _ = self.lstm(word_embeddings)
        return self.classificaiton_head(_get_features_for_classification(lstm_out, self.bidirectional))
        
class GRUClassifier(RNNClassifierBase):    
    @property
    def bidirectional(self) -> bool:
        return self.__bidirectional

    def __init__(self, voc_size: int, class_count, embedding_dim: int, hidden_layer_size: int, hidden_layer_count: int, bidirectional: bool, device: torch.device, dtype: torch.dtype = torch.float32,  **kwargs):
        super(GRUClassifier, self).__init__(voc_size, embedding_dim, device, dtype, **kwargs)
        self.__bidirectional = bidirectional
        self.gru = nn.GRU(embedding_dim, hidden_layer_size, hidden_layer_count, bidirectional=bidirectional, device=device, dtype=dtype)
        self.classification_head = nn.Linear((2 if self.bidirectional else 1) * hidden_layer_size, class_count, device=device, dtype=dtype)

    def forward(self, X: torch.Tensor | PackedSequence) -> torch.Tensor:
        if isinstance(X, PackedSequence):
            word_embeddings = self.embeddings(X.data)
            gru_out, _ = self.gru(PackedSequence(word_embeddings, X.batch_sizes, None, None))
        else:
            word_embeddings = self.embeddings(X)
            gru_out, _ = self.gru(word_embeddings)
        return self.classification_head(_get_features_for_classification(gru_out, self.bidirectional))
        
### END YOUR CODE

In [936]:
### TODO: 2.2 Implement the train functionality
### Notice: If you want, you can also combine train and eval functionality

def train(model: RNNClassifierBase, loader: DataLoader, loss_fn: Callable[[torch.Tensor, torch.Tensor], torch.Tensor], opt: Optimizer):
    """Trains the RNN-Classifier for one epoch."""
    ### YOUR CODE HERE

    batch_count = len(loader)
    running_loss = 0.0
    for i, (X, Y) in enumerate(loader):
        X = X.to(model.device)
        Y = Y.to(model.device)
        print(f"\r  training batch {i+1}/{batch_count}", end="")
        opt.zero_grad()
        logits = model(X)
        loss = loss_fn(logits, Y)
        running_loss += loss.item()
        loss.backward()
        opt.step()
    print()
    print(f"  average loss: {running_loss/batch_count}")

    ### END YOUR CODE

In [937]:
### TODO: 2.2 Implement the evaluation functionality
### Notice: If you want, you can also combine train and eval

def eval(model: nn.Module, loader: DataLoader, loss_fn: Callable[[torch.Tensor, torch.Tensor], torch.Tensor]) -> float:
    """Evaluates the optimized RNN-Classifier."""
    ### YOUR CODE HERE

    batch_count = len(loader)
    running_loss = 0.0
    Y_pred = []
    Y_true = []
    with torch.no_grad():
        for i, (X, Y) in enumerate(loader):
            X = X.to(model.device)
            Y = Y.to(model.device)
            print(f"\r  evaluating batch {i+1}/{batch_count}", end="")
            logits = model(X)
            running_loss += loss_fn(logits, Y).item()
            for y in torch.argmax(logits, 1).tolist():
                Y_pred.append(y)
            for y in torch.argmax(Y, 1).tolist():
                Y_true.append(y)
    C = confusion_matrix(Y_true, Y_pred)
    acc = (C.diagonal().sum() / C.sum())
    prec = (C.diagonal() / C.sum(0) + 1e-128)
    rec = (C.diagonal() / C.sum(1) + 1e-128)
    f1 = 2 / ((1 / prec) + (1 / rec))
    print(f"  average loss: {running_loss/batch_count}")
    print(f"  accuracy:         {acc.item()}")
    print(f"  precision scores: {prec.tolist()}")
    print(f"  recall scores:    {rec.tolist()}")
    print(f"  F1 scores:        {f1.tolist()}")
    return np.mean(f1).item()

    ### END YOUR CODE

In [938]:
### TODO: 2.2 Initialize and train the RNN-Classifier for X epochs

best_model_f1 = -1.0
best_model_class = ""
best_model_fold = -1
best_model = None
best_model_bidirectional = False

# For split reproducibility
# Use 5-fold cross validation
SEED = 42

NUM_FOLDS = 5

bachelor_indices = []
master_indices = []
diploma_indices = []
for i, d in enumerate(degrees):
    if d == "Bachelor":
        bachelor_indices.append(i)
    elif d == "Master":
        master_indices.append(i)
    elif d == "Diplom":
        diploma_indices.append(i)

fold_idcs_train = [[] for _ in range(NUM_FOLDS)]#
fold_idcs_test = [[] for _ in range(NUM_FOLDS)]

k_fold = KFold(5, shuffle=True, random_state=SEED)

for l in [bachelor_indices, master_indices]:
    for i, (train_idcs, test_idcs) in enumerate(k_fold.split(l)):
        for j in train_idcs:
            fold_idcs_train[i].append(l[j])
        for j in test_idcs:
            fold_idcs_test[i].append(l[j])

EPOCHS = 20

DEVICE = "cuda:0" # 'cpu', 'mps' or 'cuda'

BATCH_SIZE = 256

EMBEDDING_DIM = 64

HIDDEN_LAYER_SIZE = 32
HIDDEN_LAYER_COUNT = 1

### YOUR CODE HERE

for i in range(NUM_FOLDS):

    # Use batch_size=1 if you want to avoid padding handling
    train_dataset = ThesesDataset(
        [tokenized_data[j] for j in fold_idcs_train[i]],
        [degrees[j] for j in fold_idcs_train[i]],
        word2idx,
        ["Bachelor", "Master"]
    )
    train_dataloader = train_dataset.loader(BATCH_SIZE)

    # Use batch_size=1 if you want to avoid padding handling
    test_dataset = ThesesDataset(
        [tokenized_data[j] for j in fold_idcs_test[i]],
        [degrees[j] for j in fold_idcs_test[i]],
        word2idx,
        ["Bachelor", "Master"]
    )
    test_dataloader = test_dataset.loader(BATCH_SIZE)

    # Your language model
    model = RNN_Classifier(len(word2idx), 2, EMBEDDING_DIM, HIDDEN_LAYER_SIZE, HIDDEN_LAYER_COUNT, torch.device(DEVICE))

    # Your loss function
    criterion = nn.CrossEntropyLoss(torch.tensor([0.25,0.75], device=model.device))

    # Your optimizer (optim.SGD should be okay)
    optimizer = Adam(model.parameters())

    print("##########")
    print(f"# Fold {i+1} #")
    print("##########\n")


    # TODO: Training for epoch i

    for e in range(EPOCHS):
        print(f"training epoch {e+1}/{EPOCHS}...")
        train(model, train_dataloader, criterion, optimizer)


    # TODO: Evaluation for epoch i

    print("evaluating model...")
    f1 = eval(model, test_dataloader, criterion)
    if f1 > best_model_f1:
        print("new best model found!")
        best_model_f1 = f1
        best_model_fold = i
        best_model = model
        best_model_class = best_model.__class__.__name__
        
    print()

### END YOUR CODE

##########
# Fold 1 #
##########

training epoch 1/20...
  training batch 7/7
  average loss: 0.2602261837039675
training epoch 2/20...
  training batch 7/7
  average loss: 0.2474031490939004
training epoch 3/20...
  training batch 7/7
  average loss: 0.24686498727117265
training epoch 4/20...
  training batch 7/7
  average loss: 0.24558061787060328
training epoch 5/20...
  training batch 7/7
  average loss: 0.2441821907247816
training epoch 6/20...
  training batch 7/7
  average loss: 0.24627713433333806
training epoch 7/20...
  training batch 7/7
  average loss: 0.24379871147019522
training epoch 8/20...
  training batch 7/7
  average loss: 0.24206108919211797
training epoch 9/20...
  training batch 7/7
  average loss: 0.24054197541304997
training epoch 10/20...
  training batch 7/7
  average loss: 0.23919914875711715
training epoch 11/20...
  training batch 7/7
  average loss: 0.23579312222344534
training epoch 12/20...
  training batch 7/7
  average loss: 0.23369149012225016
traini

In [939]:
### TODO: 2.3 Compare the results of various RNN variants (classification metrics)

### YOUR CODE HERE

MODEL_FACTORIES = [
    lambda: LSTMClassifier(len(word2idx), 2, EMBEDDING_DIM, HIDDEN_LAYER_SIZE, HIDDEN_LAYER_COUNT, False, device=torch.device(DEVICE)),
    lambda: LSTMClassifier(len(word2idx), 2, EMBEDDING_DIM, HIDDEN_LAYER_SIZE, HIDDEN_LAYER_COUNT, True, device=torch.device(DEVICE)),
    lambda: GRUClassifier(len(word2idx), 2, EMBEDDING_DIM, HIDDEN_LAYER_SIZE, HIDDEN_LAYER_COUNT, False, device=torch.device(DEVICE)),
    lambda: GRUClassifier(len(word2idx), 2, EMBEDDING_DIM, HIDDEN_LAYER_SIZE, HIDDEN_LAYER_COUNT, True, device=torch.device(DEVICE))
]

for model_factory in MODEL_FACTORIES:
    for i in range(NUM_FOLDS):
        train_dataset = ThesesDataset(
            [tokenized_data[j] for j in fold_idcs_train[i]],
            [degrees[j] for j in fold_idcs_train[i]],
            word2idx,
            ["Bachelor", "Master"]
        )
        train_dataloader = train_dataset.loader(BATCH_SIZE)

        test_dataset = ThesesDataset(
            [tokenized_data[j] for j in fold_idcs_test[i]],
            [degrees[j] for j in fold_idcs_test[i]],
            word2idx,
            ["Bachelor", "Master"]
        )
        test_dataloader = test_dataset.loader(BATCH_SIZE)

        model = model_factory()

        criterion = nn.CrossEntropyLoss(torch.tensor([0.2,0.8], device=model.device))

        optimizer = Adam(model.parameters())

        print("##########")
        print(f"# Fold {i+1} #")
        print("##########\n")

        for e in range(EPOCHS):
            print(f"training epoch {e+1}/{EPOCHS}...")
            train(model, train_dataloader, criterion, optimizer)

        print("evaluating model...")
        f1 = eval(model, test_dataloader, criterion)
        if f1 > best_model_f1:
            print("new best model found!")
            best_model_f1 = f1
            best_model_fold = i
            best_model = model
            best_model_class = best_model.__class__.__name__
            best_model_bidirectional = model.bidirectional
        
        print()

print("\nBEST MODEL:")
print(f"  type:          {best_model_class}")
print(f"  bidirectional: {best_model_bidirectional}")
print(f"  fold:          {best_model_fold}")
print(f"  average F1:    {best_model_f1}")

### END YOUR CODE

##########
# Fold 1 #
##########

training epoch 1/20...
  training batch 7/7
  average loss: 0.228196063211986
training epoch 2/20...
  training batch 7/7
  average loss: 0.22709385199206217
training epoch 3/20...
  training batch 7/7
  average loss: 0.22598325780459813
training epoch 4/20...
  training batch 7/7
  average loss: 0.22532422627721513
training epoch 5/20...
  training batch 7/7
  average loss: 0.2237403861113957
training epoch 6/20...
  training batch 7/7
  average loss: 0.22256567861352647
training epoch 7/20...
  training batch 7/7
  average loss: 0.21928690373897552
training epoch 8/20...
  training batch 7/7
  average loss: 0.2169070690870285
training epoch 9/20...
  training batch 7/7
  average loss: 0.2114428792681013
training epoch 10/20...
  training batch 7/7
  average loss: 0.20591068054948533
training epoch 11/20...
  training batch 7/7
  average loss: 0.19689242328916276
training epoch 12/20...
  training batch 7/7
  average loss: 0.1897202879190445
training 

In [940]:
### TODO: 2.4 (Optional) Apply your best classifier to the diploma theses

### YOUR CODE HERE

diploma_dataset = ThesesDataset([tokenized_data[i] for i in diploma_indices], [degrees[i] for i in diploma_indices], word2idx, ["Diplom"])
diploma_dataloader = diploma_dataset.loader(2048)
with torch.no_grad():
    results = torch.argmax(torch.vstack([best_model(x.to(best_model.device)) for (x, _) in diploma_dataloader]), 1).tolist()
bachelor_count = results.count(0)
master_count = results.count(1)
print(f"{bachelor_count} diploma theses classified as bachelors thesis, {master_count} diploma theses classified as masters thesis")

### END YOUR CODE

632 diploma theses classified as bachelors thesis, 224 diploma theses classified as masters thesis
