# Homework 4. Text Classification with CNN

Welcome to Homework 4! 

The homework contains several tasks. You can find the amount of points that you get for the correct solution in the task header. Maximum amount of points for each homework is _seven_ + _one_ bonus point.

The **grading** for each task is the following:
- correct answer - **full points**
- insufficient solution or solution resulting in the incorrect output - **half points**
- no answer or completely wrong solution - **no points**

Even if you don't know how to solve the task, we encourage you to write down your thoughts and progress and try to address the issues that stop you from completing the task.

When working on the written tasks, try to make your answers short and accurate. Most of the times, it is possible to answer the question in 1-3 sentences.

When writing code, make it readable. Choose appropriate names for your variables (`a = 'cat'` - not good, `word = 'cat'` - good). Avoid constructing lines of code longer than 100 characters (79 characters is ideal). If needed, provide the commentaries for your code, however, a good code should be easily readable without them :)

Finally, all your answers should be written only by yourself. If you copy them from other sources it will be considered as an academic fraud. You can discuss the tasks with your classmates but each solution must be individual.

<font color='red'>**Important!:**</font> **before sending your solution, do the `Kernel -> Restart & Run All` to ensure that all your code works.**

In [2]:
import torch
from torch.utils.data import Dataset, DataLoader
from torch.utils.data.sampler import SubsetRandomSampler
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import random
import numpy as np

from pathlib import Path
import time
import json

# from: https://spacy.io/api/tokenizer
from spacy.lang.en import English
nlp = English()
# Create a Tokenizer with the default settings for English
# including punctuation rules and exceptions
tokenizer = nlp.Defaults.create_tokenizer(nlp)

# Check if we are running on a CPU or GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

## About the homework

This homework is based on the [Lab 5]. You can freely use all the code and materials from this Lab to solve this homework. 

In the Lab, we looked at the binary classification problem. In other words, we had to predict only two classes. In this homework, you are going to learn about multi-label classification. This means that predictions can contain various number of labels. For exapmle, one movie can be drama, action, and murder at the same time while some other movie is just comedy. We achieve this by transforming the task into multiple binary classification problems. Basically, we are saying for each label if it is present or not.

This will require for you to change the data loader and the model. We are going to introduce new metrics to evaluate the model as well.

This time, you are going to work with [MPST: A Corpus of Movie Plot Synopses with Tags](http://ritual.uh.edu/mpst-2018/), the same dataset that we used in the previous Homework.

__We strongly recommend you to do this homework in Google Colab if you don't have a GPU on your machine!__

## Task 0. Download the data (0 points)

Let's download the vector file and unpack in to the `vector_cache/` folder. You can skip this step if you have already done it yourself.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
!wget https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip

--2021-04-06 20:34:07--  https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.74.142, 104.22.75.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.74.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 681808098 (650M) [application/zip]
Saving to: ‘wiki-news-300d-1M.vec.zip’


2021-04-06 20:34:19 (57.0 MB/s) - ‘wiki-news-300d-1M.vec.zip’ saved [681808098/681808098]



In [4]:
!unzip wiki-news-300d-1M.vec.zip -d vector_cache/

Archive:  wiki-news-300d-1M.vec.zip
  inflating: vector_cache/wiki-news-300d-1M.vec  


Let's download the dataset and unpack in to the `MPST/` folder. You can skip this step if you have already done it yourself.

In [5]:
!wget http://ritual.uh.edu/wp-content/uploads/projects/mpst_2018/MPST.7z

--2021-04-06 20:34:55--  http://ritual.uh.edu/wp-content/uploads/projects/mpst_2018/MPST.7z
Resolving ritual.uh.edu (ritual.uh.edu)... 129.7.248.228
Connecting to ritual.uh.edu (ritual.uh.edu)|129.7.248.228|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ritual.uh.edu/wp-content/uploads/projects/mpst_2018/MPST.7z [following]
--2021-04-06 20:34:55--  https://ritual.uh.edu/wp-content/uploads/projects/mpst_2018/MPST.7z
Connecting to ritual.uh.edu (ritual.uh.edu)|129.7.248.228|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41210074 (39M)
Saving to: ‘MPST.7z’


2021-04-06 20:34:59 (10.6 MB/s) - ‘MPST.7z’ saved [41210074/41210074]



In [6]:
!7z x MPST.7z


7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) CPU @ 2.20GHz (406F0),ASM,AES-NI)

Scanning the drive for archives:
  0M Scan         1 file, 41210074 bytes (40 MiB)

Extracting archive: MPST.7z
--
Path = MPST.7z
Type = 7z
Physical Size = 41210074
Headers Size = 423877
Method = LZMA:23
Solid = +
Blocks = 1

  0%      4% 774 - MPST/final_plots_wiki_imdb_combined/raw/tt0386789.txt                                                                  4% 1438 - MPST/final_plots_wiki_imdb_combined/raw/tt0083156.txt                                                                   7% 1819 - MPST/final_plo

We are going to define some variables that we are going to need later. 

We will need the `<PAD>` and `<UNK>` symbols. `<PAD>` is needed to make the sentences in one batch have the same length. We are going to prepend this symbol to the end of each sentence to equalize the lengths. `<UNK>` is needed to replace the words for which we don't have a pretrained vector.

We are also going to define the paths for our vector file and data folder, as well as maximum numer of vectors that we want to store.

In [7]:
PAD = '<PAD>'
PAD_ID = 0
UNK = '<UNK>'
UNK_ID = 1
VOCAB_PREFIX = [PAD, UNK]

VEC_PATH = Path('vector_cache') / 'wiki-news-300d-1M.vec'
DATA_PATH = Path('MPST')
MAX_VOCAB = 25000

batch_size = 64
validation_split = .3
shuffle_dataset = True
random_seed = 42

First, let's prepare a vocabulary for our pretrained vectors and labels. Since the input to our model should be an index of a word, we need to build it to map from words to indices.

Below, we define a `BaseVocab` class that is going to be a base class for our pretrained and label vocabularies. We also define some methods that we are going to use:

- `normalize_unit()` to put the word to lowercase if `lower` argument is set to `True`.
- `unit2id()` to return the index of a word in the vocab or an `<UNK>` index otherwise.
- `id2unit()` to return a word given its index in the vocab.
- `map()` to return a list of indeces given a list of words.
- `build_vocab()` to initialize the vocab (is going to be implemented in respective classes).

In [8]:
class BaseVocab:
    def __init__(self, data, lower=False):
        self.data = data
        self.lower = lower
        self.build_vocab()
        
    def normalize_unit(self, unit):
        if self.lower:
            return unit.lower()
        else:
            return unit
        
    def unit2id(self, unit):
        unit = self.normalize_unit(unit)
        if unit in self._unit2id:
            return self._unit2id[unit]
        else:
            return self._unit2id[UNK]
    
    def id2unit(self, id):
        return self._id2unit[id]
    
    def map(self, units):
        return [self.unit2id(unit) for unit in units]
        
    def build_vocab(self):
        NotImplementedError()
        
    def __len__(self):
        return len(self._unit2id)

In [9]:
class PretrainedWordVocab(BaseVocab):
    def build_vocab(self):
        self._id2unit = VOCAB_PREFIX + self.data
        self._unit2id = {w:i for i, w in enumerate(self._id2unit)}

In [10]:
class LabelVocab(BaseVocab):
    def build_vocab(self):
        self._id2unit = self.data
        self._unit2id = {w:i for i, w in enumerate(self._id2unit)}

Next, we need to create the `Pretrain` class to store the pretrained vectors and vocab that we defined above. The vectors are going to be stored in as a numpy array.

In [11]:
class Pretrain:
    def __init__(self, vec_filename, max_vocab=-1):
        self._vec_filename = vec_filename
        self._max_vocab = max_vocab
        
    @property
    def vocab(self):
        if not hasattr(self, '_vocab'):
            self._vocab, self._emb = self.read()
        return self._vocab
    
    @property
    def emb(self):
        if not hasattr(self, '_emb'):
            self._vocab, self._emb = self.read()
        return self._emb
        
    def read(self):
        if self._vec_filename is None:
            raise Exception("Vector file is not provided.")
        print(f"Reading pretrained vectors from {self._vec_filename}...")
        
        words, emb, failed = self.read_from_file(self._vec_filename, open_func=open)
        
        if failed > 0: # recover failure
            emb = emb[:-failed]
        if len(emb) - len(VOCAB_PREFIX) != len(words):
            raise Exception("Loaded number of vectors does not match number of words.")
            
        # Use a fixed vocab size
        if self._max_vocab > len(VOCAB_PREFIX) and self._max_vocab < len(words):
            words = words[:self._max_vocab - len(VOCAB_PREFIX)]
            emb = emb[:self._max_vocab]
                
        vocab = PretrainedWordVocab(words, lower=True)
        
        return vocab, emb
        
    def read_from_file(self, filename, open_func=open):
        """
        Open a vector file using the provided function and read from it.
        """
        first = True
        words = []
        failed = 0
        with open_func(filename, 'rb') as f:
            for i, line in enumerate(f):
                try:
                    line = line.decode()
                except UnicodeDecodeError:
                    failed += 1
                    continue
                if first:
                    # the first line contains the number of word vectors and the dimensionality
                    first = False
                    line = line.strip().split(' ')
                    rows, cols = [int(x) for x in line]
                    emb = np.zeros((rows + len(VOCAB_PREFIX), cols), dtype=np.float32)
                    continue

                line = line.rstrip().split(' ')
                emb[i+len(VOCAB_PREFIX)-1-failed, :] = [float(x) for x in line[-cols:]]
                words.append(' '.join(line[:-cols]))
        return words, emb, failed

## Task 1. Define the dataset (3 points)

Finally, we need to define the dataset class `MPSTDataSet` that is going to load and preprocess our data files. Inside the data folder, we have different files that are going to help us.

### Task 1.1. Read the file names (0.25 points)
First, there are `test_ids.txt` and `train_ids.txt` files that contain the names of the files for test and train splits correspondingly. We are going to read them in the `get_filenames()` method and return as a list of strings.

### Task 1.2. Read the labels (0.25 points)
Then, we have all the names of the labels inside the `tag_assignment_data/tag_list.txt` file. Read this file line by line and return all the labels as a list of strings in the `get_labels()` method.

### Task 1.3. Read the label mappings (0.25 points)
Next, we have all the corresponding labels for the filename inside the `tag_assignment_data/movie_to_label_name.json` file. Read this file with `json.load()` and return the resulting dictionary in the `get_movie_label_mapping()` method.

### Task 1.4. Read the dataset (1.75 points)
Finally, read the dataset in the `load()` method. The texts are stored in the `final_plots_wiki_imdb_combined/cleaned/` folder. Each file has the name that we got in the `get_filenames()` method with `.txt` extension. For each file name, returned by the `get_filenames()` method, get the corresponding label from the `self.label_mappings` dictionary. Then, open the corresponding file by adding `.txt` to the end of the filename. Read all the contents and tokenize it with `tokenizer()` function defined in the beginning of this notebook. After that, convert all the tokens into indicies with `self.pretrain_vocab.map()` and store it in the `text` variable. Convert the indicies to the `torch.LongTensor`. Do the same with the labels and `self.label_vocab.map()` and store it in the `label` variable. Convert the label indicies to the `torch.FloatTensor`. 

### Task 1.5. Transform the labels (0.5 points)
Since we have a multi-label classification task, we need to convert the labels into a one-hot tensor. This tensor is going to have the length of our label vocabulary. It will have `0s` in all positions excepts for `1s` in the labels positions. For example, if our labels are `[3, 5]` and we have `7` labels in total, the resulting vector should be `[0, 0, 0, 1, 0, 1, 0]`. 

_Hint_: You can initialize a zero vector with `torch.zeros` and then fill it with `torch.scatter_`, using the `label` tensor defined before.

We also need our custom class to inherit from the `torch.utils.data.Dataset` class. Finally, we need to define the `__len__()` method to know how big is our dataset and `__getitem__()` method to get one sample at a given index.

In [45]:
class MPSTDataSet(Dataset):
    def __init__(self, pretrain, label_vocab=None, data_path='.data', test=False):
        self.pretrain_vocab = pretrain.vocab
        self.data_path = data_path
        self.test = test
        self.data = []

        if label_vocab is None:
            labels = self.get_labels()
            self.label_vocab = LabelVocab(labels)
        else:
            self.label_vocab = label_vocab
        
        self.label_mappings = self.get_movie_label_mapping()
        
        if self.data_path.exists():
            self.load()
        else:
            raise ValueError("Data path doesn't exist!")

        self.data = sorted(self.data, key=lambda x: len(x[0]), reverse=True)
        
    def get_filenames(self):
      ids = []
      filespath = self.data_path
      if self.test:
        with open(filespath / 'test_ids.txt', 'r') as f:
          testids = [testid.rstrip('\n') for testid in f]
          return testids
      else:
        with open(filespath / 'train_ids.txt', 'r') as f:
          trainids = [trainid.rstrip('\n') for trainid in f]
          return trainids

    def get_labels(self):
      label_list = []
      labelspath = self.data_path / 'tag_assignment_data'
      with open(labelspath / 'tag_list.txt', 'r') as f:
        for labelname in f:
          label_list.append(labelname.rstrip('\n'))
      return label_list

    def get_movie_label_mapping(self):
      mapping_path = self.data_path / 'tag_assignment_data' 
      f = open(mapping_path / 'movie_to_label_name.json',)
      mapping_dict = json.load(f)
      return mapping_dict

    def load(self):
        text_path = self.data_path / 'final_plots_wiki_imdb_combined' / 'cleaned'
        for fname in self.get_filenames():
          with open(text_path / (fname+'.txt'), 'r') as f:
            label_mapping = self.label_mappings[fname]
            tokenized_text = [ token.text for token in  tokenizer(f.read()) ]
            text = torch.LongTensor(self.pretrain_vocab.map(tokenized_text))
            label = torch.FloatTensor(self.label_vocab.map(label_mapping))
            indices = label.long()#This is for scatter to work. I used float tensor at first, but scatter wants indices to be int.
            label_transformed = torch.zeros(len(self.label_vocab)).scatter_(0, indices, 1)
            label_transformed = torch.FloatTensor(label_transformed)

            self.data.append((text, label_transformed))
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        return self.data[idx]

Additionally, we need to define a funciton to pad all the sentences in the batch to the same length. To do this, we are going to first find the longest sequence in the batch and use its length to create a torch tensor of size `(batch_size, max_len)` filled with `0` that is our padding id. Later, we are just going to append each sequence in the beginning of the corresponding row of our new batch tensor. Don't forget that `nn.Embedding` layer that we are going to use later requires indices to be of type `long`. We are also going to put the labels to the `labels` tensor of size `(batch_size, n_labels)`. To be able to use them to calculate the loss, each label must be of type `float`.

Finally, don't forget to convert all the tensors to the current device with `.to(device)`.

In [46]:
def pad_sequences(batch):
    max_len = max([len(x[0]) for x in batch])
    padded_sequences = torch.zeros((len(batch), max_len), dtype=torch.long)
    labels = torch.zeros((len(batch), len(batch[0][1])), dtype=torch.float)

    for i, sample in enumerate(batch):
      padded_sequences[i, :len(sample[0])] = sample[0]
      labels[i, :] = sample[1]

    padded_sequences = padded_sequences.to(device)
    labels = labels.to(device)
    
    return padded_sequences, labels

Now, we can finally load our data and pretrained vectors. It will take some time...

In [47]:
pretrain = Pretrain(VEC_PATH, MAX_VOCAB)

In [48]:
train_data = MPSTDataSet(pretrain, data_path=DATA_PATH)

Reading pretrained vectors from vector_cache/wiki-news-300d-1M.vec...


In [49]:
test_data = MPSTDataSet(pretrain, train_data.label_vocab, DATA_PATH, test=True)

The last step in our data preparation is to define the train and validation splits. We are going to use the validation set to see how the model performs during the training. It is important to be able to see if the model is overfitting or not.

To do that, we will just create a range of indices from `0` to the size of the training data. Then, we are going to define an index on which we are going to splite the data. Optionally, we can shuffle our indices before splitting.

With these indices for train and validation datasets, we are going to create two corresponding `torch.utils.data.SubsetRandomSampler` objects that we are going to pass to the `torch.utils.data.DataLoader` objects in the next step.

In [50]:
# Creating data indices for training and validation splits:
dataset_size = len(train_data)
indices = list(range(dataset_size))
split = int(np.floor(validation_split * dataset_size))
if shuffle_dataset:
    np.random.seed(random_seed)
    np.random.shuffle(indices)
train_indices, val_indices = indices[split:], indices[:split]

# Creating PT data samplers and loaders:
train_sampler = SubsetRandomSampler(train_indices)
valid_sampler = SubsetRandomSampler(val_indices)

Here, for each set, we are going to create a `DataLoader` object that is going to create a batch iterator for us. We will pass to it out `MPSTDataSet` object as a source of data. Batch size as a `batch_size` argument. To specify train and validation splits, we are going to pass the corresponding `SubsetRandomSampler` objects as a `sampler` argument for the training set. Finally, we need to pass our `pad_sequences()` function as a `collate_fn` argument to tell the data loader how to prepare the batches so that they have the same length. 

In [51]:
train_loader = DataLoader(train_data, batch_size=batch_size, sampler=train_sampler, collate_fn=pad_sequences)
validation_loader = DataLoader(train_data, batch_size=batch_size, sampler=valid_sampler, collate_fn=pad_sequences)
test_loader = DataLoader(test_data, batch_size=batch_size, collate_fn=pad_sequences)

## Task 2. Define the model (0.5 points)

In this task, you just need to change the `OUTPUT_DIM` variable to match our task.

_Hint_: Use `train_data.label_vocab`

In [52]:
class CNN(nn.Module):
    def __init__(self, pretrain, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, 
                 dropout, pad_idx):
        
        super().__init__()
                
        self.embedding = nn.Embedding.from_pretrained(
            torch.from_numpy(pretrain.emb), 
            padding_idx=pad_idx, 
            freeze=True
        )
        
        self.convs = nn.ModuleList([
                                    nn.Conv2d(in_channels = 1, 
                                              out_channels = n_filters, 
                                              kernel_size = (fs, embedding_dim)) 
                                    for fs in filter_sizes
                                    ])
        
        self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):           
        #text = [batch size, sent len]

        embedded = self.embedding(text)     
        #embedded = [batch size, sent len, emb dim]
        
        embedded = embedded.unsqueeze(1)  
        #embedded = [batch size, 1, sent len, emb dim]
        
        conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs]    
        #conved_n = [batch size, n_filters, sent len - filter_sizes[n] + 1]
                
        pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]     
        #pooled_n = [batch size, n_filters]
        
        cat = self.dropout(torch.cat(pooled, dim = 1))
        #cat = [batch size, n_filters * len(filter_sizes)]
            
        return self.fc(cat)

In [53]:
INPUT_DIM = len(pretrain.vocab)
EMBEDDING_DIM = pretrain.emb.shape[1]
N_FILTERS = 100
FILTER_SIZES = [3,4,5]
OUTPUT_DIM = len(train_data.label_vocab)
DROPOUT = 0.5

model = CNN(pretrain, INPUT_DIM, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM, DROPOUT, PAD_ID)

In [54]:
optimizer = optim.Adam(model.parameters())

criterion = nn.BCEWithLogitsLoss()

model = model.to(device)
criterion = criterion.to(device)

## Task 3. Define your metrics (1.5 points)

### Task 3.1. Coding (1 point)
In the Lab, we used just accuracy as a metric to measure the performance of our model. This can work fine for a binary classification task, since we only have two classes, but it can backfire on a milti-label classification task. Imagine, the we have 100 labels in total. For some example, the correct labels are only of the index 10 and 20. The model output is all `0s`. Now, if we calculate the accuracy as 

$\text{accuracy} = \frac{\text{true positives } + \text{ true negatives}}{\text{true positives } + \text{ true negatives } + \text{ false positives } + \text{ false negatives}} = \frac{0 + 98}{0 + 98 + 0 + 2} = 0.98$. 

That's not what we really want, because in this case the model didn't output anything at all.

To get a more accurate idea about the model's performance, we are going to calculate additional metrics. Define the following metrics in the function bellow:

$\text{accuracy} = \frac{\text{true positives } + \text{ true negatives}}{\text{true positives } + \text{ true negatives } + \text{ false positives } + \text{ false negatives}}$

$\text{precision} = \frac{\text{true positives}}{\text{true positives } + \text{ false positives}}$

$\text{recall} = \frac{\text{true positives}}{\text{true positives } + \text{ false negatives}}$

$F_1 = 2 \cdot \frac{\text{precision } \cdot \text{ recall}}{\text{precision } + \text{ recall}}$

In this function, we are going to round our predictions to the closest integer, i.e. everything greater than $0.5$ will become $1$ and everything less than $0.5$ will become $0$. This is probably not the best threshold for our model. Selecting a good threshold can drastically improve the performance of your model, especially if the labels are unballanced (as in our case). You can find more about it in [this presentation](https://users.ics.aalto.fi/jesse/talks/Multilabel-Part01.pdf). However, in this Homework, we are not going to estimate the best threshold, since it takes additional time and effort.

To find true positives, false positives, false negatives, and true negatives, we can build a $\text{confusion vector} = \frac{\text{rounded preds}}{y}$. You can use the following properties of the `confusion_vector` to find the stats:

    1     where prediction and truth are 1
    inf   where prediction is 1 and truth is 0
    nan   where prediction and truth are 0
    0     where prediction is 0 and truth is 1

_Hint_: You might want to use `torch.isinf` and `torch.isnan`.

In these metrics, we are going to use _micro-averaging_. It means that we need to sum and the positives and negatives before calculating the metrics. You can read more about it [here](https://datascience.stackexchange.com/a/24051).

_Hint_: When summing the positives and negatives, don't forget to specify `dtype=torch.float` to make sure that the division will be correct when you are going to calculate the metrics.

### Task 3.2. Sigmoid vs Softmax (0.5 points)
For this task, we are using the Sigmoid activation function again, as in the Lab on binary classification. This is because we treat our current task as $n$ binary classification tasks, where $n$ is the number of labels. However, there is another popular fucntion called _Sotfmax_. 

Read more about the Softmax function [here](https://pytorch.org/docs/stable/nn.html?highlight=softmax#torch.nn.Softmax) and [here](https://en.wikipedia.org/wiki/Softmax_function) and answer the following questions. 

__Why we do not use the Softmax fuction for a multi-label classification task?__

<font color='red'>Because a film can have different topics, ie the labels do not mutually exclude each other. The probabilities returned by the sigmoid function do not sum up to one, they are independent of each other, which holds true for our multi-label classification task. If we had used softmax for predicting film genres, it would have returned only one film genre which is unrealistic if not incorrect.</font>

__What kind of classification task should we have to justify the use of the Softmax function?__

<font color='red'>In problems where our observations are associated with only one label. Good examples for this are classification of artists by training a computer vision model on their paintings, classification of animals(via the same method), classification of authors by training a model on books written by them. After all books are generally written by only one author, an animal picture cannot be both cat and dog(unless the picture contains several animals) and an oil painting cannot be created by both Rembrandt and Caravaggio.</font>

In [55]:
def multi_label_accuracy(preds, y):

    #round predictions to the closest integer
    rounded_preds = torch.round(torch.sigmoid(preds))
    confusion_vector = rounded_preds / y
    
    true_positives = (confusion_vector == 1).sum().to(dtype=torch.float)
    false_positives = torch.isinf(confusion_vector).sum().to(dtype=torch.float)
    false_negatives = (confusion_vector == 0).sum().to(dtype=torch.float)
    true_negatives = torch.isnan(confusion_vector).sum().to(dtype=torch.float)
    

    accuracy = ((true_positives + true_negatives) / (true_positives + true_negatives + false_negatives + false_positives))
    precision = true_positives / (true_positives + false_positives)
    recall = true_positives / (true_positives + false_negatives)
    f_score = 2 * precision * recall / (precision + recall)

    return accuracy, precision, recall, f_score

In [56]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    epoch_precision = 0
    epoch_recall = 0
    epoch_f_score = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        predictions = model(batch[0]).squeeze(1)
        
        loss = criterion(predictions, batch[1])
        
        acc, precision, recall, f_score = multi_label_accuracy(predictions, batch[1])
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        epoch_precision += precision.item()
        epoch_recall += recall.item()
        epoch_f_score += f_score.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator), \
        epoch_precision / len(iterator), epoch_recall / len(iterator), \
        epoch_f_score / len(iterator)

In [57]:
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    epoch_precision = 0
    epoch_recall = 0
    epoch_f_score = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            predictions = model(batch[0]).squeeze(1)
            
            loss = criterion(predictions, batch[1])
            
            acc, precision, recall, f_score = multi_label_accuracy(predictions, batch[1])

            epoch_loss += loss.item()
            epoch_acc += acc.item()
            epoch_precision += precision.item()
            epoch_recall += recall.item()
            epoch_f_score += f_score.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator), \
        epoch_precision / len(iterator), epoch_recall / len(iterator), \
        epoch_f_score / len(iterator)

In [58]:
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

## Task 4. Early Stopping (1 point)
In the lab we trained our model for a fixed size of epochs, but we cannot be sure if the model's performance could have been improved by training more. 

We can use the performance metrics to track if the training should be stopped. The most common method is to use a heuristic called early stopping.
Early stopping works by keeping track of the performance on the
validation dataset from epoch to epoch and noticing when the performance no longer improves. Then, if the performance continues to not improve, the training is terminated. The number of epochs to wait before terminating the training is referred to as the patience. In general, the point at which a model stops improving on some dataset is said to be when the model has converged. 

Fill in the blanks in the following code to make the early stopping work. Patience is set to 10 and we are tracking the validation loss. 

In [59]:
N_EPOCHS = 40
patience = 10
epochs_of_no_improve = 0
best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc, train_precision, train_recall, train_f_score \
        = train(model, train_loader, optimizer, criterion)
    valid_loss, valid_acc, valid_precision, valid_recall, valid_f_score \
        = evaluate(model, validation_loader, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'mpst_cnn_classifier.pt')
        epochs_of_no_improve = 0
    else: 
        epochs_of_no_improve = epochs_of_no_improve + 1
    
    
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}% | ' +
          f'Train Precision: {train_precision*100:.2f}% | Train Recall: {train_recall*100:.2f}% | ' +
          f'Train F1-score: {train_f_score*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}% | ' +
          f'Val. Precision: {valid_precision*100:.2f}% | Val. Recall: {valid_recall*100:.2f}% | ' +
          f'Val. F1-score: {valid_f_score*100:.2f}%')
    
    # check if the training should be stopped and then stop the training
    if epochs_of_no_improve >= patience : 
        print(f'Early stopping, on epoch: {epoch+1}.')
        break
        


Epoch: 01 | Epoch Time: 0m 37s
	Train Loss: 0.181 | Train Acc: 95.10% | Train Precision: nan% | Train Recall: 3.02% | Train F1-score: nan%
	 Val. Loss: 0.141 |  Val. Acc: 95.77% | Val. Precision: nan% | Val. Recall: 0.00% | Val. F1-score: nan%
Epoch: 02 | Epoch Time: 0m 39s
	Train Loss: 0.140 | Train Acc: 95.89% | Train Precision: nan% | Train Recall: 3.53% | Train F1-score: nan%
	 Val. Loss: 0.138 |  Val. Acc: 95.93% | Val. Precision: 75.93% | Val. Recall: 5.49% | Val. F1-score: 10.20%
Epoch: 03 | Epoch Time: 0m 38s
	Train Loss: 0.137 | Train Acc: 95.98% | Train Precision: 62.87% | Train Recall: 8.32% | Train F1-score: 14.54%
	 Val. Loss: 0.134 |  Val. Acc: 95.97% | Val. Precision: 75.90% | Val. Recall: 6.90% | Val. F1-score: 12.61%
Epoch: 04 | Epoch Time: 0m 38s
	Train Loss: 0.134 | Train Acc: 96.04% | Train Precision: 64.01% | Train Recall: 10.89% | Train F1-score: 18.50%
	 Val. Loss: 0.132 |  Val. Acc: 96.05% | Val. Precision: 66.10% | Val. Recall: 13.78% | Val. F1-score: 22.72%
Ep

Load the best model that we saved: 

In [60]:
model = CNN(pretrain, INPUT_DIM, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM, DROPOUT, PAD_ID)
model.load_state_dict(torch.load('mpst_cnn_classifier.pt'))
model = model.to(device)

In [61]:
start_time = time.time()
test_loss, test_acc, test_precision, test_recall, test_f_score \
    = evaluate(model, test_loader, criterion)
end_time = time.time()
epoch_mins, epoch_secs = epoch_time(start_time, end_time)

print(f'Epoch: test | Epoch Time: {epoch_mins}m {epoch_secs}s')
print(f'\tTest Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}% | ' +
      f'Test Precision: {test_precision*100:.2f}% | Test Recall: {test_recall*100:.2f}% | ' +
      f'Test F1-score: {test_f_score*100:.2f}%')

Epoch: test | Epoch Time: 0m 1s
	Test Loss: 0.126 | Test Acc: 96.04% | Test Precision: 64.00% | Test Recall: 15.92% | Test F1-score: 25.40%


## Task 5. Analyze the model's performance (0.5 points)

Look at the metrics while training and testing. Briefly describe what they tell you about the model:

<font color='red'>Your answer here</font> While the train&validation and test accuracies are good and there is very little difference between them; the f1 scores and other metrics are not. To be more specific, our f1 score is mediocre at best for train, and just simply bad for both validation&test sets. The model severely overfits the training data and cannot generalize well to unseen datasets.

This means that our model will be a bad choice for detecting minority classes in datasets that are unbalanced.

## Task 6. Test the model yourself (0.5 points)

Once we trained our model, we can try to predict the sentiment of our own input. We are going to define the `predict_sentiment()` function that is going to take our trained model and a text as an argument. 

First, we need to switch the model to evaluation mode be calling `model.eval()` on it. Then, we are going to tokenize the sentence the same way as we tokenized the input. If the sentence is less than `min_len` parameter, we are going to add the padding symbols to it, so our model doesn't throw an error. After that, we turn the words into indices with the same vocabulary that we built for training. Finally, we transform the output into tenson and adding an empty dimention in the beginning, imitating a batch of size 1.

Here, we are not going to use the $0.5$ threshold since it can result in all the label probabilities being `0s`. Instead, we are going to look at top-3 labels for each prediction.

Below, I've provided plot synopses for [Joker](https://www.imdb.com/title/tt7286456/?ref_=nv_sr_srsg_0) and [Madagaskar](https://www.imdb.com/title/tt0351283/?ref_=fn_ft_tt_1). With my model, I've got `['murder', 'violence', 'cult']` for Joker and `['psychedelic', 'comedy', 'humor']` for Madagaskar. Those predictions may be not perfect but they seem plausible to me. 

You can try to put synopses for your own favourite (or not) movies and share the predictions that you've got.

<font color='red'>Your answer here</font> I tried the model with 4 different films: Goodbye Lenin, Analyze This, State of Siege and Borat.

Surprisingly, the model returned somewhat accurate labels for all films, maybe except for Analyze This(probably because of the synopsis being short).

If we interpret the results optimistically, we can say that the model performed good enough for 3 out of 4 films.

If we take the conservative stance; we may claim that the model wasn't able to return either one of the labels 'political', 'historical', 'drama' labels for GoodBye Lenin besides appending an exaggerated 'murder' label for State of Siege(the film includes scenes where people get killed by uruguayan secret service, but it is more of a political/philosophical drama).

In [62]:
def predict_sentiment(model, sentence, min_len = 5):
    model.eval()
    tokenized = [token.text for token in tokenizer(sentence)]
    if len(tokenized) < min_len:
      tokenized += [PAD] * (min_len - len(tokenized))
    indexed = pretrain.vocab.map(tokenized)
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(0)
    
    prediction, indices = torch.topk(torch.sigmoid(model(tensor)), 3)
    labels = [train_data.label_vocab.id2unit(idx) for idx in indices[0]]
    return labels

In [63]:
goodbye_lenin_synopsis = """ The film opens with old footage from summer 1978 at our old weekend cabin. A boy (Nico Ledermüller) pushes a girl (Jelena Kratz) in a cart while the father films. The boy and girl play in the back yard and around the house.

After the titles, a television reporter speaks about a space program while the boy and girl watch. The boy points an astronaut out to his sister. A voice-over says that an East German citizen, Sigmund Jähn (Himself), was the first German in space. After that day, their life started going downhill. Mrs. Kerner is questioned by officials about her marriage. They are concerned that her husband has visited a capitalist country three times. The officials leave after Christiane Kerner (Katrin Saß) gets angry. The voiceover explains that while Jähn was in space representing East Germany, his father was having an affair with his new enemy of the state girlfriend and that he never came back.

The voiceover explains that his mother was so depressed that she stopped talking. She is shown in a mental hospital, staring off into the distance while her son colors with crayons and her daughter plays a wind instrument. The son tries to get her to come back to them, saying that its boring at Mrs. Schäfer (Christine Schorn)'s. He tells her he loves her and starts to cry but his mother is unmoved.

The boy looks upset as he watches television and the astronauts are on again talking about a cosmic marriage between two television characters.

The voiceover explains that eight weeks later Mrs. Kerner came back home and was back to normal. They surprise her when she comes back home. Alex is wearing a cardboard space ship and his sister has a painted sign that says, Hello Mama. Mrs. Kerner packs up her husband's clothes and sends them to Mozambique.

Old footage from spring 1979 shows the family at a train station and then on the train. The voiceover informs that they never mentioned their father again and that from then on their mother was married to their socialist fatherland. She is shown directing a choir of children and being involved with various youth activities. The voiceover says that she has become a social crusader and activist for the concerns of common people and tiny injustices.

The family watches television in their home and a news report about special government awards is on. Alex points out that his mother is on screen. Mrs. Kerner is shown receiving her award and shaking hands with the presenter.

More aged footage is shown, with a group of kids standing in front of a space ship in astronaut gear. The voiceover explains that after many hard days of work he would be the second German to venture into space. He is shown with other children riding on a bus labeled, Young Rocket Builders. He is then shown standing with a group of kids, all holding small air powered rockets. The voiceover explains that he had imagined exploring the secrets of space for the benefit of mankind. When he launches his rocket, it flies off into space.

Alex (Daniel Brühl) is shown ten years later, October 7th, 1989, sitting on a bench drinking a beer. The voiceover explains that it was the 40th anniversary of East Germany and he had the day off from his job at a TV repair firm. He says he feels at the height of his masculine allure as he burps and slouches on the bench. Flags and posters celebrating the anniversary are everywhere. At a parade soldiers march and tanks ceremoniously dive down the street. Inside his apartment, Alex sleeps on his bed fully dressed. A woman informs him that there's a girl there to see him. The girl turns out to him baby Paula at three months of age (Philipp Kupfer), whom she's holding and the woman is revealed to be Daniel's sister Ariane Kerner (Maria Simon). The baby begins to wail due to the vibrations the parade is causing in the apartment. Alex joins two other older women in an adjacent room, who are writing a complaint about the sizes of women's clothing. Alex turns on the TV and sees news coverage of the ceremonies and voices his disapproval. One woman remarks that nothing will change as everyone migrates.

In the evening, citizens march in the streets holding protest signs. The voiceover explains that they are marching for the right to take a walk without the Wall getting in their way. Alex walks with them, eating a piece from an apple which somebody as offered him, and repeating "Freedom of the press", at one point nearly choking on the apple. The East German military arrives and forms, with arms locked, a human wall around the protesters. A girl helps Alex spit out the piece of apple he was choking on. A man and a woman are shown driving and getting stopped by the cops. The man (Ernst-Georg Schwill) tells the woman that she might still make it if she takes the subway. Military vehicles with large metal plates on the front arrive and push the protesters back. Alex tries to get the girl's name but she gets taken away before she can tell him. As some protesters break through the police line the police turn violent. The woman observes protesters being treated violently and beaten and then notices that Alex is being taken away by the police (Martin Brambach). She faints and Alex rushes to help her, revealing that the woman is his mother. The police regain control of him and take him away in a truck.

In a prison, the protesters stand in lines with their hands on their heads. One guard (Michael Gerber) approaches Alex and removes him from the line. He hands him a piece of paper about his mother. Alex leaves and catches a train to the hospital. His sister is already there and explains that their mother had a heart attack. The doctor steps in and tells him she's in a coma and that they don't know if she will ever wake up again. Alex goes into her hospital room where she is hooked up to various machines that keep her alive. He tries to get her to wake up but the voiceover explains that she kept on sleeping.

At his TV repair job Alex watches the news as they discuss the resignation of Erich Honecker (Himself) for health reasons. As Alex takes down a poster of Honecker and leaves it outside in the rain, the voiceover says that Mrs. Kerner's sleep kept her in the dark during the resignation of Honecker, protests in West Germany (euphemistically called a classical concert), and the tearing down of the Wall (a recycling campaign).

With his mother still in a coma, Alex takes his first outing to the West. She misses the first free elections, her daughter Ariane quitting college and starting work at Burger King, Ariane's manager and boyfriend moving in with her, the subsequent westernization of the apartment, the arrival of Lara (Chulpan Khamatova), the girl who had kept Alex from choking at the protest, now a nurse at the hospital, the triumph of capitalism (a tiny group of guards is shown doing military maneuvers in front of a museum while a car branded with Coca-Cola drives by in the foreground and then a giant Coca Cola truck blocks them entirely from view), and Alex's regular visits to the hospital at strategic times made to coincide with Lara's works shifts. As she sleeps, Alex talks to her about Lara. In her sleep she also misses working-class job loss, including the TV repair business Alex works at. Alex gets a job as what's described a part of a reunification crew selling satellite TV. With his new job occupying his time, he leaves a tape with his voice on it to talk to his mother. Believing that the doctors and Lara will not be there when it plays, he mentions his like of Lara, who is in the room tending to his mother. Soon he is dating Lara and they go to a club together. After, Lara remarks that it's too bad Alex's mother is missing the transformation of Berlin. Alex doesn't think so because what she believed in had toppled in a moment. She asks about his father and Alex says that he was a doctor who escaped to the West and that they never heard from him again.

Alex and his partner Denis Domaschke (Florian Lukas) go door to door selling satellite TV. At one apartment complex, Alex and his partner leave having installed ten new satellites. Afterwards, they go back to his apartment, where Denis shows Alex his family and wedding film business videos. Explaining his ambition to one day make feature films, he shows Alex that he has edited a wedding video to match the famous bone and spaceship match-cut from 2001: A Space Odyssey.

In a voiceover, Alex explains the by June 1990 the border separating East Germany was meaningless. One day while visiting his mother Alex and Lara begin to kiss. As they do, Alex's mother wakes up from her coma. Doctor Wagner (Eberhard Kirchberg) explains that even though she's woken from her coma, her life is still in danger as she could have memory loss, amnesia, or other conditions. He tells Alex and Ariane that she may not survive the next weeks. They are not allowed to take her home because any sort of excitement could lead to another heart attack. Alex tells the doctor that he thinks the newspaper discussing German reunification would be too exciting to her.

Now awake, Mrs. Kerner is visited by her children. Ariane shows her mother her new granddaughter and Alex tells her that she had a heart attack because of a long line at the store on a hot day. As they leave the hospital Alex resolves to bring his mother home because of how easy it would be for her to find out about the fall of the Wall in the hospital.

Back at home he surveys what's too new at the apartment and has to be replaced with the old. Denis and Alex remodel the room, resetting it to a condition from before the fall of the Wall. Alex purposefully breaks the radio antenna off.

Alex returns to the hospital where a new doctor replaces the old doctor, who has moved to the West. The new doctor in charge, Dr Mewes (Hans-Uwe Bauer) tries to stop Alex to no avail. Alex points out that many people are leaving their jobs to enjoy the new richer life in the West, and how long would Mewes himself stay at his job? On the ride home, Alex asks the ambulance driver (Arndt Schwering-Sohnrey) and nurse (Dirk Prinz) to turn down the radio because its talking about East Germans exchanging their currency for Western currency. They bring her in on a stretcher and Alex tries to avoid his mother's old friend, now Westernized in dress. Upon arrival, his mother remarks that it seems like nothing has changed. Alex offers tapes for her to listen to, saying that the radio is broken. His mother confides that she contemplated suicide after her husband left but didn't go through with it because Alex visited her every day in the hospital and talked to her about Sigmund Jähn and other things. As Alex leaves to go shopping, his mother requests some Spreewald pickles and Alex agrees to get some for her.

In voiceover, Alex explains that by July 1990, East German stores were emptied and real money was coming in from the West. The Deutschmark was double the rate of Eastern currency and the corner store no longer carried the traditional Eastern foods such as the pickles his mother had requested. Alex buys a different kind of pickle and pulls jars out of the garbage, disinfects them, and relabels them. He puts the new goods he bought into the old, relabeled jars.

In an effort to gain power of attorney, Alex and Ariane find out that their mother didn't keep all her money in the bank. She hid it, but she can't remember where. She then lapses, remarking that her husband is running late in getting home. An upstairs neighbor, Ganske, turns on his television and the sound carries down into the room. Their mother is surprised that he watches Western TV. Alex makes up a story about Ganske (Jürgen Holtz) falling in love with a Western woman, and in consequence having let his communist commitment slip a little bit.

With everyone leaving for the West, Alex and Lara find it easy to secure an apartment, as they just have to breat in it, and there will be nobody to claim it from them. Lara is enthusiastic about the working phone the apartment has while Alex is happy to find the old East German foods he's been looking for. They spend the night together and Alex leaves in the morning for work. He stops by his house first and his mother asks about TV again.

At work he asks Denis what to do about the TV situation. Denis suggests showing old news programs on video. Meanwhile, Germany enters the finals in soccer. In a voiceover, Alex says that soccer helped reunify the country. At a marketplace he buys a whole series of old East German newspapers from a weirdly-looking man (Mennan Yapo).

Alex talks to neighbors and friends of his mother about her birthday party, inviting them but also explain that she's unaware of the fall of the Wall. Many of her teaching colleagues have retired. Alex talks to the principal, who explains why his mother was demoted because he was sternly strict in her enthusiasm for Communism, complaining and fighting small battles all the time. Alex reasons that with the demotion the principal owes her something. He hires someone to act as an East German dispatcher and buyer for a restaurant. Denis gets Alex many different video tapes of East German TV.

Alex hooks up a VCR to the TV and pretends to attach the TV to an antenna. Denis apparently has recorded over a tape and the first part of the tape talks about getting East German's satellite TV and the effect of German soccer on reunification but soon returns to the intended East German programming, nearly ruining the illusion. His mother remarks that she would like to work from bed and write petitions but Alex says he doesn't want her to work too hard.

On his mother's birthday, Alex goes to retrieve the principal and finds him drunk. Alex gives him a quick shower and dresses him, then takes him to his apartment. At the apartment, everyone he has hired or invited is there and two children sing for her exactly as they used to. The principal presents her with a basket of old food items found in Alex and Lara's new apartment. As Alex speaks to her about how much things haven't changed and how much he loves her, she notices a giant Coca-Cola ad being rolled out on the building adjacent. As everyone tries to cover for the irregularity, saying that there was an agreement between the American and Germany's statalized companies, the party starts to unravel. Lara is dissatisfied with how they are tricking Alex's mother and with Alex telling his mother that Lara's father is a teacher when he's really a cook. The children stop singing and Lara leaves.

Alex and Denis go to the Coca-Cola Company to film a fake news report about the giant advertisement. They are hassled by a company employee (Fritz Roth). They are about to start but Denis suggests waiting for better light. Staring at the clouds, Alex realizes that faking his mothers reality is as easy as studying the old news tapes and feeding Denis's desire to be a director. At home, Alex and his mother watch the fake news segment about Coca-Cola. They construct the segment as a visit to a West Berlin factory and say that Coca-Cola is angry about losing a lawsuit because the formula for Coca-Cola was invented by East German workers. As they watch TV, his mother remembers that she stored her money in the chest of drawers Ariane threw out when her boyfriend Rainer (Alexander Beyer) moved in.

Alex finds the money in the drawers still outside and takes it to the bank to exchange it for Western currency. He and his sister go, but it's him who will speak. The banker (Armin Dillenberger) informs him he is two days beyond the deadline and that they don't take cash anyway. Alex gets angry and tells him that the money was their money for forty years until the Western money came in. He is forced to leave the bank.

On the roof of his mother's apartment he throws all of her Eastern money into a Western-facing wind. Lara tells him to yell to let off some steam. When he does, fireworks go off as Germany wins the world soccer championship.

In her apartment, Alex's mother dictates another letter of complaint about women's clothes to Mrs. Schäfer. Two more former students show up unannounced and sing songs for her. Alex discreetly kicks them out as they explain that their friends told them they would get 20 marks for showing up and singing. Rainer and Ariane say they are tired of pretending they are still in the East. Ariane then tells Alex that she saw their father while at work. She sees him driving through Burger King with two kids in the back seat. In a voiceover, Alex says that he imagines his father as a fat man stuffing his face with cheeseburgers and that they live entirely separate lives.

At their apartment, Lara practices cast-wrapping on Alex for her impending examinations and then expresses her desire for Alex to tell his mother the truth. She gets so annoyed that he lets him with the plaster all over his body and will not remove it. Alex tries to move, falling into the bathtub in a funnily amusing scene. In the house, Alex notices a jar labeled Spreewald pickles. He races home on his motorbike and in a voiceover notes that while life around his was accelerating, he could always go to his mothers to live in a slower time and sleep.

At her house, Alex sleeps while his mother eats her Spreewald pickles. Ariane's daughter Paula (Laureen Hatscher & Felicitas Hatscher), now one year-old tot who is giving her first steps, notices a blimp that says West on it and begins to walk toward it. Alex's mother gets up out of bed to see what Paula is looking at but by the time she gets to the window the blimp is behind a building. She leaves the room, takes an elevator down (and notices a Nazi symbol and lewd graffiti on the wall), and leaves the building. A group of young Western men are moving in, perplexing her. She wanders further away from the building and notices IKEA branding and ads for bras and cars. A helicopter carries a statue of Lenin away. Alex wakes up and finds that she's gone and rushes out to find her. Alex and Ariane find her at the same time and rush her home. She asks them what's going on.

Denis has not build a sort of makeshift television studio and he and Alex work to create more explanatory fake news segments. They frame the situation as Hoenecker allowing West German refugees to enter the East as a token of generosity, promising 200 marks for every refugee entering the country. In a voiceover, Alex realizes that the DDR he is creating in his TV segments is the DDR he might have wished for. His mother suggests helping the Western refugees, offering the summer cabin.

Rainer reveals that he and Ariane are getting their own place and will soon be leaving because she is pregnant. Ariane again says that she is tired of pretending for her mother.

The whole family drives out to the cabin with the mother blindfolded to keep the new car and the cabin a surprise. As Alex is about to reveal that he has been lying to his mother, his mother reveals that she has been lying to him about his father. He didn't go to West Germany with a woman and he did write letters. He decided to leave because he wasn't in the Party and so they made his life miserable. There was a conference in West Berlin and he decided to stay. She was supposed to follow and bring her children but she feared getting her kids taken away from her so she stayed in East Germany. Now she regards it as the biggest mistake of her life. Alex gets upset and goes to sit by the lake.

That night his mother's condition gets worse and she is rushed back to the hospital. Ariane meanwhile finds all of the old letters from her father and cries as she reads them. The doctor says that she's had another heart attack and that they should expect the worst. Ariane gives Alex a letter from his father and says that she won't go out to his new address. In the morning, their mother wakes up and suggests that now they take in a Western refugee with her out of the house.

Alex leaves the hospital in a taxi driven by who he thinks is Sigmund Jähn (Stefan Walz), although the driver denies it is him. He asks the taxi driver to take him to Wansee where his father lives. At his father's house there is a party and Alex is treated as an invited guest. He wanders into a room where two kids are watching the Sandman and he asks to join them. The little boy in the room is his stepbrother Thomas (Rafael Hübner), and he says that the character on TV is an astronaut and Alex says that where he's from they say Cosmonaut. The little girl is his sister Carla (Hanna Schwamborn) and she asks where he's from and he tells her he's from another country. The father of the children walks in and it is revealed that he is Robert Kerner (Burghart Klaußner), Alex's father. He is told by his wife (Svea Timander) that it's time he gives a speech and then returns to talk with Alex. Probably, because of the shock of seeing his child, his is unable to deliver the long speech he's devised, but just says two sentences thanking everybody for coming. Coming back to the sitting room, he asks why Alex is there and Alex tells him that his mother wanted to see him one more time and that she is dying.

On the ride home, Alex asks the taxi driver what it was like in space. This time he doesn't deny being Jähn and says that it was beautiful but far from home.

Alex brings his father to the hospital and as he's walking to his mother's room Lara is explaining that Germany is reunified, although she stops before Alex can see this. While Robert is waiting outside, Ariane shows up but leaves when she sees her father there.

Alex decides to tell his mother the truth but first wants to celebrate East Germany one last time, giving it the send-off it deserves. At the library, they enlist the help of Jähn and record him addressing the DDR. Since his mother can hardly wait he moves the celebration from October 7th to October 2nd, the day before reunification. In the film they show to his mother at the hospital, Honecker resigns from all positions and Jähn becomes the new leader of East Germany. By now, only Alex is unaware of his mother's up-to-date understanding of Germany and his sister can barely conceal her laughter. Jähn states in the video that he has decided to reach out to West Germany to make it better and has opened up the borders. After the video, fireworks start.

Alex's mother dies three days later. Alex still believes that she never learned the truth and that this is a good thing because she died happy. She asked that her ashes be scattered in the wind, something prohibited in both East and West Germany. Alex puts her ashes into a rocket and shoots them into the sky, resulting in two fireworks.

In a voiceover laid over archival footage, Alex says that the country his mother left behind was one she believed in. He says he will always in his memory associate that country with his mother. """

In [64]:
predict_sentiment(model, goodbye_lenin_synopsis)

['flashback', 'romantic', 'comedy']

In [65]:
analyze_this = """When mobster Paul Vitti (Robert De Niro) suffers a crisis of confidence, threatening his ability to lead his criminal empire, he turns to professional help in the way of Dr. Ben Sobel (Billy Crystal) a psychiatrist who accidentally became acquainted with two of Vitti's henchmen. Dr. Sobel and his finance, (Lisa Kudrow) soon have their world turned upside down by Vitti's increasing demands for Sobels services which displace any semblance of order for the good doctor's heretofore mundane existence. Meanwhile, the growing importance of "the shrink" to the operation of the Vitti organization has not gone unnoticed in the law enforcement community, and the FBI approaches Sobel with an offer he cant refuse: betray Vitti by wearing a wire or spend a long time in a federal prison.

Complications escalate, and Ben finds himself dodging bullets, federal agents and rival gang members, but still wanting to fulfill his professional obligation to help the conflicted mob boss. It all comes to a climax when Sobel must act as Vitti's consigliore at a sit-down of the major crime syndicates to represent the famiglia and confront rival gang chieftain, Primo Sidone (Chazz Palminteri)."""

In [66]:
predict_sentiment(model, analyze_this)

['murder', 'revenge', 'flashback']

In [67]:
borat = """ Borat Sagdiyev is a TV reporter of a popular show in Kazakhstan. The movie opens when Borat is in his hometown of Kusak in Kazakhstan. He is introducing the entire town to the documentary movie he is going to be filming. There's people like the town mechanic / abortionist, the town rapist, his annoying neighbor, his wife Oxana, and so on and so forth. Borat also talks about his home and his hobbies. Hobbies include ping pong, which he plays rather poorly, and traveling to capital city where he watches women as they do their business. Borat takes you through his home where you meet his wife Oxana and his sons Huey and Lewis, and introduces us to his annoying neighbor Yusuf. Borat and Yusuf have been engaged in a game of coveted goods, where everything Borat gets, Yusuf attempts to get as well, but since Yusuf doesn't have the money that Borat has, everything Yusuf buys is of significantly less quality.

Borat is hailed as a hero in his town as he is contracted by the government of Kazakhstan to travel to America to make a documentary which will bring together, hopefully, the cultures of America and Kazakhstan. Borat is to travel to America with his most venerable producer Azamat Bagatov, who will handle all the behind the scenes work of this documentary. After a rather lengthy flight with multiple stops and connections, Borat finally arrives in New York City. On the subway to his hotel, his pet chicken escapes his bag and wrecks havoc on the subway. Borat's attempts to introduce himself to the native New Yorkers is also a complete failure.

At the hotel, Borat attempts to bargain with the hotel owners about his negotiated room rate, only to be met with another failure as the hotel has a set rate in which it can rent rooms out. He has apparently never been in an elevator which he thinks is his actual hotel room. The next series of footage shows Borat attempting to integrate his life into New York City. Unfortunately he finds the natives aren't as friendly as they are in his native Kazakhstan.

The next day Borat meets with a humor coach. The coach attempts to tell Borat about everything that he needs to know to be funny. Borat's version of humor and the humor coaches' version of humor don't exactly mix. The next interview finds Borat as the only male in a group of feminists. There, the humor he learned from the humor coach, and the feminists' version of humor again don't exactly mix. While at the hotel, Borat is learning about American television. There while flipping through the channels he stumbles upon an American television show called "Baywatch". He is immediately smitten with the show's star, one Pamela Anderson, who is unlike any Kazakh woman he has ever seen. While at the hotel Borat receives news that his wife has died in a horrible accident. He's grateful because he remembered shortly before he left that she threatened him with castration if he cheated on her. He then is seen leaving packing his things and wants to leave New York. He then convinces his producer Azamat that in order to see the real America they need to leave New York for greener pastures. Azamat isn't entirely convinced and has received threats from the Kazakh government that they need to do all their filming in New York or else. Borat convinces Azamat otherwise.

Since Borat doesn't know how to drive he enrolls in a driving course and becomes very friendly with driving instructor, but his behavior behind the wheel takes things a bit too far. Borat is then seen at a GM dealer. He is asking the dealer what to do to buy the vehicle he needs to complete the cross country road trip. The dealer explains that the car he needs, as well as one he needs to attract women, is a Corvette or a Hummer H2. But the high price tag alienates Borat and Azamat. Borat decides they need something much less expensive. The dealer obliges and sells Borat a former ice cream truck that's worth $750. They then set about heading across the country.

Their first stop is a news cast in North Carolina. There Borat manages to be the center of attention, while humiliating the weatherman, who can't stop laughing as Borat walks on stage in the middle of the broadcast. Their next stop is Washington, DC where Borat engages in a group of individuals who are much friendlier than they were in New York City. Upon meeting with Republican leader Alan Keys, Borat is shocked to learn that he unwittingly participated in a gay pride parade. Borat and Azamat quickly high tail it to their next destination - South Carolina. While in South Carolina, Borat and Azamat attend a rodeo. There, Borat is asked to sing the US national anthem which ends rather poorly when Borat attempts to integrate the Kazakh national anthem with the US national anthem. The next stop takes Borat and Azamat to a yard sale. There Borat thinks that the host of the yard sale is a gypsy but in fact all the stuff that they're selling came from inside their home. Borat finds another sign that they are destined to go to California to meet Pamela Anderson when he discovers a book about Baywatch. He then takes the book and he and Azamat high tail it to their next destination.

Their next stop takes them to Atlanta. There they get lost off the freeway looking for a place to stay for the night. Borat meets a group of "gangstas" who teach him their ways and means. Borat and Azamat head to a rather upscale Marriott Hotel. Borat tries the gangsta's methods and immediately gets kicked out of the hotel. So the next place they go to stay is a bed and breakfast outside of the city. Unfortunately for Borat, the place is run by an elderly Jewish couple. He and Azamat are scared and immediately begin thinking of excuses to leave the establishment. Borat is still freaked from their visit to the Jewish run bed and breakfast that they stayed in. So much that he feels that they need protection. Borat and Azamat then decide to deal with a shady vendor who specializes in zoo animals and sells Borat and Azamat their own bear which protects but also annoys them and winds up scaring a group of school kids who think they're driving a real ice cream truck.

Borat decides while in Atlanta that he needs to learn about American culture. To do so Azamat has him attend a formal dinner party. Things go incredibly south when Borat manages to offend everyone in attendance. First by showing some unflattering pictures of his sons Huey and Lewis, and attempting to dispose of his business the Kazakhstan way. They finally have had it with him when Borat contacts his "date" who turns out to be a prostitute. He's then asked to leave the party. Borat and his new date Luenell, decide that they should go hit the town and go to a bar with a mechanical bull.

The next stop takes them to Houston Texas. There, Borat is shopping at an antique store to buy a gift to present to his new bride Pamela. Borat asks the owner about their obsession with the Confederate flag and secession which they claim is a part of southern heritage. He winds up clumsily crashing into most of the objects in the store and gets kicked out. While at the hotel, Borat catches Azamat with his Baywatch magazine doing unspeakable acts with it, and he and Borat get into a massive fight that ends with them crashing an executive convention completely naked.

Azamat leaves - leaving Borat with only his return ticket and the chicken. Borat is depressed that he will be unable to complete the journey and may even wind up getting stranded in Houston. He drives as far as he can on what's left of the ice cream truck's gas. Thankfully he manages to hitchhike across the country with a group of college fraternity brothers. They then proceed to get really drunk. Borat tells them about his quest to marry Pamela Anderson. The guys show Borat her sex tape and he's now completely freaked out. He then leaves the Winnebago and winds up sleeping outside in a parking lot. There, while starting a bonfire with the Baywatch magazine, he accidentally incinerates his return ticket and lets his pet chicken go.

He's now in Phoenix, Arizona. What he steps into next turns out to be a Pentecostal gathering at a huge mega church. There's even political figures involved in the large mass. After listening to the key note speakers which include a US Congressman, anti-evolution activists, and multiple judges and state senators, Borat is then baptized and joins the church. He then catches a ride out to Hollywood. The long journey is almost over. While in Hollywood, Borat manages to reunite with Azamat who's making some additional money doing character work in the tourist areas. Azamat is still angry about what happened in Houston but tells Borat he wants to complete the documentary and return to Kazakhstan and forget this entire trip ever happened. We also learn the fate of the bear that they acquired for security purposes.

Azamat manages to track down Pamela Anderson. She's doing a DVD signing at the Virgin Records store in Orange, California. They go to the mall and Borat is first in line. He attempts to propose to Pamela Anderson - the traditional Kazakhstani way which involves a large, decorated sack he tries to throw over in an attempt to kidnap her, but she is completely freaked out by the proposal and runs off. Borat chases her out the back of the store and is finally tackled by security guards and arrested. Borat comes to the realization that true love is found where you least expect it. He and Azamat head back to Atlanta, and get a bus ride thanks to Borat's new connections with the Arizona mega church, where Borat reunites with Luennell, and wants her to come to Kazakhstan with him. She happily accepts the offer.

Several months later the documentary is complete and Borat is back in his village of Kusak. He's a national hero now because the documentary was successful and has made Kazakhstan more appealing on an international stage. Borat introduces his new wife Luennell, who he married some time after the documentary was completed. And things have improved since then. He has spread the religion of Christianity to most of Kazakhstan, and he managed to up the game against his neighbor Yusuf, who it seems can only afford the iPod Mini, where Borat has purchased a full iPod. He then gets the village together and says thank you for watching the movie and goodbye. The movie ends with the Kazakhstan national anthem."""

In [68]:
predict_sentiment(model, borat)

['comedy', 'satire', 'entertaining']

In [69]:
state_of_siege = """This closely follows a true account of Cold War US black-ops in South America preceding Operation Condor launched by Augusto Pinochet in Chile following the 1973 CIA-orchestrated coup against President Allende and his death by probable assassination. Costa-Gavros also directed Missing, a film about this event starring Jack Lemmon and Sissie Spacek.

The historical figure played by Yves Montand in State of Siege was Dan Mitrioni, a former Indiana police chief working in Montevideo within USAID's Office of Public Safety, which in fact was training civilian police in kidnapping, torture and assassinations of suspected Communists parallel to School of the Americas training of South American military personnel. The film opens with the American found dead in the back of a car. We subsequently learn how and why this came about. The Tupamoros, a popular Uruguay insurgency that robbed the rich and gave to the poor, had captured and interrogated the American "trainer" at length about his actions and motivations. The captive-captors interactions that take the form of a trial are the heart of the film, revealing the motives and objectives of the insurgents and the defensive rationalizations underlying violent American interference in Latin American governments. The American never breaks and confesses, so they are left with no alternative but to execute him for his many capital crimes.

Both Contra-Gavras films provide important history lessons that Americans receive from neither their schools nor their media. Missing was a major box office success, but State of Siege provides the clearest understanding of US Cold War operations and their impacts on targeted societies."""

In [70]:
predict_sentiment(model, state_of_siege)

['murder', 'violence', 'flashback']

In [71]:
joker_synopsis = """The story takes place in Gotham City, 1981.

Arthur Fleck (Joaquin Phoenix) works as a clown-for-hire for a company called Ha-Ha's. He struggles with severe depression personally but finds some form of optimism in performing for others and trying to make people laugh. He is tasked with advertising a store by dancing and waving a sign around. On one such occasion, the sign gets snatched by a group of punk teens, forcing Arthur to chase them into an alley. They smash the sign against his face and proceed to mercilessly kick him while he's down.

In this era, Gotham is struggling with crime, unemployment, and poverty. Arthur visits a social worker for his medication, as well as his ongoing mental health issues. On the bus ride home, a small child looks at Arthur. He makes silly faces that amuse the boy, but his mother tells Arthur to leave him alone. Arthur begins to laugh hysterically and uncontrollably. When the mother questions him, he hands her a card that explains that he has a mental condition that causes him to laugh the way that he does.

Arthur returns home, where he lives with his ailing mother, Penny (Frances Conroy). They sit and watch a talk show with host Murray Franklin (Robert DeNiro). Arthur imagines himself being on the show and getting Murray's attention. In his fantasy, Arthur charms the audience and Murray by telling them that he takes care of his mother. Murray relates to Arthur and invites him up on stage in front of everyone, where they share a familial embrace. It is revealed that Penny used to work for Thomas Wayne (Brett Cullen) and is obsessed with the millionaire and has been currently writing to him to try and better their living situation.

At Ha-Ha's, Arthur is given a gun for protection by his co-worker Randall (Glenn Fleschler) after he hears about the mugging incident. Arthur is both reluctant and relieved to receive such a gift as firearms are outlawed at work but soon finds his confidence growing after receiving the weapon. However, soon after this, he is confronted by his cold and unfeeling boss, who reprimands him for losing the sign and takes the cost of it out of his pay. Arthur responds only by smiling bitterly.

Arthur is infatuated with his neighbor, single mother, Sophie Dumond (Zazie Beetz). She speaks to him politely about relating issues that he can relate to. However, while trying to make an impression with her, he appears awkward and weird around her. At one point, he spends his day following her. Later, she comes by his apartment and asks if he was following her, and he admits that he was, but she doesn't seem put off by it. He invites her to a stand-up comedy show that he is performing at. She is hesitant but is won over by his charm and sense of humor. Arthur watches comedians perform to help him gain some insight into the craft, but feels more awkward and out of place as his over-the-top laughter is not genuine.

Arthur goes to the comedy club for his performance. His nervousness consumes him and, as a coping mechanism, unintentionally finds himself laughing so hard that he can barely speak. He then begins going off into his routine, which isn't very funny. Sophie appears to be in the audience... the only person who is laughing at Arthur's jokes. This gives him the comfort he needs to continue to joke despite his inner torment and turmoil.

Arthur later goes to a children's hospital to entertain them as a clown. He brought his gun with him, and it falls out on the floor. Arthur's boss later chews him out for this. Arthur pleads for a second chance, but his boss refuses and fires him on the spot. To top things off, Randall throws Arthur under the bus by claiming that Arthur got the gun himself. On the subway train ride home from Ha-Ha's in full clown getup, Arthur spots three drunk young Wall Street types working for Wayne Enterprises harassing a woman. Arthur starts laughing unintentionally and draws the attention of the men, while the woman wisely flees from that car. The men approach Arthur and mock him and his laughter before they start to beat him. Arthur fights back in self-defense, but they team up, and relentlessly beat him to the floor. Having had enough, Arthur then pulls out his gun and shoots two of them dead in self-defense before following the last guy out of the train and murdering him on the stairs.

In shock over what he just did, Arthur retreats into a bathroom. After a moment of frantic contemplation, he finds a force rising within him, and he begins to dance by himself. At this moment, he sees himself in the dirty mirror as a battered and smeared and yet powerful clown and begins to embrace it. He hides the gun and then returns to the apartment where he meets and kisses Sophie for the first time.

The news of the three murders spreads, with some seeing it as an attack on the wealthy, while others support the act. Thomas Wayne speaks out and condemns it, labeling the lower class as "clowns," which becomes a symbol they readily embrace. The next day, Arthur cleanse out his locker at Ha Ha's but not before confronting Randall about betraying him and breaking the time punching machine. He then leaves, feeling high-spirited and free. News reports show clown rioters protesting through the city and wreaking trouble, condemning the higher privileged. Arthur sees that he has inadvertently caused this and begins to see his true potential, which makes him genuinely delighted.

Arthur later finds one of Penny's letters to Thomas, which indicates that Arthur is Thomas's son. Arthur goes to Wayne Manor, where he meets young Bruce (Dante Pereira-Olson). After performing a magic trick for Bruce, he sticks his hands through the gate and forces Bruce to smile, realizing deep within that they may or may not be brothers. However, Alfred (Douglas Hodge) comes to intervene and tell Arthur to leave. Arthur mentions his mother and her involvement with Thomas, but Alfred says he remembers Penny and that she was lying to him. Arthur attacks and nearly strangles Alfred but then notices that Bruce is watching. Arthur then gets hold of himself and flees the Wayne premises.

Arthur finds Thomas at a public art theater event and tries to confront him with the potential of him being his father. Arthur mentions Penny, whom Thomas also remembers. He says she was delusional and that there's no way Arthur could be his son. Thomas also explains that Penny never told Arthur that he was adopted, which Arthur strongly rejects before uncontrollably laughing in Thomas's face. Thomas, unaware of Arthur's condition, becomes defensive and punches Arthur in the face before having the man is thrown out of the building. Arthur returns home, where he tortures himself with the fridge in a fit of depression and longing.

Two police detectives, Burke (Shea Wigham) and Garrity (Bill Camp) go to Arthur's apartment to question him on the subway murders due to the word that the suspect was wearing clown make-up, and they know Arthur lost his job earlier that day. Arthur denies any involvement and gets the detectives to leave. Not long after, Penny falls ill and is hospitalized. Sophie sits by Arthur as he tends to his mother. In the hospital, Arthur sees that Murray's show is playing a clip from his stand-up routine, but he is hurt to see that Murray only played it to mock Arthur.

Arthur later receives a phone call from a rep for Murray's show. He is invited to appear as a guest, which Arthur reluctantly accepts. After studying other interviews on the comedy show, Arthur decides to commit suicide in front of the live audience, thinking it will make them laugh.

Seeking hard proof, Arthur goes to Arkham Asylum and speaks to a clerk, Carl (Brian Tyree Henry), who has a file on Penny. When Carl says he can't give Arthur the info he wants, Arthur snatches the file and runs away to read it. Once away, Arthur opens the documents and reads them, finding that Thomas was telling the truth- according to the documents. The reality is that Penny adopted Arthur after he was found abandoned, and she abused him, tying him to a radiator and beating him alongside her abusive boyfriend. One part of the file mentions Arthur having a head injury, which is most likely what caused his laughing condition. Arthur returns to the hospital and tells Penny that he thought his life was a tragedy, but he sees it's a "fucking comedy." With that, he smothers Penny to death.

Arthur goes back home and breaks into Sophie's apartment. She sees him and is terrified, asking him to leave for the sake of her daughter. Arthur asks her if she has ever had "a really bad day," to which she replies that she doesn't even know him. Through this, it is revealed that every other moment featuring Sophie was just in Arthur's head. A broken and frustrated Arthur leaves Sophie alone, storming out of the apartment.

Arthur starts to get ready for his appearance on Murray's show and paints his face white. He is visited in his apartment by Randall and another former co-worker, named Gary (Leigh Gill). They offer condolences after they hear about Penny's death, but then Randall begins mentioning Burke and Garrity going to their apartments to question them about the subway murders. Arthur realizes that Randall is only seeking a way to use Arthur in order to cover his own butt and then snaps, brutally stabbing Randall twice in the face before smashing his head against the wall. A terrified Gary questions Arthur's deeds and begs to be let go. Arthur agrees to before playfully scaring him as a prank. Gary tries to undo the lock on Arthur's door but is unable to due to his height. He asks Arthur to open the door for him to which Arthur immediately agrees, pausing once to thank Gary for being the only person in his life who was nice to him. Arthur kisses Gary on the forehead and lets him go.

Arthur then dyes his hair green, puts on full clown make-up, and dons a burgundy suit. He then dances down the stairways, fully embracing his insanity and carefree life. Burke and Garrity find Arthur dancing in the street and move in to arrest him. Arthur runs, and they chase him into the subway train where dozens of other Gotham citizens are dressed like clowns after being inspired by the murders. Arthur hides his face with a clown mask, which he steals from a protester and inadvertently starts a brawl in the train cars. As the detectives pursue Arthur, one clown gets in the way, and Burke accidentally shoots him dead when they struggle with his gun. The clowns pull the detectives out of the subway and start beating them relentlessly, allowing Arthur to get away, moving smoothly through the police forces which swarm the area.

At the TV station, Arthur meets Murray and his agent Gene (Marc Maron). Before he goes on, Arthur asks Murray to introduce him as "Joker," since Murray referred to him as such when playing his clip. Murray asks Arthur if his clown make-up has political agendas behind it to which Arthur replies, "I don't believe in that. I don't believe in anything." While waiting to be introduced, Arthur sees Murray broadcasting a clip of a struggling Arthur trying to tell a joke. This causes Arthur's mind and plans to change, and then he dances out into the spotlight.

Arthur goes out as the show begins. He awkwardly tells Murray a joke, which he finds funny for its dark humor though nobody else does. After being confronted with this, Arthur continues by admitting to the subway murders. Murray and the audience slowly realize that Arthur is serious. Arthur argues that the audience only cares for the victims because Thomas Wayne spoke for them, but anyone else like Arthur would be ignored and walked over. Murray and the audience grow angrier with Arthur, but so does he. Murray scolds Arthur, which escalates into Arthur snapping and telling another joke, grinning giddily. "What do you get when you cross a mentally ill loner with a society that abandons him and treats him like trash?!" he asks, only for Murray to try shutting him off before calling for the police. An enraged Arthur then screams, "You get what you fucking' deserve!" before blowing Murray's brains out in front of everyone. The audience runs away in terror, and the news of the murder immediately hits the airwaves. Arthur then laughs genuinely for the first time in his life.

Gotham is now overrun by rioting citizens dressed as clowns after hearing about what Arthur did. The Waynes leave a movie theater to find the chaos in the streets. Thomas takes Martha (Carrie Louise Putrello) and Bruce into an alley, but one clown follows them and tells Thomas he is getting what he deserves using the punchline that Arthur used on the Murray Franklin show. With that, he shoots Thomas and Martha dead in front of Bruce. Meanwhile, Arthur has been arrested and is being taken by the police. Arthur looks out the window and laughs gleefully as he sees the destruction he has caused. Just then, the clowns in an ambulance run into the car, killing the cops and freeing Arthur, who is injured and unconscious. When he awakes, Arthur finds himself surrounded by a mob of cheering mobsters in clown masks. The rioters then cheer Arthur on as he stands on a car and embraces their admiration, now that he has gotten the recognition he has long desired. He dances to their cheering and then pauses, finding that his nose is bleeding profusely. He then spreads the blood across his upper lip and grins before standing before them, elevated like a god.

Sometime later, Arthur is locked up in Arkham. He laughs after telling this story and visualizes a young Bruce standing over his parents in the alley. Realizing that he has, in a way, turned Bruce into himself, Arthur laughs some more, finding this genuinely hilarious. He meets a new social worker (April Grace) and says he wants to tell her a joke, but she wouldn't get it. A few minutes later, Arthur then steps out of the room, leaving a trail of bloody footprints behind before he is chased around by orderlies."""

In [72]:
predict_sentiment(model, joker_synopsis)

['murder', 'violence', 'cult']

In [73]:
madagascar_synopsis = """At New York City's Central Park Zoo, Marty the zebra (Chris Rock) is walking on a treadmill, daydreaming about running free in the wild. He swings on vines, jumps, does flips and runs through a bunch of singing penguins. Marty is jolted back to reality by his best friend, Alex the lion (Ben Stiller), who gets in his face and roars.

Alex tells Marty that there's something in his teeth, so Marty tells him to open his mouth and let Dr. Marty, D.D.S., have a look. Marty reaches in and extracts a glass ball with a red ribbon on top. It's a snow globe, with a miniature Alex figure in the middle. It's a tenth birthday present for Marty from Alex. Marty puts the globe among a bunch of other "Alex" themed items that he'd been given over time. The gift did not excite him. In fact, he's bored and upset that his life at the zoo limits his ability to enjoy life more fully by keeping his movements restricted.

Today was not only Marty's birthday, but "Field Trip Day" at the zoo, when all the school kids come. It was one of Alex's favorite days, and he excitedly woke up his other friends, Gloria the hippopotamus and Melman the giraffe, so they could get themselves ready. That's difficult for Melman, as he's a hypochondriac and worries about all sorts of ailments he thinks he may have.

Alex quickly counsels Marty to work on changing his attitude, suggesting that he approach every day in a "fresh" way. So, Marty decides he's going to be "fresh" today.

Mason the Chimpanzee (Conrad Vernon) starts his day off by rummaging through the waste can and coming up with a cup of coffee, bagel and newspaper, which he shares with his companion, Phil, another chimpanzee, who can't speak and just uses sign language.

Once the gates to the zoo open, Gloria, Marty and Alex go into action, putting on an exhibition of dancing, posing, and doing acrobatics for the visiting people. They were very popular.

There are four penguins at the zoo who are in the process of digging an escape tunnel. They are using plastic spoons and popsicle sticks to dig with, but their tools keep breaking and slowing them down.

Marty is surprised when there's a small area of the grass in his compound that suddenly bulges upward, and the head of Skipper the penguin (Tom McGrath) appears, followed by those of his fellow penguins: Private (Christopher Knights), Kowalski (Chris Miller), and Rico. One of them asks what continent this is, and Marty says, "Manhattan." They realize that they've somewhat missed the mark, as their destination was Antarctica, so they go back down to continue digging, but first swearing Marty to silence.

Marty, Gloria, Melman and Alex all get V.I.P. treatments after the zoo closes, with the freshest of whatever foods they prefer, massages, and acupuncture. It's about as good as a life of confinement can be.

Gloria's birthday gift to Marty is a cake, while Melman gives him a thermometer, not telling him it's his old rectal thermometer until Marty had put it in his mouth to try it out.

Marty's birthday wish is to go to the wild. When Marty tells his skeptical friends that the penguins are trying to escape to the wild, Alex replies, "the penguins are psychotic." They engage in a discussion about where the nearest wild might be, and Gloria says she heard there are wild places in Connecticut.

Alex tells Marty that there wouldn't be things like the fresh steaks he likes so much out in the wild. When Marty asks his friends if they aren't bothered by not knowing about the world outside the zoo, they simultaneously say, "no."

Marty continues to be depressed, so Gloria makes Alex go and attempt to give him a pep talk. Alex knows that if he sings "New York, New York," that Marty won't be able to resist joining in, so he does that. Marty does join in, but the noise they make causes the other animals to start waking up and they shout out for Marty and Alex to "shut up."

Marty tries to convince Alex to join him in breaking out and traveling north to Connecticut. Alex isn't interested, besides tomorrow is "Senior's Day," at the zoo and he doesn't want to miss that.

Later that night, after the animals have all gotten back to sleep, Alex is wakened by Melman, who normally wakes up every two hours to pee. Melman tells Alex that Marty wasn't in his compound. Gloria comes over too, and they all wonder where Marty might be. Alex grabs a nearby phone and calls 911, before he realizes "we can't call the people."

Marty is sauntering down a main street in downtown New York, headed for Grand Central Station. He stares at a woman walking by who is wearing a zebra striped outfit. He spends some time ice skating at an ice rink, then stops and talks to a police horse (David Cowgill), who gives him directions to Grand Central. The police officer (Stephen Apostolina), riding the horse calls into the precinct and asks if he can shoot the zebra. He is told no, that it's animal control's responsibility.

Melman lifts Alex over the zoo wall and lowers him to the street. Gloria just busts through the brick wall, and Melman follows her out. Mason and Phil also go out through the hole in the wall. They all go to the nearest subway station and get in one of the cars, scaring and upsetting all the passengers. Before boarding the train, Melway goes into a restroom and comes out with one of those blue deodorizers in his mouth (he liked how it tasted).

The animals ride the subway down to Grand Central Station, which is the same place Marty went. Wherever Alex goes, the humans all freak out and run, which he doesn't understand because he's not after them at all. There's an old lady (Elisa Gabrielli) who isn't afraid of him, and actually calls him a "bad kitty" and proceeds to start beating on him with her purse.

When they find Marty, Alex rushes forward and tackles him, then hugs him, and finally chokes him, alternating in his emotions of concern, relief and anger. Marty tells Alex that he was going to come back to the zoo in the morning, after his little excursion to Connecticut.

Hundreds of police show up and surround the animals. By then, the penguins and chimps have arrived and joined Marty, Alex, Gloria and Melman at the station. All the humans are on edge, and very afraid, especially of Alex. Except the old woman, who kicks Alex in the groin.

Alex attempts to speak to the police and reason with them, but they don't understand him. So, he roars, in imitation of his popular daily zoo performance. The animal control officer is finally able to steady his nerves enough to shoot Alex in the butt with a tranquilizer dart. That puts Alex out. When Alex starts coming to, he's back at the zoo and there's an animal rights activist making a speech to a crowd about how the zoo animals should be returned to the wild. When they see Alex coming awake, they all get scared and run. Animal Control again responds by shooting at Alex with multiple darts. One of them hits him in the paw and he goes back to sleep.

The next time Alex wakes up, he's in a wooden crate, as are Marty, Gloria, Melman, the penguins and the chimps. They are on a large ship that is sailing to Africa. The boxes that the animals are in are all labeled: "Kenyan Wildlife Preserve." Phil can read, so he signs that bit of information to Mason, who informs the rest of them.

Rico coughs up a paper clip and uses it to pick the lock to the box holding the penguins. The four of them then waddle their way up to the bridge, disabling a crewman along the way, and administering a karate chop to the back of the neck of the captain of the ship, taking him out.

Alex, Marty and Gloria all start arguing about their predicament and what they should do. Meanwhile, the penguins are on the bridge of the ship and struggling to figure out how to steer and navigate it. They more or less accidentally figure it out and Skipper orders hard right rudder. When the ship lurches, the crates holding Alex, Marty, Gloria and Melman, all fall overboard and are set adrift as the ship moves away.

After traveling some distance, the crate holding Alex starts rolling and bouncing. It comes to a sudden stop and breaks open, sending Alex head over heels onto a sandy beach. He comes up coughing, with a mouthful of sand. He's all alone and spends a long time roaming the beach, calling out to his friends, and at one point, he even calls for Regis, Kelly, Matt, Katie, and Al.

Suddenly, there's the sound of another voice and Alex looks up to see a crate with four legs sticking out of the bottom, running around the beach. It's Melman. Alex hurries over and attempts to free Melman from the crate. He pulls Melman's neck way out, but that doesn't work, so he grabs a coconut tree log and prepares to ram it into Melman's stomach to force him out of the crate. He steps way back, points the log, and starts running. Just before he slams into Melman, something distracts him and he stops short. Coming onto the beach through the surf is a large crate containing Gloria. Once the crate hits the beach, Gloria kicks one side of her crate out, freeing herself and at the same time sending Alex flying through the air and crashing down on top of Melman, smashing his crate and freeing him. There are two starfish and a crab covering Gloria's private parts, so she announces that "the party's over," and they all scatter.

Marty is next to arrive, only he does so in style, riding onto the beach on the backs of some dolphins. Alex is surprised to see Marty, but in short order he realizes that all this grief he and the others are experiencing is because of Marty, so he starts chasing Marty around the beach, intending to beat him up. Melman and Gloria intervene and the four of them begin wondering just where they are. Melman looks around and offers his opinion that they are somewhere near San Diego, given the terrain and vegetation. Alex decides to chase Marty some more, because he doesn't want to be in San Diego where he likely won't be the star of the zoo anymore. Gloria stops Alex.

Alex hears some sounds coming from deep in the jungle. It sounds like humans, so they all charge off towards the noise. Alex has trouble when he keeps running into things, including a large spider web, that causes him to fall behind. Meanwhile, Gloria, Melman and Marty come across a clearing that is filled with about a hundred lemurs of various sizes and ages. They don't recognize what sort of animals the lemurs are, just that they definitely aren't human. They watch the lemurs as they dance, sing and generally carry on. Melman tells the others that he's counted 27 health code violations taking place.

A lemur named Maurice (Cedric the Entertainer) calls for quiet and introduces the lemur King Julien XIII (Sacha Baron Cohen). King Julien then launches into a rap song, "I Like to Move It." The lemurs are all enjoying themselves when suddenly an alarm is sounded and a lemur shouts that the fossa are coming. There are four fossa, which are animals that look like a cross between a cat and a dog, and prey on the lemurs. The lemurs all run, but the fossa catch a baby lemur named Mort (Andy Richter), and start making a salad, with Mort as the main ingredient.

Alex finally catches up with his friends and as he gazes down on the scene before him, Gloria sees a large spider on his back. She picks up a stick, preparing to swat the spider. Before she can do that, the spider speaks, saying hello. That causes Alex to look and when he sees the spider, he freaks out, letting out a huge roar, which frightens the fossa and they run away, allowing the Mort lemur to escape. Meanwhile, Gloria starts hammering away at the spider on Alex's back, nearly beating him unconscious in the process.

King Julien and his fellow lemurs don't know what to make of the strange animals who have suddenly appeared and scared the fossa away. He decides they must be aliens. To confirm whether the aliens are friendly or not, the king grabs Mort and tosses him towards them. Alex approaches Mort and attempts to make friends, but he's frightened of him and cries. Gloria picks up baby Mort and calms him down. The king now decides that these aliens are a bunch of pansies.

King Julien, aka, "The Lord of the Lemurs," steps forward and says, "welcome giant pansies!" Alex decides that all the lemurs must be some sort of squirrels, because they act so weird. The king asks "where are you giants from?" and when Alex says, "New York," the king says, "All hail the New York Giants!"

Alex asks the king about any people on the island, knowing that they need people to come find and rescue them. The king says there are people on the island, but they aren't very active. He then points up at a tree, where there's a skeleton hanging from a parachute. Not far away are the remnants of an airplane, also perched high in a tree. The king says there are no live people on the island.

Alex loses it and takes off for the beach, intending to jump in the ocean and swim back to New York. Gloria again has to stop him and calm him down. She assures him that the people must be missing them and should be coming at any time to rescue them.

Melman decides their situation is hopeless and he digs himself a grave there on the beach. Extending out from the grave is a long will and testament that he'd written in the sand. He starts to read it to the others, informing them of what he will leave to each of them, when a wave comes rolling in and erases the bottom third of the will. Melman says, "sorry Alex," as that portion of the will pertained to Alex's inheritance.

Marty decides that he doesn't care if the humans come to find and rescue them, because he loves his newfound freedom there in the wild. That upsets Alex and he takes that large coconut log and draws a line in the sand, telling Marty that he must stay on the other side of that line, while Alex and the others, who want to be rescued, will stay on the opposite side. That doesn't bother Marty, as he immediately sets about building himself a little beachside patio, complete with a large umbrella, fire pit, bar and lounge seats.

Alex starts building something too. It ends up being a large figure, similar to the Statue of Liberty, and he calls it the "Beacon of Liberty." His intent is to set it ablaze once they see a ship out on the ocean, so the people will see it and come to their rescue. While Marty has already started a little fire in his fire pit, Alex has ordered Melman to work on starting a fire for them. Melman is getting very tired rubbing two sticks together, trying to create a spark. However, he eventually does get some sparks, then flames, but the flames catch the sticks on fire and Melman becomes frightened and starts running around. He accidentally sets the Beacon of Liberty on fire and it quickly burns down. Alex can't believe it.

Back at the lemurs place, the king is telling his fellow lemurs that he wants to make the New York Giants their friends, especially Alex, who would then keep the fossa away from them. Maurice then poses a question to them all, asking them to consider why it is the fossa are afraid of Alex, and perhaps the lemurs should be afraid of him too. No one seems particularly concerned.

Back on the beach, Alex has built another structure, made of coconut logs. This one spells out the word, "HELP."

Melman and Gloria decide to join Marty on the "fun side" of the island, and are over sitting under the umbrella, near the fire pit. Marty invites Alex to join them, telling him that it's not really the fun side without him there. Marty continues to refuse. As he sits there pouting, the "P" in his help sign collapses.

After some time, Alex decides to go over to the fun side, apologizing to Marty and asking to join the others. Marty welcomes him to "Casa del Wild," and prepares Alex a drink, which he serves to him in a coconut shell. Alex takes a big gulp and immediately spits it out. Marty has to explain that the drinks are just sea water, but that's only until the plumbing is fixed. Gloria, Melman, Marty and Alex then continue sipping their drinks and spitting it out.

Marty then prepares something he calls "seaweed on a stick," and offers some up to his friends. Gloria and Melman think it's very tasty, but Alex starts choking when he eats it. He is very hungry for meat, and misses the steaks he used to eat at the zoo in New York. After he falls asleep, he dreams about steak, and he starts to lick one. He is jolted awake by the others and finds himself licking Marty's backside. Marty wants to know what Alex thinks he's doing. Alex pretends to have been counting Marty's stripes.

Meanwhile, 2,500 miles to the south, the penguins have arrived at their dream destination, Antarctica, and are standing quietly on the ice, near the bow of the ship. The wind is blowing and there's nothing around them except ice and snow as far as the eye can see. Finally, Private turns to the others and says, "well, this sucks."

King Julien and all the lemurs make noise and wake up Alex and the others, surprising them by leading them to an overlook where they gaze out upon a huge expanse of open area, with beautiful green grass, trees and waterfalls visible way in the distance. The king says, "Welcome to Madagascar."

It all looks just like the poster that Marty had on the wall back in New York. It was just like the land in his dreams. He and Alex rush forward and romp through the grass, wrestling and teasing each other as they went. Alex tires temporarily, as he hadn't eaten for a long time, but rediscovers that he has more energy than ever and continues his romp.

Marty suggests to Alex that he perform his zoo routine for the lemurs. When Julien hears Alex refer to himself as a "king," he becomes concerned, thinking that there can't be two kings on Madagascar.

The fossa arrive and observe Alex's performance from behind some rocks. As Alex continues his performance, he looks out over the crowd, comprised of the lemurs and his friends, and they all start to look like steaks to him. When Alex lets out a huge roar, the fossa run off and everyone else decides to run as well, because Alex is attacking. When Alex snaps out of his hunger induced temporary insanity, he has his jaws attached to Marty's butt, but he hadn't bitten down yet.

Maurice takes advantage of the moment to educate everyone about the fact that Alex is an apex predator, and if he's hungry, no one is safe. About that time, Alex sees steaks again and starts another attack. He always seems to hone in on Marty as his target of choice. Alex is about to pounce on Marty when Maurice fires a coconut from atop a tree and beans Alex with it, saving Marty.

Alex recovers and is very upset at himself for attacking Marty. He's worried what might happen, so he runs off. He falls into a river and ends up floating to another part of the island, the place where the fossa live. There, he sharpens a bunch of sticks and inserts them into the ground, sharp sides up. He's created a sort of jail for himself, with the sticks at the base of some rocks and himself sitting on the rocks.

Marty, Gloria and Melman also find themselves roaming around in the land of the predators, where the fossa live. As they walk, they observe several small animals get eaten up by much larger predators, and they are not at all comfortable.

A boat horn sounds out on the ocean, so Marty, Gloria and Melman hurry to the beach. They see the same ship that they'd been on earlier. Melman struggles to hoist Gloria up high on his head, his neck bending severely as he does it, but her waving seems to work, as the ship begins to turn around and head for the island. Marty decides to hurry and go find Alex, but Gloria stops him, knowing he wouldn't last long in the land of the predators.

The ship arrives at the island and the bow comes right up onto the beach, flush against Melman's face, as he's standing there in shocked amazement. The anchor drops onto the beach and the four penguins appear. Gloria asks them where the people from the ship are and Skipper tells her that they "are on a slow lifeboat to China."

During the distraction of the ship coming aground on the island, Marty has run off to go find Alex. He finds him sitting in his little jail enclosure. Alex talks to Marty for a little bit, then lunges and takes a swipe at him. He couldn't help it, he was so hungry. He regains his senses and goes deeper into the rocks to hide. Marty follows him partway and tells Alex he isn't going anywhere without him. He starts to sing, "New York, New York." Alex doesn't respond.

The fossa have arrived and are gathering around Marty in force. Soon, they attack and Marty has to run, calling for help. Just as it appears Marty is doomed, Melman and Gloria arrive. Melman sweeps down from on high and scoops up Marty, carrying him to safety. The penguins are there too, and Skipper steps forward, producing a flare gun which he fires into the air, distracting the fossa. He and the other penguins then quickly rig up a device using the steering wheel from the ship, and spin it around rapidly, striking the attacking fossa and knocking them unconscious.

Alex shows up, roaring and showing his teeth. He makes as though he's attacking Marty, claiming Marty as his territory, as his meal. Marty is about to faint, thinking Alex is really going to eat him, until Alex whispers to him that it's all for show.

Alex then grabs Marty, Gloria and Melman and hoists them all above his head, proclaiming to the fossa that they all belong to him. Then he sets the three back on the ground and starts taking it directly to the fossa, knocking them this way and that, until they all decide to run off. He shouts at them to never come back.

The king observes all this and becomes very happy that his plan had worked out after all.

Back on the beach, Rico is working feverishly as a chef, preparing some sushi, which is then fed to Alex. Alex is tentative, but he tries it and decides he likes it. He orders 300 pieces to go.

Everyone enjoys themselves at the "Thank You Freaks," banquet put on by the lemurs. There's a massive toast, with everyone taking a drink from their coconut cups and then simultaneously spitting it right back out.

Marty tells his friends that he's ok with just staying on Madagascar, or going back to New York, just as long as he can be with his friends. They decide to go back to New York.

The king gives Alex his crown, then produces a new one for himself. His new crown is larger and has a live gecko on it.

Marty, Alex, Gloria, and Melman board the ship. The chimps are still there too. The ship is loaded with lots of fresh fruit and sushi, ready to sail. Alex envisions them making some side trips, since it will be winter at the New York zoo. The penguins, however, are all sitting in beach chairs, sunning themselves. One of them wonders if they should tell those on the ship that it's out of gas."""

In [74]:
predict_sentiment(model, madagascar_synopsis)

['psychedelic', 'entertaining', 'cult']

## BONUS. Weights & Biases (1 point)

Logging the loss and accuracy curves is quite often done by saving the values in arrays and then plot them after the training. To make the logging easier we can use library called Weights & Biases (WandB). 

[Weights & Biases](https://wandb.ai/site) is a tool that helps you keep track of your machine learning projects. It is used to log hyperparameters, output metrics from runs and visualize results. This tool can also be useful in your project, where you might want to compare the performance of different models. 

Your task is to set up the WandB project for this CNN model. Name the project "hw4_bonus". Learn how to do it from [here](https://docs.wandb.ai/quickstart). Run the homework again so that you can save the results and output metrics. Log all the metrics (loss, f1-score, recall, accuracy and precision). Even thought WandB is framework agnostic use PyTorch specific hooks for this task. 

When you have rerun the homework log in to the Weights & Biases. You should see a project named "hw4_bonus". Go through the project and see what kind of information is saved. 


**Answer following questions:**
1. Was it easy to set up the WandB project. If yes, what were the issues. 

<font color='red'>Your answer here </font>

2. Save the charts about the metrics (which are in the W&B) and show them here. Describe every chart - what can you see and does it make sense. 

<font color='red'>Your answer here </font>

3. What other information can you find and how is it useful?  

<font color='red'>Your answer here </font>

