# ISO-Based Deep Learning Using LSTMs

This notebook establishes the model architecture that is used to learn the mapping between various desired mood states into playlists. The notebook is split into four distinct section: *data loading*, *dataset*, *model architecture*, and finally *training and evaluation*. We proceed by summarizing these sections briefly; a detailed description of their purposes can be found under the section headers. 

The **data loading** section loads all of the variables from preprocessing, including the tokenization of the training set, as well as the tokenizer used to perform the tokenizations. 

The **dataset** section creates an `ISODataset` class which converts the Dataframe loaded in from the data loading section to be in a format which is easily accessible by torch. 

The **model architecture** section creates the actual model that is used for training. The specification of the model is also under its section heading. It should be important to note that `torch lightning` is used throughout the notebook, but in particular for designing the model architecture. Thus, the code in the training and evaluation section is minimal. This section also encapsulates the loss function and learning rate scheduler used for training.

The **training and evaluation** section contains code which kickstarts the training of the model. 

In [21]:
import json
import torch
import random
import pickle
import numpy as np
import pandas as pd
import pytorch_lightning as pl
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from torch.nn.utils.rnn import pad_sequence, pack_padded_sequence

## Data Loading

Data from preprocessing is loaded into this notebook – including the tokenizations, cleaned audio features, and the tokenizer itself. It is important to note that for ease of operation, we are pre-emptively removing all the rows in the data frame corresponding to empty playlists, as these are still WIP. 

In [3]:
df = pd.read_csv('train.csv', index_col=0)
df = df[df['features'] != '[null]']

Reload the tokenizer.

In [4]:
class Tokenizer:
    def __init__(self):
        self.stoi = {}
        self.itos = {}
    
    def __len__(self):
        return len(self.stoi)
    
    def fit_on_moods(self, moods):
        flat = []
        
        Tokenizer.flatten(moods, flat)
        vocab = sorted(set(flat))
        vocab.append('<sos>')
        vocab.append('<eos>')
        vocab.append('<pad>')
        for index, word in enumerate(vocab):
            self.stoi[word] = index
        self.itos = {v : k for k, v in self.stoi.items()}

    def flatten(l, flat):
        """
        Recursively, flatten a list.
        """
        if type(l) != list:
            flat.append(l)
        else:
            for el in l:
                Tokenizer.flatten(el, flat)

    def moods_to_token(self, states, reverse=False):
        """
        Recursively tokenize moods, while preserving the
        structure of the list. When `reverse` is true, the
        method translates the tokens back into the mood strings
        """
        if type(states) != list:
            if reverse:
                return self.itos[states]
            else:
                return self.stoi[states]
        else:
            for index, state in enumerate(states):
                states[index] = self.moods_to_token(state, reverse)
            return states
tokenizer = torch.load('tokenizer.pth')

## Dataset

In this section, we package the training data into an `ISODataset` object. This is so that `torch`'s batching system can work with it more easily. Moreover, to make sure that all of the sequences are uniform, we assume that each states has at most 5 mood descriptors. Therefore, all the inputs to our network should be of shape `(batch_size, n, 5, 3)`

In [66]:
class ISODataset(Dataset):
    def __init__(self, df):
        self.df = df
    
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        mood_states = json.loads(self.df.iloc[idx]['moods_states'])
        for index, state in enumerate(mood_states):
            mood_states[index] = np.pad(state, (0,5-len(state)), 
                                        constant_values=tokenizer.stoi['<pad>'])
        mood_states = torch.LongTensor(mood_states)
        
        audio_features = self.df.iloc[idx]['features']
        audio_features = torch.Tensor(json.loads(audio_features))
        return mood_states, audio_features

Since the mood states are not only of variable length but also of variable dimension, we need to pad each batch so that a network will be able to process them. This preprocessing before it reaches the neural network is done through the `iso_collate` function below.

In [72]:
def iso_collate(batch):
    moods, features = batch
    lengths = []
    for data_point in batch:
        lengths.append(len(data_point))
    moods = pad_sequence(moods, batch_first=True, padding_value=tokenizer.stoi['<pad>'])
    return moods, torch.LongTensor(lengths), features

## Model Architecture

In [15]:
class Model(pl.LightningModule):
    def __init__(self, tokenizer, decoder_dim=30, dropout=0.5):
        super().__init__()
        self.tokenizer = tokenizer
        self.embedding = nn.Embedding(len(self.tokenizer.itos),
                                      3, padding_idx=self.tokenizer.stoi['<pad>'])
        self.lstm = nn.LSTM(batch_first=True)
    
    def init_hidden_state(self, )
        
    def forward(self, x):
        mood_states, audio_features = x
        return self.embedding(mood_states)
    
    def step(self, batch, batch_idx):
        return
    
    def training_step(self, batch, batch_idx):
        return
    
    def validation_step(self, batch, batch_idx):
        return
    
    def configure_optimizers(self):
        return

## Training and Evaluation

In [85]:
iso = ISODataset(df)
a1 = nn.Embedding(len(tokenizer.itos), 3,
                 padding_idx=tokenizer.stoi['<pad>'])
tokens, labels = iso[0]
b = a1(tokens)

In [89]:
b.view(1, -1, 3)

tensor([[[ 0.8458, -0.9801, -0.2898],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [-1.1824,  0.1617,  0.0662],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [-0.1663,  0.1596, -0.9348],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000]]], grad_fn=<ViewBackward>)

In [98]:
b.size()

torch.Size([3, 5, 3])

In [100]:
b = b.reshape((1, 3, 5, 3))

In [104]:
a2 = nn.Linear(3, 2)
a3 = nn.ReLU()

In [61]:
m = Model(tokenizer)
iso = ISODataset(df)
m(iso[0])

[21 31 31 31 31]
[ 3 31 31 31 31]
[13 31 31 31 31]


tensor([[[-1.5340, -0.2316, -0.6829],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000]],

        [[ 1.0501,  0.4073, -0.4818],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000]],

        [[ 1.3359, -0.3053, -1.3831],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000]]], grad_fn=<EmbeddingBackward>)

In [74]:
iso_collate(iso[0])

(tensor([[21, 31, 31, 31, 31],
         [ 3, 31, 31, 31, 31],
         [13, 31, 31, 31, 31]]),
 tensor([ 3, 10]),
 tensor([[ 9.6900e-01,  3.8800e-01,  8.5900e-02,  7.3500e-05,  7.0000e+00,
           1.0800e-01, -1.6061e+01,  0.0000e+00,  4.7200e-02,  8.8253e+01,
           1.9000e-01],
         [ 9.7800e-01,  3.6700e-01,  1.1100e-01,  3.9700e-05,  4.0000e+00,
           8.9700e-02, -1.4084e+01,  1.0000e+00,  9.7200e-02,  8.2642e+01,
           1.9800e-01],
         [ 9.6600e-01,  5.1300e-01,  1.8300e-01,  0.0000e+00,  2.0000e+00,
           3.2700e-01, -1.0249e+01,  1.0000e+00,  4.0700e-02,  1.0718e+02,
           2.6600e-01],
         [ 3.9800e-01,  6.6800e-01,  5.4700e-01,  7.6600e-02,  1.0000e+00,
           9.3100e-02, -8.0240e+00,  1.0000e+00,  3.5300e-02,  8.3500e+01,
           1.9200e-01],
         [ 8.5300e-01,  7.9400e-01,  3.2000e-01,  1.3400e-01,  1.0000e+00,
           1.1200e-01, -1.2920e+01,  0.0000e+00,  1.7300e-01,  1.7409e+02,
           2.4100e-01],
         [ 8.630

In [93]:
iso_collate(iso[0])

(tensor([[21, 31, 31, 31, 31],
         [ 3, 31, 31, 31, 31],
         [13, 31, 31, 31, 31]]),
 tensor([ 3, 10]),
 tensor([[ 9.6900e-01,  3.8800e-01,  8.5900e-02,  7.3500e-05,  7.0000e+00,
           1.0800e-01, -1.6061e+01,  0.0000e+00,  4.7200e-02,  8.8253e+01,
           1.9000e-01],
         [ 9.7800e-01,  3.6700e-01,  1.1100e-01,  3.9700e-05,  4.0000e+00,
           8.9700e-02, -1.4084e+01,  1.0000e+00,  9.7200e-02,  8.2642e+01,
           1.9800e-01],
         [ 9.6600e-01,  5.1300e-01,  1.8300e-01,  0.0000e+00,  2.0000e+00,
           3.2700e-01, -1.0249e+01,  1.0000e+00,  4.0700e-02,  1.0718e+02,
           2.6600e-01],
         [ 3.9800e-01,  6.6800e-01,  5.4700e-01,  7.6600e-02,  1.0000e+00,
           9.3100e-02, -8.0240e+00,  1.0000e+00,  3.5300e-02,  8.3500e+01,
           1.9200e-01],
         [ 8.5300e-01,  7.9400e-01,  3.2000e-01,  1.3400e-01,  1.0000e+00,
           1.1200e-01, -1.2920e+01,  0.0000e+00,  1.7300e-01,  1.7409e+02,
           2.4100e-01],
         [ 8.630