# ISO-Based Deep Learning Using LSTMs

This notebook establishes the model architecture that is used to learn the mapping between various desired mood states into playlists. The notebook is split into four distinct section: *data loading*, *dataset*, *model architecture*, and finally *training and evaluation*. We proceed by summarizing these sections briefly; a detailed description of their purposes can be found under the section headers. 

The **data loading** section loads all of the variables from preprocessing, including the tokenization of the training set, as well as the tokenizer used to perform the tokenizations. 

The **dataset** section creates an `ISODataset` class which converts the Dataframe loaded in from the data loading section to be in a format which is easily accessible by torch. 

The **model architecture** section creates the actual model that is used for training. The specification of the model is also under its section heading. It should be important to note that `torch lightning` is used throughout the notebook, but in particular for designing the model architecture. Thus, the code in the training and evaluation section is minimal. This section also encapsulates the loss function and learning rate scheduler used for training.

The **training and evaluation** section contains code which kickstarts the training of the model. 

In [1]:
import json
import math
import torch
import random
import pickle
import requests
import datetime
import collections
import threading
import numpy as np
import pandas as pd
import pytorch_lightning as pl
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
from sklearn import decomposition
from scipy.special import softmax
from torchvision import transforms
from torch.utils.data import DataLoader, Dataset
from torch.nn.utils.rnn import pad_sequence, pack_padded_sequence

In [2]:
np.set_printoptions(suppress=True)

# Data Loading

Data from preprocessing is loaded into this notebook – including the tokenizations, cleaned audio features, and the tokenizer itself. It is important to note that for ease of operation, we are pre-emptively removing all the rows in the data frame corresponding to empty playlists, as these are still WIP. 

In [3]:
df = pd.read_csv('data/train.csv', index_col=0)
df = df[df['features'] != '[null]']

In [4]:
df

Unnamed: 0,moods_states,features
iso26,"[[22], [20]]","[[0.755, 0.479, 0.154, -15.051, 0.0369, 0.232]..."
iso23,"[[28], [20]]","[[0.455, 0.674, 0.615, -8.188, 0.147, 0.756], ..."
iso30,"[[19, 1, 9], [6, 21, 25]]","[[0.907, 0.23, 0.159, -16.315, 0.0323, 0.039],..."
iso19,"[[17], [26], [0]]","[[0.985, 0.653, 0.178, -13.47, 0.0312, 0.225],..."
iso22,"[[28], [26]]","[[0.155, 0.221, 0.0879, -16.996, 0.0381, 0.040..."
iso24,"[[22], [11]]","[[0.975, 0.462, 0.203, -16.313, 0.0355, 0.437]..."
iso18,"[[13], [16], [25]]","[[0.124, 0.585, 0.52, -6.136, 0.0712, 0.129], ..."
iso25,"[[22], [26]]","[[0.953, 0.582, 0.199, -10.045, 0.0321, 0.0783..."
iso17,"[[23], [4], [9]]","[[0.755, 0.479, 0.154, -15.051, 0.0369, 0.232]..."
iso20,"[[5], [16], [21]]","[[0.948, 0.571, 0.0274, -20.274, 0.0649, 0.087..."


Reload the tokenizer.

In [5]:
class Tokenizer:
    def __init__(self):
        self.stoi = {}
        self.itos = {}
    
    def __len__(self):
        return len(self.stoi)
    
    def fit_on_moods(self, moods):
        flat = []
        
        Tokenizer.flatten(moods, flat)
        vocab = sorted(set(flat))
        vocab.append('<sos>')
        vocab.append('<eos>')
        vocab.append('<pad>')
        for index, word in enumerate(vocab):
            self.stoi[word] = index
        self.itos = {v : k for k, v in self.stoi.items()}

    def flatten(l, flat):
        """
        Recursively, flatten a list.
        """
        if type(l) != list:
            flat.append(l)
        else:
            for el in l:
                Tokenizer.flatten(el, flat)

    def moods_to_token(self, states, reverse=False):
        """
        Recursively tokenize moods, while preserving the
        structure of the list. When `reverse` is true, the
        method translates the tokens back into the mood strings
        """
        if type(states) != list:
            if reverse:
                return self.itos[states]
            else:
                return self.stoi[states]
        else:
            for index, state in enumerate(states):
                states[index] = self.moods_to_token(state, reverse)
            return states
tokenizer = torch.load('data/tokenizer.pth')

# Dataset

In this section, we package the training data into an `ISODataset` object. This is so that `torch`'s batching system can work with it more easily. Moreover, to make sure that all of the sequences are uniform, we assume that each states has at most 5 mood descriptors. Therefore, all the inputs to our network should be of shape `(batch_size, n, 5, 3)`, where $n$ is pre-determined.

## Augmentations

We implement two data augmentations:

- Random protuberance (randomly shifting the audio features by a particular specified percentage)
- Reversing mood states and audio features (self explanatory) 

In [6]:
class Compose:
    def __init__(self, transformations):
        self.transform = transformations
    
    def __call__(self, moods, features):
        for trans in self.transform:
            moods, features = trans(moods, features)
        return moods, features

In [7]:
class FeatureProtuberance:
    def __init__(self, max_protuberance, phi):
        """
        :param max_protuberance: the maximum percentage of protuberance.
        If 0.5 is given then each component, c, in the feature matrix 
        will have a potential new min/max of c +- 0.5 * c.
        :param phi: the probability that a given component is going to
        be augmented. 
        """
        self.protuberance = max_protuberance
        self.phi = phi
    
    def __call__(self, moods, features):
        pct = (torch.randn(features.size()) - 0.5)
        pct = pct * self.phi
        aug = torch.randn(features.size()) > self.phi
        return moods, features + aug * pct * features

In [8]:
class Reverse:
    def __init__(self, phi):
        """
        :param phi: (0, 1), the probability that the mood states and 
        features will be reversed.
        """
        self.phi = phi
        
    def __call__(self, moods, features):
        if random.random() > self.phi:
            return moods, features
        return (torch.flip(moods, dims=(0,)),
                torch.flip(features, dims=(0,)))

In [9]:
class ISODataset(Dataset):
    """
    The `ISODataset` class packages training data into a single index-able object.
    This makes it easy for torch to use as a generator.
    """
    def __init__(self, df, maxlen=5, transform=None, batch_size=0):
        """
        Initializer.
        :param maxlen: The reader should note that this is the maximum number
        of mood transitions there can be. The constants (5) proceeding this
        block represent the number of descriptors allowed for each mood state.
        """
        self.pca = None
        self.n_comp = 11
        self.components = np.array([])
        self.mean = np.array([])
        self.df = df
        self.maxlen = maxlen
        self.batch_size = batch_size
        self.transform = transform
    
    def pca_reduction(self, percent_var=0.95):
        """
        Reduce the dimensionality of the data from 11->n. Where
        the sum of the average percentage variances calculated by
        eig / tr(D), where D is the diagonal matrix of eigenvalues
        is greater than `percent_var`. It is important to note that 
        we assume a dataframe where the initial values are still in json
        form. 
        :param var: must be greater than 0 and less than 1. 
        The method then uses these eigenvectors to reduce the dimensions
        of the audio features, and returns the number of components being
        used as well as the matrix of eigenvectors. 
        """
        # We proceed by stacking all the songs into a large matrix
        all_playlists = [json.loads(self.df.iloc[entry]['features'])
                         for entry in range(len(self.df))]
        all_playlists = np.vstack(all_playlists)
        self.mean = torch.mean(torch.from_numpy(all_playlists), 0)
        self.pca = decomposition.PCA()
        self.pca.fit(all_playlists)
        
        p_singular = self.pca.singular_values_ / sum(self.pca.singular_values_)
        counter, i = 0, 0
        while i < percent_var:
            i += p_singular[counter]
            counter += 1
        self.n_comp = counter

        self.components = torch.from_numpy(self.pca.components_[:self.n_comp,:]).float()
        return self.pca.components_[:counter,:], counter
        
    def pca_reconstruction(self, y):
        """
        This method converts a given matrix of predicted `y` datapoints
        which are in reduced PCA form back into their `full` features 
        using PCA reconstruction. This is done by simply right-multiplying
        `y` by our eigenvectors, and the adding the mean.
        """
        Xhat = torch.matmul(y, self.components)
        return Xhat + self.mean.float()
        
    def __len__(self):
        return max(len(self.df), self.batch_size)
    
    def __getitem__(self, idx):
        idx = idx % len(self.df)
        mood_states = json.loads(self.df.iloc[idx]['moods_states'])
        audio_features = self.df.iloc[idx]['features']
        audio_features = torch.Tensor(json.loads(audio_features))
        for index, state in enumerate(mood_states):
            mood_states[index] = np.pad(state, (0,5-len(state)), 
                                        constant_values=tokenizer.stoi['<pad>'])
        while len(mood_states) < 5:
            mood_states.append(np.full(5, tokenizer.stoi['<pad>']))
        mood_states = torch.LongTensor(mood_states)
        
        # augmentations
        if self.transform:
            return self.transform(mood_states, audio_features)
        return mood_states, audio_features

In [10]:
iso = ISODataset(df)

In [11]:
class TestDataset(Dataset):
    """
    The `ISODataset` class packages training data into a single index-able object.
    This makes it easy for torch to use as a generator.
    """
    def __init__(self, df, maxlen=5, transform=None, batch_size=0):
        """
        Initializer.
        :param maxlen: The reader should note that this is the maximum number
        of mood transitions there can be. The constants (5) proceeding this
        block represent the number of descriptors allowed for each mood state.
        """
        self.df = df
        self.maxlen = maxlen
        self.batch_size = batch_size
        self.transform = transform
        
    def __len__(self):
        return max(len(self.df), self.batch_size)
    
    def __getitem__(self, idx):
        idx = idx % len(self.df)
        mood_states = json.loads(self.df.iloc[idx]['moods_states'])
        for index, state in enumerate(mood_states):
            mood_states[index] = np.pad(state, (0,5-len(state)), 
                                        constant_values=tokenizer.stoi['<pad>'])
        while len(mood_states) < 5:
            mood_states.append(np.full(5, tokenizer.stoi['<pad>']))
        mood_states = torch.LongTensor(mood_states)
        
        return mood_states, self.df.iloc[idx]['length']

In [12]:
iso = ISODataset(df.copy())
iso.pca_reduction(percent_var=0.999999)
eigs = iso.components

Since the mood states are not only of variable length but also of variable dimension, we need to pad each batch so that a network will be able to process them. This preprocessing before it reaches the neural network is done through the `iso_collate` function below.

In [13]:
def iso_collate(batch):
    moods, features, lengths = [], [], []
    for data_point in batch:
        moods.append(data_point[0])
        features.append(data_point[1])
        lengths.append(len(data_point[1]))
    features = pad_sequence(features, batch_first=True)
    moods = pad_sequence(moods, batch_first=True, padding_value=tokenizer.stoi['<pad>'])
    return moods, torch.LongTensor(lengths), features

In [14]:
def test_collate(batch):
    moods, lengths = [], []
    for data_point in batch:
        moods.append(data_point[0])
        lengths.append(data_point[1])
    moods = pad_sequence(moods, batch_first=True, padding_value=tokenizer.stoi['<pad>'])
    return moods, torch.LongTensor(lengths)

# Model Architecture
The model is split into two distinct components: the attention mechanism, and the LSTM component. The interaction between these two components are shown in the figure below:

![ISONet](imgs/isonet-lstm-attention-architecture.png)

In [15]:
class Attention(pl.LightningModule):
    """
    The attention mechanism of the network. On each time step, of the LSTM,
    the LSTM cell looks at the previous hidden state as well as the input
    to the LSTM, then it weights the various dimensions of the input based
    on the hidden state / input. This is done by applying two linear on the
    hidden and input states respectively, then combining the outputs, running
    them through another linear layer, and interpolating the final weights 
    using the softmax function. The result of the attention layer, is a 
    sum product of all the weights and the respective attributes.
    """
    def __init__(self, embed_dim, attention_dim=40, maxlen=5, output_dim=6):
        """
        Initializer for the attention mechanism of the network.
        :embed_dim: the dimension of the embeddings – hyperparamters.
        :param attention_dim: specifies dimension of the hidden attention
        layer. This is simply a hyperparameter and will only affect the
        efficacy of the network, not its functionality. 
        :param maxlen: specifies the maximum number of mood transitions allowed.
        :param output_dim: the dimension of the output, this varies as we 
        include/exclude prediction feature. We note that to predict all of the features,
        we simply use the default value of 11.
        """
        super().__init__()
        self.maxlen = maxlen
        self.embed_dim = embed_dim
        self.attention_dim = attention_dim
        self.mood_attention1 = nn.Linear(self.embed_dim, self.attention_dim)
        self.mood_attention2 = nn.Linear(self.attention_dim, self.attention_dim)
        self.hidden_attention1 = nn.Linear(output_dim, self.attention_dim)
        self.hidden_attention2 = nn.Linear(self.attention_dim, self.attention_dim)
        # the input to the hidden attention is 11 as that is the size of the
        # desired output dimension.
        self.attention = nn.Linear(self.attention_dim, 1)
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax(dim=1)
    
    def forward(self, moods, hidden):
        """
        :param moods: The raw input mood states – this is of size (bs x maxlen x 5 x embed_dim).
        The reader should note that this *needs* to be preprocessed into the size
        (bs x (maxlen * 5) x embed_dim). 
        :param hidden: The previous hidden state of the LSTM cell – this should be of
        size (bs x 10).
        The result of this function is to find a weighting, or alternatively, where to 
        pay "attention" to based on the `moods` and `hidden` state. The weights
        of the attention, `alpha` of size (bs x (maxlen * 5)), is the used to in a sum product
        with the moods (bs x (maxlen * 5) x embed_dim), yielding a size of (bs x embed_dim).
        This single vector then acts as the inputs to the LSTM cell. 
        """
        att1 = self.relu(self.mood_attention1(moods))
        att1 = self.mood_attention2(att1)
        att2 = self.relu(self.hidden_attention1(hidden))
        att2 = self.hidden_attention2(att2)
        att = self.attention(self.relu(att1 + att2.unsqueeze(1)))
        alpha = self.softmax(att)
        weighted_moods = (moods * alpha).sum(dim=1)
        return weighted_moods, alpha

In [16]:
class Model(pl.LightningModule):
    """
    An attention-based, one-directional baseline model.
    """
    def __init__(self, tokenizer=tokenizer, dropout=0.0, maxlen=5, embed_dim=3, lr=1e-2,
                 weight_decay=1e-9, hidden_dim=6, output_dim=6,
                 dataset=None):
        """
        Initializer.
        :param tokenizer: the tokenizer used to create the tokenizations for
        the mood states and descriptors. 
        :param dropout: the probability of dropout of the layer between the 
        hidden state and the final output of each LSTM cell.
        :param maxlen: the maximum number of mood transition states that are allowed. 
        It should be noted that this must be greater than any of the number of the 
        states associated with each datapoint; otherwise, it will cause errors. 
        :param embed_dim: the dimensionality of each embedding.
        :param lr: learning rate of the network, TODO: implement separate learning rates
        for the attention network and the LSTM cell.
        :param output_dim: the dimension of the output, this varies as we 
        include/exclude prediction feature. We note that to predict all of the features,
        we simply use the default value of 11.
        """
        super().__init__()
        self.lr = lr
        self.bce = nn.BCELoss(reduction='none')
        self.mse = nn.MSELoss(reduction='none')
        self.weight_decay = weight_decay
        self.maxlen = maxlen
        self.hidden_dim = hidden_dim
        self.output_dim = output_dim
        self.tokenizer = tokenizer
        self.embed_dim = embed_dim
        self.dataset = dataset
        self.embedding = nn.Embedding(len(self.tokenizer.itos), self.embed_dim, 
                                      padding_idx=self.tokenizer.stoi['<pad>'])
        self.attention = Attention(self.embed_dim, output_dim=hidden_dim)
        self.h0 = nn.Linear(maxlen * self.embed_dim * 5, 
                            self.hidden_dim)
        self.c0 = nn.Linear(maxlen * self.embed_dim * 5,
                            self.hidden_dim)
        self.dropout = nn.Dropout(p=dropout)
        self.lstm = nn.LSTMCell(self.embed_dim, self.hidden_dim)
        self.sigmoid = nn.Sigmoid()
        self.forget = nn.Linear(self.hidden_dim, self.embed_dim)
        self.fc = nn.Linear(self.hidden_dim, self.output_dim)
        
    def init_hidden_states(self, x):
        """
        Given the mood states: flattened into a (bs x (maxlen * 5 * 3)) vector,
        we use two separate linear layers to find the initial cell state and 
        initial hidden state. This is necessary over simply random initialization,
        as the attention associated with the first cell is dependnet on h0.
        """
        return self.h0(x), self.c0(x)
        
    def forward(self, x):
        """
        Forward feeding of the model. The network proceeds by first converting the mood
        states into their respective embedding representations. Then, the flattened inputs
        are used to determine the initial hidden/cell states of the given LSTM. The given
        inputs are then sorted based on the length of their outputs. This makes it easier
        for prediction. 

        :param x: a tuple that contains three items, the first being the various
        mood states that are being queried, the second being the lenghts of the
        desired labels of each datapoint, and finally the features – the target 
        outputs. 
        """
        mood_states, lengths, audio_features = x
        bs = mood_states.size(0)
        mood_states = self.embedding(mood_states)
        moods = mood_states.view(bs, (self.maxlen * 5), self.embed_dim)
        
        sorted_lengths, indicies = lengths.sort(dim=0, descending=True)
        moods, audio_features = moods[indicies], audio_features[indicies]
        h, c = self.init_hidden_states(moods.view(bs, -1))  # (bs x output_dim)

        predictions = torch.zeros(bs, max(lengths), self.output_dim)
        for timestep in range(max(lengths)):
            num_predict = sum([l > timestep for l in lengths])
            attention_weighted_moods, alphas = self.attention(moods[:num_predict], 
                                                      h[:num_predict])
            gate = self.sigmoid(self.forget(h[:num_predict]))
            weighted_moods = gate * attention_weighted_moods
            h, c = self.lstm(weighted_moods, 
                             (h[:num_predict], c[:num_predict]))
            preds = self.fc(self.dropout(h))
            
            predictions[:num_predict, timestep, :] = preds
        return self.sigmoid_relevant(predictions), audio_features
    
    def sigmoid_relevant(self, predictions):
        # entropy loss for attributes ['danceability' (0), 'energy' (1), x'loudness' (2), 
        #           'speechiness' (3), 'acousticness' (4),
        #           'valence' (5)]
        loss = 0
        for attr in [0, 1, 3, 4, 5]:
            predictions[:,:,attr] = F.sigmoid(predictions[:,:,attr])
        return predictions
    
    def step(self, batch, batch_idx):
        """
        One "step" of the model. 
        """
        predictions, targets = self(batch)
        if self.dataset is not None:
            predictions = self.dataset.pca_reconstruction(predictions)
        loss = self.entropy_loss(predictions, targets)
        return loss, {'loss': loss}

    def entropy_loss(self, predictions, targets):
        # entropy loss for attributes ['danceability' (0), 'energy' (1), x'loudness' (2), 
        #           'speechiness' (3), 'acousticness' (4),
        #           'valence' (5)]
        loss = 0
        for attr in [0, 1, 3, 4, 5]:
            attr_loss = self.bce(predictions[:,:,attr], targets[:,:,attr])
            loss += abs(attr_loss).sum(axis=1).mean()
        # mse loss for attributes (loudness 2)
        for attr in [2]:
            attr_loss = self.mse(predictions[:,:,attr], targets[:,:,attr])
            loss += attr_loss.sum(axis=1).mean()
        return loss
    
    def training_step(self, batch, batch_idx):
        loss, logs = self.step(batch, batch_idx)
        self.log_dict({f'train_{k}': v for k, v in logs.items()},
                      on_step=True, on_epoch=True, sync_dist=True)
        return loss
    
    def validation_step(self, batch, batch_idx):
        loss, logs = self.step(batch, batch_idx)
        self.log_dict({f'val_{k}': v for k, v in logs.items()}, sync_dist=True)
        return loss
    
    def test_step(self, batch, batch_idx):
        mood_states, lengths = batch
        bs = mood_states.size(0)
        mood_states = self.embedding(mood_states)
        moods = mood_states.view(bs, (self.maxlen * 5), self.embed_dim)
        
        sorted_lengths, indicies = lengths.sort(dim=0, descending=True)
        moods = moods[indicies]
        h, c = self.init_hidden_states(moods.view(bs, -1))  # (bs x output_dim)

        predictions = torch.zeros(bs, max(lengths), self.output_dim)
        for timestep in range(max(lengths)):
            num_predict = sum([l > timestep for l in lengths])
            attention_weighted_moods, alphas = self.attention(moods[:num_predict], 
                                                      h[:num_predict])
            gate = self.sigmoid(self.forget(h[:num_predict]))
            weighted_moods = gate * attention_weighted_moods
            h, c = self.lstm(weighted_moods, 
                             (h[:num_predict], c[:num_predict]))
            preds = self.fc(self.dropout(h))
            
            predictions[:num_predict, timestep, :] = preds
        return predictions
    
    def configure_optimizers(self):
        """
        Configuration of the optimizer used to train the model.
        This method is implicitly called by torch lightning during training. 
        Note that the learning rate and weight decay is given by the initialization
        parameters of the model. 
        """
        return (optim.Adam(self.parameters(), lr=self.lr,
                         weight_decay=self.weight_decay))

# Training and Evaluation
This section is simplified by the torch lightning interface. Parameters involving training including epochs as well as checkpoints are paramterized in the initialization of `Trainer` object. We note here that there is no validation set as there are too little data points. Therefore, all the data is being used for training.

In [17]:
tokenizer.itos

{0: 'affectionate',
 1: 'agitated',
 2: 'amused',
 3: 'angry',
 4: 'animated',
 5: 'anxious',
 6: 'calm',
 7: 'connected',
 8: 'dreamy',
 9: 'energetic',
 10: 'energized',
 11: 'happy',
 12: 'hopeful',
 13: 'irritated',
 14: 'joyful',
 15: 'lonely',
 16: 'meditative',
 17: 'melancholic',
 18: 'motivated',
 19: 'nervous',
 20: 'powerful',
 21: 'relaxed',
 22: 'sad',
 23: 'serene',
 24: 'sluggish',
 25: 'soothed',
 26: 'tender',
 27: 'tenderness',
 28: 'tense',
 29: 'triumphant',
 30: '<sos>',
 31: '<eos>',
 32: '<pad>'}

In [18]:
transform = Compose([
#     FeatureProtuberance(0.10, 0.5),
    Reverse(0.3)
])

In [19]:
BATCHSIZE = 64
STEPS_P_EPOCH = 2000
EPOCHS = 150 

iso = ISODataset(df.copy(), transform=transform,
                 batch_size=STEPS_P_EPOCH)
# eigs, out_dim = iso.pca_reduction(percent_var=0.98)
train_loader = DataLoader(iso,
                          batch_size=BATCHSIZE,
                          collate_fn=iso_collate)
model = Model(tokenizer, embed_dim=64, hidden_dim=256, 
              dropout=0.1, lr=1e-3, weight_decay=1e-9) # , output_dim=out_dim,
              # dataset=iso)

In [20]:
trainer = pl.Trainer(max_epochs=EPOCHS)
trainer.fit(model, train_loader)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores

   | Name      | Type      | Params
-----------------------------------------
0  | bce       | BCELoss   | 0     
1  | mse       | MSELoss   | 0     
2  | embedding | Embedding | 2.1 K 
3  | attention | Attention | 16.2 K
4  | h0        | Linear    | 409 K 
5  | c0        | Linear    | 409 K 
6  | dropout   | Dropout   | 0     
7  | lstm      | LSTMCell  | 329 K 
8  | sigmoid   | Sigmoid   | 0     
9  | forget    | Linear    | 16.4 K
10 | fc        | Linear    | 1.5 K 
-----------------------------------------
1.2 M     Trainable params
0         Non-trainable params
1.2 M     Total params
4.743     Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]



Training: 0it [00:00, ?it/s]

  value = torch.tensor(value, device=device, dtype=torch.float)


In [140]:
n = datetime.datetime.today()
torch.save(model.state_dict(), f'models/modified-loss-{n.month}-{n.day}.pth')

# Inference
We proceed to test our model through inference, just to see how well it will be able to perform. I note here that with Spotify, the model was able to generate a somewhat logical 10-song playlist for sad->happy. However, there were many songs that were repeated.

I suspect this to be a result of having the same song/artist/genre as the seed genre/artist/song for all of the predictions. Therefore, in this next iteration, we may vary this between each song, and also add some random pertuberbance.

Below are the playlists that were generated on the last iteration:
```
1  : ghostin by Ariana Grande
2  : The Rose Song - From "High School Musical: The Musical: The Series (Season 2)" by Olivia Rodrigo
3  : Lose You To Love Me by Selena Gomez
4  : Cherry Wine - Live by Hozier
5  : Lose You To Love Me by Selena Gomez
6  : Lose You To Love Me by Selena Gomez
7  : Flashlight - From "Pitch Perfect 2" Soundtrack by Jessie J
8  : Starving by Hailee Steinfeld
9  : Young & Alive by Bazzi
10 : Fake by Lauv
```

An example of a longer playlist is also shown below:
```
1  : Please Notice by Christian Leave
2  : The Rose Song - From "High School Musical: The Musical: The Series (Season 2)" by Olivia Rodrigo
3  : Even When/The Best Part - From "High School Musical: The Musical: The Series (Season 2)" by Olivia Rodrigo
4  : Even When/The Best Part - From "High School Musical: The Musical: The Series (Season 2)" by Olivia Rodrigo
5  : Lose You To Love Me by Selena Gomez
6  : Lose You To Love Me by Selena Gomez
7  : Too Good At Goodbyes by Sam Smith
8  : Prom Queen by Beach Bunny
9  : Vibez by ZAYN
10 : Lose You To Love Me by Selena Gomez
11 : Bed Peace by Jhené Aiko
12 : Starving by Hailee Steinfeld
13 : Hot Stuff by Kygo
14 : sweetener by Ariana Grande
15 : Hot Stuff by Kygo
16 : Even When/The Best Part - From "High School Musical: The Musical: The Series (Season 2)" by Olivia Rodrigo
17 : Bed Peace by Jhené Aiko
18 : Spaceman by Nick Jonas
19 : Castaways by The Backyardigans
20 : One by Ed Sheeran
21 : Too Good At Goodbyes by Sam Smith
22 : Castaways by The Backyardigans
23 : happier by Olivia Rodrigo
24 : Too Good At Goodbyes by Sam Smith
25 : Funny by Zedd
26 : Wondering - From "High School Musical: The Musical: The Series" by Olivia Rodrigo
27 : The Rose Song - From "High School Musical: The Musical: The Series (Season 2)" by Olivia Rodrigo
28 : Castaways by The Backyardigans
29 : Bed Peace by Jhené Aiko
30 : Funny by Zedd
```
Note, the number of repeated songs. This could be a result of the closeness of many features.

We begin by "sanity-checking" the model by giving it some easy mood transitions (low to high, high to high). We also proceed by asking the model to generate a playlist with a random number of songs between 10 and 20.

**We load the state from the previously trained model.**

In [27]:
model = Model(tokenizer=tokenizer, embed_dim=64, hidden_dim=256, 
              dropout=0.2, lr=1e-3, weight_decay=1e-6, )
model.load_state_dict(torch.load('models/baseline-8-3.pth'))

<All keys matched successfully>

In [28]:
moods = [[['sad', 'lonely'], ['serene', 'relaxed'], ['powerful', 'energetic']],
         [['agitated', 'angry'], ['soothed', 'serene', 'relaxed']],
         [['irritated', 'nervous'], ['meditative', 'tender', 'melancholic']],
         [['joyful', 'dreamy', 'motivated'], ['motivated', 'powerful', 'happy']]]

In [29]:
mood_states = [json.dumps(tokenizer.moods_to_token(v)) for v in moods]
rows = [[v, random.randint(10, 20)] for v in mood_states]

In [30]:
dtf = pd.DataFrame(rows, columns=['moods_states', 'length'])
dtf

Unnamed: 0,moods_states,length
0,"[[22, 15], [23, 21], [20, 9]]",10
1,"[[1, 3], [25, 23, 21]]",15
2,"[[13, 19], [16, 26, 17]]",12
3,"[[14, 8, 18], [18, 20, 11]]",14


In [31]:
test = TestDataset(dtf.copy(), transform=transform,
                 batch_size=1)
test_loader = DataLoader(test,
                          batch_size=1,
                          collate_fn=test_collate)

We now generate the playlist associated with each mood; here, should probably considering on generating the playlist in batches rather than in order.

```
1  : Imagine by Jack Johnson
2  : Almost Is Never Enough by Ariana Grande
3  : Solar Power by Lorde
4  : I Will Follow You into the Dark by Death Cab for Cutie
5  : Welcome Home, Son by Radical Face
6  : She Is Love by Parachute
7  : Welcome Home, Son by Radical Face
8  : Flashlight - From "Pitch Perfect 2" Soundtrack by Jessie J
9  : sweetener by Ariana Grande
10 : Flashlight - From "Pitch Perfect 2" Soundtrack by Jessie J
11 : Little Lion Man by Mumford & Sons
12 : 1999 WILDFIRE by BROCKHAMPTON
13 : Old Friends by Pinegrove
14 : Lose You To Love Me by Selena Gomez
15 : Little Lion Man by Mumford & Sons
16 : Too Good At Goodbyes by Sam Smith
17 : Lose You To Love Me by Selena Gomez
18 : I Will Follow You into the Dark by Death Cab for Cutie
19 : Old Pine by Ben Howard
```

In [32]:
model.eval()
results = []
for batch_idx, batch in enumerate(test_loader):
    results.append(model.test_step(batch, batch_idx))

Convert each song and each respective playlist into a list.

In [33]:
results = [v.detach().numpy().tolist() for v in results]

## Get Names of Songs
Now we query the Spotify API using the `get_recommendations` endpoint.

In [35]:
API_TOKEN = open('api').readline()
SOAR_ID = "etoj1vywg8pvjmuxgovg6l9kb"
headers = {"Authorization": f"Bearer {API_TOKEN}"}

In [49]:
def get_recommendation(feature, artists, tracks, genres,
                       playlist, order, lock, beam_width):
    url = "https://api.spotify.com/v1/recommendations"
    f = ["acousticness", "danceability", "energy", "instrumentalness", 
         "key", "liveness", "loudness", "mode", "speechiness", "tempo", "valence"]
    params = {f'target_{f[i]}': v
              for i, v in enumerate(feature)}
    # round integer fields -> 
    params['target_key'] = round(params['target_key'])
    params['target_mode'] = round(params['target_mode'])
    params['target_tempo'] = round(params['target_tempo'])
    # get seed artists, tracks, and genres
    params['seed_artists'] = artists
    params['seed_tracks'] = tracks
    params['seed_genres'] = genres
    params['limit'] = 10
    # get id of recommended track
    reqst = requests.get(url, headers=headers, params=params)
    reqst = reqst.json()
    track_ids = [reqst['tracks'][i]['id'] for i in range(beam_width)]
    # store track id in appropriate index in dict
    lock.acquire()
    playlist[order] = get_track_name(','.join(track_ids))
    lock.release()

In [55]:
def get_track_name(id):
    params = {'ids': id}
    req = requests.get('https://api.spotify.com/v1/tracks', 
                       headers=headers, params=params).json()
    tracks = []
    for i in range(len(req['tracks'])):
        tracks.append((req['tracks'][i]['name'],
                       req['tracks'][i]['artists'][0]['name']))
    return tracks

In [58]:
for result in results:
    result = result[0]
    playlist = {}

    # Create threads
    artists = '66CXWjxzNUsdJxJ2JdwvnR'  # ariana grande
    seed_songs = '463CkQjx2Zk1yXoBuierM9'  # levitating
    threads = []
    lock = threading.Lock()
    for index, song in enumerate(result):
        threads.append(threading.Thread(target=get_recommendation,
                                        args=(song, artists, seed_songs, 
                                              'pop,acoustic,indie', playlist,
                                              index, lock, 5)))
    # Start threads
    for thread in threads:
        thread.start()

    # Join threads
    for thread in threads:
        thread.join()
    od = collections.OrderedDict(sorted(playlist.items()))
    for k, v in od.items():
        print(f'{str(k+1).ljust(3)}: {v}')
    break

1  : [('You Belong With Me (Taylor’s Version)', 'Taylor Swift'), ('Welcome Home, Son', 'Radical Face'), ('Arms', 'The Paper Kites'), ('Honey Jars', 'Bryan John Appleby'), ('Imagine', 'Jack Johnson')]
2  : [('Almost Is Never Enough', 'Ariana Grande'), ('Goodmorning', 'Bleachers'), ('Canyon Moon', 'Harry Styles'), ('Furr', 'Blitzen Trapper'), ('Hey There Delilah', "Plain White T's")]
3  : [('Starving', 'Hailee Steinfeld'), ('33 “GOD”', 'Bon Iver'), ('Girl Crush - Recorded at Metropolis Studios, London', 'Harry Styles'), ('Solar Power', 'Lorde'), ('Things That Stop You Dreaming', 'Passenger')]
4  : [('Too Good At Goodbyes', 'Sam Smith'), ('Master Of None', 'Beach House'), ('Funny', 'Zedd'), ('I Will Follow You into the Dark', 'Death Cab for Cutie'), ('Wrecking Ball', 'Miley Cyrus')]
5  : [('Coming Home', 'Leon Bridges'), ('You Belong With Me (Taylor’s Version)', 'Taylor Swift'), ('Love To Dream', 'Doja Cat'), ('Without Me', 'Halsey'), ('drivers license', 'Olivia Rodrigo')]
6  : [('Shimmer

# Post-Processing
Now we proceed with implementing some post-processing that will be able to select the optimal playlist from the many Spotify generated suggestions. The loss function used to score our playlist is scored through two factors: uniqueness and sentimental similarity of the lyrics. This objective function being minimized is as follows:
$$
L(x) = \text{count}\cdot\text{BCE}(\text{valence}, \text{sentiment}),
$$
where BCE represents the binary cross entropy function.

In [63]:
bce = torch.nn.BCELoss()

def loss(existing_list, song, valence, sentiment):
    count = 0
    for exists in existing_list:
        if exists[0] == song[0]:
            count += 1
    return count * bce(torch.Tensor([valence]), 
                       torch.Tensor([sentiment])) 

In [57]:
od

OrderedDict([(0,
              [('Trouble', 'Cage The Elephant'),
               ('Hello My Old Heart', 'The Oh Hellos'),
               ('Happier', 'Ed Sheeran'),
               ('off the table (with The Weeknd)', 'Ariana Grande'),
               ('Stay With Me', 'Sam Smith')]),
             (1,
              [('Happier', 'Ed Sheeran'),
               ('Too Good At Goodbyes', 'Sam Smith'),
               ('Praying', 'Kesha'),
               ('Solar Power', 'Lorde'),
               ('Be Kind (with Halsey)', 'Marshmello')]),
             (2,
              [('One', 'Ed Sheeran'),
               ('Lose You To Love Me', 'Selena Gomez'),
               ('Too Good At Goodbyes', 'Sam Smith'),
               ('Crash Into Me', 'Boyce Avenue'),
               ("I Won't Give Up", 'Jason Mraz')]),
             (3,
              [('First Day Of My Life', 'Bright Eyes'),
               ('Crash Into Me', 'Boyce Avenue'),
               ('33 “GOD”', 'Bon Iver'),
               ('Things That Stop You D