# Polyphnic Music Dataset

## Overview

We evaluate temporal convolutional network (TCN) on two popular polyphonic music datasets. The goal here is to predict the next note given some history of the notes played.

**NOTE**: 
- Each sequence can have a different length. In the current implementation, we simply train each sequence separately (i.e. batch size is 1), but one can zero-pad all sequences to the same length and train by batch.

- While each data is binary, the fact that there are 88 dimensions (for 88 keys) means there are essentially `2^88` "classes". Therefore, instead of directly predicting each key directly, we follow the standard practice so that a sigmoid is added at the end of the network. This ensures that every entry is converted to a value between 0 and 1 to compute the NLL loss.

## Settings

In [1]:
import torch as th
import torch.nn as nn
import os
import torch.nn.functional as F
from tqdm.notebook import tqdm

DEVICE = "cuda:0"
DROPOUT = 0.25
CLIP = 0.2
EPOCHS = 10
KSIZE = 5
LEVELS = 4
LR = 1e-3
OPTIM = "Adam"
NHID = 150

DATASET = "JSB" # JSB, Muse, Nott, Piano
DATA_ROOT = "/home/densechen/dataset/mdata"

SEED = 1111
INPUT_SIZE = 88
CHANNEL_SIZES = [NHID] * LEVELS

th.manual_seed(SEED)

<torch._C.Generator at 0x7fede81a8950>

## Data Generation

**JSB Chorales** dataset (Allan & Williams, 2005) is a polyphonic music dataset con-
sisting of the entire corpus of 382 four-part harmonized chorales by J. S. Bach. In a polyphonic
music dataset, each input is a sequence of elements having 88 dimensions, representing the 88 keys
on a piano. Therefore, each element `x_t` is a chord written in as binary vector, in which a “1” indicates
a key pressed.

**Nottingham** dataset is a collection of 1200 British and American folk tunes. Not-
tingham is a much larger dataset than JSB Chorales. Along with JSB Chorales, Nottingham has
been used in a number of works that investigated recurrent models’ applicability in polyphonic mu-
sic, and the performance for both tasks are measured in terms
of negative log-likelihood (NLL) loss.


In [2]:
from scipy.io import loadmat
import numpy as np

def data_generator():
    data = loadmat(
        {"JSB": os.path.join(DATA_ROOT, "JSB_Chorales.mat"),
         "Muse": os.path.join(DATA_ROOT, "MuseData.mat"),
         "Nott": os.path.join(DATA_ROOT, "Nottingham.mat"),
         "Piano": os.path.join(DATA_ROOT, "Piano_midi.mat")}[DATASET])
    x_train = data["traindata"][0]
    x_valid = data["validdata"][0]
    x_test = data["testdata"][0]
    
    for data in [x_train, x_valid, x_test]:
        for i in range(len(data)):
            data[i] = th.Tensor(data[i].astype(np.float64))
    
    return x_train, x_valid, x_test

print("Producing data...")
x_train, x_valid, x_test = data_generator()
print("Finished.")

Producing data...
Finished.


## Define Model

In [3]:
from core.tcn import TemporalConvNet

class TCN(nn.Module):
    def __init__(self, input_size, output_size, num_channels, kernel_size, dropout):
        super().__init__()
        
        self.tcn = TemporalConvNet(input_size, num_channels, kernel_size, dropout=dropout)
        
        self.linear = nn.Linear(num_channels[-1], output_size)
        self.sig = nn.Sigmoid()
    
    def forward(self, x):
        # x needs to have dimension [N, C, L]
        output = self.tcn(x.transpose(1, 2)).transpose(1, 2)
        output = self.linear(output).double()
        return self.sig(output)

print("Building model...")

model = TCN(INPUT_SIZE, INPUT_SIZE, CHANNEL_SIZES, KSIZE, dropout=DROPOUT)

model = model.to(DEVICE)

optimizer = getattr(th.optim, OPTIM)(model.parameters(), lr=LR)
print("Finished.")

Building model...
Finished.


## Run

In [6]:
def evaluate(x_data, name='Eval'):
    model.eval()
    eval_idx_list = np.arange(len(x_data), dtype="int32")
    total_loss = 0.0
    count = 0
    with th.no_grad():
        for idx in eval_idx_list:
            data_line = x_data[idx]
            x, y = data_line[:-1], data_line[1:]
            x, y = x.to(DEVICE), y.to(DEVICE)
            output = model(x.unsqueeze(0)).squeeze(0)
            loss = -th.trace(th.matmul(y, th.log(output).float().t()) +
                             th.matmul((1-y), th.log(1-output).float().t()))
            total_loss += loss.item()
            count += output.size(0)
    eval_loss = total_loss / count
    print(name + " loss: {:.5f}".format(eval_loss))

def train(ep):
    model.train()
    train_idx_list = np.arange(len(x_train), dtype="int32")
    np.random.shuffle(train_idx_list)
    process = tqdm(train_idx_list)
    
    for idx in process:
        data_line = x_train[idx]
        x, y = data_line[:-1], data_line[1:]
        x, y = x.to(DEVICE), y.to(DEVICE)

        optimizer.zero_grad()
        output = model(x.unsqueeze(0)).squeeze(0)
        loss = -th.trace(th.matmul(y, th.log(output).float().t()) +
                         th.matmul((1 - y), th.log(1 - output).float().t()))
        if CLIP > 0:
            th.nn.utils.clip_grad_norm_(model.parameters(), CLIP)
        loss.backward()
        optimizer.step()
        process.set_description(f"Train Epoch: {ep:2d}, loss: {loss.item():.6f}")

for ep in range(1, EPOCHS+1):
    train(ep)
    vloss = evaluate(x_valid, name='Validation')
    tloss = evaluate(x_test, name='Test')

  0%|          | 0/229 [00:00<?, ?it/s]

Validation loss: 10.53107
Test loss: 10.63705


  0%|          | 0/229 [00:00<?, ?it/s]

Validation loss: 9.53949
Test loss: 9.66112


  0%|          | 0/229 [00:00<?, ?it/s]

Validation loss: 9.26661
Test loss: 9.36112


  0%|          | 0/229 [00:00<?, ?it/s]

Validation loss: 9.04405
Test loss: 9.13224


  0%|          | 0/229 [00:00<?, ?it/s]

Validation loss: 8.91696
Test loss: 9.00111


  0%|          | 0/229 [00:00<?, ?it/s]

Validation loss: 8.80907
Test loss: 8.88915


  0%|          | 0/229 [00:00<?, ?it/s]

Validation loss: 8.73970
Test loss: 8.82649


  0%|          | 0/229 [00:00<?, ?it/s]

Validation loss: 8.66689
Test loss: 8.73835


  0%|          | 0/229 [00:00<?, ?it/s]

Validation loss: 8.61581
Test loss: 8.69541


  0%|          | 0/229 [00:00<?, ?it/s]

Validation loss: 8.61180
Test loss: 8.66965
