# Validation of Amadeus

This notebook compares the perplexity of quantized and unquantized instances of a trained Performance RNN model in order to validate the compressed model.

## Perplexity

As described in the [Wikipedia article](https://en.wikipedia.org/wiki/Perplexity), perplexity is a "measurement of how well a probability distribution or probability model predicts a sample." If \\(T\\) is a test set and \\(q\\) is the distribution predicted by the model, then the empircal perplexity is given by

\\[
2^{H(T, q)}
\\]

where

\\[
H(T, q) = -\frac{1}{N} \sum_{i=1}^N \log_2 q(x_i)
\\]

is the empical cross-entropy on the test set \\(T\\). Note that a lower perplexity is a good thing; it means that the model is better able to predict a sample.

## Load Model

The following code loads a trained instance of Performance RNN and computes a quantized instance of the same model. It requires that the trained weights in `ecomp_w500.sess` are located in the `save/` folder and that the quantization calibration statistics are stored in a file `performance_rnn_pretrained_stats.yaml` in the `stats/` folder. It also assumes that the [MAESTRO dataset](https://magenta.tensorflow.org/datasets/maestro) has been downloaded and preprocessed (see the [Usage section of the README](https://github.com/axiom-of-joy/amadeus#usage) for preprocessing instructions).

In [1]:
import numpy as np
import numpy.testing as nptest
import torch
from torch import nn
import torch.functional as F
from tqdm import tqdm
import distiller
from distiller.modules.gru import convert_model_to_distiller_gru
import config
from data import Dataset
from model import PerformanceRNN
from quantize import Quantizer
from sequence import EventSeq

In [2]:
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
sess_path = "save/ecomp_w500.sess"
#stats_file = "stats/quant_stats.yaml"
stats_file = "stats/performance_rnn_pretrained_stats.yaml"

# Create an instance of Performance RNN and load pre-trained weights.
state = torch.load(sess_path)
model = PerformanceRNN(**state['model_config']).to(device)
model.load_state_dict(state['model_state'])

# Obtain the quantized model.
Q = Quantizer(model)
quant_model = Q.quantize(stats_file).model

We next call `eval` on both `model` and `quant_model` to disable Dropout. Note that the GRU in `model` is the usual `torch.nn.modules.GRU`, while the GRU in `quant_model` is an instance of `DistillerGRU`, the quantized GRU class I created in my forked Distiller repository. Also note that many of the linear and activation layers in the original model have been replaced by instances of `RangeLinearQuantParamLayerWrapper`, a Distiller class that takes care of quantizing inputs and de-quantizing outputs to the layer.

In [3]:
model.eval()

PerformanceRNN(
  (inithid_fc): Linear(in_features=32, out_features=1536, bias=True)
  (inithid_fc_activation): Tanh()
  (event_embedding): Embedding(240, 240)
  (concat_input_fc): Linear(in_features=265, out_features=512, bias=True)
  (concat_input_fc_activation): LeakyReLU(negative_slope=0.1, inplace)
  (gru): GRU(512, 512, num_layers=3, dropout=0.3)
  (output_fc): Linear(in_features=1536, out_features=240, bias=True)
  (output_fc_activation): Softmax()
)

In [4]:
quant_model.eval()

PerformanceRNN(
  (inithid_fc): FP16Wrapper(
    (wrapped_module): Linear(in_features=32, out_features=1536, bias=True)
  )
  (inithid_fc_activation): Tanh()
  (event_embedding): FP16Wrapper(
    (wrapped_module): Embedding(240, 240)
  )
  (concat_input_fc): FP16Wrapper(
    (wrapped_module): Linear(in_features=265, out_features=512, bias=True)
  )
  (concat_input_fc_activation): LeakyReLU(negative_slope=0.1, inplace)
  (gru): DistillerGRU(512, 512, num_layers=3, dropout=0.30, bidirectional=False)
  (output_fc): FP16Wrapper(
    (wrapped_module): Linear(in_features=1536, out_features=240, bias=True)
  )
  (output_fc_activation): Softmax()
)

## Compute Model Size

The following computes the number of parameters in the original model Performance RNN. The quantized model contains the same number of parameters, but each parameter is an 8-bit int rather than a 32-bit float.

In [5]:
model._modules

OrderedDict([('inithid_fc',
              Linear(in_features=32, out_features=1536, bias=True)),
             ('inithid_fc_activation', Tanh()),
             ('event_embedding', Embedding(240, 240)),
             ('concat_input_fc',
              Linear(in_features=265, out_features=512, bias=True)),
             ('concat_input_fc_activation',
              LeakyReLU(negative_slope=0.1, inplace)),
             ('gru', GRU(512, 512, num_layers=3, dropout=0.3)),
             ('output_fc',
              Linear(in_features=1536, out_features=240, bias=True)),
             ('output_fc_activation', Softmax())])

In [6]:
sigma = 0
for p in model._modules['output_fc'].parameters():
    sigma += p.numel()
print(sigma)

368880


In [7]:
sum(p.numel() for p in model.parameters())

5341168

## Evaluate the perplexity of the original and quantized models.

In [8]:
# Load test set (MAESTRO dataset was not used for training or validation).
data_path = "dataset/processed/maestro"
dataset = Dataset(data_path, verbose=True)
dataset_size = len(dataset.samples)
assert dataset_size > 0

# Define parameters for generation.
# (Same as for collecting quantization calibration statitics).
#controls = None
event_dim = EventSeq.dim()
num_iters = 100
batch_size = config.collect_quant_stats['batch_size']
window_size = config.collect_quant_stats['window_size']
stride_size = config.collect_quant_stats['stride_size']
use_transposition = config.collect_quant_stats['use_transposition']
control_ratio = config.collect_quant_stats['control_ratio']
teacher_forcing_ratio = config.collect_quant_stats['teacher_forcing_ratio']

# Create batch generator.
batch_gen = dataset.batches(batch_size, window_size, stride_size)

Define a function for computing the empirical perplexity.

In [9]:
def compute_perplexity(model, batch_gen, num_iters, event_dim,
                       batch_size, window_size, use_transposition,
                       control_ratio, teacher_forcing_ratio):
    """
    Computes the empirical perplexity of model on the data in batch_gen.
    
    Args:
        model (Performance RNN): A quantized or unquantized instance of
            Performance RNN.
        batch_gen: A batch generator created from an instance of Dataset.
        num_iters (int): Number of iterations through batch_gen.
        event_dim (int): Dimension of EventSeq object.
        batch_size (int): Batch size.
        window_size (int): Number of note events in window when predicting sample.
        use_transposition (bool): Indicates whether to transpose inputs.
        control_ratio (float): Control ratio.
        teacher_forcing_ratio (float): Teacher forcing ratio.
        
    Returns:
        perplexity (float): Empirical perplexity of model on data in batch_gen.
    """

    # Define loss functions.
    loss_function = nn.CrossEntropyLoss()
    log_softmax = nn.LogSoftmax(dim=1)
    nnl = nn.NLLLoss(reduction='sum')
    
    # Accumulated loss and number of samples.
    acc_loss = 0
    N = 0

    for iteration in tqdm(range(num_iters)):
        events, controls = next(batch_gen)

        if use_transposition:
            offset = np.random.choice(np.arange(-6, 6))
            events, controls = utils.transposition(events, controls, offset)

        events = torch.LongTensor(events).to(device)
        assert events.shape[0] == window_size

        if np.random.random() < control_ratio:
            controls = torch.FloatTensor(controls).to(device)
            assert controls.shape[0] == window_size
        else:
            controls = None

        init = torch.randn(batch_size, model.init_dim).to(device)
        outputs = model.generate(init, window_size, events=events[:-1], controls=controls,
                                 teacher_forcing_ratio=teacher_forcing_ratio, output_type='logit')

        assert outputs.shape[:2] == events.shape[:2]

        loss1 = loss_function(outputs.view(-1, event_dim), events.view(-1))
        pred = log_softmax(outputs.view(-1, event_dim))
        n = pred.shape[0]
        loss2 = nnl(pred, events.view(-1))
        acc_loss += loss2
        N += n

        # Check to make sure we're calculating the correct loss.
        nptest.assert_array_almost_equal(loss1.cpu().detach().numpy(), loss2.cpu().detach().numpy() / n)

    acc_cross_entropy_loss = acc_loss / N
    perplexity = acc_cross_entropy_loss.exp()
    return perplexity

Compute the empirical perplexity for both the quantized and unquantized models (this may take a while).

In [10]:
unquant_perplexity = compute_perplexity(model, batch_gen, num_iters, event_dim,
                                        batch_size, window_size, use_transposition,
                                        control_ratio, teacher_forcing_ratio)
quant_perplexity = compute_perplexity(quant_model, batch_gen, num_iters, event_dim,
                                      batch_size, window_size, use_transposition,
                                      control_ratio, teacher_forcing_ratio)

100%|██████████| 100/100 [00:57<00:00,  1.77it/s]
100%|██████████| 100/100 [05:46<00:00,  3.43s/it]


Compare the quantized and unquantized perplexities.

In [11]:
unquant_perplexity

tensor(8.1709, device='cuda:0', grad_fn=<ExpBackward>)

In [12]:
quant_perplexity

tensor(8.7300, device='cuda:0', grad_fn=<ExpBackward>)