# Model

The ultimate goal of the model is to learn a __latent variable space of musical units__. Then, given a musical unit, we wish to encode that unit into a latent vector within the space, and predict the best accompaninment latent vector to that input. Finally, that accompaniment latent vector can be decoded to produce an accompanying musical unit.

This involves many tricky steps, so development will be approached incrementally:

#### 1. Convolutional Autoencoder

Given an input unit of `[num_ticks, num_pitches]`, learn a Convolutional Autoencoder model to generate an encoding of that unit.

```
INPUT -> Convolution layers -> EMBEDDING -> Deconvolution layers -> INPUT
```

Autoencoding: To test this convolutional autoencoder, generate a response to a given input unit using 
- Decoder reconstruction of same input
- Nearest-neighbor unit selection (Similar to what Bretan et al did)

De-noising: Test de-noising abilities of the autoencoder. Given a partial accompaniment input unit, generate a response of
- Decoder reconstruction of "full"/"comp" unit
- Nearest-neighbor unit selection

#### 2. LSTM of latent variables -> Generation using unit selection

Given a sequence of embeddings (from the convolutional autoencoder), predict the next embedding - and perform NN-unit-selection as before, to generate the next unit in the sequence.

#### 3. Convolutional Variational Autoencoder

Learn a new latent space using a VAE architecture. Test how well resconstruction works using
- Decoder reconstruction

#### 4. LSTM of variational latent variables -> Generation using latent space sampling 

Given a sequence of embeddings (from the VAE), predict the next embedding and generate an output musical unit by decoding the predicted embedding!!!


In [3]:
import os, shutil
import random
import sys
import numpy as np
import pypianoroll
from matplotlib import pyplot as plt
import cPickle as pickle
import pianoroll_utils

PICKLE_FILE = './pickle_jar/units_50_songs.pkl'

In [2]:
units = {}
with open(PICKLE_FILE, 'rb') as infile:
    units = pickle.load( infile )

units["full"] = units["input"] + units["comp"]

# Print info
print "Loaded", units["input"].shape[0], "units from", PICKLE_FILE
print "full_units.shape: ", units["full"].shape
print "input_units.shape: ", units["input"].shape
print "input_units_next.shape: ", units["input_next"].shape
print "comp_units.shape: ", units["comp"].shape
print "comp_units_next.shape: ", units["comp_next"].shape


Loaded 3276 units from ./pickle_jar/units_50_songs.pkl
full_units.shape:  (3276, 96, 128)
input_units.shape:  (3276, 96, 128)
input_units_next.shape:  (3276, 96, 128)
comp_units.shape:  (3276, 96, 128)
comp_units_next.shape:  (3276, 96, 128)


## 1. Convolutional Autoencoder

Given an input unit of `[num_ticks, num_pitches]`, learn a Convolutional Autoencoder model to generate an encoding of that unit.

```
INPUT -> Convolution layers -> EMBEDDING -> Deconvolution layers -> INPUT
```

### Testing

We will evaluate the autoencoder using two measures:

1. __Autoencoding__: To test this convolutional autoencoder, generate a response to a given input unit using 

    - Decoder reconstruction of same input
    - Nearest-neighbor unit selection (Similar to what Bretan et al did)

2. __De-noising__: Test de-noising abilities of the autoencoder. Given a partial accompaniment input unit, generate a response of

    - Decoder reconstruction of "full"/"comp" unit
    - Nearest-neighbor unit selection

    (inspired by Huang et al Counterpoint by Convolution, and Bretan et al Learning and Evaluating Musical Features with Deep Autoencoders)

These two tests simply require training the model on two different datasets: "full"->"full" for autoencoding, and "input"->"comp" for de-noising.

### Architecture


_Initial code adapted from the [Keras tutorial on autoencoders](https://blog.keras.io/building-autoencoders-in-keras.html)._

_Inspiration for convolution autoencoder network from "Learning and Evaluating Musical Features with Deep
Autoencoders"._


In [7]:
from keras.layers import Input, Dense, Conv2D, Conv2DTranspose, BatchNormalization, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras.callbacks import TensorBoard
from keras.models import load_model

In [13]:
# Prepare data
print "Original:", units["input"].shape
NUM_TICKS = units["input"].shape[1] # 96
NUM_PITCHES = units["input"].shape[2] # 128
assert NUM_TICKS == 96 and NUM_PITCHES == 128

# Change from [M, ticks, pitches] to [M, pitches, ticks, channels=1]
input_units = units["input"].swapaxes(1,2).reshape(len(units["input"]), NUM_PITCHES, NUM_TICKS, 1)
# Normalize values between 0 and 1
input_units = input_units.astype('float32') / 127. # 0-127 is the unnormalized velocity range
print "Reshaped:", input_units.shape

# Create an array of True (train) and False (test) to split the dataset
train_test_indices = np.random.choice([True, False], size=len(input_units), p=[.9, .1])
input_train = input_units[train_test_indices, ...]
input_test = input_units[np.invert(train_test_indices), ...]
print "Train:", input_train.shape
print "Test:", input_test.shape


Original: (3276, 96, 128)
Reshaped: (3276, 128, 96, 1)
Train: (2958, 128, 96, 1)
Test: (318, 128, 96, 1)


# Autoencoder V0

`code given below`

### Details

Based on "Learning and Evaluating Musical Features with Deep Autoencoders", but adapted for different input size.

```
Data: -
Embedding shape: (None, 1, 1, 800) -> 800 elements
Epochs: -
Batch size: -
Final loss: -
```

### Notes

Pretty sophisticated model, but unfortunately not able to train due to a `ResourceExhaustedError` upon running `model.fit`. This is most likely due to insufficient GPU memory (model is very large).

Several attempts were made to shrink the model / reduce batch size (which apparently helps), but was not able to shake the error.

### Next steps
1. Look at how to shrink this model / use an alternative model. This [SO thread](https://stackoverflow.com/questions/41526071/why-is-keras-throwing-a-resourceexhaustederror) may be helpful.

In [45]:

input_mat = Input(shape=(NUM_PITCHES, NUM_TICKS, 1))  # 'channels_last' data format (only 1 channel in our case)

## ENCODER

# First four layers are Conv2D
x = Conv2D(100, (13, 21), strides=(5,5), activation='relu', padding='valid')(input_mat)
x = BatchNormalization(axis=3)(x)
x = Conv2D(200, (2, 7), strides=(2,3), activation='relu', padding='valid')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2D(400, (2, 2), strides=(2,2), activation='relu', padding='valid')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2D(800, (2, 2), strides=(2,1), activation='relu', padding='valid')(x)
x = BatchNormalization(axis=3)(x)
# Following three are fully connected
x = Conv2D(800, (3, 1), strides=(1,1), activation='relu', padding='valid')(x)
x = BatchNormalization(axis=3)(x)
x = Dense(400, activation='relu')(x)
x = BatchNormalization()(x)
x = Dense(100, activation='relu')(x)
encoded = BatchNormalization()(x)

# at this point the representation is a 100-dimensional vector

## DECODER

# Two fully connected
decoded = Dense(400, activation='relu')(encoded)
x = BatchNormalization()(x)
x = Conv2DTranspose(800, (3, 1), strides=(1,1), activation='relu', padding='valid')(x)
x = BatchNormalization(axis=3)(x)
# Deconvolution / Convolution Transpose layers
x = Conv2DTranspose(800, (2, 2), strides=(2,1), activation='relu', padding='valid')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2DTranspose(400, (2, 2), strides=(2,2), activation='relu', padding='valid')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2DTranspose(200, (2, 7), strides=(2,3), activation='relu', padding='valid')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2DTranspose(100, (13, 21), strides=(5,5), activation='relu', padding='valid')(x)
x = BatchNormalization(axis=3)(x)
x = Conv2DTranspose(1, (NUM_TICKS, NUM_PITCHES), activation='relu', padding='valid')(x)
decoded = BatchNormalization(axis=3)(x)

autoencoder = Model(input_mat, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

In [39]:
autoencoder.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_8 (InputLayer)         (None, 128, 96, 1)        0         
_________________________________________________________________
conv2d_29 (Conv2D)           (None, 24, 16, 100)       27400     
_________________________________________________________________
batch_normalization_85 (Batc (None, 24, 16, 100)       400       
_________________________________________________________________
conv2d_30 (Conv2D)           (None, 12, 4, 200)        280200    
_________________________________________________________________
batch_normalization_86 (Batc (None, 12, 4, 200)        800       
_________________________________________________________________
conv2d_31 (Conv2D)           (None, 6, 2, 400)         320400    
_________________________________________________________________
batch_normalization_87 (Batc (None, 6, 2, 400)         1600      
__________

In [46]:
# Train model model
autoencoder.fit(input_train, input_train,
                epochs=100,
                batch_size=128,
                shuffle=True,
                validation_data=(input_test, input_test),
                callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])

Train on 2923 samples, validate on 353 samples
Epoch 1/100


ResourceExhaustedError: OOM when allocating tensor with shape[128,100,24,16] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[Node: batch_normalization_119/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=0.001, is_training=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv2d_42/Relu, batch_normalization_119/gamma/read, batch_normalization_119/beta/read, batch_normalization_119/Const_4, batch_normalization_119/Const_4)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[Node: loss_8/mul/_3855 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5422_loss_8/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


Caused by op u'batch_normalization_119/FusedBatchNorm', defined at:
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/junshern/Scripts/fyp-virtualenv/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 486, in start
    self.io_loop.start()
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/tornado/ioloop.py", line 1065, in start
    handler_func(fd_obj, events)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-45-54b84faa8e27>", line 11, in <module>
    x = BatchNormalization(axis=3)(x)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/keras/engine/topology.py", line 619, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/keras/layers/normalization.py", line 181, in call
    epsilon=self.epsilon)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 1827, in normalize_batch_in_training
    epsilon=epsilon)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 1802, in _fused_normalize_batch_in_training
    data_format=tf_data_format)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_impl.py", line 906, in fused_batch_norm
    name=name)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 2224, in _fused_batch_norm
    is_training=is_training, name=name)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
    op_def=op_def)
  File "/home/junshern/Scripts/fyp-virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[128,100,24,16] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[Node: batch_normalization_119/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=0.001, is_training=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv2d_42/Relu, batch_normalization_119/gamma/read, batch_normalization_119/beta/read, batch_normalization_119/Const_4, batch_normalization_119/Const_4)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[Node: loss_8/mul/_3855 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5422_loss_8/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.



# Autoencoder V1

`code given below`

### Details

Pretty arbitrary variant of the convolutional autoencoder architecture suggested in the [Keras tutorial](https://blog.keras.io/building-autoencoders-in-keras.html).

```
Data: input->input
Embedding shape: (None, 32, 24, 32) -> 24576 elements
Epochs: 100
Batch size: 32
Final loss: [loss: -0.0047 - val_loss: -0.0052]
```

### Notes
Final binary crossentropy loss (after 100 epochs) gives `loss: -0.0047 - val_loss: -0.0052`, which is strange since __I don't think binary crossentropy should give negative values__? Will have to investigate further.

Besides that, the decoded output looks really good. The graphs and playback are almost indistinguishable - can notice what appears to be quantization effects in the graphs, and some difference (note drops/additions, incorrect pitch) is occasionally audible. But mostly similar. 

On the whole, this model was a successful "trial" model. Demonstrates that the autoencoder actually produces a valid pianoroll, but our __input size was 12288 and embedding size is 24576__, which actually enlarges the dimensionality instead of shrinking it.

### Next steps
1. Investigate negative loss values ([most likely](https://github.com/Lasagne/Recipes/issues/54) due to some normalization issue - fix this in data preparation)
2. Am I overfitting?
3. Train on input->comp, or input->comp_next
4. Shrink the embedding layer!! 

In [8]:

input_mat = Input(shape=(NUM_PITCHES, NUM_TICKS, 1))  # 'channels_last' data format (only 1 channel in our case)

# ENCODER
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_mat)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

# At this point, the data is already represented in the embedding

# DECODER
x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_mat, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

In [9]:
autoencoder.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         (None, 128, 96, 1)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 128, 96, 32)       320       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 64, 48, 32)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 64, 48, 32)        9248      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 32, 24, 32)        0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 32, 24, 32)        9248      
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 64, 48, 32)        0         
__________

In [10]:
# Train model model
autoencoder.fit(input_train, input_train,
                epochs=100,
                batch_size=32,
                shuffle=True,
                validation_data=(input_test, input_test),
                callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])

MODEL_AUTOENCODER_V1_FILE = './models/autoencoder_v1.h5'
autoencoder.save(MODEL_AUTOENCODER_V1_FILE)# creates a HDF5 file
print "Saved Keras model to", MODEL_AUTOENCODER_V1_FILE

Train on 2934 samples, validate on 342 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100


<keras.callbacks.History at 0x7f4517f9dad0>

In [11]:
autoencoder = load_model(MODEL_AUTOENCODER_V1_FILE)

In [14]:
# Run test inputs through the autoencoder
decoded_test = autoencoder.predict(input_test)

In [None]:
# Inspect a random input-output sample
sample_index = np.random.randint(len(input_test))
sample_input = input_test[sample_index].swapaxes(0,1).reshape(NUM_TICKS, NUM_PITCHES) * 127
sample_output = decoded_test[sample_index].swapaxes(0,1).reshape(NUM_TICKS, NUM_PITCHES) * 127

print(sample_input.shape)
print(sample_output.shape)

# Plot comparison
fig, ax = plt.subplots(1,2)
fig.set_size_inches(10, 6, forward=True)
ax[0].set_title('Input')
ax[1].set_title('Output')
pypianoroll.plot_pianoroll(ax[0], sample_input, beat_resolution=24)
pypianoroll.plot_pianoroll(ax[1], sample_output, beat_resolution=24)
fig.tight_layout()

# Play comparison
pianoroll_utils.playPianoroll(sample_input)
pianoroll_utils.playPianoroll(sample_output)
