# Realizing Velocity Prediction with CNN

This part of the notebook attempts to realize the velocity prediction with CNN as opposed to the LSTM models used in the original [paper](https://arxiv.org/pdf/1708.03535.pdf). 

In [3]:
import os
import mido
import keras
import numpy as np
import StyleNet.midi_util as midi_util

ImportError: No module named mido

### Part 1 Loading and processing data

In this part we would first process the data into a form that we can use. We will be reusing code from the [GitHub repo](https://github.com/imalikshake/StyleNet) of the paper author to process the data the way he did it.

First, we use the code to convert it into the format that they provided

In [1]:
fpath = "TPD/classical/bach_846_format0.mid"
midi = mido.MidiFile(fpath)
midi_array, velocity_array = midi_util.midi_to_array_one_hot(midi, 4)

NameError: name 'mido' is not defined

Now we inspect the data that we loaded

In [None]:
print "midi_array shape = %s" % str(midi_array.shape)
print "velocity_array shape = %s" % str(velocity_array.shape)

Now we attempt to divide up the midi_array into 2 layers, so the data would be a volumn instead of a 1d array. This may allow us to learn better features through convolution.

In [None]:
midi_notes = midi_array[:, ::2]
midi_continuation = midi_array[:, 1::2]
X = np.dstack((midi_notes, midi_continuation))
print "X shape = %s" % str(X.shape)

This can be generalized to loading and converting an entire subset of music files. The code to do that is below

In [38]:
def load_midis(base_fpath):
    fpaths = []
    for (root, dirnames, filenames) in os.walk(base_fpath):
        fpaths += [os.path.join(root, filename) for filename in filenames]
    return [mido.MidiFile(fpath) for fpath in fpaths]

def convert_midis(midis):
    X = []
    Y = []
    for midi in midis:
        print "size of X is %d" % len(X)
        try:
            midi_array, velocity_array = midi_util.midi_to_array_one_hot(midi, 4)
        except:
            continue
        midi_notes = midi_array[:, ::2]
        midi_continuation = midi_array[:, 1::2]
        X_i = np.dstack((midi_notes, midi_continuation))
        X += [X_i]
        Y += [velocity_array]
    return np.stack(X), np.stack(Y)

In [None]:
X, Y = convert_midis(load_midis("TPD/classical"))

In [84]:
print "X's shape is %s" % str(X.shape)
print "X[0]'s shape is %s" % str(X[0].shape)
print "Y's shape is %s" % str(Y.shape)
print "Y[0]'s shape is %s" % str(Y[0].shape)

X's shape is (183,)
X[0]'s shape is (2048, 88, 2)
Y's shape is (183,)
Y[0]'s shape is (2048, 88)


Now we can store these matricies and load them later so we don't loose them. Note that the shapes of these matricies are not what we intended and needs further processing. The X matrix contains 183 3D matricies but it iself is not a 4D matrix because each matrix within it does not have the same dimention. In order to solve this problem we need to either pad the songs so that they are the same time, or end songs early and only take a sample from the song so we can make the matricies the same dimension. This will be explored later

In [86]:
np.save("matricies/X.npy", X)
np.save("matricies/Y.npy", X)

### Part 2 Modeling
Now we have represented our data, we would like to see if we can build a model that predicts the velocities through the 3D matrix we generated

#### 2.1 Attempt to use a CNN of filter size (100, 1)

In [79]:
def model(input_shape):
    X_input = keras.layers.Input(input_shape)
    X = keras.layers.Conv2D(filters=50, kernel_size=(100, 1), strides=(1, 1), padding='same', name='Conv0')(X_input)
#     X = keras.layers.BatchNormalization(axis = 2, name = 'bn0')(X)
    X = keras.layers.Activation('relu')(X)
    
#     X = keras.layers.Conv2D(filters=3, kernel_size=(1, 88), strides=(1, 1), padding='same', name='Conv1')(X_input)
#     X = BatchNormalization(axis = 3, name = 'bn1')(X)
#     X = Activation('relu')(X)
    
    X = keras.layers.Conv2D(filters=1, kernel_size=(100, 1), strides=(1, 1), padding='same', name='Conv1')(X_input)
    model = keras.models.Model(inputs=X_input, outputs=X, name='basic')
    return model

In [80]:
m = model((None, 88, 2))
m.compile(optimizer='adam', loss='mean_absolute_error', metrics=['accuracy'])

In [None]:
m.fit(X[0], Y, epochs = 10)

#### 2.2 Attempt to use CNN and filter size (100, 88)

Playground

In [51]:
X2 = X.tolist()

In [55]:
X2 = np.matrix(X2)

In [72]:
np.max()

(2048, 88, 2)