# Feature Extraction using CNN

Classifying music directly from its midi matrix does not have much grounding. Instead, we are interested in classifying music by first extracting features by using CNNs. In our case, this is done by training a CNN that can accurately predict Jazz vs Classical music and using each level of activations as features for each music. With such a network we are able to "extract" features

In [26]:
import os
import mido
import keras
import numpy as np
import sklearn.model_selection as ms
import StyleNet.midi_util as midi_util
import matplotlib.pyplot as plt

88


## Getting Data

We will reuse some code that was use to generate the original [paper](https://arxiv.org/pdf/1708.03535.pdf). Instead of having the velocity matrix as a separate matrix, we will multiply it into our notes matrix and have the label vector to be 0 for classical and 1 for jazz

In [41]:
def load_midis(base_fpath):
    fpaths = []
    for (root, dirnames, filenames) in os.walk(base_fpath):
        fpaths += [os.path.join(root, filename) for filename in filenames]
    return [mido.MidiFile(fpath) for fpath in fpaths]

def convert_midis(midis, label):
    """
        midis = a list of MidiFiles we are trying to parse
        label = 0 or 1 integer used to generate the label matrix
    """
    X = []
    Y = []
    for midi in midis:
        try:
            midi_array, velocity_array = midi_util.midi_to_array_one_hot(midi, 4)
        except:
            continue
        midi_notes = midi_array[:, ::2] * velocity_array
        midi_continuation = midi_array[:, 1::2]
        print(midi_notes.shape, midi_continuation.shape)
        X_i = np.dstack((midi_notes, midi_continuation))
        X += [X_i]
        Y += [[label]]
    return np.array(X), np.array(Y)

In [None]:
classical, classical_label = convert_midis(load_midis("TPD/classical"), 0)

In [None]:
jazz, jazz_label = convert_midis(load_midis("TPD/jazz"), 1)

In [56]:
X = np.vstack((classical, jazz))
Y = np.vstack((classical_label, jazz_label))
assert X.shape[0] == Y.shape[0]
inds = np.arange(X.shape[0])
np.random.shuffle(inds)
X, Y = X[inds], Y[inds]

In [59]:
np.save("matricies/X.npy", X)
np.save("matricies/Y.npy", Y)
print(X.shape, Y.shape)

((532, 512, 88, 2), (532, 1))


## Loading data

The data has already be preprocessed as matricies and we will just straight away load them. 

In [60]:
X = np.load("matricies/X.npy")
Y = np.load("matricies/Y.npy")
print(X.shape, Y.shape)

((532, 512, 88, 2), (532, 1))


In [270]:
X_train, X_test, Y_train, Y_test = ms.train_test_split(X, Y, test_size=0.2, random_state=43)
print X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

(425, 512, 88, 2) (107, 512, 88, 2) (425, 1) (107, 1)


## CNN Modeling

Now we are interested to see which CNN model performs the best in classification so we can use the activations as feature vectors

In [275]:
def model1(input_shape):
    X_input = keras.layers.Input(input_shape)
    print(X_input.shape)
    X = X_input
    X = keras.layers.ZeroPadding2D(padding=(8, 0))(X)
    X = keras.layers.Conv2D(filters=88, kernel_size=(17, 88), strides=(8, 1),# padding='same', 
                            name='Conv0',
                            kernel_initializer=keras.initializers.glorot_normal(seed=None),
                            bias_initializer=keras.initializers.glorot_normal(seed=None),
                            data_format="channels_last")(X)
#     X = keras.layers.Dropout(0.5)(X)    
#     X = keras.layers.MaxPooling2D(pool_size=(2, 1))(X)
    X = keras.layers.BatchNormalization(axis = 3, name = 'bn0')(X)
    X = keras.layers.Activation('relu')(X)
    print(X.shape)

#     X = keras.layers.Conv2D(filters=50, kernel_size=(10, 5), strides=(1, 1), padding='same', name='Conv1',
#                             kernel_initializer=keras.initializers.glorot_normal(seed=None),
#                             bias_initializer=keras.initializers.glorot_normal(seed=None),
#                             data_format="channels_last")(X)
#     X = keras.layers.Dropout(0.5)(X) 
#     X = keras.layers.MaxPooling2D(pool_size=(2, 2))(X)
#     X = keras.layers.BatchNormalization(axis = 3, name = 'bn1')(X)
#     X = keras.layers.Activation('relu')(X)
#     print(X.shape)
    
#     X = keras.layers.Conv2D(filters=50, kernel_size=(5, 3), strides=(1, 1), padding='same', name='Conv2',
#                             kernel_initializer=keras.initializers.glorot_normal(seed=None),
#                             bias_initializer=keras.initializers.glorot_normal(seed=None),
#                             data_format="channels_last")(X)
#     X = keras.layers.Dropout(0.5)(X) 
#     X = keras.layers.MaxPooling2D(pool_size=(2, 2))(X)
#     X = keras.layers.BatchNormalization(axis = 3, name = 'bn2')(X)
#     X = keras.layers.Activation('relu')(X)
#     print(X.shape)
#     X = keras.layers.Conv2D(filters=100, kernel_size=(5, 3), strides=(1, 1), padding='same', name='Conv3',
#                             kernel_initializer=keras.initializers.glorot_normal(seed=None),
#                             bias_initializer=keras.initializers.glorot_normal(seed=None),
#                             data_format="channels_last")(X)
#     X = keras.layers.Dropout(0.5)(X) 
#     X = keras.layers.MaxPooling2D(pool_size=(2, 2))(X)
#     X = keras.layers.BatchNormalization(axis = 3, name = 'bn3')(X)
#     X = keras.layers.Activation('relu')(X)    
#     print(X.shape)

    X = keras.layers.Flatten()(X)
#     print(X.shape)
    X = keras.layers.Dropout(0.5)(X) 
#     X = keras.layers.Dense(500, activation='sigmoid')(X)
#     X = keras.layers.Dropout(0.5)(X) 
    X = keras.layers.Dense(200, activation='sigmoid')(X)
    X = keras.layers.Dropout(0.5)(X) 
#     X = keras.layers.Dense(100, activation='sigmoid')(X)
#     X = keras.layers.Dropout(0.5)(X) 
    X = keras.layers.Dense(50, activation='sigmoid')(X)
    X = keras.layers.Dense(1, activation='sigmoid')(X)
#     X = keras.layers.Activation('sigmoid')(X)
    print(X.shape)
    model = keras.models.Model(inputs=X_input, outputs=X, name='basic')
    return model

In [276]:
m1 = model1(input_shape=(512, 88, 2))
m1.compile(optimizer='adam', loss='binary_crossentropy', metrics=['binary_accuracy'])

(?, 512, 88, 2)
(?, 64, 1, 88)
(?, 1)


In [277]:
m1.fit(X_train, Y_train, epochs = 5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0xa145d8d50>

In [278]:
preds = m1.evaluate(X_test, Y_test)
print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))

Loss = 0.3913348676445328
Test Accuracy = 0.85046729250489


### Residual Network

One other thing we are experimenting is whether we can use residual network to have deeper networks since we are attempting to see which activation has the best potential to classify genres.

## Feature Extraction

Now that we have trained our model and it perform fairly well, we can now attempt to use the weights of the layers within the model and produce the same model but with different layers of outputs. The following code is taken from [StackOverflow](https://stackoverflow.com/questions/41711190/keras-how-to-get-the-output-of-each-layer) with modifications

#### Experiments

First we define a function f that will output the activations of each layer of our model.

In [None]:
outputs = [layer.output for layer in model.layers]
f = keras.backend.function([model.input, K.learning_phase()], outputs )

In [None]:
example = [X_train[0]]
layer_outs = f([test, 1.])
print(layer_outs)