# Feature Extraction using CNN

Classifying music directly from its midi matrix does not have much grounding. Instead, we are interested in classifying music by first extracting features by using CNNs. In our case, this is done by training a CNN that can accurately predict Jazz vs Classical music and using each level of activations as features for each music. With such a network we are able to "extract" features

In [None]:
import os
import mido
import keras
import numpy as np
import sklearn.model_selection as ms
import StyleNet.midi_util as midi_util
import matplotlib.pyplot as plt

## Getting Data

We will reuse some code that was use to generate the original [paper](https://arxiv.org/pdf/1708.03535.pdf). Instead of having the velocity matrix as a separate matrix, we will multiply it into our notes matrix and have the label vector to be 0 for classical and 1 for jazz

In [None]:
def load_midis(base_fpath):
    fpaths = []
    for (root, dirnames, filenames) in os.walk(base_fpath):
        fpaths += [os.path.join(root, filename) for filename in filenames]
    return [mido.MidiFile(fpath) for fpath in fpaths]

def convert_midis(midis, label):
    """
        midis = a list of MidiFiles we are trying to parse
        label = 0 or 1 integer used to generate the label matrix
    """
    X = []
    Y = []
    for midi in midis:
        try:
            midi_array, velocity_array = midi_util.midi_to_array_one_hot(midi, 4)
        except:
            continue
        if midi_array.shape[0] < 1024:
            continue
        midi_notes = midi_array[:1024, ::2] * velocity_array[:1024]
        midi_continuation = midi_array[:1024, 1::2]
        print(midi_notes.shape, midi_continuation.shape)
        X_i = np.dstack((midi_notes, midi_continuation))
        X += [X_i]
        Y += [[label]]
    return np.array(X), np.array(Y)

In [None]:
classical_midi = load_midis("TPD/classical")
jazz_midi = load_midis("TPD/jazz")

In [None]:
classical, classical_label = convert_midis(classical_midi, 0)

In [None]:
jazz, jazz_label = convert_midis(jazz_midi, 1)

In [None]:
X = np.vstack((classical, jazz))
Y = np.vstack((classical_label, jazz_label))
assert X.shape[0] == Y.shape[0]
inds = np.arange(X.shape[0])
np.random.shuffle(inds)
X, Y = X[inds], Y[inds]

In [None]:
np.save("matricies/X.npy", X)
np.save("matricies/Y.npy", Y)
print(X.shape, Y.shape)

## Loading data

The data has already be preprocessed as matricies and we will just straight away load them. 

In [None]:
X = np.load("matricies/X.npy")
Y = np.load("matricies/Y.npy")
print(X.shape, Y.shape)

In [None]:
X_train, X_test, Y_train, Y_test = ms.train_test_split(X, Y, test_size=0.2, random_state=43)
print X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

## Neural Network Modeling

Now we are interested to see which Neural Network model performs the best in classification so we can use the activations as feature vectors. Here we are experimenting between different Neural Network architectures before settling down on a single NN to use as a feature extractor

#### Trial 1

First we have a single convolution layer to reduce the amount of weights we will need for the dense layers. then we will have 4 dense layers of size (500, 200, 100, 50) and each dense layer's activation could be used as features. We can explore which level of activation is better as a feature for classification.

In [None]:
def model1(input_shape):
    X_input = keras.layers.Input(input_shape)
    print(X_input.shape)
    X = X_input
    X = keras.layers.Conv2D(filters=30, kernel_size=(10, 5), strides=(1, 1), padding='same', 
                            name='Conv0',
                            kernel_initializer=keras.initializers.glorot_normal(seed=None),
                            bias_initializer=keras.initializers.glorot_normal(seed=None),
                            data_format="channels_last")(X)
    X = keras.layers.Dropout(0.5)(X)    
    X = keras.layers.MaxPooling2D(pool_size=(4, 2))(X)
    X = keras.layers.BatchNormalization(axis = 3, name = 'bn0')(X)
    X = keras.layers.Activation('relu')(X)
    print(X.shape)

    X = keras.layers.Flatten()(X)
    X = keras.layers.Dropout(0.5)(X) 
    X = keras.layers.Dense(500, activation='sigmoid')(X)
    X = keras.layers.Dropout(0.5)(X) 
    X = keras.layers.Dense(200, activation='sigmoid')(X)
    X = keras.layers.Dropout(0.5)(X) 
    X = keras.layers.Dense(100, activation='sigmoid')(X)
    X = keras.layers.Dropout(0.5)(X) 
    X = keras.layers.Dense(50, activation='sigmoid')(X)
    X = keras.layers.Dense(1, activation='sigmoid')(X)
    print(X.shape)
    model = keras.models.Model(inputs=X_input, outputs=X, name='basic')
    return model

In [None]:
m1 = model1(input_shape=(1024, 88, 2))
m1.compile(optimizer='adam', loss='binary_crossentropy', metrics=['binary_accuracy'])

In [None]:
m1.fit(X_train, Y_train, epochs = 25)

In [108]:
preds = m1.evaluate(X_test, Y_test)
print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))

Loss = 0.2622504429282429
Test Accuracy = 0.9252336470880241


In [None]:
m1.save("/Users/haojun/Downloads/m1.h5")

In [107]:
m1 = keras.models.load_model('/Users/haojun/Downloads/m1.h5')

#### Trial 2

Trial 2 will have 2 convolution layers and no 1 fully connected layer before the output layer. We can explore whether each of the 3 layer output could be used as feature extractors

In [None]:
def model2(input_shape):
    X_input = keras.layers.Input(input_shape)
    print(X_input.shape)
    X = X_input
    X = keras.layers.Conv2D(filters=10, kernel_size=(10, 5), strides=(1, 1), padding='same', 
                            name='Conv0',
                            kernel_initializer=keras.initializers.glorot_normal(seed=None),
                            bias_initializer=keras.initializers.glorot_normal(seed=None),
                            data_format="channels_last")(X)
    X = keras.layers.Dropout(0.5)(X)    
    X = keras.layers.MaxPooling2D(pool_size=(4, 2))(X)
    X = keras.layers.BatchNormalization(axis = 3, name = 'bn0')(X)
    X = keras.layers.Activation('tanh')(X)
    print(X.shape)
    
    X = keras.layers.Conv2D(filters=50, kernel_size=(5, 3), strides=(1, 1), padding='same', 
                            name='Conv1',
                            kernel_initializer=keras.initializers.glorot_normal(seed=None),
                            bias_initializer=keras.initializers.glorot_normal(seed=None),
                            data_format="channels_last")(X)
    X = keras.layers.Dropout(0.5)(X)    
    X = keras.layers.MaxPooling2D(pool_size=(4, 2))(X)
    X = keras.layers.BatchNormalization(axis = 3, name = 'bn1')(X)
    X = keras.layers.Activation('tanh')(X)
    print(X.shape)
    
    X = keras.layers.Flatten()(X)
    X = keras.layers.Dense(500, activation='sigmoid')(X)
    X = keras.layers.Dense(1, activation='sigmoid')(X)
    print(X.shape)
    model = keras.models.Model(inputs=X_input, outputs=X, name='basic')
    return model

In [None]:
m2 = model2(input_shape=(1024, 88, 2))
m2.compile(optimizer='adam', loss='binary_crossentropy', metrics=['binary_accuracy'])

In [105]:
m2.fit(X_train, Y_train, epochs = 10)

Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1a9b01e90>

In [106]:
preds2 = m2.evaluate(X_test, Y_test)
print ("Loss = " + str(preds2[0]))
print ("Test Accuracy = " + str(preds2[1]))

Loss = 0.5798406311284716
Test Accuracy = 0.6728971979328405


I'm very much overfitting this network so I'll attempt to reduce the complexity of the network so I don't overfit

### Residual Network

One other thing we are experimenting is whether we can use residual network to have deeper networks since we are attempting to see which activation has the best potential to classify genres.

## Feature Extraction

Now that we have trained our model and it perform fairly well, we can now attempt to use the weights of the layers within the model and produce the same model but with different layers of outputs. The following code is taken from [StackOverflow](https://stackoverflow.com/questions/41711190/keras-how-to-get-the-output-of-each-layer) with modifications

#### Experiments

First we define a function f that will output the activations of each layer of our model. The following block of code signifies which model are we going to use for the rest of the notebook

In [109]:
model = m1

Visually inspect the layers and select the layer that we want to use as feature extractors!

In [112]:
model.layers

[<keras.engine.input_layer.InputLayer at 0x1a9b0ee90>,
 <keras.layers.convolutional.Conv2D at 0x1a9b0e4d0>,
 <keras.layers.core.Dropout at 0x1a9b0e290>,
 <keras.layers.pooling.MaxPooling2D at 0x1a9b0e2d0>,
 <keras.layers.normalization.BatchNormalization at 0x1a9b0e050>,
 <keras.layers.core.Activation at 0x1a5ff0390>,
 <keras.layers.core.Flatten at 0x1a9b34050>,
 <keras.layers.core.Dropout at 0x1a9b34350>,
 <keras.layers.core.Dense at 0x1a9b34510>,
 <keras.layers.core.Dropout at 0x1a9b34610>,
 <keras.layers.core.Dense at 0x1a9b345d0>,
 <keras.layers.core.Dropout at 0x1a9b34710>,
 <keras.layers.core.Dense at 0x1a9b34890>,
 <keras.layers.core.Dropout at 0x1a9b34990>,
 <keras.layers.core.Dense at 0x1a9b349d0>,
 <keras.layers.core.Dense at 0x1a9b34ad0>]

In [118]:
outputs = [model.layers[layer].output for layer in [5, 8, 10, 12, 14]]
f = keras.backend.function([model.input, keras.backend.learning_phase()], outputs)

In [143]:
def extract_all_features(X, model, layer_nums):
    """Extract all the features by taking out the activation output of the layers specified in layer_nums
    
        X : the data that we are trying to extract features from
        model: the model that we are using to extract the features
        layer_nums: the layer number that we want to use as feature extractors
    """
    outputs = [model.layers[layer].output for layer in layer_nums]
    f = keras.backend.function([model.input, keras.backend.learning_phase()], outputs)
    layer_outs = f([X, 0.])
    return layer_outs

In [144]:
all_layer_outs = extract_all_features(X, model, [5, 8, 10, 12, 14])

In [152]:
conv, dense500, dense200, dense100, dense50 = all_layer_outs
conv = conv.reshape(conv.shape[0], -1)

In [158]:
# np.savetxt("/Users/haojun/Downloads/conv.txt", conv)
np.savetxt("/Users/haojun/Downloads/dense500.txt", dense500)
np.savetxt("/Users/haojun/Downloads/dense200.txt", dense200)
np.savetxt("/Users/haojun/Downloads/dense100.txt", dense100)
np.savetxt("/Users/haojun/Downloads/dense50.txt", dense50)
np.savetxt("/Users/haojun/Downloads/label.txt", Y)

## Supervised Learning Classifications (Exploration Phase)

Now we will move on to using classic supervised learning algorithms by using each layer activation output as our feature vector. 