## CNN Model

The model I’ve created is as follow:

- The input is composed of 192 (frequency bins) x 9 (time frames) CQT images, representing 200ms of isolated acoustic guitar audio.
- I’ve added three convolutional layers, each with a filter size of 3 x 3. The first convolutional layer has 32 filters, and the latter two each have 64. Each convolution is immediately followed by a Rectified Linear Unit (ReLU) activation.
- The feature maps are then subsampled by a max pooling layer. Both the filter size and the stride for this operation are 2 x 2.
- The structure is then flattened and followed by a dense layer of dimension 128, which includes a ReLU activation. This is connected to a second dense layer of dimension 126 with no activation.
- Finally,  the vector is reshaped to 6 x 21, and a 6-dimensional softmax activation is applied. The output shape represents the 6 guitar strings and the 21 different classes related to each string

In [5]:
import librosa as _librosa
import librosa.display as _display
from presets import Preset

import numpy as np
import matplotlib.pyplot as plt

from os import listdir, mkdir, makedirs
from os.path import isfile, join, isdir

import pickle

import jams

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

librosa = Preset(_librosa)
librosa.display = _display

duration= 0.2


In [6]:
## Load image into memory

print('Loading training data...')
x_train = pickle.load(open('data/x_train.data', 'rb'))
print('Done')


Loading training data...
Done


In [7]:
# Loading groundtruth
notesInString = [['e string OFF', 'E2', 'F2', 'F#2', 'G2', 'G#2', 'A2', 'A#2', 'B2', 
                  'C3', 'C#3', 'D3', 'D#3', 'E3', 'F3', 'F#3', 'G3', 'G#3', 'A3', 'A#3', 'B3'],
                 ['A string OFF', 'A2', 'A#2', 'B2', 'C3', 'C#3', 'D3', 'D#3', 'E3', 
                  'F3', 'F#3', 'G3', 'G#3', 'A3', 'A#3', 'B3', 'C4', 'C#4', 'D4', 'D#4', 'E4'],
                 ['D string OFF', 'D3', 'D#3', 'E3', 'F3', 'F#3', 'G3', 'G#3', 'A3', 
                  'A#3', 'B3', 'C4', 'C#4', 'D4', 'D#4', 'E4', 'F4', 'F#4', 'G4', 'G#4', 'A4'],
                 ['G string OFF', 'G3', 'G#3', 'A3', 'A#3', 'B3', 'C4', 'C#4', 'D4', 
                  'D#4', 'E4', 'F4', 'F#4', 'G4', 'G#4', 'A4', 'A#4', 'B4', 'C5', 'C#5', 'D5'],
                 ['B string OFF', 'B3', 'C4', 'C#4', 'D4', 'D#4', 'E4', 'F4', 'F#4', 
                  'G4', 'G#4', 'A4', 'A#4', 'B4', 'C5', 'C#5', 'D5', 'D#5', 'E5', 'F5', 'F#5'],
                 ['E string OFF', 'E4', 'F4', 'F#4', 'G4', 'G#4', 'A4', 'A#4', 'B4', 
                  'C5', 'C#5', 'D5', 'D#5', 'E5', 'F5', 'F#5', 'G5', 'G#5', 'A5', 'A#5', 'B5']]

print('Loading training data...')
y_train = pickle.load(open('data/y_train.data', 'rb'))

print('Done')




Loading training data...
Done


In [8]:
print('Creating one-hot-encoding array')
y_train_one_hot= []
for i in range(len(y_train)):
    y_train_one_hot.append(tf.keras.utils.to_categorical(y_train[i], num_classes=21))

print('Done')
print(len(x_train))




Creating one-hot-encoding array
Done
52598


In [9]:
for i in range(len(y_train)):    
    print(y_train_one_hot[i].shape)
    
    

(52598, 21)
(52598, 21)
(52598, 21)
(52598, 21)
(52598, 21)
(52598, 21)


In [10]:

trainSplit = 46000
testSplit = 46000

X_train = x_train[0:trainSplit]

e_train = y_train_one_hot[0][0:trainSplit]
A_train = y_train_one_hot[1][0:trainSplit]
D_train = y_train_one_hot[2][0:trainSplit]
G_train = y_train_one_hot[3][0:trainSplit]
B_train = y_train_one_hot[4][0:trainSplit]
E_train = y_train_one_hot[5][0:trainSplit]

X_test = x_train[testSplit:50000]

e_test = y_train_one_hot[0][testSplit:50000]
A_test = y_train_one_hot[1][testSplit:50000]
D_test = y_train_one_hot[2][testSplit:50000]
G_test = y_train_one_hot[3][testSplit:50000]
B_test = y_train_one_hot[4][testSplit:50000]
E_test = y_train_one_hot[5][testSplit:50000]

batch_size = 32

X_train = np.reshape(X_train, (len(X_train), 84, 9, 1))
X_test = np.reshape(X_test, (len(X_test), 84, 9, 1))

input_shape = X_train.shape[1:]
print('Input shape: ', input_shape)
# Optimizer
epochs = 30
learning_rate = 0.01 
momentum = 0.8
decay = learning_rate/epochs
sgd = keras.optimizers.SGD(lr = learning_rate, momentum = momentum, decay = decay, nesterov = False)

# Training (Functional Method)
model_in = keras.Input(shape = input_shape)
conv1 = Conv2D(32, kernel_size = (3, 3), activation = 'relu')(model_in)
conv2 = Conv2D(64, kernel_size = (3, 3), activation = 'relu')(conv1)
conv3 = Conv2D(64, kernel_size = (3, 3), activation = 'relu')(conv2)
pool1 = MaxPooling2D(pool_size = (2, 2), strides = (2, 2))(conv3)
flat = Flatten()(pool1)

# Create fully connected model heads
y1 = Dense(128, activation = 'relu')(flat)
y1 = Dropout(0.5)(y1)
y1 = Dense(126)(y1)
y1 = Dropout(0.2)(y1)

y2 = Dense(128, activation = 'relu')(flat)
y2 = Dropout(0.5)(y2)
y2 = Dense(126)(y2)
y2 = Dropout(0.2)(y2)

y3 = Dense(128, activation = 'relu')(flat)
y3 = Dropout(0.5)(y3)
y3 = Dense(126)(y3)
y3 = Dropout(0.2)(y3)

y4 = Dense(128, activation = 'relu')(flat)
y4 = Dropout(0.5)(y4)
y4 = Dense(126)(y4)
y4 = Dropout(0.2)(y4)

y5 = Dense(128, activation = 'relu')(flat)
y5 = Dropout(0.5)(y5)
y5 = Dense(126)(y5)
y5 = Dropout(0.2)(y5)

y6 = Dense(128, activation = 'relu')(flat)
y6 = Dropout(0.5)(y6)
y6 = Dense(126)(y6)
y6 = Dropout(0.2)(y6)

# Connect heads to final output layer
out1 = Dense(21, activation = 'softmax', name = 'estring')(y1)
out2 = Dense(21, activation = 'softmax', name = 'Astring')(y2)
out3 = Dense(21, activation = 'softmax', name = 'Dstring')(y3)
out4 = Dense(21, activation = 'softmax', name = 'Gstring')(y4)
out5 = Dense(21, activation = 'softmax', name = 'Bstring')(y5)
out6 = Dense(21, activation = 'softmax', name = 'Estring')(y6)

# Create model
model = keras.Model(inputs = model_in, outputs = [out1, out2, out3, out4, out5, out6]) #, out2, out3, out4, out5, out6])
model.compile(optimizer = sgd, loss = ['categorical_crossentropy', 'categorical_crossentropy', 
                                       'categorical_crossentropy', 'categorical_crossentropy', 
                                       'categorical_crossentropy', 'categorical_crossentropy'],
              metrics = ['accuracy'])



Input shape:  (84, 9, 1)


In [11]:

history = model.fit(X_train, [e_train, A_train, D_train, G_train, B_train, E_train],
                    batch_size = batch_size, epochs = epochs, verbose = 1,
                    validation_data = (X_test, [e_test, A_test, D_test, G_test, B_test, E_test]))

model.save('model.k')


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30


Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30


Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: model.k/assets
