# Audio key estimation of digital music with CNNs

## Data Preprocessing
---

### Million Song Dataset
- utilized to select appropriate song samples
- holds information about key and mode per song (targets)

Juypter Notebook <a href='./00.hlp/msd/msd.ipynb'>msd</a>

outputs: csv file *songs_conf=75_tracks_filt.csv*, which holds all songs with key confidence and mode confidence > 0.75

In [6]:
# LIST SELECTED SONGS
import os
import pandas as pd
from IPython.display import display

selsongsfile = os.path.join ('00.hlp', 'msd', 'songs_conf=75_tracks_filt.csv')
selsongs = pd.read_csv (selsongsfile, header=0, index_col=0)
display (selsongs.head (1))
print ('[i] number of records:', len (selsongs))

Unnamed: 0,key,key_confidence,mode,mode_confidence,track_id,song_id,artist_name,song_title
0,7,0.896,1,0.852,TRMMMGL128F92FD6AB,SOHSSPG12A8C144BE0,Clifford T. Ward,Mad About You


[i] number of records: 47913


In [7]:
# LOAD AUDIO DATASET
import os
import numpy as np
from sklearn import datasets

PARAM_RND_STATE = 42

container_path = os.path.join ('src_audio')
load_content = False
description = ['key C, mode minor', 'key C, mode major',
               'key C#, mode minor', 'key C#, mode major',
               'key D, mode minor', 'key D, mode major',
               'key D#, mode minor', 'key D#, mode major',
               'key E, mode minor', 'key E, mode major',
               'key F, mode minor', 'key F, mode major',
               'key F#, mode minor', 'key F#, mode major',
               'key G, mode minor', 'key G, mode major',
               'key G#, mode minor', 'key G#, mode major',
               'key A, mode minor', 'key A, mode major',
               'key A#, mode minor', 'key A#, mode major',
               'key B, mode minor', 'key B, mode major']

src_audio_data = datasets.load_files (container_path=container_path,
                                      description=description,
                                      load_content=load_content,
                                      random_state=PARAM_RND_STATE)

In [8]:
# FYI: LIST SOME OF THE USED SONGS
filenames = list (os.path.basename (filepath) for filepath in src_audio_data['filenames'])
usedsongs_track_id = list (os.path.splitext (fn)[0] for fn in filenames)
usedsongs = selsongs.query ('track_id in @usedsongs_track_id')

display (usedsongs.sample(5))
print ('[i] min of: key_confidence =', usedsongs['key_confidence'].min (), ',', \
       'mode_confidence =', usedsongs['mode_confidence'].min ())
print ('[i] number of records:', len (usedsongs))

Unnamed: 0,key,key_confidence,mode,mode_confidence,track_id,song_id,artist_name,song_title
8060,5,0.982,0,0.917,TRCAQKL12903CB7415,SOSEFLP12AB0189867,Osiris the Rebirth,Technology
32847,7,0.934,0,0.91,TREDRTV12903D03829,SOOBTFX12AC3DF8470,Paul Keeley,Kaleidoscope
44694,2,1.0,1,1.0,TRKBVJU128F92FCAE0,SOTWGOF12AB017BDAE,Staggered Crossing,Save Me Tonight
36170,10,1.0,0,0.963,TRSUJVV128F147CDC7,SODJLQI12A6D4F7CCE,Heather Small,Everything's Alright
21037,7,0.974,0,1.0,TRANLXD128F146AA97,SOFILTW12A6D4F9C31,Jocelyn Pook,Saffron


[i] min of: key_confidence = 0.809 , mode_confidence = 0.777
[i] number of records: 240


### Feature Extraction
- create spectrograms of audio files with discrete Fourier transform (DFT)
- save spectrograms as images for further use in CNN

Juypter Notebook <a href='./00.hlp/fft/fft.ipynb'>fft</a>

ouptuts: spectrograms (png images) of audio files with same folder structure as *src_audio* in new container path named *src_spectro*

**Example of a spectrogram image**

<img src ='./src_spectro/7-0/TREDRTV12903D03829.png' align=left>

## Model Preparation
---

### Load and prepare data

In [18]:
# LOAD SPECTROGRAM FILENAMES
import os
import numpy as np
from sklearn import datasets

PARAM_RND_STATE = 42

container_path = os.path.join ('src_spectro')
load_content = False
description = ['key C, mode minor', 'key C, mode major',
               'key C#, mode minor', 'key C#, mode major',
               'key D, mode minor', 'key D, mode major',
               'key D#, mode minor', 'key D#, mode major',
               'key E, mode minor', 'key E, mode major',
               'key F, mode minor', 'key F, mode major',
               'key F#, mode minor', 'key F#, mode major',
               'key G, mode minor', 'key G, mode major',
               'key G#, mode minor', 'key G#, mode major',
               'key A, mode minor', 'key A, mode major',
               'key A#, mode minor', 'key A#, mode major',
               'key B, mode minor', 'key B, mode major']

src_spectro_data = datasets.load_files (container_path=container_path,
                                        description=description,
                                        load_content=load_content,
                                        random_state=PARAM_RND_STATE)
src_spectro_data.keys ()

dict_keys(['DESCR', 'filenames', 'target_names', 'target'])

In [19]:
src_spectro_data['filenames'][0]

'src_spectro/1-0/TRLZZOJ128F1494C12.png'

### read images, make tensors
Keras Conv2D layers expect a **4D tensor with shape (batch, rows, cols, channels)** (if param data_format='channels_last') (src: <a href='https://keras.io/layers/convolutional/#conv2d'>Keras Conv2D</a>)

In [59]:
# open a random image and take a look at the attributes
import numpy as np
from PIL import Image

im = Image.open (src_spectro_data['filenames'][0])
print ('[i] image size:', im.size)
print ('[i] pixel format:', im.mode)

[i] image size: (87, 87)
[i] pixel format: RGB


images are of size (87, 87) and have 3 channels

for CNN: change target size to (88, 88) and 1 channel

below functions taken from Udacity MLND dog-project

In [76]:
from keras.preprocessing import image                  
from tqdm import tqdm

def path_to_tensor (img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img (img_path, color_mode='grayscale', target_size=(88, 88))
    # convert PIL.Image.Image type to 3D tensor with shape (88, 88, 1)
    x = image.img_to_array (img)
    # convert 3D tensor to 4D tensor with shape (1, 88, 88, 1) and return 4D tensor
    return np.expand_dims (x, axis=0)

def paths_to_tensor (img_paths):
    list_of_tensors = [path_to_tensor (img_path) for img_path in tqdm (img_paths)]
    return np.vstack (list_of_tensors)

**TODO** split data into train and test files (better: make 2 dirs with train and test data)

In [99]:
from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True                 

spectro_tensors = paths_to_tensor (src_spectro_data['filenames'])#.astype ('float32') / 255

100%|██████████| 240/240 [00:00<00:00, 3202.37it/s]


In [100]:
print (spectro_tensors.shape)

(240, 88, 88, 1)


In [114]:
from keras.utils import np_utils
targets = np_utils.to_categorical (np.array (src_spectro_data['target']), 24)
print (targets.shape)

(240, 24)


now there are inputs = spectro_tensors and outputs = targets

### Model architecture

In [136]:
#from keras.models import Sequential
from keras import layers, models, optimizers
from keras import backend as K
#from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D

# clear everything known of past instances ("useful to avoid clutter from old models / layers")
K.clear_session ()

# input layer
inputs = layers.Input (shape=spectro_tensors.shape[1:], name='input')

# hidden layers
net = layers.Conv2D (filters=16, kernel_size=(2,2), strides=(1,1),
              padding='same', # don't lose information due to conv window runs out of image
              activation='relu',
              name='conv2d_1') (inputs)
#net = layers.MaxPooling2D (pool_size=(2,2), strides=None, name='maxp_1') (net)

net = layers.Conv2D (filters=32, kernel_size=(2,2), strides=(1,1),
              padding='same', # don't lose information due to conv window runs out of image
              activation='relu',
              name='conv2d_2') (net)
net = layers.MaxPooling2D (pool_size=(2,2), strides=None, name='maxp_2') (net)

net = layers.Conv2D (filters=64, kernel_size=(2,2), strides=(1,1),
              padding='same', # don't lose information due to conv window runs out of image
              activation='relu',
              name='conv2d_3') (net)
#net = layers.MaxPooling2D (pool_size=(2,2), strides=None, name='maxp_3') (net)

net = layers.Conv2D (filters=32, kernel_size=(2,2), strides=(1,1),
              padding='same', # don't lose information due to conv window runs out of image
              activation='relu',
              name='conv2d_4') (net)
net = layers.MaxPooling2D (pool_size=(2,2), strides=None, name='maxp_4') (net)

# 'flatten for dense layer'
net = layers.GlobalAveragePooling2D (name='avg_flatten') (net)

# output layer
outputs = layers.Dense (units=targets.shape[1], activation='softmax', name='output') (net)


model = models.Model (inputs=inputs, outputs=outputs)
model.summary ()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           (None, 88, 88, 1)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 88, 88, 16)        80        
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 88, 88, 32)        2080      
_________________________________________________________________
maxp_2 (MaxPooling2D)        (None, 44, 44, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 44, 44, 64)        8256      
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 44, 44, 32)        8224      
_________________________________________________________________
maxp_4 (MaxPooling2D)        (None, 22, 22, 32)        0         
__________

In [140]:
# https://stackoverflow.com/questions/43547402/how-to-calculate-f1-macro-in-keras
from keras import optimizers

def fbeta_score (y_true, y_pred):
    beta = 0
    return K.mean (K.square (y_true - y_pred), axis=-1)

lr = 0.001
opt_sgd = optimizers.SGD (lr=lr)


model.compile (optimizer=opt_sgd, loss=fbeta_score)