##### Mel Spectrograms
The purpose of this notebook is to explore the audio data files using Mel Spectrograms. In my previous effort I have used discrete FFT analysis as training data.

This work was inspired by Sercan Ö. Arık and Markus Kliegl et al (2017), "Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting", *Baidu Silicon Valley Artificial IntelligenceLab,1195 Bordeaux Dr. Sunnyvale, CA 94089*

The training and test data is prepared using the method described in the above paper. 40 channel Mel Spectrograms of the 16kHz mono audio files are produced using a 100ms sliding window of size three and a half(3.5) seconds. In addition, the mel spectrogram is constrained to the frequency range of interest which is above 200 Hz and below 8kHz.

The data for each Mel Spectrogram is saved, un-rolled, as a csv row in an ouput file. The file name prefix is the source audio file name. The resulting files are very large, in the order of 10s of GBytes.

In [23]:
%matplotlib inline
import seaborn
import numpy as np
import scipy
import matplotlib.pyplot as plt
import pandas as pd
import librosa
import librosa.display
import os
import csv

In [24]:
source_directory = r'/Volumes/ThorsHammer/DataScience/data/audio-recognition/16k'
destination_directory = r'/Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/'

stride = 0.1
sample_length = 3.5 #seconds
sample_rate = 16000

# This following represent the first postive class observation in the data
positive_classes = {
    '161225-001' : [9541250], # Large file takes too long for me to wait.
    '161225-002' : [713600],
    '161225-003' : [813800],
    '161225-004' : [808700],
    '161225-005' : [686400],
    '161225-006' : [201850,1156480,2110750]
}

In [25]:
# Find the audio wav files in the source folder
raw_files = []
for file in os.listdir(source_directory):
    if file.endswith(".wav"):
        raw_files.append(file)
        print(raw_files[-1])
        

161225-000.wav
161225-001.wav
161225-002.wav
161225-003.wav
161225-004.wav
161225-005.wav
161225-006.wav


In [26]:
# prepare the destination directory if not already available
if not os.path.exists(destination_directory):
    os.makedirs(destination_directory)

### Calculate the mel spectrograms for the data

In [27]:
def calc_mel(x, sr, hop_length):
    S = librosa.feature.melspectrogram(x, sr=sr, n_mels=40, fmin=200, fmax=8000, hop_length=hop_length)
    # Convert to log scale (dB) using peak power as reference.
    log_S = librosa.power_to_db(S, ref=np.max)
    return log_S

def load_file(filename, sample_rate = 16000):
    data, sample_rate = librosa.load(filename, sr=sample_rate)
    return (data, sample_rate)

In [28]:
def plot_mel(data, hop_length):
    librosa.display.specshow(data, x_axis='time', sr=sample_rate, y_axis='mel', hop_length=hop_length)

    # Put a descriptive title on the plot
    plt.title('mel power spectrogram')

    # draw a color bar
    plt.colorbar(format='%+02.0f dB')

    # Make the figure layout compact
    plt.tight_layout()
    plt.show()
    
def save_mel_data(prefix, data):
    '''
    Save mel data to disk. Input data is an array of mel 2d matrices.
    Each matrix has dimensions 40 x 401 (there are 40 mel channels for a 3.5 sec sample).
    '''
    data = np.asarray(data)
    print('Saving mel data: %s'%prefix)
    print(data.shape)
    
    output_file = '{0}{1}-mel.npy'.format(destination_directory,prefix)
    np.save(output_file, data)
    print('Done: %s'%output_file)

In [29]:
%%time

for filename in raw_files:
    print(filename)
    mel = []
    data, sample_rate = load_file(os.path.join(source_directory, filename))
    hop_length = int(sample_rate*stride)
    # perform mel calculation for each sliding window
    for i in np.arange(0,len(data)-(int(sample_length*sample_rate)),(int(stride*sample_rate))):
        offset = i
        x = data[offset:offset+(int(sample_length*sample_rate))]
        log_S = calc_mel(x, sample_rate, hop_length=hop_length)
        mel.append(log_S)

    save_mel_data(filename, mel)

161225-000.wav
Saving mel data: 161225-000.wav
(3909, 40, 36)
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-000.wav-mel.npy
161225-001.wav
Saving mel data: 161225-001.wav
(6059, 40, 36)
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-001.wav-mel.npy
161225-002.wav
Saving mel data: 161225-002.wav
(480, 40, 36)
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-002.wav-mel.npy
161225-003.wav
Saving mel data: 161225-003.wav
(551, 40, 36)
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-003.wav-mel.npy
161225-004.wav
Saving mel data: 161225-004.wav
(547, 40, 36)
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-004.wav-mel.npy
161225-005.wav
Saving mel data: 161225-005.wav
(474, 40, 36)
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-005.wav-mel.npy
161225-006.wav
Saving mel data: 161225

# Create Labels for the Audio Sequences
The purpose of this section is to generate the target labels for supervised training. This is somewhat manual and the following procedure was used:
 - open an audio file in [Audacity](http://www.audacityteam.org)
 - by visual inspection of the waveform locate the start and end samples of the target sequence (verify selection by playing the audio in that range)
 - using a window of 3.45 seconds (the width of the target sequence in this case) label the time-correlated mel spectrograph observation as class 1, otherwise class 0.

The audio file contents are as follows:

** Washing Machine **
``` 
File :161225-000
Beep Sequence Count: 0

File :161225-001
Beep Sequence Count: 1
Start Sample: 9541250
End Sample:  9596500

File :161225-002
Beep Sequence Count: 1
Start Sample: 713600
End Sample:  721360

File :161225-003
Beep Sequence Count: 1
Start Sample: 813800
End Sample:  869300

File :161225-004
Beep Sequence Count: 1
Start Sample: 808700
End Sample:  864000

File :161225-005
Beep Sequence Count: 1
Start Sample: 686400
End Sample:  742000
```
** Dryer **
```
File :161225-006
Beep Sequence Count: 3
Start Sample: 201850
End Sample: 262330
Start Sample: 1156480
End Sample:  1212300
Start Sample: 2110750
End Sample:  2118300
```

Based on this data we will create the label vector for each mel spectrogram row. 

**Note:** we need to take into account the sliding 4 second windows with a stride of 10ms.

In [30]:
#Determine the observation index for a given sample numer
def calc_mel_observation_index(raw_sample_number):
    observation_index = int(raw_sample_number / (sample_rate * stride))
    return observation_index

def file_len(filename):
    f = np.load(filename)
    return f.shape[0]

def list_all_files(source_directory, truncate_extension=False, extension='.wav'):
    # Find the audio wav files in the source folder
    raw_files = []
    for file in os.listdir(source_directory):
        if file.endswith(extension):
            if(truncate_extension):
                file = file[:file.rfind('.')]
            raw_files.append(file)
            #print(raw_files[-1])
            
    return raw_files

def save_labels(destination_directory, prefix, labels):
    output_file = '{0}{1}-mel-labels'.format(destination_directory,prefix)
    np.save(output_file,labels)
    print('Done: %s'%output_file)

In [33]:
%%time
file_sufix = '.wav-mel.npy'
all_files = list_all_files(source_directory, truncate_extension=True)
file_len_dict = {}
file_label_dict = {}

for file in sorted(all_files):
    
    filename = os.path.join(destination_directory,file+file_sufix)
    num_observations = file_len(filename)
    file_len_dict[file] = num_observations

    labels = np.zeros(num_observations)
    
    if(file in positive_classes.keys()):
        print('%s contains target'%file)
        start_sample_array = positive_classes[file]
        for start_sample in start_sample_array:        
            target_class_range_start = calc_mel_observation_index(start_sample - (sample_rate * 0.5))
            target_class_range_end = calc_mel_observation_index(start_sample + (sample_rate * 0.5))
            labels[target_class_range_start:target_class_range_end] = 1
        file_label_dict[file] = labels
    else:
        print('%s target absent'%file)
    save_labels(destination_directory, file, labels)

print (file_len_dict)

161225-000 target absent
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-000-mel-labels
161225-001 contains target
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-001-mel-labels
161225-002 contains target
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-002-mel-labels
161225-003 contains target
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-003-mel-labels
161225-004 contains target
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-004-mel-labels
161225-005 contains target
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-005-mel-labels
161225-006 contains target
Done: /Volumes/ThorsHammer/DataScience/data/audio-recognition/mel_3.5_100ms/161225-006-mel-labels
{'161225-000': 3909, '161225-001': 6059, '161225-002': 480, '161225-003': 551, '161225-004': 547, '161225-005': 474, '161225-00

In [32]:
np.sum(file_label_dict['161225-006'])

36.0