# Audio Event Recognition Using Deep Learning (CNN)

The subject Audio Event is a laundry appliance end-of-cycle beep sequence. Audio recordings were made of the appliance in operation including the beep sequence. Most of the recording was of the machine performing its normal function as well as "room tone". Room tone includes other household and environmental sounds picked up by the recorder.

The purpose of training a model in this notebook is to detect the beep itself and not the time sequence of beeps. That will be looked at in subsequent notebooks.

Audio data has been prepared in other notebooks for use in this machine learning training.

The audio recordings were resampled to 16kHz, amplitude normalised to within +/- 1.0 peak, divided into one minute segments and a discrete Fast Fourier Transform (FFT) analysis was performed. A sliding window of 150ms was used for FFT observation with a stride of 5ms.

This resulted in each observation comprising 2400 features which represent the frequency power level (dB) between 0 and 8kHz.

# Data File Setup
Configure the folder settings and helper function for loading data files.

In [17]:
from __future__ import print_function
import os
import sys
import numpy as np
import pandas as pd

from os.path import isfile, join

wav_directory = r'/Volumes/ThorsHammer/Data Science/data/audio-recognition/parts/'
fft_directory = r'/Volumes/ThorsHammer/Data Science/data/audio-recognition/fft/'
audio_detect_pred_directory = r'/Volumes/ThorsHammer/Data Science/data/audio-recognition/audio-detect-pred/'

def list_files(base_dir, ext):
    onlyfiles = [f for f in os.listdir(base_dir) if isfile(join(base_dir, f)) and f.split('.')[-1] == ext]
    return np.sort(onlyfiles)

print('Compressed FFT Files:')
fft_compressed_files = list_files(fft_directory,'gz')
print(fft_compressed_files)


Compressed FFT Files:
['161225-000_16bit-part-0.wav-fft.csv.gz'
 '161225-000_16bit-part-1.wav-fft.csv.gz'
 '161225-000_16bit-part-2.wav-fft.csv.gz'
 '161225-000_16bit-part-3.wav-fft.csv.gz'
 '161225-000_16bit-part-4.wav-fft.csv.gz'
 '161225-000_16bit-part-5.wav-fft.csv.gz'
 '161225-000_16bit-part-6.wav-fft.csv.gz'
 '161225-001_16bit-part-0.wav-fft.csv.gz'
 '161225-001_16bit-part-1.wav-fft.csv.gz'
 '161225-001_16bit-part-10.wav-fft.csv.gz'
 '161225-001_16bit-part-2.wav-fft.csv.gz'
 '161225-001_16bit-part-3.wav-fft.csv.gz'
 '161225-001_16bit-part-4.wav-fft.csv.gz'
 '161225-001_16bit-part-5.wav-fft.csv.gz'
 '161225-001_16bit-part-6.wav-fft.csv.gz'
 '161225-001_16bit-part-7.wav-fft.csv.gz'
 '161225-001_16bit-part-8.wav-fft.csv.gz'
 '161225-001_16bit-part-9.wav-fft.csv.gz'
 '161225-002_16bit-part-0.wav-fft.csv.gz'
 '161225-003_16bit-part-0.wav-fft.csv.gz'
 '161225-004_16bit-part-0.wav-fft.csv.gz'
 '161225-005_16bit-part-0.wav-fft.csv.gz'
 '161225-006_16bit-part-0.wav-fft.csv.gz'
 '161225-00

# Utility for Loading a Saved Model

In [3]:
from keras.models import load_model
import tensorflow as tf
tf.python.control_flow_ops = tf

# returns a compiled model
# identical to the previous one
model = load_model('audio_detection.h5')

Using TensorFlow backend.


In [4]:
model

<keras.models.Sequential at 0x116a74a90>


# Generate CNN Response from FFT Data Sets

In [24]:
%%time
from keras import backend as K
# input image dimensions
sample_length = 2400

for item in fft_compressed_files:
    pred_file = item[0:item.index('.wav')]
    fft_compressed_file = '{0}/{1}'.format(fft_directory,item)
    response_output_file = '{0}/{1}.txt'.format(audio_detect_pred_directory,pred_file)

    fft_data = pd.read_csv(fft_compressed_file).astype(np.float32)

    if K.image_dim_ordering() == 'th':
        X = fft_data.values.reshape(fft_data.shape[0], 1, sample_length)
    else:
        X = fft_data.values.reshape(fft_data.shape[0], sample_length, 1)

    X = X.astype('float32')

    print(X.shape[0], 'test samples')

    y_pred = model.predict_classes(X)
    print('\n'+fft_compressed_file)
    np.savetxt(response_output_file,y_pred,fmt='%u')

11970 test samples

/Volumes/ThorsHammer/Data Science/data/audio-recognition/fft//161225-000_16bit-part-0.wav-fft.csv.gz
11970 test samples

/Volumes/ThorsHammer/Data Science/data/audio-recognition/fft//161225-000_16bit-part-1.wav-fft.csv.gz
11970 test samples

/Volumes/ThorsHammer/Data Science/data/audio-recognition/fft//161225-000_16bit-part-2.wav-fft.csv.gz
11970 test samples

/Volumes/ThorsHammer/Data Science/data/audio-recognition/fft//161225-000_16bit-part-3.wav-fft.csv.gz
11970 test samples

/Volumes/ThorsHammer/Data Science/data/audio-recognition/fft//161225-000_16bit-part-4.wav-fft.csv.gz
11970 test samples

/Volumes/ThorsHammer/Data Science/data/audio-recognition/fft//161225-000_16bit-part-5.wav-fft.csv.gz
6846 test samples

/Volumes/ThorsHammer/Data Science/data/audio-recognition/fft//161225-000_16bit-part-6.wav-fft.csv.gz
11970 test samples

/Volumes/ThorsHammer/Data Science/data/audio-recognition/fft//161225-001_16bit-part-0.wav-fft.csv.gz
11970 test samples

/Volumes/Thor