# **Welcome to my Data Science 500 Seminar Project**

## Abstract:

In this project our goal is to build a general audio classifier capable of recognising different sounds. The Sounds will be ambient noises from an urban environment like sirens traffic noises and construction. 

The purpose of this project is to understand the intricacies of sound, and what makes similar sounds different. Many people can readily recognise common sounds, for instance a dog bark, but people can identify these sounds only after they’ve learned the characteristics of each sound, after they have heard it through their entire life. 

I want to work with DSP (Digital Signal Processing) after college and this project serves as a way for me to be introduced into the field. Understanding how a singal is processed if fundamental to understanding how speech recognition, music, and audio prediction can be made.



## Workbook 1 - Extracting and Exploring UrbanSound8K Audio Data 

In this first workbook, we separate the over arching scope of this project Since we will be working with large sums of data we will need a way to process and store all the information we gather. 

This can take quite some time, and having a way to store our information is vital to increasing the efficency and overall production speed of our analysis. With this said, this workbooks purpose is to load in the data files we will be utilizing and store processed information from each data file into some kind of collection (folder inside a database) for us to call upon in the next steps of this project. 

This following workbook will take the information we store in our database and then create a predictive model to determine our capabilites of creating a predictive machine that can identify audio






First, here's the imports.
The audio processing is handled by a library called librosa

Later examples use Keras framework and Tensorflow.

We do need to import our standard numpy, matplotlib, as well as os which we can combine with mounting our drive to this colab session to reach our necessary files

In [None]:
#BASIC IMPORTS
import glob
import os
import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import specgram
%matplotlib inline
plt.style.use('ggplot')

#MOUNTING OUR DRIVE
from google.colab import drive
drive.mount("/content/drive")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### This step is important so don't gloss over it

Here, after we mount our drive to this session in colab we check to make sure we can find our directory where our project information lays.

In [None]:
import os
os.chdir("/content/drive/My Drive/ColabNotebooks/DSC 500 Seminar Project/My project/")
!ls

 data	    UrbanSound8K
 pictures  'Workbook #1 UrbanSound8k-Exploration and Data Preparation.ipynb'
 samples   'Workbook #2 UrbanSound8k-FeedForwardNetwork Analysis.ipynb'


# NOTICE THE PATH NAME ABOVE

Please make sure that the "DSC 500 Seminar Project" directory is inside a folder labeled "ColabNotebooks" in your realtive "My Drive" directory

## The code cell above should output this:
'Copy of ProjectReport.gdoc'

'Copy of ProjectReport.pdf'

 pictures

 samples

 UrbanSound8K

'UrbanSound8k-Exploration and Data Preparation NEEDS COMMENTS.ipynb'

'UrbanSound8k-FeedForwardNetwork Analysis NEEDS COMMENTS.ipynb'




## Trouble shooting:
*   Inside you home directory of google drive create a folder labeled "ColabNotebooks"
*   Inside ColabNotebooks, download or place the shared instance of "DSC 500 Seminar Project"

In [None]:
# lets save this path and alter it later in the project
projectDirectory = "/content/drive/My Drive/ColabNotebooks/DSC 500 Seminar Project/My project"

We'll begin by doing some basic visualisation of audio data from the UrbanSound8K dataset, a collection of 8732 short clips covering 10 different sounds from urban environments. 


To do this we will need to crete a pretty extense amount of functions so that we can increase our productivity

In [None]:
# function to load sounds in from there file locaiton
def load_sound_files(parent_dir, file_paths):
  #files added to an array
    raw_sounds = []
    for fp in file_paths:
        #with each file name we combine it with the parent diectory and add the designated file to the array
        X,sr = librosa.load(parent_dir + fp)
        raw_sounds.append(X)
    return raw_sounds

# function to form a waveform plot for each file
def plot_waves(sound_names,raw_sounds):
    i = 1
    #creting plot instance
    fig = plt.figure(figsize=(25,10), dpi = 900)
    #every file has a title and a sound
    for n,f in zip(sound_names,raw_sounds):
        plt.subplot(2,5,i)
        #we use librose to create a waveform plot with each files data then label it with its title
        librosa.display.waveplot(np.array(f),sr=22050)
        plt.title(n.title())
        #increase the index
        i += 1
    #plot final waveform
    plt.suptitle('Figure 1: Waveplot',x=0.5, y=0.95,fontsize=18)
    plt.show()
    

# function to form a spectrogram
#this finction is almost identical in style to that of the the waveform plotting function above
def plot_specgram(sound_names,raw_sounds):
    i = 1
    fig = plt.figure(figsize=(25,10), dpi = 900)
    for n,f in zip(sound_names,raw_sounds):
        plt.subplot(2,5,i)
        #notice here we create a spectrogram
        specgram(np.array(f), Fs=22050)
        plt.title(n.title())
        i += 1
    plt.suptitle('Figure 2: Spectrogram',x=0.5, y=0.95,fontsize=18)
    plt.show()

#function to form a log_power_spectrogram (IDENTICAL TO FUNCTION ABOVE)
def plot_log_power_specgram(sound_names,raw_sounds):
    i = 1
    fig = plt.figure(figsize=(25,10), dpi = 900)
    for n,f in zip(sound_names,raw_sounds):
        plt.subplot(2,5,i)
        #logamplitude() changed to power_to_db in 2018
        D = librosa.power_to_db(np.abs(librosa.stft(f))**2, ref=np.max)
        librosa.display.specshow(D,x_axis='time' ,y_axis='log')
        plt.title(n.title())
        i += 1
    plt.suptitle('Figure 3: Log power spectrogram',x=0.5, y=0.95,fontsize=18)
    plt.show()

## Let's test our functions on some sample data that we have pre prepared

In [None]:
# names of the files in our us8k folder located in the "samples directory"
sound_file_paths = ["aircon.wav", "carhorn.wav", "play.wav", "dogbark.wav", "drill.wav",
                   "engine.wav","gunshots.wav","jackhammer.wav","siren.wav","music.wav"]
sound_names = ["air conditioner","car horn","children playing","dog bark","drilling","engine idling",
               "gun shot","jackhammer","siren","street music"]
#adding new path
parent_dir = projectDirectory + '/samples/us8k/'

#loading sound data in to an array variable
raw_sounds = load_sound_files(parent_dir, sound_file_paths)

Each sound can be visualised by how it changes over time. The classic view is the waveform, which shows the amplitude (relative loudness) of the sound at each successive time interval. 

Matplotlib provides an visualisation method called specgram - which calculates and plots the different intensities of the frequency spectrum. 

Another visualisation provided by Librosa is the log power spectrogram plotting. 

By looking at the plots shown in Figure 1, 2 and 3, we can see apparent differences between sound clips of different classes. These differences are what we want our deep learning system to learn and interpret.

In [None]:
#using our pre-prepared functions to make visuals
plot_waves(sound_names, raw_sounds)
plot_specgram(sound_names, raw_sounds)
plot_log_power_specgram(sound_names, raw_sounds)

Output hidden; open in https://colab.research.google.com to view.

In [None]:
visual_title = ["siren"]
visual_file_paths = ["siren.wav"]
visual_sounds = load_sound_files(parent_dir, visual_file_paths)

plot_waves(visual_title, visual_sounds)
plot_specgram(visual_title, visual_sounds)
plot_log_power_specgram(visual_title, visual_sounds)

Output hidden; open in https://colab.research.google.com to view.

The images above show visualisations of the raw data, but some feature extraction is necessary. That means we'll always have the same features for each clip, regardless of how long or short it is. 

The librosa library comes with several methods , including:

* Mel-frequency cepstral coefficients (MFCC) - https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
* Chromagram of a short-time Fourier transform - projects bins representing the 12 distinct semitones (or chroma) of the musical octave http://labrosa.ee.columbia.edu/matlab/chroma-ansyn/
* Mel-scaled power spectrogram - uses https://en.wikipedia.org/wiki/Mel_scale to provide greater resolution for the more informative (lower) frequencies 
* Octave-based spectral contrast (http://ieeexplore.ieee.org/document/1035731/)
* Tonnetz - estimates tonal centroids as coordinates in a six-dimensional interval space (https://sites.google.com/site/tonalintervalspace/)

The results of the 5 extractions are then concatenated to give a consistent feature vector of 193 values for every audio clip we process.


In [None]:
#For each audio file we want to reduce the dimensionality
#to do this we need to implement all of the above feature extraction methods
def extract_feature(file_name):
    #give a single audio file we store the rate at which the audio was sampled at and store the length of the file
    X, sample_rate = librosa.load(file_name)
    print "Features :",len(X), "sampled at ", sample_rate, "hz"
    #computing a Short-time Fourier transform
    stft = np.abs(librosa.stft(X))
    #computing the Mel-frequency cepstral coefficients
    mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T,axis=0)
    #computing a Chromagram of the short-time Fourier transform
    chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
    #computing a Mel-scaled power spectrogram
    mel = np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
    #computing a Octave-based spectral contrast
    contrast = np.mean(librosa.feature.spectral_contrast(S=stft, sr=sample_rate).T,axis=0)
    #computing the Tonnetz of the audio
    tonnetz = np.mean(librosa.feature.tonnetz(y=librosa.effects.harmonic(X), sr=sample_rate).T,axis=0)
    # return all features and values for files
    return mfccs,chroma,mel,contrast,tonnetz


# here for each audio file we use a loop to generate dimensional reductions on each audio file
def parse_audio_files(parent_dir,sub_dirs,file_ext='*.wav'):
  #creating an empty array of the length of are final feature values
    features, labels = np.empty((0,193)), np.empty(0)
    #starting our loop which reaches each audio file
    for label, sub_dir in enumerate(sub_dirs):
        for fn in glob.glob(os.path.join(parent_dir, sub_dir, file_ext)):
            #attempting to reduce all the audio files dimensions
            try:
                #extracting feature values
                mfccs, chroma, mel, contrast, tonnetz = extract_feature(fn)
                #Stacka arrays in sequence horizontally (column wise).
                ext_features = np.hstack([mfccs,chroma,mel,contrast,tonnetz])
                #Stacks arrays in sequence vertically (row wise)
                features = np.vstack([features,ext_features])
                #Append values to the end of an array.
                labels = np.append(labels, fn.split('fold')[1].split('-')[1])
            except:
                print("Error processing " + fn + " - skipping")
    #return the array opf features and the array of labels for each feature(all integers)
    return np.array(features), np.array(labels, dtype = np.int)

#one-hot encode all the labels
def one_hot_encode(labels):
    n_labels = len(labels)
    #finding length of all unique labels
    n_unique_labels = len(np.unique(labels))
    # Return a new array of given shape and type, filled with zeros.
    one_hot_encode = np.zeros((n_labels,n_unique_labels))
    #Return evenly spaced values within a given interval (length of labels).
    one_hot_encode[np.arange(n_labels), labels] = 1
    return one_hot_encode



#This is a complicated verification function i found on stack overflow
def assure_path_exists(path):
    mydir = os.path.join(os.getcwd(), path)
    #Returns True if path refers to an existing path or an open file descriptor.
    if not os.path.exists(mydir):
        os.makedirs(mydir)
        print('Initialized new directory')

### Here we see the dimensionality reduction in action, how a clip with 26168 data points is reduced into 193 features. 

We don't use a data file rather just one of our pre prepared samples instead

In [None]:
#new path
sample_filename = projectDirectory+"/samples/us8k/siren.wav"
#extract features
mfccs, chroma, mel, contrast, tonnetz = extract_feature(sample_filename)
#Stacka arrays in sequence horizontally (column wise).
all_features = np.hstack([mfccs,chroma,mel,contrast,tonnetz])
#printing our how many features are extracted with each method
print "MFCSS  = ", len(mfccs)
print "Chroma = ", len(chroma)
print "Mel = ", len(mel)
print "Contrast = ", len(contrast)
print "Tonnetz = ", len(tonnetz)

# We want to see the dimensional reduction
data_points, _ = librosa.load(sample_filename)
#printting out the initial amount of data points the extraction was given
print "IN: Initial Data Points =", len(data_points), np.shape(data_points)
#printting out the end resukt of our extraction (the total # of feature values we will use)
print "OUT: Total features =", len(all_features)

### Another exploration we can perform is checking the balance of the dataset.

This is useful to know, as we could inadvertently achieve good performance on just one class with many instances, and poor performance on all others.

In [None]:
# we want to see all the deifferent labels provided to us in our data so lets build a function
def get_labels(parent_dir,sub_dirs,file_ext='*.wav'):
  #creating empty array
    labels = np.empty(0)
    #for each audio path
    for label, sub_dir in enumerate(sub_dirs):
        #for each audio file
        for fn in glob.glob(os.path.join(parent_dir, sub_dir, file_ext)):
            #using cloab compatability to extract the class identification and label
            try:
                class_value = fn.split('fold')[1].split('-')[1]
                #appemnd info to array
                labels = np.append(labels, class_value)
            except:
                print("Error processing " + fn + " - skipping")
    #return array
    return labels

# put the path to the downloaded UrbanSound8K files here
raw_data_dir = projectDirectory+'/UrbanSound8K/audio/'
subsequent_fold = False
#our data is store in fold so that we can give it to our model in another workbook
#however we still need to parse through "each fold" so find our how much of each class is represented
#for each fold of data
for k in range(1,11):
    fold_name = 'fold' + str(k)
    #get labels array
    labels = get_labels(raw_data_dir, [fold_name])
    #Append labels already extracted to the collection of labels generated from the loop
    if subsequent_fold:
        all_labels = np.concatenate((all_labels, labels))
    else:
        all_labels = labels
        subsequent_fold = True
#now that we have all of our labels we need to separate them by their uniqueness    
unique, counts = np.unique(all_labels, return_counts=True)

#plot the results
plt.figure(figsize=(18,4))
plt.bar(np.arange(len(unique)), counts, align='center')
plt.xticks(np.arange(len(unique)), sound_names)
plt.show()

## Still wondering why there is 2 workbooks??

The code in the cell below can be run (once) to convert the raw audio files into much smaller numpy arrays. As this process is quite time consuming, we'd prefer to just do it once, and then load the numpy data when we want to do some training. 

In [None]:
# use this to process the audio files into numpy arrays for easier use for our model in workbook #2

def save_folds(data_dir):
    #for each existing fold of data stored in our directory we create 2 arrays (for x_test/train/val and ytest/train/val)
    for k in range(1,11):
        fold_name = 'fold' + str(k)
        print "\nSaving " + fold_name
        #pull features and one hot encode our data
        features, labels = parse_audio_files(data_dir, [fold_name])
        labels = one_hot_encode(labels)
        # printing out each file so we know the function is working
        print "Features of", fold_name , " = ", features.shape
        print "Labels of", fold_name , " = ", labels.shape

        #Here we save our files inside our data directory for easy storage
        feature_file = os.path.join(data_dir, fold_name + '_x.npy')
        labels_file = os.path.join(data_dir, fold_name + '_y.npy')
        np.save(feature_file, features)
        print "Saved " + feature_file
        np.save(labels_file, labels)
        print "Saved " + labels_file
  
# uncomment this to OVERWRITE and save the feature vectors
raw_data_dir = projectDirectory+"/UrbanSound8K/audio/"       
save_dir = projectDirectory+"/data/us8k-np-ffn"
#verifying this path exists
assure_path_exists(save_dir)

#HERE WE OVERWRITE OUR EXISTING FILES
#saving our folds inside the data directory
save_folds(raw_data_dir)



# End of Workbook 1

--------------------------------------------------------------------------------------------------------------------------