# Random Forest for killer whale sound classification

by Erika Peláez

This is a simple Random Forest Classifier trained with the data from the Ford Osborne tape analysis and the dataset built by the get_db.py file by scraping the [Watkins Marine Mammal Sound database](https://cis.whoi.edu/science/B/whalesounds/index.cfm).

The non killer examples are made of the noises from other not orca whales, to be specific I took the Humpback Whale (*Megaptera novaeangliae*) and the False Killer Whale (*Pseudorca crassidens*) from the Watkins' data base. For the killer examples I used the Killer Whale (Orcinus orca) from Watkins' and the data from the Ford Osborne tape analysis.


In [1]:
import os
import librosa
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
killer_files = os.listdir('killer')
nonkiller_files = os.listdir('nonkiller')
print(f"Number of killer whale files: {len(killer_files)}")
print(f"Number of non killer whale files: {len(nonkiller_files)}")

Number of killer whale files: 121
Number of non killer whale files: 746


## Read files

The files were cut into a maximum of five second chunks with the `cut_sounds.sh` script, however the original files with greater lenght remain in the directory We will read all the files that last less than 6 seconds (to avoid havind the same signal twice).

In [11]:
sampling_rates = []
sound_signals = []
labels = []
base_dirs = {'killer': killer_files, 'nonkiller': nonkiller_files}

for folder in base_dirs:
    for file in base_dirs[folder]:
        if librosa.get_duration(filename=f'{folder}/{file}') < 6:
            sound, sampling = librosa.load(f'{folder}/{file}', duration=5.0)
            sampling_rates.append(sampling)
            sound_signals.append(sound)
            labels.append(folder)

In [14]:
print(f"Number of signals: {len(sound_signals)}")
print(f"Number of killer whale signals: {labels.count('killer')}")
print(f"Number of killer whale signals: {labels.count('nonkiller')}")

Number of signals: 827
Number of killer whale signals: 119
Number of killer whale signals: 708


We can see that we got a bit less files which is what we wanted. Now we have to pad the files that last less than 5 seconds just so that all of them have te same lenght to make the dataset uniform as we need. We will do a center padding, meaning the signal will be surounded by zeroes.

In [19]:
max_len = max([sound.shape for sound in sound_signals])[0]
sound_signals = [librosa.util.pad_center(sound, max_len) for sound in sound_signals]

## Feature extraction with mel spectrogam

Mel scale is a perceptual scale of pitches judged to be equal in distance to one another by listeners. It has been proven really useful in the voice recognition field, so we will use this representation as our features. 

In [23]:
spectrograms = np.array([librosa.feature.melspectrogram(y=sound) for sound in sound_signals])

To give a little context of what we did here we need to explain some concepts. The first one is a **spectrum**. A spectrum is just the Fourier transformation of the original signal. Then a **Spectrogram** is a vector representation of the spectrum. You divide the spectrum in different windows of time and for each window size you quantizise the amplitude and turn that into a number between 0-255 the number is directly related to the amplitude of the peak in each window; each vector is layed one beside the other thus creating the hallmark appeareance of the spectrogram.

Now The Mel-frequency analysis of speech is based on human experiments. It is observed that human ears act as filters that favor the low frequency region more than the high frequency. Thus making it non-uniformly spaced. Based on that we can now set up a cluster of filters that will only allow to pass certain signals that fall within the filter bounds. As we want to mimic tha human ear, we set up more filters in the low frequency region that the high frequency one. The filters will fall into the mel-scale.

Finally, if we filter out our spectrum with Mel-filters we get a Mel-spectrum. So the result of the above operation are actually matrices with spectrums. However the algorithm that we use here takes a vector as input instead of a matrix, that is why we need to *flatten* the matrix by concatenating each row into a single row, thus making it possible for our Random Forest to be trained.


In [26]:
n_sounds, width, height = spectrograms.shape
x = spectrograms.reshape(n_sounds, -1)

We prepare our labels to a numeric form

In [28]:
y = [1 if label == 'killer' else 0 for label in labels]

## Train a Random Forest

Random Forest are one Ensemble technique in Machine Learning. They are a group of decision trees that collectively decide the class of each sample. The decision of every tree is computed and then the mode (most frequent) class is the output.

In [29]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

In [30]:
clf = RandomForestClassifier(n_estimators=50) 
scores = cross_val_score(clf, x, y, cv=10)

In [31]:
print(f'Accuracy is: {scores.mean()} +/- {scores.std() * 2}')

Accuracy is: 0.9927710843373493 +/- 0.024573588017314622


Our accuracy is great so we will serialize this model to use later to help us label future incoming data. We use pickle for the serialization.  

In [34]:
import pickle

filename = 'rf_orca.pkl'

with open(filename, 'wb') as file:
    pickle.dump(clf, file)
 

If we wanted to reload our model and use it later we can use the following code:

```python
with  open('rf_orca.pkl', 'rb') as model:
    clf = pickle.load(model)
 
clf.predict(x) # will output class label {0, 1}
clf.predict_proba(x) # will output prob for each class 
```

In [6]:
import pickle
with  open('rf_orca.pkl', 'rb') as model:
    clf = pickle.load(model)

#clf.predict(x) # will output class label {0, 1}
#clf.predict_proba(x) # will output prob for each class 

