# Jagawana - Forest Logging Detection

### 
<div class="alert alert-block alert-success"> 📌 This notebook is created for a capstone project, we are creating a Forest Logging Detection System to identify Chainsaws and Gunshots from forest ambiance sounds.</div>

### Workflow Problem Definition
Forests are huge and the terrain is hard to pass through, on the other side, forest ranger usually comprises of only several people. Often, rangers are patrolling the forest area for 1–2 weeks in a month, which means there are many opportunities for illegal loggers to get in and out without any patrol. This gap hole could be prevented by incorporating technology for the ranger and forests.

Jagawana is a Wide Sensor Network System deployed in the forests to prevent Ilegal Logging. By using sensors to pick up voices in the forests, we could monitor what happened in the forest in real-time. We deployed a Machine Learning Model to process the sounds taken by the sensor, then the model will identify the sounds into various categories, such as chainsaws, trucks, gunshot, and burning sounds.
   
### Workflow Goals
Our Machine Learning Model main goals is to **Classify Forests Ambience Sounds** taken by the sensors. Our priority is to identify chainsaw sounds and alert users from Android App. Though identifying other sounds is as important too. Being able to identify other sounds may enable us to map out fauna habitats, and for further research data.

### Workflow Stages :
This notebook workflow goes through seven stages.
1. Acquire training and testing data.
2. Wrangle, prepare, cleanse the data.
3. Analyze, identify patterns, and explore the data.
4. Model, predict and solve the problem.
5. Visualize, report, and present the problem solving steps and final solution.
6. Exporting Models

### Resources and References
* We use ESC-50, Urbansound8k, and Google's Audioset dataset. 
* We use VGG-16 Models as our baseline and slightly adjust it for audio classification.
* Papers papers papers

<div class="alert alert-block alert-warning"> 📌 This project is still on development, feel free to comment or contact me through link in profile.</div>

# <center> Acquire Training and Testing Data </center>

In [None]:
import tensorflow as tf
print(tf.version)

In [None]:
# Library
import os
import pandas as pd
import matplotlib.pyplot as plt
import librosa
import librosa.display
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
import tensorflow.keras.models as models
import tensorflow.keras.layers as layers
import IPython
import sklearn
import seaborn as sns
from sklearn.utils import shuffle

%load_ext tensorboard

In [None]:
#datasets chainsaw and crackling fire
PATH_ESC = "../input/environmental-sound-classification-50/audio/audio/16000/"
CSV_ESC = "../input/environmental-sound-classification-50/esc50.csv"

#datasets gun shot
CSV_URBAN = "../input/urbansound8k/UrbanSound8K.csv" 
PATH_URBAN = "../input/urbansound8k/fold" 

In [None]:
#read csv
df_chainsaw = pd.read_csv(CSV_ESC)
df_gunshot = pd.read_csv(CSV_URBAN)

chainsaw = df_chainsaw.loc[df_chainsaw['category'].isin(['chainsaw', 'crackling_fire'])]
gunshot = df_gunshot[df_gunshot['class'] == 'gun_shot']
chainsaw = chainsaw.drop(['esc10', 'src_file', 'take'], axis=1)
gunshot = gunshot.drop(['fsID', 'start', 'end', 'classID', 'salience'], axis=1)

gunshot = gunshot.rename(columns={'class': 'category', 'slice_file_name': 'filename'})

In [None]:
#combined chainsaw and gunshot datasets
combined_datasets = pd.concat([chainsaw, gunshot])

classes = combined_datasets['category'].unique()
class_dict = {i:x for x,i in enumerate(classes)}
combined_datasets['target'] = combined_datasets['category'].map(class_dict)

sample_df = combined_datasets.drop_duplicates(subset=['target'])
sample_df

In [None]:
# Class Conf will save the settings we are going to use in this notebook
class conf:
    sr = 16000
    duration = 3
    hop_length = 340*duration
    fmin = 20
    fmax = sr // 2
    n_mels = 128
    n_fft = n_mels * 20
    samples = sr * duration
    epochs = 30

def read_audio(conf, pathname, trim_long_data):
    y, sr = librosa.load(pathname, sr=conf.sr)
    # trim silence
    if 0 < len(y): # workaround: 0 length causes error
        y, _ = librosa.effects.trim(y) # trim, top_db=default(60)
    # make it unified length to conf.samples
    if len(y) > conf.samples: # long enough
        if trim_long_data:
            y = y[0:0+conf.samples]
    else: # pad blank
        padding = conf.samples - len(y)    # add padding at both ends
        offset = padding // 2
        y = np.pad(y, (offset, conf.samples - len(y) - offset), 'constant')
    return y

def audio_to_melspectrogram(conf, audio):
    spectrogram = librosa.feature.melspectrogram(audio, 
                                                 sr=conf.sr,
                                                 n_mels=conf.n_mels,
                                                 hop_length=conf.hop_length,
                                                 n_fft=conf.n_fft,
                                                 fmin=conf.fmin,
                                                 fmax=conf.fmax)
    spectrogram = librosa.power_to_db(spectrogram)
    return spectrogram

def show_melspectrogram(conf, mels, title='Log-frequency power spectrogram'):
    librosa.display.specshow(mels, x_axis='time', y_axis='mel', 
                             sr=conf.sr, hop_length=conf.hop_length,
                            fmin=conf.fmin, fmax=conf.fmax)
    plt.colorbar(format='%+2.0f dB')
    plt.title(title)
    plt.show()

# <center> Visualization </center>

In [None]:
# Visualization of Soundwave
fig, ax = plt.subplots(3, figsize = (8, 6))
fig.suptitle('Sound Waves', fontsize=16)
color = ['#A300F9', '#4300FF', '#009DFF']
i=0
for index,row in sample_df.iterrows(): 
    if row['category'] == "gun_shot":
        PATH = PATH_URBAN + str(row[1]) + '/' + row[0]
    else:
        PATH = PATH_ESC + row[0]
    signal , rate = librosa.load(PATH, sr=conf.sr)
    librosa.display.waveplot(y = signal, sr = rate, color = color[i], ax=ax[i])
    ax[i].set_ylabel(classes[row[2]], fontsize=13)
    i +=1

In [None]:
# Visualization of Soundwave
fig, ax = plt.subplots(3, figsize = (8, 6))
fig.suptitle('Mel Spectogram', fontsize=16)
color = ['#A300F9', '#4300FF', '#009DFF']
i=0
for index,row in sample_df.iterrows(): 
    if row['category'] == "gun_shot":
        PATH = PATH_URBAN + str(row[1]) + '/' + row[0]
    else:
        PATH = PATH_ESC + row[0]
    signal , rate = librosa.load(PATH, sr=conf.sr)
    mel_spec = audio_to_melspectrogram(conf, signal)
    librosa.display.specshow(mel_spec, sr = conf.sr, hop_length = conf.hop_length, x_axis = 'time', 
                         fmin=conf.fmin, fmax=conf.fmax, y_axis = 'mel', ax=ax[i])
    ax[i].set_ylabel(classes[row[2]], fontsize=13)
    i +=1

In [None]:
# Visualization of MFCC Plot
fig, ax = plt.subplots(3, figsize = (8, 6))
fig.suptitle('MFCC', fontsize=16)
color = ['#A300F9', '#4300FF', '#009DFF']
i=0
for index,row in sample_df.iterrows(): 
    if row['category'] == "gun_shot":
        PATH = PATH_URBAN + str(row[1]) + '/' + row[0]
    else:
        PATH = PATH_ESC + row[0]
    signal , rate = librosa.load(PATH, sr=conf.sr)
    mfcc = librosa.feature.mfcc(signal , rate , n_mfcc=13, dct_type=3)
    librosa.display.specshow(mfcc, sr = conf.sr, hop_length = conf.hop_length, x_axis = 'time', 
                          y_axis = 'mel', ax=ax[i])
    ax[i].set_ylabel(classes[row[2]], fontsize=13)
    i +=1

In [None]:
for index,row in sample_df.iterrows(): 
    if row['category'] == "gun_shot":
        PATH = PATH_URBAN + str(row[1]) + '/' + row[0]
    else:
        PATH = PATH_ESC + row[0]
    signal , rate = librosa.load(PATH, sr=conf.sr)
    print(len(signal)/rate)
    
combined_datasets.category.value_counts()

# <center> Data Distribution </center>
### Distribution Problem
Although the Gun Shot data is plenty, totaling to 374 clips. But the clips itself is only 2~3 seconds long. Meanwhile the chainsaw and fire sounds totaling only to 40 clips each, up to 5 seconds of audio clip.

To feed our data to our ML model, we need to normalize the data to the same length, this time I am going to only use 2 seconds of audio clip. For the Gun Shot data, we are going to trim the clips from 2.6 seconds to 2 seconds. Meanwhile for Chainsaw and Fire, we are going to use windowing and shifting it by 1 second, so a 5 second clip will resulted in 4 clips of 2 clips (From 0-2s, 1-3s, 2-4s, 3-5s).

Using this method, we are totaling our data from Gun Shot, Chainsaw, and Fire to 374, 160, 160 number of clips each.

To add more balanced dataset, I am going to download more data from [Google Audioset](https://research.google.com/audioset/dataset/chainsaw.html) using scripts from [here](https://github.com/nicorenaldo/audioset-processing), which I modified from [here](https://github.com/aoifemcdonagh/audioset-processing)

The script will download a 10 second audio from youtube links. Using a 2-seconds windowing, we could get 9 clip from each file. I'm setting goals to use 340 clips from each category, so that means I would need to download 20 audio data from Google Audioset for chainsaw and fire categories.

In [None]:
!git clone https://github.com/nicorenaldo/audioset-processing.git
%cd audioset-processing/
!pip install -r requirements.txt

In [None]:
!python3 process.py download -c "chainsaw" -s STRICT --limit 20
!python3 process.py download -c "fire" -s STRICT --limit 20

In [None]:
%cd ../
chainsaw_dir = "./audioset-processing/output/chainsaw/"
fire_dir = "./audioset-processing/output/fire/"
chainsaw_file = os.listdir(chainsaw_dir)
fire_file = os.listdir(fire_dir)

audioset_chainsaw = pd.DataFrame({"filename":chainsaw_file, "target":0, "category":"chainsaw"})
audioset_fire = pd.DataFrame({"filename":fire_file, "target":1, "category":"crackling_fire"})
audioset = pd.concat([audioset_chainsaw, audioset_fire])
audioset.tail()

# <center> Preparing Training Data </center>

In [None]:
X = []
y = []

# Example Output of data
# Pandas(Index=24, filename='1-116765-A-41.wav', fold=1, target=0, category='chainsaw')
for data in combined_datasets.itertuples():
    if data[4]=="gun_shot":
        PATH = PATH_URBAN + str(data[2]) + '/' + data[1]
        signal , rate = librosa.load(PATH, sr=conf.sr)
        if(len(signal)/16000 <= 2.0):
            blank = np.zeros((rate*2)-len(signal))
            sig_ = np.append(signal,blank)
        else:
            sig_ = signal[0 : int(rate*2)]
        mel_spec = audio_to_melspectrogram(conf, sig_)
        X.append(mel_spec)
        y.append(data[3])
    else:
        PATH1 = PATH_ESC + data[1]
        signal , rate = librosa.load(PATH1, sr=conf.sr)
        #Creating four 2 second clip from each audio file, to create more samples
        for i in range(4):
            sig_ = signal[i : int(i+rate*2)]
            mel_spec = audio_to_melspectrogram(conf, sig_)
            X.append(mel_spec)
            y.append(data[3])

# Example Output
# Pandas(Index=0, filename='-DVM0BK_h5A_30.wav', target=0, category='chainsaw')
for data in audioset.itertuples():
    if data[3] == "chainsaw":
        PATH = chainsaw_dir + str(data[1])
    else:
        PATH = fire_dir + str(data[1])
    signal , rate = librosa.load(PATH, sr=conf.sr)
    for i in range(9):
        sig_ = signal[i : int(i+rate*2)]
        mel_spec = audio_to_melspectrogram(conf, sig_)
        X.append(mel_spec)
        y.append(data[2])    

# convert list to numpy array
X = np.array(X)
y = np.array(y)

#one-hot encoding the target
y_hot = tf.keras.utils.to_categorical(y , num_classes=len(classes))

# our tensorflow model takes input as (no_of_sample , height , width , channel).
# here X has dimension (no_of_sample , height , width).
# So, the below code will reshape it to (no_of_sample , height , width , 1).
X_reshaped = X.reshape(X.shape[0], X.shape[1], X.shape[2], 1)
                                  
x_train , x_val , y_train , y_val = train_test_split(X_reshaped , y_hot ,test_size=0.2, random_state=42)

# <center> Modifying VGG16 Model </center>

In [None]:
INPUTSHAPE = (128, 32, 1)
def create_model():
    created_model =  models.Sequential([
        layers.Conv2D(64 , (3,3),activation = 'relu',padding='same', input_shape = INPUTSHAPE),
        layers.BatchNormalization(),
        layers.Conv2D(64, (3,3), activation='relu',padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2,2), strides=(2,2)),
        layers.Dropout(0.2),

        layers.Conv2D(128, (3,3), activation='relu',padding='same'),                      
        layers.BatchNormalization(),
        layers.Conv2D(128, (3,3), activation='relu',padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2,2), strides=(2,2)),
        layers.Dropout(0.2),

        layers.Conv2D(256, (3,3), activation='relu',padding='same'),
        layers.BatchNormalization(),
        layers.Conv2D(256, (3,3), activation='relu',padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2,2), strides=(2,2)),    
        layers.Dropout(0.2),

        layers.GlobalAveragePooling2D(),

        layers.Dense(256 , activation = 'relu'),
        layers.Dense(256 , activation = 'relu'),
        layers.Dense(len(classes) , activation = 'softmax')
    ])

    created_model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics=['acc'])
    return created_model

In [None]:
# Our model summary
model = create_model()
print(model.summary())

In [None]:
%mkdir "cpkt"
%mkdir "logs"
LOGDIR = "logs"
CPKT = "cpkt/"

#this callback is used to prevent overfitting.
callback_1 = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto',
    baseline=None, restore_best_weights=False
)

#this checkpoint saves the best weights of model at every epoch
callback_2 = tf.keras.callbacks.ModelCheckpoint(
    CPKT, monitor='val_loss', verbose=0, save_best_only=True,
    save_weights_only=True, mode='auto', save_freq='epoch', options=None
)

#this is for tensorboard
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=LOGDIR)

# <center> Training</center>

In [None]:
model = create_model()
history = model.fit(x_train,y_train ,
        validation_data=(x_val,y_val),
        epochs=conf.epochs,
        callbacks = [callback_1], verbose=1)

eval_score = model.evaluate(x_val, y_val)
print("Val Score: ",eval_score )

# <center> Evaluation </center>

In [None]:
color = ['black', 'red', 'green', 'blue', 'purple']
plt.figure(figsize=(15,5))
plt.title('Accuracies vs Epochs')

label_name_train = 'Train Accuracy'
label_name_val = 'Val Accuracy'
plt.plot(history.history['acc'], label=label_name_train)
plt.plot(history.history['val_acc'], label=label_name_train)

plt.legend()
plt.show()

In [None]:
color = ['black', 'red', 'green', 'blue', 'purple']
plt.figure(figsize=(15,5))
plt.title('Accuracies vs Epochs')

label_name_train = 'Train Accuracy'
label_name_val = 'Val Accuracy'
plt.plot(history.history['loss'], label=label_name_train)
plt.plot(history.history['val_loss'], label=label_name_train)

plt.legend()
plt.show()

In [None]:
# Creating a confusion matrix to see the error occured
y_pred = model.predict(x_val)
confusion_matrix = sklearn.metrics.confusion_matrix(np.argmax(y_val, axis=1), np.argmax(y_pred, axis=1))

ax = plt.subplot()
sns.heatmap(confusion_matrix, annot=True, cmap="YlGnBu", ax=ax);

ax.set_xlabel('Predicted labels');ax.set_ylabel('True labels'); 
ax.set_title('Confusion Matrix'); 
ax.xaxis.set_ticklabels(classes, rotation='vertical'); ax.yaxis.set_ticklabels(classes, rotation='horizontal');

#### There are some audio files I inserted to my datasets, let's see how the model identify sounds in the audio.

In [None]:
# The audio we are using is this one
import IPython.display as ipd
sig , sr = librosa.load('../input/chainsaw-testing/chainsaw-01.wav', sr=conf.sr)
ipd.display(ipd.Audio(sig, rate=sr))
librosa.display.waveplot(y = sig, sr = sr)

In [None]:
def split_audio(audio_data, w, h, threshold_level, tolerence=10):
    split_map = []
    start = 0
    data = np.abs(audio_data)
    threshold = threshold_level*np.mean(data[:25000])
    inside_sound = False
    near = 0
    for i in range(0,len(data)-w, h):
        win_mean = np.mean(data[i:i+w])
        if(win_mean>threshold and not(inside_sound)):
            inside_sound = True
            start = i
        if(win_mean<=threshold and inside_sound and near>tolerence):
            inside_sound = False
            near = 0
            split_map.append([start, i])
        if(inside_sound and win_mean<=threshold):
            near += 1
    return split_map

In [None]:
# To identify the sounds in the audio, we are going to cut the soundwave into several parts
# The clip will be clipped to it's highlight (noisiest) with certain interval

sound_clips = split_audio(sig, 10000, 2500, 15, 10)
duration = len(sig)
i = 1

for intvl in sound_clips:
    clip, index = librosa.effects.trim(sig[intvl[0]:intvl[1]],       
                                       top_db=20, frame_length=512, hop_length=64)
    mel_spec = audio_to_melspectrogram(conf, clip)
    testing = np.array(mel_spec)
    testing = testing.reshape(1, testing.shape[0], testing.shape[1], 1)
    pred = model.predict(testing)
    
    blank = np.zeros(intvl[0]-0)
    blank2 = np.zeros(duration-intvl[1])
    temp = np.append(blank,clip)
    temp = np.append(temp,blank2)
    librosa.display.waveplot(y = temp, sr = sr, )
    
    print("Clip Number :", i)
    print("Interval from : ", intvl[0]/16000, " to ",intvl[1]/16000, "seconds")
    i += 1
    if(pred.max() > 0.8):
        print("Results : ", classes[np.argmax(pred)], "\n")
    else:
        print("Results : Unknown")
        print("Confidence Level : ", pred)
        print("Highest Confidence Level : ", classes[np.argmax(pred)], " of ", np.max(pred)*100, "%\n")
        ipd.display(ipd.Audio(clip, rate=sr))

In [None]:
# Showing the Mel Spectogram that is passed to the model

fig, ax = plt.subplots(5, figsize = (15, 10))
fig.suptitle('Mel Spectogram', fontsize=16)
i=0
for intvl in sound_clips:
    clip, index = librosa.effects.trim(sig[intvl[0]:intvl[1]],       
                                       top_db=20, frame_length=512, hop_length=64)
    mel_spec = audio_to_melspectrogram(conf, clip)
    librosa.display.specshow(mel_spec, sr = conf.sr, hop_length = conf.hop_length, x_axis = 'time', 
                         fmin=conf.fmin, fmax=conf.fmax, y_axis = 'mel', ax=ax[i])
    i +=1

# <center> Saving Model </center>

In [None]:
# Uncomment code below to save the model to folder "/kaggle/working/export"
# The exported model will be on SavedModel Tensorflow format, which is the default for Tensorflow 2.% model

model.save("jagawana_v2")