# Evaluating different architecture for our Gunshot detection model

We recall from our preprocessing method comparison that using mel spectrograms is the most powerful preprocessing technique one can have to perform gunshot detection on our dataset. In this notebook, we compare different model architectures and design for our models to see which architecture design and tweaks can perform best. 

## Environment Setup 

### Package Imports

In [1]:
# Machine Learning imports
import torch
import torch.nn as nn
import torch.optim as optim 
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
import torchaudio
import torchaudio.functional as F
import torchaudio.transforms as T
from torchvision.models import resnet18

# Processing imports
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as mcolors

# Import the functions we designed to be used accross notebooks to avoid redundancies and improve clarity
from utils.common import list_files, create_dataframe, train_model, evaluate_model
from utils.plotsPreprocessing import plot_spectrogram

### Global Variables

In [2]:
# Feel free to change the following in order to accommodate your environment
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
mode = "training" if str(device) == "cuda" else "development" 
print(f"Notebook in {mode} mode")

np.random.seed(4) # For reproducibility of results

MODEL_DIR = "models/preprocessing"
TRAIN_PREFIX = "Data/Training data"
VAL_PREFIX   = "Data/Validation data"

SAMPLE_RATE = 8000

Notebook in training mode


### Training Parameters

In [3]:
batch_size = 128
num_workers = 8 if str(device) == "cuda" else 2
num_epochs = 20
lr = 0.001

### Data Loading

In [4]:
train_keys = list_files(TRAIN_PREFIX)
val_keys   = list_files(VAL_PREFIX)

# Using a df allows us to introduce labels in the AudioFileDataset
train_df   = create_dataframe(train_keys)
val_df     = create_dataframe(val_keys)

# Creating a Sampler to account for the imbalance of the dataset
train_counts = train_df["label"].value_counts().to_dict()
val_counts = val_df["label"].value_counts().to_dict()
weights = train_df["label"].map(lambda x: 1.0 / train_counts[x])
sampler = WeightedRandomSampler(weights.tolist(), num_samples=len(weights), replacement=True)

print(f"Found {len(train_keys)} training audios ({train_counts[1]} gunshots, {train_counts[0]} backgrounds) and {len(val_keys)} validation audios ({val_counts[1]} gunshots, {val_counts[0]} backgrounds).")

Found 28790 training audios (597 gunshots, 28193 backgrounds) and 7190 validation audios (150 gunshots, 7040 backgrounds).


## Mel-Spectrogram preprocessing pipeline

In [66]:
class MelDataset(BaseSpectrogramDataset):
    def process(self, waveform):
        spectrogram = T.MelSpectrogram(sample_rate=SAMPLE_RATE, n_fft=512, hop_length=128, n_mels=64)
        return spectrogram(waveform)

train_mel_spec = MelDataset(train_df, augmentation=1)
val_mel_spec = MelDataset(val_df)

In [70]:
train_loader_mel_spec = DataLoader(
    train_mel_spec,
    batch_size=batch_size,
    sampler=sampler,
    num_workers=num_workers,
    pin_memory=True
)
   
val_loader_mel_spec = DataLoader(
    val_mel_spec,
    batch_size=batch_size,
    num_workers=num_workers,
    pin_memory=True
)

## ResNet18 

### Default Architecture

### Adjusting the Kernel Size

### Adjusting the ...

### Adjusting the ...

### Conclusion
X is best

#### Default Architecture

## Model 2

### Default Architecture

### Adjusting the ...

### Adjusting the ...

### Adjusting the ...

### Conclusion
X is best

## Model 3

### Default Architecture

### Adjusting the ...

### Adjusting the ...

### Adjusting the ...

### Conclusion
X is best

## Model 4

### Default Architecture

### Adjusting the ...

### Adjusting the ...

### Adjusting the ...

### Conclusion
X is best

## Model 5

### Default Architecture

### Adjusting the ...

### Adjusting the ...

### Adjusting the ...

### Conclusion
X is best

## Results

*Table with all results*