In [1]:
#default_exp metrics

First of all, let us get all the data that we need. Through the magic of `nbdev`, we will use the functionality we defined in `01_gettin_started`

In [2]:
from birdcall.data import *

items, classes = get_items(1000)
trn_idxs, val_idxs = trn_val_split_items(items, 10)[0]
mean, std = calculate_mean_and_std(items, trn_idxs)
trn_ds = AudioDataset(items[trn_idxs], classes, mean, std)
val_ds = AudioDataset(items[val_idxs], classes, mean, std)

In [3]:
len(trn_ds), len(val_ds)

(237600, 26400)

In [4]:
from fastai2.vision.all import *

We need some sort of architecture to get started - the one adapted from this [paper](https://www.groundai.com/project/end-to-end-environmental-sound-classification-using-a-1d-convolutional-neural-network/1) seems like a good place to start

In [5]:
NUM_WORKERS

8

In [6]:
NUM_WORKERS -= 1

In [7]:
BS = 128

dls = DataLoaders(
    DataLoader(dataset=trn_ds, bs=BS, num_workers=NUM_WORKERS, shuffle=True),
    DataLoader(dataset=val_ds, bs=BS, num_workers=NUM_WORKERS)
).cuda()

In [8]:
b = dls.train.one_batch()
b[0].shape

torch.Size([128, 160000])

Let's define our architecture

In [9]:
get_arch = lambda: nn.Sequential(*[
    Lambda(lambda x: x.unsqueeze(1)),
    ConvLayer(1, 16, ks=64, stride=2, ndim=1),
    ConvLayer(16, 16, ks=8, stride=8, ndim=1),
    ConvLayer(16, 32, ks=32, stride=2, ndim=1),
    ConvLayer(32, 32, ks=8, stride=8, ndim=1),
    ConvLayer(32, 64, ks=16, stride=2, ndim=1),
    ConvLayer(64, 128, ks=8, stride=2, ndim=1),
    ConvLayer(128, 256, ks=4, stride=2, ndim=1),
    ConvLayer(256, 256, ks=4, stride=4, ndim=1),
    Flatten(),
    LinBnDrop(5120, 512, p=0.25, act=nn.ReLU()),
    LinBnDrop(512, 512, p=0.25, act=nn.ReLU()),
    LinBnDrop(512, 256, p=0.25, act=nn.ReLU()),
    LinBnDrop(256, len(classes)),
    nn.Sigmoid()
])

A couple of functions to help us calculate metrics for diagnostics

In [10]:
#export

def preds_to_tp_fp_fn(preds, targs):
    positives = preds > 0.5
    true_positives = positives[targs == 1]
    false_positives = positives[targs != 1]
    negatives = ~positives
    false_negatives = negatives[targs == 1]
    return true_positives.sum(), false_positives.sum(), false_negatives.sum()

def precision(preds, targs):
    tp, fp, fn = preds_to_tp_fp_fn(preds, targs)
    return (tp.float() / (tp + fp)).item()

def recall(preds, targs):
    tp, fp, fn = preds_to_tp_fp_fn(preds, targs)
    return (tp.float() / (tp + fn)).item()

def f1(preds, targs, eps=1e-8):
    prec = precision(preds, targs)
    rec = recall(preds, targs)
    return 2 * (prec * rec) / (prec + rec + eps)

In [11]:
learn = Learner(
    dls,
    get_arch(),
    metrics=[AccumMetric(precision), AccumMetric(recall), AccumMetric(f1)],
    loss_func=BCELossFlat()
)

In [12]:
learn.fit(1, 1e-3)

epoch,train_loss,valid_loss,precision,recall,f1,time
0,0.000191,8.7e-05,1.0,0.999735,0.999867,03:23


Ooops! This is not a good sign. How come our model is that good? Do we have a bug in how we sample the validation set? Was it a bad idea after all to combine all the files together? Given how I have set this up, we could have the first 5 seconds of a recording go into the train set and the next five into the validation set. This doesn't make a lot of sense indeed - we want our model to be able to identify the same species across recordings with different backgrounds / recorded with different equipment - we should be building train and validation sets based off different files.

Maybe our metrics have a bug or there is some issue with our model / how the loss gets applied?

My money is on the issue with sampling the validation set. But nonetheless, this will not stop us! The first order of business is to create an end to end pipeline, all the way to successful submission. Once we have this in place, we will be in a good position to start fiddling with making improvements.

In [14]:
mkdir data/models

In [15]:
torch.save(learn.model.state_dict(), 'data/models/first_model.pth')

We will need the following info for inference:

In [16]:
mean, std

(-6.132126808166504e-05, 0.04304003225515465)

In [20]:
pd.to_pickle(classes, 'data/classes.pkl')