# This notebook replicates the CNN training cycle taken from the Opensoundscape tutorial. 

In this notebook we will load a cnn saved to disk and run it on a validation and test set, then run BirdNET on the same data to see how it performs. 

This notebook is run using an environment with tensorflow installed which allows us to download and use birdnet for inference. 



## Setup

### Import needed packages

In [27]:
# the cnn module provides classes for training/predicting with various types of CNNs
from opensoundscape import CNN

#other utilities and packages
import torch
import pandas as pd
from pathlib import Path
import numpy as np
import random 
import subprocess
from glob import glob
import sklearn
import opensoundscape as opso

%load_ext autoreload
%autoreload 2

### Set random seeds

Set manual seeds for Pytorch and Python. These essentially "fix" the results of any stochastic steps in model training, ensuring that training results are reproducible. You probably don't want to do this when you actually train your model, but it's useful for debugging.

In [2]:
torch.manual_seed(0)
random.seed(0)
np.random.seed(0)

### Download files

Training a machine learning model requires some pre-labeled data. These data, in the form of audio recordings or spectrograms, are labeled with whether or not they contain the sound of the species of interest. 

These data can be obtained from online databases such as Xeno-Canto.org, or by labeling one's own ARU data using a program like Cornell's Raven sound analysis software. In this example we are using a set of annotated avian soundscape recordings that were annotated using the software Raven Pro 1.6.4 (Bioacoustics Research Program 2022):

<blockquote><i>An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. </i> Lauren M. Chronister,  Tessa A. Rhinehart,  Aidan Place,  Justin Kitzes.
https://doi.org/10.1002/ecy.3329 
</blockquote>

These are the same data that are used by the annotation and preprocessing tutorials, so you can skip this step if you've already downloaded them there.

Download the datasets to your current working directory and unzip them. You can do so by running the cell below OR

- Download and unzip both `annotation_Files.zip` and `mp3_Files.zip` from the https://datadryad.org/stash/dataset/doi:10.5061/dryad.d2547d81z  
- Move the unzipped contents into a subfolder of the current folder called `./annotated_data/`

## Prepare audio data

To prepare audio data for machine learning, we need to convert our annotated data into clip-level labels.

These steps are covered in depth in other tutorials, so we'll just set our clip labels up quickly for this example.

First, get exactly matched lists of audio files and their corresponding selection files:

In [3]:
# Set the current directory to where the dataset is downloaded
dataset_path = Path("./annotated_data/")

# Make a list of all of the selection table files
selection_files = glob(f"{dataset_path}/Annotation_Files/*/*.txt")
selection_files[:5]

['annotated_data/Annotation_Files/Recording_1/Recording_1_Segment_31.Table.1.selections.txt',
 'annotated_data/Annotation_Files/Recording_1/Recording_1_Segment_07.Table.1.selections.txt',
 'annotated_data/Annotation_Files/Recording_1/Recording_1_Segment_36.Table.1.selections.txt',
 'annotated_data/Annotation_Files/Recording_1/Recording_1_Segment_04.Table.1.selections.txt',
 'annotated_data/Annotation_Files/Recording_1/Recording_1_Segment_35.Table.1.selections.txt']

In [4]:

# Create a list of audio files, one corresponding to each Raven file
# (Audio files have the same names as selection files with a different extension)
audio_files = [f.replace('Annotation_Files','Recordings').replace('.Table.1.selections.txt','.mp3') for f in selection_files]

In [5]:
!ls annotated_data

README.txt            test_set.csv          valid_set.csv
[34mRecordings[m[m            train_and_val_set.csv [34mwav_files[m[m
[34mannotation_Files[m[m      train_set.csv


Next, convert the selection files and audio files to a `BoxedAnnotations` object, which contains the time, frequency, and label information for all annotations for every recording in the dataset.

In [6]:
from opensoundscape.annotations import BoxedAnnotations
# Create a dataframe of annotations
annotations = BoxedAnnotations.from_raven_files(
    selection_files,
    audio_files)

  all_annotations = pd.concat(all_file_dfs).reset_index(drop=True)


In [7]:
!ls annotated_data/

README.txt            test_set.csv          valid_set.csv
[34mRecordings[m[m            train_and_val_set.csv [34mwav_files[m[m
[34mannotation_Files[m[m      train_set.csv


When extracting the downloaded database, there were two folders each containing the source recordings. These were the wav and mp3 folders. I had to renamne the mp3 folder to Recordings to match the path expected by this notebook. 

In [8]:
# %%capture
# # Parameters to use for label creation
# clip_duration = 3.0
# clip_overlap = 0.0
# min_label_overlap = 0.25
# species_of_interest = ["NOCA", "EATO", "SCTA", "BAWW", "BCCH", "AMCR", "NOFL"]

# # Create dataframe of one-hot labels
# clip_labels = annotations.one_hot_clip_labels(
#     clip_duration = clip_duration, 
#     clip_overlap = clip_overlap,
#     min_label_overlap = min_label_overlap,
#     class_subset = species_of_interest # You can comment this line out if you want to include all species.
# )

If you wanted, you could load the training and testing set from these saved CSV files.

In [9]:
# train_and_val_set = pd.read_csv('./annotated_data/train_and_val_set.csv',index_col=[0,1,2])
# test_set = pd.read_csv('./annotated_data/test_set.csv',index_col=[0,1,2])

### Load the CNN model we trained earlier

In [10]:
model = opso.cnn.load_model('opso_model')

### load the test and validation sets from disk

In [11]:
test_set = pd.read_csv('./annotated_data/test_set.csv',index_col=[0,1,2])
valid_df = pd.read_csv('./annotated_data/valid_set.csv',index_col=[0,1,2])

### Check model device

If a GPU is available on your computer, the CNN object automatically selects it for accellerating performance. You can override `.device` to use a specific device such as `cpu` or `cuda:3`

In [12]:
print(f'model.device is: {model.device}')

model.device is: mps


training on mps (Apple Silicon GPU) requires PyTorch >= 2.1.0. If we have an older 

In [13]:
# if model.device ==  torch.device('mps'):
#     model.device=torch.device('cpu')

## Let's take a look at the validation set used to evaluate this model during training.

In [14]:
valid_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,NOCA,EATO,SCTA,BAWW,BCCH,AMCR,NOFL
file,start_time,end_time,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
annotated_data/Recordings/Recording_2/Recording_2_Segment_09.mp3,75.0,78.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0
annotated_data/Recordings/Recording_1/Recording_1_Segment_36.mp3,12.0,15.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
annotated_data/Recordings/Recording_2/Recording_2_Segment_03.mp3,60.0,63.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
annotated_data/Recordings/Recording_1/Recording_1_Segment_02.mp3,9.0,12.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
annotated_data/Recordings/Recording_2/Recording_2_Segment_10.mp3,9.0,12.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0


The validation set contains a random sample of clips from areas 1, 2 and 3 - so the model has never seen these clips before, but it has seen clips from the same location. Let's see how the model performs on these clips compared with the ones in the withheld test set

In [18]:
import warnings



In [26]:
valid_preds = model.predict(valid_df, batch_size=32)
test_preds = model.predict(test_set, batch_size=32)


  0%|          | 0/16 [00:00<?, ?it/s]

  "labels": torch.tensor([s.labels for s in samples]),
  "labels": torch.tensor([s.labels for s in samples]),
  "labels": torch.tensor([s.labels for s in samples]),
  "labels": torch.tensor([s.labels for s in samples]),
  "labels": torch.tensor([s.labels for s in samples]),
  "labels": torch.tensor([s.labels for s in samples]),


KeyboardInterrupt: 

In [None]:
valid_preds = model.predict(valid_df, batch_size=32)
test_preds = model.predict(test_set, batch_size=32)

  0%|          | 0/16 [00:00<?, ?it/s]

  "labels": torch.tensor([s.labels for s in samples]),
  "labels": torch.tensor([s.labels for s in samples]),
  "labels": torch.tensor([s.labels for s in samples]),


KeyboardInterrupt: 

# Measure performance on the validation and test sets

Multi target metrics on the validation set:

In [None]:
len(valid_df["AMCR"])

510

In [None]:
# drop all columns except amcr
# use multi target evaluation function then use the amcr column to get the AMCR

valid_metrics = opso.metrics.multi_target_metrics(valid_df.values, valid_preds, valid_df.columns, threshold=0.5)['AMCR']
test_set_metrics = opso.metrics.multi_target_metrics(test_set.values, test_preds, test_set.columns, threshold=0.5)['AMCR']

pd.DataFrame([valid_metrics, test_set_metrics], index=['valid', 'test_set'])

Unnamed: 0,au_roc,avg_precision,precision,recall,f1,support
valid,0.993753,0.991293,0.972414,0.946309,0.959184,149
test_set,0.749758,0.191354,0.255319,0.122449,0.165517,196


The scores are much lower on the withheld test set than on the validation set. 

$$
precision = [
\frac{TP}{TP + FP}
]
$$ 

$$
recall = [
\frac{TP}{TP + FN}
]

$$
$$
f1 = [
\frac{2 * precision * recall}{precision + recall}
]


$$
$$
accuracy = [
\frac{TP + TN}{TP + TN + FP + FN}
]


$$
$$
specificity = [
\frac{TN}{TN + FP}
]
$$

$$ 
support = [
TP + FN
]
$$


There's about 4600 clips in the training set and about 510 in the validation set for the AMRO for this example. The test set contains 196

In [None]:
len(valid_df.AMCR), sum(valid_df.AMCR == 1), len(test_set.AMCR), sum(test_set.AMCR == 1)

(510, 149, 2600, 196)

# do the other pretrained models do any better?
See how BirdNET performs on this audio

In [None]:
torch.hub.list('kitzeslab/bioacoustics-model-zoo')
# All of these models require tensorflow, so I'll evaluate them in a separate notebook.
birdnet = torch.hub.load('kitzeslab/bioacoustics-model-zoo', 'BirdNET')

Using cache found in /Users/mikeg/.cache/torch/hub/kitzeslab_bioacoustics-model-zoo_main
Using cache found in /Users/mikeg/.cache/torch/hub/kitzeslab_bioacoustics-model-zoo_main


downloading model from URL...
Downloaded completed: BirdNET_GLOBAL_6K_V2.4_Model_FP16.tflite


INFO: Created TensorFlow Lite XNNPACK delegate for CPU.


Downloaded completed: BirdNET_GLOBAL_6K_V2.4_Labels_af.txt


In [None]:
# Check the location of the american crow
birdnet_classes = birdnet.classes
birdnet_classes[1575]


'Corvus brachyrhynchos_American Crow'

In [None]:
# predict on validation set using birdnet
birdnet_preds = birdnet.predict(valid_df, batch_size=32)




  0%|          | 0/16 [00:00<?, ?it/s]

In [None]:


def probs_from_logits(logits, threshold=0.5):
    logits = torch.sigmoid(torch.tensor(logits))
    return (logits > threshold).float()


In [None]:

opso_amcr_preds = valid_preds["AMCR"].values
birdnet_amcr_preds = birdnet_preds['Corvus brachyrhynchos_American Crow']

opso_amcr_preds = probs_from_logits(opso_amcr_preds)
birdnet_amcr_preds = probs_from_logits(birdnet_amcr_preds)


In [None]:
# Opso metrics on validation set
opso_precision = opso.metrics.M.precision_score(valid_df["AMCR"].values, opso_amcr_preds)
opso_recall = opso.metrics.M.recall_score(valid_df["AMCR"].values, opso_amcr_preds)
opso_precision, opso_recall

(0.959731543624161, 0.959731543624161)

In [None]:
# BirdNET metrics on validation set
birdnet_precision = opso.metrics.M.precision_score(valid_df["AMCR"].values, birdnet_amcr_preds)
birdnet_recall = opso.metrics.M.recall_score(valid_df["AMCR"].values, birdnet_amcr_preds)
birdnet_precision, birdnet_recall

(1.0, 0.8120805369127517)

# Get metrics on the test set for both models

In [None]:
# predict on test set using birdnet
birdnet_test_logits = birdnet.predict(test_set, batch_size=32)



  0%|          | 0/82 [00:00<?, ?it/s]

In [None]:
birdnet_test_probs = probs_from_logits(birdnet_test_logits['Corvus brachyrhynchos_American Crow'])
birdnet_precision_testset = opso.metrics.M.precision_score(test_set["AMCR"].values, birdnet_test_probs)

  logits = torch.sigmoid(torch.tensor(logits))


In [None]:
birdnet_test_recall = opso.metrics.M.recall_score(test_set["AMCR"].values, birdnet_test_preds)

In [None]:
# Get opso predicitions from test set
opso_test_logits = model.predict(test_set, batch_size=32)

  0%|          | 0/82 [00:00<?, ?it/s]

  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels": torch.Tensor([s.labels for s in samples]),
  "labels"

In [None]:
opso_probs = probs_from_logits(opso_test_logits["AMCR"].values)
opso_precision_testset = opso.metrics.M.precision_score(test_set["AMCR"].values, opso_probs)
opso_recall_testset = opso.metrics.M.recall_score(test_set["AMCR"].values, opso_probs)

In [None]:
birdnet_precision_testset, birdnet_test_recall, opso_precision_testset, opso_recall_testset

(0.9411764705882353, 0.08163265306122448, 0.272, 0.17346938775510204)

# Results are inconlusive
It looks as though birdnet is better since it has higher precision, but the recall is lower at this threshold. More thresholds or f1 score should be caluclated to shed more light on this. 

Once this is finished running, you have trained the CNN. 

**Clean up:** Run the following cell to delete the files created in this tutorial. However, these files are used in other tutorials, so you may wish not to delete them just yet.

In [None]:
# import shutil
# shutil.rmtree('./annotated_data')
# shutil.rmtree('./wandb')
# shutil.rmtree('./model_training_checkpoints')
# Path('annotation_Files.zip').unlink()
# Path('mp3_Files.zip').unlink()