This is Part V of the notebook with some experiments in using the Yamnet pre-trained neural network for classifying and evaluating bird audio recordings from the Cacophony project.

The focus on this notebook is examining outliers from the larger scale testing identified in [part IV](Yamnet_Audio_Classification_Experiments_Part_4.ipynb)


See [part I](Yamnet_Audio_Classification_Experiments_Part_1.ipynb) for a full background

Quentin McDonald <br>
October  2021

In [5]:
import csv
import io
import os
import os.path
import glob
import datetime

import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_io as tfio

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib import patches
from IPython import display
import pydub
from scipy import signal
import pandas as pd
from tqdm import notebook,trange
import pickle

# Common code moved to utils:
import utils

import score

In [6]:
SAMPLE_RATE = 16000 # Work at 16000 sampling rate
LOW_PASS_CUTOFF = 4000
HIGH_PASS_CUTOFF = 2000

In [73]:
model,class_names = utils.load_model_and_class_names()
BIRDS_CLASSES = utils.BIRDS_CLASSES

<H2> Investigating the Yamnet scoring for outliers </H2>

During scoring of more than a year's data it became apparent that quite high threshold scores where being encountered at night when bird song should be minimal. Can we examine the scores in detail to see what's going on and if the scoring paramaters could be adjusted to remove these outliers?


Some useful ideas and code fragments are from [this article](https://analyticsindiamag.com/guide-to-yamnet-sound-event-classifier/)


In [43]:
def list_yamnet_classes(filename,model,class_names,
                      low_pass_cutoff = None, 
                      high_pass_cutoff = None,
                      top_n_classifications = 5,
                      start_time = 0,
                      end_time = 40,
                      offset = 0):
    """
    Read audio data from the wave file given by "filename"
    Run the sound against the model and list the top N classes that scored 
    
    Data will only be read from start_time to end_time
    """

    wave_data = utils.load_audio_16k_mono(filename, out_sample_rate=SAMPLE_RATE,
                                             start_time=start_time,end_time=end_time)   
    
    if low_pass_cutoff != None:
        wave_data =utils.butter_lowpass_filter( wave_data, low_pass_cutoff, SAMPLE_RATE, order=5)
    
    if high_pass_cutoff != None:
        wave_data =utils.butter_highpass_filter( wave_data, high_pass_cutoff, SAMPLE_RATE, order=5)
        
   
    # Evaluate the data against the model:
    scores, embeddings, spectrogram = model(wave_data[int(offset*SAMPLE_RATE):])
    
    scores_np = scores.numpy()
    mean_scores = np.mean(scores_np, axis=0)
    
    top_class_indices = np.argsort(mean_scores)[::-1][:top_n_classifications]
    
    bn = os.path.basename(filename)
    fname = os.path.splitext(bn)[0]
    fid = fname.split("-")[1]
    print("{:15s} https://browse.cacophony.org.nz/recording/{}".format(fname,fid))
    for i in top_class_indices:
        print("   {:40s} {:5.2f}  ".format(class_names[i], mean_scores[i]))
    

In [46]:
# Prfint top classes for a number of outliers from databse scoring:

for f in glob.glob("outliers/*.mp3"):
    list_yamnet_classes(f,model,class_names,LOW_PASS_CUTOFF,HIGH_PASS_CUTOFF)


20201002-691982 https://browse.cacophony.org.nz/recording/691982
   Owl                                       0.74  
   Animal                                    0.53  
   Wild animals                              0.53  
   Bird                                      0.37  
   Chirp tone                                0.20  
20210830-924373 https://browse.cacophony.org.nz/recording/924373
   Wild animals                              0.61  
   Animal                                    0.59  
   Bird                                      0.57  
   Bird vocalization, bird call, bird song   0.31  
   Chirp, tweet                              0.24  
20210414-827652 https://browse.cacophony.org.nz/recording/827652
   Animal                                    0.58  
   Wild animals                              0.57  
   Owl                                       0.53  
   Bird                                      0.39  
   Frog                                      0.30  
20210828-923415 https://b

Trying the same without filtering

In [47]:
for f in glob.glob("outliers/*.mp3"):
    list_yamnet_classes(f,model,class_names,None,None)


20201002-691982 https://browse.cacophony.org.nz/recording/691982
   Waterfall                                 0.66  
   Water                                     0.50  
   Stream                                    0.17  
   Gurgling                                  0.16  
   Rain                                      0.15  
20210830-924373 https://browse.cacophony.org.nz/recording/924373
   Rowboat, canoe, kayak                     0.33  
   Boat, Water vehicle                       0.25  
   Vehicle                                   0.12  
   Water                                     0.10  
   Liquid                                    0.09  
20210414-827652 https://browse.cacophony.org.nz/recording/827652
   Noise                                     0.32  
   Pink noise                                0.21  
   Waterfall                                 0.20  
   Water                                     0.11  
   Vehicle                                   0.11  
20210828-923415 https://b

In [90]:
 # Compare scoring the outliers with and without filtering:
def score_by_filtering(path,  model, birds_classes, desc="",):
    """
    Score the all the files in "path" with and without bandpass filtering and compare the scores:
    
    """
    
    print("Scoring for {} with and without bandpass filtering".format(desc))
    print("{:20s} {:^18s} {:^22s}".format("File","Filtering", "No Filtering"))
    print("{:20s} {:10s} {:10s} {:10s} {:10s}".format("","Thresh", "Class", "Thresh","Class"))
    print("-"*60)
    for f in glob.glob("{}/*.mp3".format(path)):
        bn = os.path.basename(f)
        fname = os.path.splitext(bn)[0]
        fid = fname.split("-")[1]
        (filter_class,filter_thresh) = score.score_audio_file(f,model,birds_classes,num_offsets=3,
                                                              low_pass_cutoff=LOW_PASS_CUTOFF,
                                                             high_pass_cutoff=HIGH_PASS_CUTOFF)
        (nofilter_class,nofilter_thresh) = score.score_audio_file(f,model,birds_classes,num_offsets=3,
                                                              low_pass_cutoff=None,
                                                             high_pass_cutoff=None)
        print("{:20s} {:5.2f} {:10.2f} {:10.2f} {:10.2f}   https://browse.cacophony.org.nz/recording/{}".format(
                    fname,filter_thresh, filter_class,nofilter_thresh,nofilter_class, fid))
        
        

In [91]:
score_by_filtering("outliers", model, BIRDS_CLASSES, desc="outliers")

Scoring for outliers with and without bandpass filtering
File                     Filtering           No Filtering     
                     Thresh     Class      Thresh     Class     
------------------------------------------------------------
20201002-691982       0.98       0.99       0.00       0.00   https://browse.cacophony.org.nz/recording/691982
20210830-924373       0.71       0.89       0.00       0.01   https://browse.cacophony.org.nz/recording/924373
20210414-827652       0.78       0.81       0.00       0.00   https://browse.cacophony.org.nz/recording/827652
20210828-923415       0.78       0.92       0.01       0.02   https://browse.cacophony.org.nz/recording/923415
20210728-897118       0.61       0.80       0.00       0.00   https://browse.cacophony.org.nz/recording/897118
20210323-751989       0.90       0.90       0.00       0.00   https://browse.cacophony.org.nz/recording/751989
20200709-624800       0.94       0.91       0.00       0.00   https://browse.cacophony.o

In [92]:
score_by_filtering("top scorers/", model, BIRDS_CLASSES, desc="top scorers")

Scoring for top scorers with and without bandpass filtering
File                     Filtering           No Filtering     
                     Thresh     Class      Thresh     Class     
------------------------------------------------------------
20201213-744036       0.99       0.99       0.98       0.99   https://browse.cacophony.org.nz/recording/744036
20210927-947877       0.99       0.99       0.57       0.76   https://browse.cacophony.org.nz/recording/947877
20210124-778354       0.99       0.99       0.35       0.73   https://browse.cacophony.org.nz/recording/778354
20201017-703223       0.99       0.99       0.81       0.90   https://browse.cacophony.org.nz/recording/703223
20201205-771693       0.99       0.99       0.65       0.94   https://browse.cacophony.org.nz/recording/771693
20200914-673189       0.99       0.99       0.98       0.99   https://browse.cacophony.org.nz/recording/673189
20201215-745503       0.99       0.99       0.94       0.98   https://browse.cacophon

<H2> Conclusion </H2>

Larger scale testing shows that in fact bandpass filtering is <i>not</i> the correct strategy. This makes sense in retrospect as the original Yamnet training was done without filtering. When filtering is done one ends up with various noises incorrectly being classified as bird songs so we end up with some false positives. No filtering gives more subtlety to those which were previously top scoring. Future database wide scoring should be done without filtering.