# **Meerkat Call Detection Framework**

This notebook allows you to go through the full process of training a CNN on meerkat calls, running the CNN on new files to extract potential calls, and then evaluating the effectiveness using ROC curves. You can go through all steps or perform a subset of them, but first choose which files you will be using for training and testing.

## Important note
You may have to authenticate to allow access to Google Drive. Run the code below and it will automatically ask you to authenticate.

## Set parameters
Here, you set up the parameters needed for the training and testing. Run the cell below before you run the rest of the notebook!

**Directories**

All directories need to be accessible from your Google Drive.

*audio_dir*: This should contain 3 subfolders. 'calls' and 'noise' are folders where the call and noise snippets used for training are located. 'long_recordings' is where full audio files are

*groundtruth_dir*: This is the directory containing csv files of labeled data

*model_dir*: This directory contains fitted models

*output_dir*: This is where the output (.pckl and .csv files) will be stored

*code_dir*: Directory where code is stored

**Files**

*model_name*: file name for the output model, or for the model to load (in the case of using a pre-trained model)

**Training parameters**

*epochs*: number of epochs to train for

*batch_size*: batch size of data to use

*steps_per_epoch*: how many training steps per epoch

**Prediction parameters**

*audio_file*: name of audio file to run prediction on

*t_start*: start time for predictions within audio file (sec)

*t_end*: end time for predictions within audio file (sec)

**Other parameters (probably don't need to change)**

*samprate*: sample rate of audio file, should be 8000 for current code (code will likely not work with other sample rates!)

*chunk_size*: number of seconds to read in each time wav is accessed directly

*chunk_pad*: pad chunks of wavs on each end to avoid any issues - 1 sec is fine

In [3]:
#PARAMETERS - Modify parameters before running to change settings!
#-------------------------------------------------------------------------------

#---------TO CHANGE--------:

#General
use_pretrained_model = True
run_model_on_file = False
run_model_on_folder = False
run_model_on_specified_round = True
specified_round = 2
specified_model_name = 'cnn_20epoch_proportional_aug_2dconv_20181218.h5' #defaults to None. use only if use_pretrained_model is True or if you want to specify the name of the model directly
run_only_where_ground_truth_available = False #set to True to run only on parts of files where groundtruth labels are available
evaluate_detections = False

#Model fitting options (use only if use_pretrained_model is False)
epochs = 20 #Number of epochs to train for
augment = True #whether to augment by overlaying noise (at different levels) on calls
conv_dimension = 2

#name of audio file (use only if run_model_on_file is True)
audio_file_to_predict = 'HM_PET_R11_20170903-20170908_file_2_(2017_09_03-05_44_59)_ASWMUX221163_SS.wav'

#Name of audio folder for prediction (use only if run_model_on_folder is True)
audio_folder = None

#Probability of selecting each call type for training (call types given below). If None, choose calls with probability equal to their occurrence in the training data
call_probs = None

#---------TO LEAVE ALONE (PROBABLY)--------:

#Main directory
base_dir = '/home/arianasp/meerkat_detector' #base directory

#Subdirectories
audio_dir = base_dir + '/data/full_recordings'
ground_truth_dir = base_dir + '/ground_truth'
model_dir = base_dir + '/models'
output_dir = base_dir + '/predictions'
code_dir = base_dir + '/dev'
clips_dir = base_dir + '/clips'
eval_dir = base_dir + '/eval'

#Training parameters
batch_size = 100
steps_per_epoch = 1000

call_types = ['cc','sn','ld','mov','agg','alarm','soc','hyb','unk','oth']

#Evaluation parameters
boundary_thresh = 0.6
n_points = 30
pckl_paths = [] #NOTE: One can specify pckl_paths, but this is only useful when running evluation ALONE, otherwise anything specified here gets cleared

#Other parameters
samprate = 8000 
chunk_size = 60 
chunk_pad = 1 
mel = False
ml_plan_file = base_dir + '/docs/' + 'audio_labeling_plan_filenames.csv'

#SETUP
#-------------------------------------------------------------------------------

#import libraries
import sys
import os
import wave
import time
import glob

#Set path
sys.path.append(code_dir)

#Import call detector library
from meerkat_call_detector_library import *

#NEW MODEL CONSTRUCTION DEFINITIONS (TO MOVE LATER)
#inputs: originally spectrogram or output of upper layer
#filters: number of filters to use (arbitrary)
#n_convs: number of consecutive convolutions to do
#output will be time_dim(inputs)/2 x n_filters

import keras.layers as layers

#1d code - working (for creating models)
def conv_pool(inputs, filters, n_convs=3):
    conv = inputs
    for idx in range(n_convs):
        conv = Conv1D(filters, (3), padding='same')(conv)
        conv = Activation('relu')(conv)
    conv = AveragePooling1D()(conv)

    return conv


def conv_upsample_OLD(inputs, residual, filters):
    conv = Conv1D(filters, (3), padding='same')(inputs)
    conv = Activation('relu')(conv)
    conv = UpSampling1D()(conv)
    
    print(conv)
    print(residual)
    
    residual = Conv1D(filters, (1))(residual)
    conv = Add()([conv,residual])

    return conv

def conv_upsample(inputs, residual, filters):
    conv = Conv1D(filters, (3), padding='same')(inputs)

    residual = Conv1D(filters, (1))(residual)
    conv = Add()([conv,residual])
    conv = Activation('relu')(conv)
    conv = UpSampling1D()(conv)

    return conv

def construct_unet_model():
    #input_layer = Input(batch_shape=(None,None,None))
  input_layer = Input(batch_shape=(None,512,128))
  conv = conv_pool(input_layer, 32)
  res1 = conv
  outputs = []
    
    #could modify to more layers (more than 5)
  for idx in range(5):
      conv = conv_pool(conv, 32*(idx+1))
      outputs.append(conv)

  for idx in range(5):
      conv = conv_upsample(conv, outputs[-(idx+1)], 32*(5-idx))

  conv = conv_upsample(conv, res1,  32)

    #fully connected layer equivalent
  conv = Conv1D(1, (3), padding='same')(conv)
  conv = Activation('sigmoid')(conv)

  model = Model(input_layer, conv)

#change optimizer to ADAM?
  model.compile(RMSprop(lr=2.5e-4), loss='binary_crossentropy')
  
  return model

######DONT MESS WITH ABOVE

#2d CNN functions
def conv_pool_2d(inputs, filters, n_convs=3):
    conv = inputs
    for idx in range(n_convs):
        conv = Conv2D(filters, (3,3), padding='same')(conv)
        conv = Activation('relu')(conv)
    conv = AveragePooling2D()(conv)

    return conv

def conv_upsample_2d(inputs, residual, filters):
    conv = Conv2D(filters, (3,3), padding='same')(inputs)

    residual = Conv2D(filters, (1,1))(residual)
    conv = Add()([conv,residual])
    conv = Activation('relu')(conv)
    conv = UpSampling2D()(conv)

    return conv

def construct_unet_model_2d():
  input_layer = Input(batch_shape=(None,512,128,1))
  conv = conv_pool_2d(input_layer, 32)
  res1 = conv
  outputs = []
  for idx in range(5):
      conv = conv_pool_2d(conv, 32*(idx+1))
      outputs.append(conv)

  for idx in range(5):
      conv = conv_upsample_2d(conv, outputs[-(idx+1)], 32*(5-idx))

  conv = conv_upsample_2d(conv, res1,  32)

  # reduce the channels to 1 using a 1x1 2D convolution
  conv = layers.Conv2D(1, (1,1), padding='same')(conv)
  # conv should be shape (512,128,1) at this point
  # reshape to remove the last dimension so we can use 1D convolution
  conv = layers.Reshape((512,128))(conv)
  # reduce the last dimension to 1 using a 1x 1D convolution
  logits = layers.Conv1D(1, 1, padding='same')(conv)
  # logits should be shape (512,1)
  probs = layers.Activation('sigmoid')(logits)
  probs = layers.Reshape((512,1,1))(probs)


  model = Model(input_layer, probs)
  model.compile(RMSprop(lr=2.5e-4), loss='binary_crossentropy')
  
  return model

#Set up file names and paths
aug_str = 'noaug'
if(augment):
    aug_str = 'aug'
    
dim_str = '_1dconv_'
if(conv_dimension==2):
    dim_str = '_2dconv_'

if(call_probs is not None):
    call_probs_str = ''.join([str(s) + '_' for s in call_probs])
else:
    call_probs_str = 'proportional_'

#if model name was specified, use specified name here. otherwise construct name based on parameters.
if(specified_model_name is not None):
    model_name = specified_model_name
else:
    model_name = 'cnn_' + str(epochs) + 'epoch_' +  call_probs_str + aug_str + dim_str + time.strftime('%Y%m%d') + '.h5'
model_path = model_dir + '/' + model_name

#create threshold range for evaluation
thresh_range = np.linspace(boundary_thresh+.0001,.9,10)

for i in range(2,n_points-9):
    thresh_range = np.append(thresh_range,thresh_range[len(thresh_range)-1]+10**(-i)*9)
    
#TRAIN OR LOAD MODEL
#-------------------------------------------------------------------------------
        
if(use_pretrained_model):

    print("-------- Loading pretrained model --------")
    print('Model name: ' + model_name)
  
    #Load pre-trained model
    model = load_model(model_dir + '/' + model_name)

else:
  
    print("-------- Training new model --------")
    print('Start time:')
    print(datetime.datetime.now())

    #Construct model
    if(conv_dimension==1):
        model = construct_unet_model()
    else:
        model = construct_unet_model_2d()
    model.summary()

    #Fit model
    model.fit_generator(data_generator(clips_dir = clips_dir,batch_size = batch_size, cnn_dim=conv_dimension,mel=mel), epochs=epochs, use_multiprocessing=True, workers=16, steps_per_epoch=steps_per_epoch)

    print('End time:')
    print(datetime.datetime.now())

    #Save fitted model
    print('Saving model as: ' + model_name)
    model.save(filepath=model_dir + '/' + model_name) 

print("-------- Done with training or loading model step --------")


#RUN MODEL TO DETECT CALLS
#-------------------------------------------------------------------------------
#Extract probable calls from wav recording
                    
if(run_model_on_folder):
    
    print("-------- Running model on folder --------")
    print('Folder path = ' + audio_folder)
    
    #get all audio files in that folder (or subfolders of it, recursively)
    audio_files = glob.glob(audio_folder + '/**/' + '*.wav',recursive=True)
    
    #print number of files found
    print('Found ' + str(len(audio_files)) + ' audio files, running model on all of them')
    
if(run_model_on_specified_round):
    
    print("-------- Running model on specified round --------")
    print('Round = ' + str(specified_round))

    ml_plan = pandas.read_csv(ml_plan_file)
    files_to_run = ml_plan[(ml_plan['Round (0-1)'] == str(specified_round)) | (ml_plan['Round (1-2)'] == str(specified_round)) | (ml_plan['Round (2+)'] == str(specified_round))]
    files_to_run = files_to_run['Audio filename'].tolist()

    audio_files = list()

    for f_idx in range(len(files_to_run)):
        curr_file = glob.glob(base_dir + '/data/raw_data' + '/**/' + files_to_run[f_idx], recursive=True)[0]
        audio_files.append(curr_file)

    #print number of files found
    print('Found ' + str(len(audio_files)) + ' audio files, running model on all of them')

if(run_model_on_folder or run_model_on_specified_round):
    
    for i in range(len(audio_files)):
        
        audio_file = audio_files[i]
        
        print('Running predictions on file: ')
        print(audio_file)
        
        aud = wave.open(audio_file,'rb')
        
        #time bounds for extraction
        if(run_only_where_ground_truth_available):
            labels = get_ground_truth_labels(wav_name = os.path.basename(audio_file), ground_truth_dir = ground_truth_dir)
            if(labels is None):
                print('No ground truth data found - skipping this file')
                continue
            else:
                [t_start, t_end] = get_start_end_time_labels(labels)
        else:
            t_start = 1
            t_end = np.floor(aud.getnframes()/aud.getframerate()/60)*60
        
        #start at least 1 sec in to avoid problems of wrong input size in next step
        if(t_start < 1):
            t_start = 1
        
        #Store parameters in extraction_params object
        wav_path = audio_file
        audio_name = os.path.basename(audio_file)
        pckl_path = output_dir + '/' + audio_name[0:(len(audio_name)-4)] + "_label_" + model_name[0:(len(model_name)-3)] + '_' + str(t_start) + '-' + str(t_end) + ".pckl"
        
        #Append to list of created pckl paths
        pckl_paths.append(pckl_path)
        
        #if path to extraction results already exists, do not run. otherwise run.
        if(not(os.path.exists(pckl_path))):
            extraction_params = CallExtractionParams(model_path = model_path, wav_path = wav_path, pckl_path=pckl_path, samprate = samprate, t_start = t_start, t_end = t_end)
            
            #if SOUNDFOC is in filename, this indicates a different type of sound file - don't run!
            if(re.search('SOUNDFOC',audio_file)==None):
                extract_scores(model, extraction_params, mel=mel)

#Run model on a specific file
if(run_model_on_file):
    
    print('--------Running model on specific file---------')

    #create paths to prediction files (wav and pckl)
    wav_path = audio_dir + '/' + audio_file_to_predict
    
    print(wav_path)
    
    aud = wave.open(wav_path,'rb')
    
    #tibase_dir = '/home/arianasp/meerkat_detector'me bounds for extraction
    if(run_only_where_ground_truth_available): #TODO: add option to find labeled portion and run only for this
        labels = get_ground_truth_labels(wav_name = audio_file_to_predict, ground_truth_dir = ground_truth_dir)
        if(labels is None):
            print('No ground truth data found - set run_only_where_ground_truth_available to False to run on this file')
            t_start = None
        else:
            [t_start, t_end] = get_start_end_time_labels(labels)
        #start at least 1 sec in to avoid problems with wrong matrix size in next step
        if t_start < 1:
            t_start = 1
    else:
        t_start = 1
        t_end = np.floor(aud.getnframes()/aud.getframerate()/60)*60
        
    if(t_start is not None):
    
        #create path to output file
        pckl_path = output_dir + '/' + audio_file_to_predict[0:(len(audio_file_to_predict)-4)] + "_label_" + model_name[0:(len(model_name)-3)] + '_' + str(t_start) + '-' + str(t_end) + ".pckl"
    
        pckl_paths = [pckl_path]
        extraction_params = CallExtractionParams(model_path = model_path, wav_path = wav_path, pckl_path = pckl_path, samprate = samprate, t_start = t_start, t_end = t_end)
        extract_scores(model, extraction_params, mel=mel)
                
#EVALUATE DETECTIONS
#-------------------------------------------------------------------------------
if(evaluate_detections & (pckl_paths is not None)):
    
    #for file_idx in range(len(pckl_files)):
    for file_idx in range(len(pckl_paths)):

        pckl_path = pckl_paths[file_idx]

        run_evaluation(pckl_path = pckl_path,thresh_range=thresh_range,save_dir =eval_dir,ground_truth_dir = ground_truth_dir,call_types = call_types, verbose = False)


-------- Loading pretrained model --------
Model name: cnn_20epoch_proportional_aug_2dconv_20181218.h5
-------- Done with training or loading model step --------
-------- Running model on specified round --------
Round = 2
Found 11 audio files, running model on all of them
Running predictions on file: 
/home/arianasp/meerkat_detector/data/raw_data/AUDIO1/HM_VCVM001_HMB_AUDIO_R08_20170812-20170815/HM_VCVM001_HMB_AUDIO_R08_file_7_(2017_08_08-06_44_59)_ASWMUX221153.wav
-------Running CNN model on new data-------
model_path: /home/arianasp/meerkat_detector/models/cnn_20epoch_proportional_aug_2dconv_20181218.h5
wav_path: /home/arianasp/meerkat_detector/data/raw_data/AUDIO1/HM_VCVM001_HMB_AUDIO_R08_20170812-20170815/HM_VCVM001_HMB_AUDIO_R08_file_7_(2017_08_08-06_44_59)_ASWMUX221153.wav
samprate: 8000
t_start: 1
t_end: 12600.0

--------Generating predictions-------
Start time:
2018-12-19 09:42:38.901716
End time:
2018-12-19 10:04:40.048375

-------Generating scores and extracting calls-------

End time:
2018-12-19 14:13:55.907517

-------Generating scores and extracting calls-------
-------Saving output-------
Done
