# Introduction

Welcome to the agile modelling python notebook. 

## What is a python notebook?

A python notebook allows you to run python code in a python environment. If you are running this notebook in Google Colab, the python notebook is running in a virtual machine in the cloud. 

## Overview

In this notebook, we will use a process called "Agile Modelling" to build a labelled dataset and from this build and incrementally improve a classifier for acoustic analysis using Perch embeddings. 
These are the steps we will take:

1. Installing and importing Perch and other requirements
2. Configuring the Perch agile modelling modules
3. Creating a database of embeddings
4. Searching your recordings for similarity to single queries
5. Building a machine learning classifier model from the search results
6. Searching your recordings based on their results in the classifier
7. Improving your classifier based on these results


# Installing and importing perch and dependencies

You are running this notebook in a python environment. We need to add the Perch package to this environment. We do this by running the `pip install` command below. You only need to do this once, however if you are running this notebook in the cloud on an Google Colab, your session is only ephemaral. You need to rerun his this cell after disconnecting. 

After we have the python packaeges installed, we import them.

In [None]:
!pip install git+https://github.com/QutEcoacoustics/perch.git@a9fd115c6e3551beb521ee25c53f096432d4a411

In [None]:
#@title Imports. { vertical-output: true }
from pathlib import Path
from chirp.projects.agile2.agile_modeling_state import agile2_config, agile2_state, download_embeddings, Helpers

# Linking to google drive

We will need somewhere to read and write files. This colab environment where the notebook is running does not persist between sessions, so we will link to google drive for access to persistent storage. 

In [None]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
except:
    print("colab not available")

# Configuration

Here we set some configuration for names and local filepaths and initialize our agile modeling workflow.

Your Ecosounds "auth_token" can be found by logging in to https://www.ecosounds.org, then clicking on your profile picture in the top left. You can copy your auth token from this profile page. 

In [None]:
from pathlib import Path

config = agile2_config()

config.search_dataset_name="", #@param {type:'string'}
config.annotator_id="", #@param {type:'string'}
config.baw_config= {}


# If you followed the above instructions for creating a shortcut to the Drive folder, 
# you should be able to navigate to this directory in the left hand "Files" menu 
# in this Colab (indicated by the Folder icon on the far left menu).

base_folder = '/content/drive/My Drive/'

# This is the location on google drive that this tutorial will use to save data.
working_folder = base_folder + 'esa2024_data/'

config.db_path = working_folder + 'db/db.sqlite'
config.embeddings_folder = working_folder + 'embeddings/'
config.labeled_examples_folder = working_folder + 'labeled_examples/'


Path(config.labeled_examples_folder).mkdir(exist_ok=True)


#config.from_json("../../../local/esa/agile_config.json")

agile = agile2_state(config)

### Configuring access to your Ecosounds project

We will be loading audio from Ecosounds, so we need to provide the Ecosounds auth token associated with your ecosounds account. 
Because this is a secret, we should avoid saving it in plain text in the notebook. If working in colab, you can add the secret in the secrets section on the left (key icon). 

In [None]:
import os
auth_token = os.getenv('BAW_AUTH_TOKEN')

try:
    from google.colab import drive
    auth_token = userdata.get('BAW_AUTH_TOKEN', '')
except:
    print("colab not available")


config.baw_config['auth_token'] = userdata.get('BAW_AUTH_TOKEN', '') if userdata else ''
config.baw_config['domain'] = 'api.ecosounds.org'

# Create embeddings database

Here we retrieve the files of embeddings for the recordings that we will be searching in and put them in the right format for working with them. 

In [None]:
# Download audio embeddings to the working folder
# this might take a while

download_embeddings('yellow_bellied_glider', config.embeddings_folder)

In [None]:
# using the downloaded embeddings, create a database of embeddings.
# This database links labels to embeddings so we can train our classifier
# this might take a while
agile.create_database(config.embeddings_folder)

In [None]:
agile.initialize()

# Search

Here, we take a single example and find the examples in our search set which most closely match that example. This is a way to get started with a labelled training set.

In [None]:
#@title Load query audio. { vertical-output: true }

# Put your labelled examples in a folder on your mounted Google Drive, 
# then specify the path here. 
path_to_labeled_examples = config.labeled_examples_folder
audio_files = Helpers.list_audio_files(path_to_labeled_examples)

# choose one of the audio examples in the labeled examples folder
query_uri = audio_files[0]

# or specify a path or url
#@markdown The `query_uri` can be a URL, filepath, or Xeno-Canto ID
#@markdown (like `xc777802`, containing an Eastern Whipbird (`easwhi1`)).
#query_uri = "../../../local/esa/20230513T150000+0700_Site-109_1376880___755.0.wav"  #@param {type:'string'}

agile.display_query(query_uri)

In [None]:
#@markdown Our target call-type label
query_label = 'ybg'  #@param {type:'string'}
#@markdown Number of results to retrieve.
num_results = 40  #@param
#@markdown Number of (randomly selected) database entries to search over.
sample_size = None  #@param
#@markdown When margin sampling, target this logit.
target_score = None  #@param

agile.embed_query(query_uri)

agile.search_with_query(query_label, num_results, sample_size, target_score)

In [None]:
#@title Save data labels. { vertical-output: true }

agile.save_labels()

# Classify

In [None]:
#@title Classifier training. { vertical-output: true }

#@markdown Set of labels to classify. If None, auto-populated from the DB.
target_labels = None  #@param
learning_rate = 1e-3  #@param
weak_neg_weight = 0.05  #@param
l2_mu = 0.000  #@param
num_steps = 128  #@param
train_ratio = 0.9  #@param
batch_size = 128  #@param
weak_negatives_batch_size = 128  #@param
loss_fn_name = 'bce'  #@param ['hinge', 'bce']
agile.train_classifier(target_labels, learning_rate, weak_neg_weight, l2_mu, num_steps, train_ratio, batch_size, weak_negatives_batch_size, loss_fn_name)


In [None]:
#@title Review Classifier Results. { vertical-output: true }
#@markdown Our target call-type label
query_label = 'ybg'  #@param {type:'string'}
#@markdown Number of results to retrieve.
num_results = 40  #@param
#@markdown Number of (randomly selected) database entries to search over.
sample_size = None  #@param
#@markdown When margin sampling, target this logit.
target_score = None  #@param

agile.search_with_classifier(query_label, num_results, sample_size, target_score)


In [None]:
#@title Save data labels. { vertical-output: true }

agile.save_labels()