PyHa

A tool designed to convert audio-based "weak" labels to "strong" moment-to-moment labels. Provides a pipeline to compare automated moment-to-moment labels to human labels. Current proof of concept work being fulfilled on Bird Audio clips using Microfaune predictions.

This package is being developed and maintained by the Engineers for Exploration Acoustic Species Identification Team in collaboration with the San Diego Zoo Wildlife Alliance.

PyHa = Python + Piha (referring to a bird species of our interest known as the screaming-piha)

Installation and Setup

Navigate to a desired folder and clone the repository onto your local machine. git clone https://github.com/UCSD-E4E/PyHa.git

If you wish to reduce the size of the repository on your local machine you can alternatively use git clone https://github.com/UCSD-E4E/PyHa.git --depth 1 which will only install the most up-to-date version of the repo without its history.

Install Python 3.8, Python 3.9, or Python 3.10
Create a venv by running python3.x -m venv .venv where python3.x is the appropriate python.
Activate the venv with the following commands:

Windows: .venv\Scripts\activate
macOS/Linux: source .venv/bin/activate

Install the build tools: python -m pip install --upgrade pip poetry
Install the environment: poetry install
Here you can download the Xeno-canto Screaming Piha test set used in our demos: https://drive.google.com/drive/u/0/folders/1lIweB8rF9JZhu6imkuTg_No0i04ClDh1
Run jupyter notebook while in the proper folder to activate the PyHa_Tutorial.ipynb notebook and make sure PyHa is running properly. Make sure the paths are properly aligned to the TEST folder in the notebook as well as in the ScreamingPiha_Manual_Labels.csv file

Functions

This image shows the design of the automated audio labeling system.

`isolation_parameters`

Many of the functions take in the isolation_parameters argument, and as such it will be defined globally here.

The isolation_parameters dictionary definition depends on the model used. The currently supported models are BirdNET-Lite, Microfaune, and TweetyNET.

The BirdNET-Lite isolation_parameters dictionary is as follows:

isolation_parameters = {
    "model" : "birdnet",
    "output_path" : "",
    "lat" : 0.0,
    "lon" : 0.0,
    "week" : 0,
    "overlap" : 0.0,
    "sensitivity" : 0.0,
    "min_conf" : 0.0,
    "custom_list" : "",
    "filetype" : "",
    "num_predictions" : 0,
    "write_to_csv" : False,
    "verbose" : True
}

The Microfaune isolation_parameters dictionary is as follows:

isolation_parameters = {
    "model" : "microfaune",
    "technique" : "",
    "threshold_type" : "",
    "threshold_const" : 0.0,
    "threshold_min" : 0.0,
    "window_size" : 0.0,
    "chunk_size" : 0.0,
    "verbose" : True
}

The technique parameter can be: Simple, Stack, Steinberg, and Chunk. This input must be a string in all lowercase.
The threshold_type parameter can be: median, mean, average, standard deviation, or pure. This input must be a string in all lowercase.

The remaining parameters are floats representing their respective values.

The TweetyNET isolation_parameters dictionary is as follows:

isolation_parameters = {
    "model" : "tweetynet",
    "tweety_output": False,
    "technique" : "",
    "threshold_type" : "",
    "threshold_const" : 0.0,
    "threshold_min" : 0.0,
    "window_size" : 0.0,
    "chunk_size" : 0.0,
    "verbose" : True
}

The tweety_output parameter sets whether to use TweetyNET's original output or isolation techniques. If set to False, TweetyNET will use the specified technique parameter.

The Foreground-Background Separation technique isolation_parameters is as follows:

isolation_parameters = {
   "model" : "fg_bg_dsp_sep",
   "technique" : "",
   "threshold_type" : "",
   "threshold_const" : 0.0,
   "kernel_size" : 4,
   "power_threshold" : 0.0,
   "threshold_min" : 0.0,
   "verbose" : True
}

The kernel_size parameter is an integer n that specifies the size of the kernel used in the morphological opening process. For the opening of the binary mask, this will be an n by n kernel. For the processing of the indicator vector, this will be a 1 by n kernel.
The power_threshold parameter is a float that determines by how many times the power of a pixel must be larger than its row and column medians. For example, if this value is set to 3.0, each pixel will have to have a power of at least 3 times its row and column medians to be included in the binary mask.

The Template Matching isolation_parameters is as follows:

isolation_parameters = {
   "model" : "template_matching",
   "template_path" : "",
   "technique" : "",
   "window_size" : 0.0,
   "threshold_type" : "",
   "threshold_const" : 0.0,
   "cutoff_freq_low" : 0,
   "cutoff_freq_high" : 0,
   "verbose" : True,
   "write_confidence" : True
}

The template_path parameter should be set to the path to the template to use, stored as a .wav file.
The window_size parameter should be a float corresponding to the length (in seconds) of the template. This is so the Steinberg isolation can correctly convert the local score array into labels.
cutoff_freq_low and cutoff_freq_high should be integer values. If both are defined, both signal and template will be put through a butterworth bandpass filter set to those cutoff frequencies. This is recommended to ensure that the signal and template are the same shape on the frequency axis.
write_confidence determines whether or not the confidence of each label is written to the array, determined by the max score in the local score array for each label.

annotation_post_processing.py file

`annotation_chunker`

Found in annotation_post_processing.py

This function converts a Kaleidoscope-formatted Dataframe containing annotations to uniform chunks of chunk_length. Drops any annotation that less than chunk_length.

Parameter	Type	Description
`kaleidoscope_df`	Dataframe	Dataframe of automated or human labels in Kaleidoscope format
`chunk_length`	int	Duration in seconds of each annotation chunk

This function returns a dataframe with annotations converted to uniform second chunks.

Usage: annotation_chunker(kaleidoscope_df, chunk_length)

IsoAutio.py file

`write_confidence`

Found in IsoAutio.py

This function adds a new column to a clip dataframe that has had automated labels generated, going through all of the annotations and adding to said row a confidence metric based on the maximum value of said annotation.

Parameter	Type	Description
`local_score_arr`	list of floats	Array of small predictions of bird presence.
`automated_labels_df`	Pandas Dataframe	Dataframe of labels derived from the local score array using the `isolate()` function.

This function returns a Pandas Dataframe with an additional column of confidence scores from the local score array.

Usage: write_confidence(local_score_arr, automated_labels_df)

`isolate`

Found in IsoAutio.py

This function is the wrapper function for audio isolation techniques, and will call the respective function based on isolation_parameters "technique" key.

Parameter	Type	Description
`local_scores`	list of floats	Local scores of the audio clip as determined by Microfaune Recurrent Neural Network.
`SIGNAL`	list of ints	Samples that make up the audio signal.
`SAMPLE_RATE`	int	Sampling rate of the audio clip, usually 44100.
`audio_dir`	string	Directory of the audio clip.
`filename`	string	Name of the audio clip file.
`isolation_parameters`	dict	Python Dictionary that controls the various label creation techniques.

This function returns a dataframe of automated labels for the audio clip based on the passed in isolation technique.

Usage: isolate(local_scores, SIGNAL, SAMPLE_RATE, audio_dir, filename, isolation_parameters)

`threshold`

Found in IsoAutio.py

This function takes in the local score array output from a neural network and determines the threshold at which we determine a local score to be a positive ID of a class of interest. Most proof of concept work is dedicated to bird presence. Threshold is determined by "threshold_type" and "threshold_const" from the isolation_parameters dictionary.

Parameter	Type	Description
`local_scores`	list of floats	Local scores of the audio clip as determined by Microfaune Recurrent Neural Network.
`isolation parameters`	dict	Python Dictionary that controls the various label creation techniques.

This function returns a float representing the threshold at which the local scores in the local score array of an audio clip will be viewed as a positive ID.

Usage: threshold(local_scores, isolation_parameters)

`steinberg_isolate`

Found in IsoAutio.py

This function uses the technique developed by Gabriel Steinberg that attempts to take the local score array output of a neural network and lump local scores together in a way to produce automated labels based on a class across an audio clip. It is called by the isolate function when isolation_parameters['technique'] == steinberg.

Parameter	Type	Description
`local_scores`	list of floats	Local scores of the audio clip as determined by Microfaune Recurrent Neural Network.
`SIGNAL`	list of ints	Samples that make up the audio signal.
`SAMPLE_RATE`	int	Sampling rate of the audio clip, usually 44100.
`audio_dir`	string	Directory of the audio clip.
`filename`	string	Name of the audio clip file.
`isolation_parameters`	dict	Python Dictionary that controls the various label creation techniques.
`manual_id`	string	controls the name of the class written to the pandas dataframe

This function returns a dataframe of automated labels for the audio clip.

Usage: steinberg_isolate(local_scores, SIGNAL, SAMPLE_RATE, audio_dir, filename,isolation_parameters, manual_id)

`simple_isolate`

Found in IsoAutio.py

This function uses the technique suggested by Irina Tolkova and implemented by Jacob Ayers. Attempts to produce automated annotations of an audio clip based on local score array outputs from a neural network. It is called by the isolate function when isolation_parameters['technique'] == simple.

Parameter	Type	Description
`local_scores`	list of floats	Local scores of the audio clip as determined by Microfaune Recurrent Neural Network.
`SIGNAL`	list of ints	Samples that make up the audio signal.
`SAMPLE_RATE`	int	Sampling rate of the audio clip, usually 44100.
`audio_dir`	string	Directory of the audio clip.
`filename`	string	Name of the audio clip file.
`isolation_parameters`	dict	Python Dictionary that controls the various label creation techniques.
`manual_id`	string	controls the name of the class written to the pandas dataframe

This function returns a dataframe of automated labels for the audio clip.

Usage: simple_isolate(local_scores, SIGNAL, SAMPLE_RATE, audio_dir, filename,isolation_parameters, manual_id)

`stack_isolate`

Found in IsoAutio.py

This function uses a technique created by Jacob Ayers. Attempts to produce automated annotations of an audio clip based on local score array outputs from a neural network. It is called by the isolate function when isolation_parameters['technique'] == stack.

Parameter	Type	Description
`local_scores`	list of floats	Local scores of the audio clip as determined by Microfaune Recurrent Neural Network.
`SIGNAL`	list of ints	Samples that make up the audio signal.
`SAMPLE_RATE`	int	Sampling rate of the audio clip, usually 44100.
`audio_dir`	string	Directory of the audio clip.
`filename`	string	Name of the audio clip file.
`isolation_parameters`	dict	Python Dictionary that controls the various label creation techniques.
`manual_id`	string	controls the name of the class written to the pandas dataframe

This function returns a dataframe of automated labels for the audio clip.

Usage: stack_isolate(local_scores, SIGNAL, SAMPLE_RATE, audio_dir, filename,isolation_parameters, manual_id)

`chunk_isolate`

Found in IsoAutio.py

This function uses a technique created by Jacob Ayers. Attempts to produce automated annotations of an audio clip based on local score array outputs from a neural network. It is called by the isolate function when isolation_parameters['technique'] == chunk.

Parameter	Type	Description
`local_scores`	list of floats	Local scores of the audio clip as determined by Microfaune Recurrent Neural Network.
`SIGNAL`	list of ints	Samples that make up the audio signal.
`SAMPLE_RATE`	int	Sampling rate of the audio clip, usually 44100.
`audio_dir`	string	Directory of the audio clip.
`filename`	string	Name of the audio clip file.
`isolation_parameters`	dict	Python Dictionary that controls the various label creation techniques.
`manual_id`	string	controls the name of the class written to the pandas dataframe

This function returns a dataframe of automated labels for the audio clip.

Usage: chunk_isolate(local_scores, SIGNAL, SAMPLE_RATE, audio_dir, filename,isolation_parameters, manual_id)

`generate_automated_labels`

Found in IsoAutio.py

This function generates labels across a folder of audio clips determined by the model and other parameters specified in the isolation_parameters dictionary.

Parameter	Type	Description
`audio_dir`	string	Directory with wav audio files
`isolation_parameters`	dict	Python Dictionary that controls the various label creation techniques.
`manual_id`	string	controls the name of the class written to the pandas dataframe
`weight_path`	string	File path of weights to be used by the RNNDetector for determining presence of bird sounds.
`normalized_sample_rate`	int	Sampling rate that the audio files should all be normalized to.
`normalize_local_scores`	boolean	Set whether or not to normalize the local scores.