<a href="https://colab.research.google.com/github/BenUCL/Reef-acoustics-and-AI/blob/main/Tutorial/2-Feature_Extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Machine learning with coral reef soundscape data**

This notebook is a supporting tutorial for the study **Unlocking the soundscape of coral reefs with artificial intelligence: pretrained networks and unsupervised learning win out** by [Williams et al (2024a)](https://www.biorxiv.org/content/10.1101/2024.02.02.578582v1). If you use any of these methods after reading this then please cite the article.

In this publication we recommend combining pretrained neural networks with unsupervised learning for analysing soundscape ecology.

## What this notebook does:

1. Set up: access some sample data and install the required packages we need.
2. Extract features from the audio data using the SurfPerch pretrained neural network.

## **SurfPerch: A pretrained neural network fine tuned to coral reefs**

In the associated study to this tutorial we used VGGish. However, here we will use SurfPerch, a newly developed pretrained neural network fine tuned to coral reefs which we created in a collaboration with Google DeepMind. It was created and rigorously tested on audio data from 16 unique datasets across 12 countries. You can read more about the network in its supporting research article **Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics**, [Williams et al (2024b)](https://arxiv.org/abs/2404.16436).

SurfPerch can also be used to rapidly identify individual sounds in your data, as opposed to the whole soundscape approach presented in **Unlocking the soundscape of coral reefs with artificial intelligence** by [Williams et al (2024a)](https://www.biorxiv.org/content/10.1101/2024.02.02.578582v1). See a full tutorial on identifying individual sounds [here](https://github.com/BenUCL/surfperch/blob/surfperch/SurfPerch_Demo_with_Calling_in_Our_Corals.ipynb).

## **Our sample data**

We'll use a small sample dataset for this tutorial. This data consists of 262 audio files from healthy, degraded and restored coral reefs in Indonesia. These reefs are part of the worlds largest coral reef restoration program [buildingcoral.com](https://www.buildingcoral.com/). See [Williams et al (2022) ](https://doi.org/10.1016/j.ecolind.2022.108986) for more detail on this audio.

# **Step 1: Set up**

In [None]:
#@title Import packages
import os # for handling files and directories
import librosa # for audio processing
import tensorflow as tf # for machine learning
import tensorflow_hub as hub # for machine learning
import numpy as np # for numerical processing
import pandas as pd # for handling dataframes
from tqdm import tqdm # for progress bar

First, we'll define the directories for storing the audio files, model weights, and extracted features. 
You can modify these paths as needed.

In [None]:
#@title Set all filepaths
import os

# Directory containing audio files. If you have your own data, change this path.
AUDIO_DIR = 'audio'

# Directory to store the pre-trained model weights.
MODEL_DIR = 'model'

# Directory where the extracted features (as a CSV file) will be saved.
OUTPUT_DIR = 'outputs'

# Create the output directory if it doesn't exist.
if not os.path.exists(OUTPUT_DIR):
  os.mkdir(OUTPUT_DIR)

The following cell downloads a sample audio dataset.
If you want to use your own audio data, **do not run this cell**.
Instead, ensure your audio files are in the directory specified by `AUDIO_DIR`.
The audio files should be in WAV format.

In [None]:
%%bash -s "$AUDIO_DIR"

# Download sample audio data if it doesn't exist.
if [ ! -d $1 ]; then
  # Create a temporary directory.
  tmp_dir=`mktemp -d`

  # Download the sample audio data archive from Zenodo.
  wget --progress=bar:force:noscroll -P "${tmp_dir}" "https://zenodo.org/records/14841479/files/tutorial_sample_data.zip"

  # Extract the downloaded archive.
  unzip -q "${tmp_dir}/tutorial_sample_data.zip" -d "${tmp_dir}/data"

  # Create the target directory for the audio data.
  mkdir $1

  # Move the extracted sample data to the target directory.
  mv ${tmp_dir}/data/sample_data/* $1

  # Clean up the temporary directory.
  rm -r ${tmp_dir}
fi

The next cell downloads the pre-trained SurfPerch model weights.
Feel free to use your own pre-trained model if you have one.
You may need to make some adjustments to the model loading and usage sections if you use a different model.

In [None]:
%%bash -s "$MODEL_DIR"
# Download model weights if they don't exist.
if [ ! -d $1 ]; then
  # Create a temporary directory.
  tmp_dir=`mktemp -d`
  
  # Download the model weights archive from Zenodo.
  wget --progress=bar:force:noscroll -P "${tmp_dir}" "https://zenodo.org/records/11071202/files/SurfPerch_v1.0.zip"

  # Extract the downloaded archive.
  unzip -q "${tmp_dir}/SurfPerch_v1.0.zip" -d "${tmp_dir}/model"

  # Create the target directory for the model weights.
  mkdir $1

  # Move the extracted model weights to the target directory.  
  # Specifically, the savedmodel directory.
  mv ${tmp_dir}/model/SurfPerch_v1.0/savedmodel/* $1

  # Clean up the temporary directory.
  rm -r ${tmp_dir}
fi

### Load the SurfPerch neural network model

Check we have the saved model folder (`model`) in GDrive. In it you should see:

```bash
 assets	ckpt.txt  fingerprint.pb  saved_model.pb  variables
```

In [None]:
# Check model is present
!ls 'model'

In [None]:
# We will load the pretrained neural net as 'model'
model = tf.saved_model.load(MODEL_DIR)

# **Step 2: Extract features with the neural net**

Now we run the main for loop to iterate over each file and extract features from these using the pretrained neural network.

The results will be saved to a 'pandas dataframe', similar to a dataframe in R, and, to the `extracted_features.csv` which should appear in the file tab on the left.

In [None]:
#@title Define helper functions for inference
def get_sample_rate(file_path):
    audio, sample_rate = librosa.load(file_path, sr=None)
    return sample_rate


def resample_and_split_audio(file_path, original_sr, target_sr=32000, segment_duration=5):
    audio, _ = librosa.load(file_path, sr=original_sr)  # Load with original sample rate
    audio = librosa.resample(audio, orig_sr=original_sr, target_sr=target_sr)  # Resample to 32kHz
    segments = []

    segment_length = target_sr * segment_duration
    total_segments = len(audio) // segment_length

    for i in range(total_segments):
        start = i * segment_length
        end = start + segment_length
        segments.append(audio[start:end])

    return segments


def process_audio_files(audio_dir, model):
    rows_list = []
    original_sr = None

    # Loop through every file in audio_dir
    for filename in tqdm(os.listdir(audio_dir), desc="Processing audio files"):
        if filename.lower().endswith('.wav'):
            file_path = os.path.join(audio_dir, filename)

            # Check if the sample rate has not been set yet
            if original_sr is None:
                original_sr = get_sample_rate(file_path)  # Get the sample rate from the first file

            try:
                segments = resample_and_split_audio(file_path, original_sr=original_sr)

                for i, segment in enumerate(segments):
                    # Model expects batch dimension, so use np.newaxis to add it
                    logits, embeddings = model.infer_tf(segment[np.newaxis, :])

                    embedding = embeddings.numpy()[0]

                    embedding_index = i + 1
                    row_data = {'filename': filename, 'embedding_index': embedding_index}
                    for j, feature in enumerate(embedding):
                        row_data[f'feature_{j}'] = feature
                    rows_list.append(row_data)
            except Exception as e:
                print(f"An error occurred while processing file: {filename}. Error: {e}")

    features_df = pd.DataFrame(rows_list)
    return features_df

## Run feature extraction and save results to a csv

This will run orders of magnitudes faster if using GPU instance of Google colab. Check your runtime type if unsure.

In [None]:
# Extract the features
features_df = process_audio_files(AUDIO_DIR, model)

# Save results to output dir
features_df_path = os.path.join(OUTPUT_DIR, 'surfperch_feature_embeddings.csv')
features_df.to_csv(features_df_path, index=False)

Check your working directory for the `outputs/` folder.
Inside, you'll find the `surfperch_feature_embeddings.csv` file.
If you're using Google Colab, you'll need to download this file to your computer or copy it to your Google Drive.

In [19]:
#@title Take a peek at the features dataframe

features_df

Unnamed: 0,filename,embedding_index,feature_0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,...,feature_1270,feature_1271,feature_1272,feature_1273,feature_1274,feature_1275,feature_1276,feature_1277,feature_1278,feature_1279
0,D.SaF3.1355.805322778.180829.1.5.wav,1,0.095266,-0.043760,-0.186835,-0.065721,-0.135039,0.027893,0.030612,0.336429,...,0.009653,-0.008715,-0.053117,-0.002967,-0.015146,-0.008182,0.339541,0.042919,0.064481,0.010667
1,D.SaF3.1355.805322778.180829.1.5.wav,2,-0.054386,-0.070658,-0.062030,-0.026162,-0.055553,0.048730,0.084917,0.261091,...,0.033196,-0.021291,-0.043879,0.017621,-0.015038,0.005198,-0.079819,0.041414,0.061947,0.017771
2,D.SaF3.1355.805322778.180829.1.5.wav,3,-0.013306,-0.028767,-0.042732,0.045478,-0.086306,0.041371,0.079206,0.216338,...,0.050451,0.016551,-0.128548,0.028641,-0.015715,0.001647,-0.069922,0.030726,0.002214,0.012486
3,D.SaF3.1355.805322778.180829.1.5.wav,4,0.128021,-0.055525,0.031539,-0.026539,-0.077078,0.055511,0.033902,0.314529,...,0.015063,0.009737,-0.015940,0.034475,-0.019154,0.023457,0.255913,0.038950,0.007818,0.011662
4,D.SaF3.1355.805322778.180829.1.5.wav,5,0.035347,-0.069571,-0.005956,-0.024740,-0.112405,0.024988,0.090412,0.186201,...,0.039776,0.023733,0.054703,0.050067,-0.005310,0.010681,-0.042922,0.042997,0.055794,0.012468
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3139,H.BaF3.0915.1678278701.180829.2.30.wav,8,0.052924,-0.028343,-0.041132,-0.069430,-0.073702,0.088496,-0.091703,-0.036165,...,-0.006155,-0.065905,-0.150666,0.093934,-0.009525,0.030543,-0.086864,0.045006,0.010965,0.037594
3140,H.BaF3.0915.1678278701.180829.2.30.wav,9,-0.059150,-0.055043,-0.055807,0.073265,-0.052938,0.115610,0.009453,0.115414,...,0.041878,-0.031102,-0.159292,-0.062542,-0.017308,0.012166,-0.128742,0.028227,-0.013661,0.013006
3141,H.BaF3.0915.1678278701.180829.2.30.wav,10,-0.029610,-0.038658,-0.108122,-0.061626,0.116923,0.025992,0.068813,0.105473,...,-0.005419,-0.067728,-0.186287,0.004383,-0.026162,0.011837,-0.088420,0.026192,0.010454,-0.012032
3142,H.BaF3.0915.1678278701.180829.2.30.wav,11,-0.042929,-0.036163,-0.095022,-0.037912,-0.041864,0.154841,-0.023873,0.189126,...,0.013159,-0.125300,-0.210276,0.023335,-0.007543,0.031137,0.002487,0.033958,-0.011404,0.001926


## **Finished!**

You should see a results table that contains:
1. All the audio files in our sample data under the 'filename' column.
2. SurfPerch cuts audio files into 5s chunks, the chunk which each rows corresponds to is under 'embedding_index'.
3. There should be feature columns running from feature_0 to feature_1279. Each 5s chunk is now represented by these highly informative (to a machine) feature embeddings.