# **Classification of Birds based on audio of their call**

# § I: Introduction

Bird species classification based on audio recordings has significant implications for ecological monitoring, biodiversity studies, and conservation efforts, as well as being applicable and interesting to a population of bird watchers and nature lovers. This project focuses on developing a robust machine learning pipeline capable of identifying bird species from basic metadata recordable by a cell phone recording. By leveraging advanced audio signal processing techniques and modern classification algorithms, the project aims to create a reliable and scalable solution for bird call recognition.

## A. Overview of project goals

The primary objectives of this project are:

### 1. Data Collection and Integration:
Analyze, preprocess, and merge audio datasets containing bird calls to create a comprehensive, high-quality dataset suitable for training machine learning models.
### 2. Feature Engineering:
Extract meaningful features from raw audio signals, including frequency-domain representations, time-frequency representations, and repcrocess the signal to make the target sound the loudest part of the signal.
### 3. Model Training and Evaluation:
Train and compare multiple classification models, including neural networks and traditional classification algorithms, to identify the most effective approach for bird call classification.
### 4. Practical Application:
Build a functional tool capable of processing new audio files to classify bird species throughout an audio file.

# § II. Data Preperation and Exploration

## A. Initial Setup

### 1. Building Virtual Enviroment
The first step was to establish a virtual enviroment with the required libraries. A virtual environment was created with `micromamba`  to ensure that all dependencies, libraries, and tools used in the project are isolated from the host system. This approach prevents version conflicts and facilitates seamless collaboration and deployment.

```bash
micromamba create ./.conda conda pydub ffmpeg cupy pytorch
```

### 2. Importing Libraries
From these libraries, and from some of the standard library we can now import the specific parts of the library that we need for this project.

In [4]:
import time, os, asyncio, sys
import pandas as pd
import numpy as np
import scipy as sp
import cupy as cp
import cupyx.scipy.fft as cufft
import cupyx.scipy.signal as cusig
from IPython.display import display, HTML
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import cross_val_score
from sklearn.neural_network import MLPClassifier
import matplotlib.pyplot as plt
from pydub import AudioSegment
from pydub.utils import mediainfo

ModuleNotFoundError: No module named 'cupy'

#### a) Overview of Libraries and their purposes

##### (i) `pydub`
`pydub` is a library used for simple processing of audio files. It provides a simple means for decoding mp3 files in python, and manipulating the audio data in a relatively basic way. `pydub` provides the ability to split up audio files as if they were python lists where the index of the list is the number of miliseconds from the begining of the file. With the addition of another library called `simpleaudio` audio can be played directly with python, but for this project all audio files will exported and displayed with html in markdown cells.
```python
# loads audio file
sound = AudioSegment.from_file(filename, format='wav')

# cuts the sound to only the first second of the file
sound = sound[:1000]

# returns the signal array for more nuanced signal manipulation
signal = sound.get_array_of_samples()

# returns a list of mono sounds for each channel in the audio file
sounds = sound.split_to_mono()
```
Metadata of audio files is also accessed with pydub either through the `pydub.AudioSegment` object or through the `pydub.utils.mediainfo` function.
```python
# the length of the audio file in miliseconds
len(sound)

# the sample rate of the audio file
sound.frame_rate

# the number of channels in the audio file
sound.channels

# returns the name of the format used to store the audio file
mediainfo(filename)['format_name']

# returns the sample format, needed to get the bit depth of the file
mediainfo(filename)['sample_fmt']
```


##### (ii) `ffmpeg`
`ffmpeg` is a powerful tool for handling a wide variety of media formats, including audio and video. Within this environment, `ffmpeg` serves as the backend for `pydub` to decode `.mp3` files. By default, when an `.mp3` file is loaded using `pydub`, `ffmpeg` is called automatically to handle the decoding. This integration ensures that the project can work seamlessly with compressed audio formats like `.mp3`.
```python
# decodes codec with ffmpeg and returns the AudioSegment object as normal
sound = AudioSegment.from_file(filename, format='mp3')
```

##### (iii) `cupy`
`cupy` is a GPU-accelerated library that provides a drop-in replacement for the `numpy` and `scipy` libraries, leveraging NVIDIA's CUDA toolkit for GPU-accelerated parallel processing. This allows for significant performance improvements when performing computationally intensive tasks, such as Fourier transformations, matrix operations, and large-scale data manipulations.

In this project, `cupy` was particularly useful for processing large audio datasets and accelerating tasks like feature extraction and signal transformations, where CPU-based computation would be too slow.
```python
import cupy as cp

# Create a large random array on the GPU
gpu_array = cp.random.rand(1000000)

# Perform fast computations on the GPU
mean_value = cp.mean(gpu_array)
```
Arrays in match the syntax of `numpy.ndarray` objects and `scipy` methods can be called via the `cupyx.scipy` library.

**CUDA-Specific Features**

While `cupy` emulates `numpy` and `scipy` functionality, it also offers CUDA-specific tools to optimize memory and computation. These are critical for managing GPU resources efficiently when working with large datasets or computationally intensive tasks.

 - **Device Management:** `cupy` supports multi-GPU setups and allows explicit control over which GPU device is used:

In [None]:
gpu = cp.cuda.Device(0)  # Access the first GPU
gpu.use()  # Set it as the active GPU

 - **Memory Management**: Memory allocation and deallocation are handled through memory pools to reduce overhead during GPU memory operations. The default memory pool can be configured to limit VRAM usage:

In [None]:
# Set the memory pool limit to 75% of the total VRAM
vram_pool = cp.get_default_memory_pool()
with gpu:
    vram_pool.set_limit(fraction=0.75)

# Clear the memory pool
vram_pool.free_all_blocks()

    This method helps ensure explicit memory errors occur and stop the program speeding up debugging. `cupy` will run without manual allocation of ram it is just more likely to have problems related to memory overflow without explcitly explaining the cause of the progam error.

 - **FFT Plans with `cupy.cuda.cufft`:** `cupy` provides access to CUDA's FFT library (`cuFFT`), which allows for pre-planning and optimizing FFT operations. Each FFT plan is formed based on the shape of the input array. To improve performance an prevent recalculating FFT plans, the resulting FFT plan is automatically cached in the default memory pool. To perform an FFT without caching the FFT plan, you can manually create a plan and pass the plan into the FFT, or you can clear the cache of the FFT plan whenever you like with a simple syntax.
```python
with cufft.get_fft_plan(signal, value_type='R2C'):  # Create FFT plan and pass into FFT function
    gpu_fft = cufft.rfft(signal, overwrite_x=True)  # This does not cache FFT plan

# or clear the cache at any point
cp.fft.config.clear_plan_cache()
```


##### (iv) `pytorch`
`pytorch` is a widely used library for building and training neural networks chosen because of its simple integration with `numpy` arrays. Most of the models we will train in this notebook can be trained with libraries from the anaconda collection but `pytorch` provides an ability to make customizable networks, and networks of different forms, like Convolutional Neural Networks (CNN) and Reccurent Neural Networks (RNN).

## B. Dataset Exploration and Enhancement

### 1. Initial Dataset Assesment

#### a) Downloading and Viewing Primary Dataset
The first data source is from a [kaggle competition](https://www.birds.cornell.edu/clementschecklist/introduction/updateindex/october-2023/download/). This dataset provides preorganized bird audio data, structured as `.mp3` files grouped by bird species. Each species is identified by a unique eBird code, a standard maintained by the Cornell Lab of Ornithology.

This can also be downloaded by running this code after accepting the terms of the competition on a valid kaggle account:
```bash
kaggle competitions download -c birdsong-recognition
```
The dataset directory structure is as follows:
```log
.
├── example_test_audio
│  ├── BLKFR-10-CPL_20190611_093000.pt540.mp3
│  └── ORANGE-7-CAP_20190606_093000.pt623.mp3
├── example_test_audio_metadata.csv
├── example_test_audio_summary.csv
├── sample_submission.csv
├── test.csv
├── train.csv
└── train_audio
   ├── aldfly
   │  ├── XC2628.mp3
   │  └── ...
   ├── ameavo
   │  ├── XC99571.mp3
   │  └── ...
   └── ...
```
Key components of the dataset:
 - `train_audio/`: Audio files organized by bird species (eBird codes as folder names).
 - `train.csv`: Metadata about each audio file, including eBird codes, audio duration, and recording location.


In [None]:
df = pd.read_csv('data/birdsong-recognition/train.csv')
print(df.columns)
df.head()

The dataset lacks full taxonomy information, which limits the ability to classify birds at higher taxonomic levels (e.g., family or order) which may be easier targets than individual species.

### 2. Acquisition and Integration of Additional Data

#### a) Acquiring Secondary Dataset and Verifying Dataset Compatibility
To supplement the primary dataset, a taxonomy dataset was downloaded from the [Cornell Lab of Ornithology](https://www.birds.cornell.edu/clementschecklist/introduction/updateindex/october-2023/download/). This dataset includes hierarchical taxonomic information for each bird species, such as family, order, and genus, as well as a human understandable species group.

In [None]:
taxonomy = pd.read_csv('data/ebird_taxonomy_v2023.csv')
print(taxonomy.columns)
taxonomy.head()

To ensure compatibility between datasets:
 - **Ensuring uniqueness**: Verified that the SPECIES_CODE column in the taxonomy dataset contained unique values

In [None]:
total_species_codes = len(taxonomy['SPECIES_CODE'])
unique_species_codes = taxonomy['SPECIES_CODE'].nunique()
print('All values in SPECIES_CODE are unique:', total_species_codes == unique_species_codes)
del total_species_codes, unique_species_codes

 - **Ensure Consistent use of ebird code**: Verified that where the `ebird_code` matches between the datasets, the species also matches.

###### (This is happening by merging the datasets because it is considerably faster than iterating between the two)

In [None]:
# Merge the taxonomy data with the training data
merged_df = df.merge(taxonomy, left_on='ebird_code', right_on='SPECIES_CODE', how='left', indicator=True)
# Get the ebird_codes that did not merge or have a scientific name mismatch between the two datasets
bad_codes = merged_df[(merged_df['_merge'] == 'left_only') | (merged_df['sci_name'] != merged_df['SCI_NAME'])]['ebird_code'].unique()

# Iterate through the bad codes and print out the scientific names that do not match
for code in bad_codes:
    if code not in taxonomy['SPECIES_CODE'].values:
        print(code, "not found in taxonomy")
    else:
        birdsong_df_species = merged_df[merged_df['ebird_code'] == code]['sci_name'].unique()
        taxonomy_species = taxonomy[taxonomy['SPECIES_CODE'] == code]['SCI_NAME'].unique()
        if len(birdsong_df_species) == 1 and len(taxonomy_species) == 1:
            print(f'{code}: {birdsong_df_species[0]} != {taxonomy_species[0]}')
        if len(birdsong_df_species) == 0:
            print(f'{code}: No scientific name in training data')
        if len(taxonomy_species) == 0:
            print(f'{code}: No scientific name in taxonomy data')
        if len(birdsong_df_species) > 1:
            print(f'{code}: Multiple scientific names in training data')
        if len(taxonomy_species) > 1:
            print(f'{code}: Multiple scientific names in taxonomy data')

del merged_df, bad_codes

#### b) Merging The Datasets

## C. Data Cleaning and Preprocessing

### 1. Data Cleaning

#### a) Removing Unnecessary Columns

#### b) Removing Outliers

#### c) Removing Problematic Sample Rates

#### d) Audio Metadata Verification

### 2. Data Preprocessing

#### a) Normalizing Date and Time values

#### b) Normalizing Latitude, Longitude and Elevation values

### 3. Final Columns

#### a) Explanation for Feature Columns

#### b) Explaination for Target Columns

#### c) Explination for Remaining Columns

# § III. Establishing Utility Functions

## A. Simple Aliases

## B. Data Conversions

## C. Process Monitoring

# § IV. Audio Processing Techniques

## A. Audio Metering and Signal Analysis

### 1. Understanding Different Scales

#### a) Decibel (dB)

#### b) Frequency Scaling

##### (i) Mel Scale

##### (ii) Bark Scale

##### (iii) Equivalent Rectangular Bandwidth Scale (ERB)

##### (iv) Visualizing Different Scales

### 2. Plotting Waveforms

#### a) RMS Calculations

### 3. Understanding Different Signal Transformations

#### a) The Fourier Transformation

##### (i) Understanding The Complex Result

##### (ii) Advantage of GPU Parallelization when Performing Fourier Transformations

##### (iii) Grouping Frequencies Together

##### (iv) Example Frequency Spectrums

#### b) The Short-Time Fourier Transformation (STFT)

##### (i) Advantage of GPU Parallelization when Performing Short-Time Fourier Transformations

##### (ii) The Inverse Short-Time Fourier Transformation

##### (iii) The Constant OverLap Add (COLA) Constraint

##### (iv) The Nonzero OverLap Add (NOLA) Constraint

##### (v) Grouping Frequencies Together

##### (vi) Example Spectrograms

#### c) The Hilbert Transformation

##### (i) Understanding the Analytic Signal

##### (ii) Advantage of GPU Parallelization when Performing Hilbert Transformations

##### (iii) Extracting the Real Signal

##### (iv) Extracting the Hilbert Envelope

##### (v) Extracting the Instantaneous Phase Angle

##### (vi) Example Plots of Hilbert Envelopes

##### (vii) Example Plots of Signal Phase Angles

## B. Generation of Simple Frequency Representation Dataset

### 1. Performing the Bulk Calculation

### 2. Visualizations of Resulting Data

#### a) Individual Frequency Spectrums

#### b) Heatmaps of Frequency Spectrums

## C. Audio Signal Processing Techniques

### 1. Reduction of the Noise Floor

#### a) Issues Caused by Noise Floor

#### b) Method of Noise Reduction

#### c) Noise Reduction Examples

### 2. Reduction of Momentary Clicks

#### a) Issued Caused by Clicks

#### b) Means of Detecting Clicks

##### (i) Peak Detection

##### (ii) Calcuating the Click-Sensitive Signal

#### c) Reducing Magnitude around Clicks

### 3. Retuction Of Transient Response

#### a) Introduction to Signal Transients

#### b) Issues Caused by Transients

#### c) Calculating the Transient-Sensitive Signal

#### d) Reducing Magnitude around Transients

### 4. Segmentation of Audio into Normalized Windows

## D. Generation of Filtered Frequency Representation Dataset

## E. Generation of Filtered Time-Frequency Representation Dataset

# § V. Model Training and Evaluation

## A. Simple Frequency Representation Dataset

## B. Filtered Frequency Representation Dataset

# § V. Advanced Model Training and Species Detection

## A. Training a Convolutional Neural Network (CNN) Against the Filtered Time-Frequency Representation Dataset

## B. Model Training with Species as a Target

## C. Training a Convolutional Neural Network (CNN) with Species as a Target

# § VI. Practical Application and Use of Given Test Data

## A. Recreating the Audio Processing Pipeline

## B. Making the Processing Function

## C. Testing Against the Given Test Data