Elaenia is a collection of transfer learning experiments for identifying bird species in audio recordings. In all experiments the classifier is trained on a lower-dimensional embedding of the input audio data, computed using a publicly-released model (either VGGish or BirdNet).
The experiments are implemented using a bare-bones ML training pipeline library called sylph, which I wrote for this project. Sylph allows a pipeline to be defined, comprising a series of data transformation / feature extraction steps, with a final step to train the classifier.
The following Sylph code defines a pipeline which performs preliminary transformations of the raw audio data, computes the spectrogram, computes the VGGish embeddings, trains a classifier, and computes metrics on the test set:
from sylph.learners.svm import SVMLearner
from sylph.pipeline import Compose
from sylph.pipeline import TrainingPipeline
from sylph.transforms.audio import Audio2Audio16Bit
from sylph.transforms.pca import PCA
from sylph.transforms.vggish import Audio2Spectrogram
from sylph.transforms.vggish import Spectrogram2VGGishEmbeddings
pipeline = TrainingPipeline(
transform=Compose(
[
Audio2Audio16Bit(normalize_amplitude=True),
Audio2Spectrogram(),
Spectrogram2VGGishEmbeddings(),
PCA(whiten=True),
]
),
learn=SVMLearner(),
)
output = pipeline.run(dataset)
metrics = pipeline.get_metrics(dataset, output)
Melodious Warbler (H. polyglotta) and Icterine Warbler (H. icterina) are similar members of the genus Hippolais that come into contact in a narrow zone in western Europe. Results of classifying audio samples from xeno-canto are illustrated below.
git clone git@github.com:dandavison/elaenia.git
cd elaenia
make init
source env.sh
make test
Mountain Elaenia (Elaenia frantzi) by Daniel Uribe.