### Introduction

Systems able to recognize sounds directly from audio recordings are widely applicable. In this project,
you’ll attempt to create an audio tagging system by extracting audio clip image representations and then
using computer vision-based classification models. You can consider constructing your system using a
relatively large-scale competition data set and then evaluate it on its ability to recognize and distinguish more specialized sounds on locally generated recordings.

### Goals

1. Investigate and construct models for automatic audio tagging of noisy recordings.
2. Adapt this to smaller data sets of audio recordings, either by using a setup motivated by your
above findings or by transfer learning.
3. Construct an audio tagging application.

### Methods and materials

To achieve Goal 1 of the project, you can, for example, use the FSDKaggle2018 used in the Freesound
General-Purpose Audio Tagging Challenge on Kaggle. There are 41 categories of audio clips, and the
goal is to classify each clip. For the second objective, you can look for a data set on your own or
construct one yourself.

As part of the project, you should investigate ways to do data augmentation for audio.
You’ll make use of a variety of Python audio libraries, e.g., librosa. You should also look into fastxtend, a library built on top of fastai. To construct the application, you’re free to use any solution you know or want to investigate. A natural starting point is the deployment solutions used in the fastai course.

Consider not converting audio to images but instead setting up an audio classification framework that
operates on audio representations of the data.

In [8]:
pip install fastaudio

Collecting fastaudio
  Downloading fastaudio-1.0.2-py2.py3-none-any.whl (23 kB)
Collecting fastai==2.3.1 (from fastaudio)
  Downloading fastai-2.3.1-py3-none-any.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.6/194.6 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hINFO: pip is looking at multiple versions of fastaudio to determine which version is compatible with other requirements. This could take a while.
Collecting fastaudio
  Downloading fastaudio-1.0.1-py2.py3-none-any.whl (23 kB)
  Downloading fastaudio-1.0.0-py2.py3-none-any.whl (23 kB)
Collecting fastai==2.2.7 (from fastaudio)
  Downloading fastai-2.2.7-py3-none-any.whl (193 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.3/193.3 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
Collecting librosa==0.8 (from fastaudio)
  Downloading librosa-0.8.0.tar.gz (183 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m183.9/183.9 kB[0m [31m12.7 MB/s[0m eta [36m0:0

In [14]:
from fastai.vision.all import *
from pathlib import Path
import pandas as pd
from fastai.data.block import DataBlock
from fastai.data.transforms import get_files, RandomSplitter, parent_label
from torchaudio.transforms import MelSpectrogram, AmplitudeToDB
import torchaudio
import librosa
import numpy as np
from torch import Tensor


sr = 44100  # Sample rate to use for your audio files
n_fft = 2048  # fft size for MelSpectrogram
n_mels = 128  # number of mel bins
mel_spec = MelSpectrogram(sample_rate=sr, n_fft=n_fft, n_mels=n_mels)
amp_to_db = AmplitudeToDB()  # Transformation to get the logarithmic scale


# Wrap the torchaudio transforms in a fastai Transform
class AudioToSpec(Transform):
    def __init__(self, mel_spec, amp_to_db):
        self.mel_spec, self.amp_to_db = mel_spec, amp_to_db
        
    def encodes(self, ai: Tensor) -> TensorImage:
        spec = self.mel_spec(ai)
        spec_db = self.amp_to_db(spec)
        # Normalize the spectrogram by the max value
        spec_norm = spec_db - spec_db.max()
        return spec_norm

def get_audio_files(path):
    return get_files(path, extensions='.wav')

# Define a function that loads an audio file into an AudioTensor
def open_audio(fn):
    wave, _ = torchaudio.load(fn)
    return Tensor(wave)


# Collecting the label of the audio files
labels_df = pd.read_csv('/kaggle/input/audio-samples/FSDKaggle2018.meta/FSDKaggle2018.meta/train_post_competition.csv')
label_dict = {row['fname']: row['label'] for _, row in labels_df.iterrows()}

def label_func(fname):
    # The name of the audio file is expected to be a Path object or string
    # Extract filename without extension from the full path
    name = fname.with_suffix('').name
    return label_dict.get(name, None)  # Provide a label or None if key not found

In [12]:
# Define DataBlock
audio_to_spec = AudioToSpec(mel_spec, amp_to_db)
audio_block = DataBlock(
    blocks=(TransformBlock(type_tfms=open_audio), CategoryBlock),
    get_items=get_audio_files,
    splitter=RandomSplitter(),
    get_y=label_func,  # parent_label function takes the parent folder name as the label
    batch_tfms=[IntToFloatTensor, AudioToSpec(mel_spec, amp_to_db)]
)

In [15]:
# Build the dataloaders
path = Path('/kaggle/input/audio-samples/FSDKaggle2018.audio_train/FSDKaggle2018.audio_train')
dls = audio_block.dataloaders(path, bs=64)

# Define a vision learner
learn = vision_learner(dls, resnet34, metrics=accuracy)

Unexpected exception formatting exception. Falling back to standard exception


Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_34/2248221255.py", line 3, in <module>
    dls = audio_block.dataloaders(path, bs=64)
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/block.py", line 155, in dataloaders
    # Cell
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/block.py", line 147, in datasets
    for x in s[0]:
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 454, in __init__
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 454, in <listcomp>
  File "/opt/conda/lib/python3.10/site-packages/fastcore/foundation.py", line 98, in __call__
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 368, in __init__
    return Datasets(tls=test_tls)
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", l

In [None]:
# Train the model
learn.fine_tune(epochs=5)