### Introduction

Systems able to recognize sounds directly from audio recordings are widely applicable. In this project,
you’ll attempt to create an audio tagging system by extracting audio clip image representations and then
using computer vision-based classification models. You can consider constructing your system using a
relatively large-scale competition data set and then evaluate it on its ability to recognize and distinguish more specialized sounds on locally generated recordings.

### Goals

1. Investigate and construct models for automatic audio tagging of noisy recordings.
2. Adapt this to smaller data sets of audio recordings, either by using a setup motivated by your
above findings or by transfer learning.
3. Construct an audio tagging application.

### Methods and materials

To achieve Goal 1 of the project, you can, for example, use the FSDKaggle2018 used in the Freesound
General-Purpose Audio Tagging Challenge on Kaggle. There are 41 categories of audio clips, and the
goal is to classify each clip. For the second objective, you can look for a data set on your own or
construct one yourself.

As part of the project, you should investigate ways to do data augmentation for audio.
You’ll make use of a variety of Python audio libraries, e.g., librosa. You should also look into fastxtend, a library built on top of fastai. To construct the application, you’re free to use any solution you know or want to investigate. A natural starting point is the deployment solutions used in the fastai course.

Consider not converting audio to images but instead setting up an audio classification framework that
operates on audio representations of the data.


## References

McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. “librosa: Audio and music signal analysis in python.” In Proceedings of the 14th python in science conference, pp. 18-25. 2015.

Benjamin Warner. "fastxtend" 2023. Accessed 22.02.2024 from: https://fastxtend.benjaminwarner.dev/ 

# Part 1: Setting up the environment

### Installing libraries

First, we need to install the necessary libraries to perform machine learning on the audio samples.

It is a good idea to create a virtual environment specifically for this project before installing the packages.

Now open a terminal in your desired editor, and run the following commands:

Create a venv using:

```
python -m venv "your-venv-name"
```

Now you can install the libraries in this venv only:

``` 
pip install librosa
```

Install PyTorch by following this link and selecting your details: https://pytorch.org/get-started/locally/
Run the given command in the Terminal.

Now you can proceed with the 

``` 
conda create -n fastxtend python=3.11 "pytorch>=2.1" torchvision torchaudio \
pytorch-cuda=12.3 fastai nbdev pkg-config libjpeg-turbo opencv tqdm psutil \
terminaltables numpy "numba>=0.57" librosa timm kornia rich typer wandb "transformers>=4.34" "tokenizers>=0.14" "datasets>=2.14" ipykernel \
ipywidgets "matplotlib<3.8" -c pytorch -c nvidia -c fastai -c huggingface -c conda-forge
```

``` 
conda activate fastxtend
```

``` 
pip install "fastxtend[all]"
```

In [1]:
import torch
x = torch.rand(5, 3)
print(x)

tensor([[0.7460, 0.1664, 0.2565],
        [0.4836, 0.8227, 0.6346],
        [0.7704, 0.6892, 0.7098],
        [0.8201, 0.7667, 0.0265],
        [0.9147, 0.5007, 0.9854]])


In [4]:
torch.cuda.is_available()

False