## Usage

This notebook will go over how to install this repo on an external server to run the training and inference.
To begin, we'll first need to clone the repo

In [1]:
!git clone https://github.com/ksanjeevan/crnn-audio-classification.git

Cloning into 'crnn-audio-classification'...
remote: Enumerating objects: 167, done.[K
remote: Counting objects: 100% (26/26), done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 167 (delta 10), reused 12 (delta 6), pack-reused 141[K
Receiving objects: 100% (167/167), 3.57 MiB | 1.77 MiB/s, done.
Resolving deltas: 100% (82/82), done.


We will also need to install the [torchaudio-contrib package](https://github.com/keunwoochoi/torchaudio-contrib). This is simple as cloneing the repo and using `pip` to install it 

In [2]:
!git clone https://github.com/keunwoochoi/torchaudio-contrib
!pip install -e torchaudio-contrib

Cloning into 'torchaudio-contrib'...
remote: Enumerating objects: 308, done.[K
remote: Total 308 (delta 0), reused 0 (delta 0), pack-reused 308[K
Receiving objects: 100% (308/308), 1.93 MiB | 5.08 MiB/s, done.
Resolving deltas: 100% (156/156), done.
Obtaining file:///home/abs/abs/crnn-audio-classification/torchaudio-contrib
Installing collected packages: torchaudio-contrib
  Running setup.py develop for torchaudio-contrib
Successfully installed torchaudio-contrib


Next, we'll do some cleanup and move the repo into the root folder, which is optional

In [3]:
!cp -r crnn-audio-classification/* .

Next, we'll install the other required packages. Note tensorboardX is optional. If you want to install tensorboardX, you'll also need to install Tensorflow as well

In [4]:
!pip install SoundFile
!pip install git+https://github.com/ksanjeevan/torchparse.git

Collecting SoundFile
  Downloading SoundFile-0.10.3.post1-py2.py3-none-any.whl (21 kB)
Installing collected packages: SoundFile
Successfully installed SoundFile-0.10.3.post1
Collecting git+https://github.com/ksanjeevan/torchparse.git
  Cloning https://github.com/ksanjeevan/torchparse.git to /tmp/pip-req-build-q24clo1o
  Running command git clone -q https://github.com/ksanjeevan/torchparse.git /tmp/pip-req-build-q24clo1o
Building wheels for collected packages: torchparse
  Building wheel for torchparse (setup.py) ... [?25ldone
[?25h  Created wheel for torchparse: filename=torchparse-0.1-py3-none-any.whl size=7977 sha256=7589fb4ce010afe4871bc34ee65372bbfd5e8990ed1799908926185b4c9924e7
  Stored in directory: /tmp/pip-ephem-wheel-cache-36hhb5fr/wheels/14/99/39/34754e0ce89e0a21e18986098cbc8346e7346b3cb7ce839129
Successfully built torchparse
Installing collected packages: torchparse
Successfully installed torchparse-0.1


In [5]:
#optional
!pip install tensorboardX
!pip install tensorboard
!pip install tensorflow

Collecting tensorboardX
  Downloading tensorboardX-2.4-py2.py3-none-any.whl (124 kB)
[K     |████████████████████████████████| 124 kB 3.2 MB/s 
Collecting protobuf>=3.8.0
  Downloading protobuf-3.17.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 5.0 MB/s 
Installing collected packages: protobuf, tensorboardX
Successfully installed protobuf-3.17.3 tensorboardX-2.4
Collecting tensorflow
  Downloading tensorflow-2.6.0-cp38-cp38-manylinux2010_x86_64.whl (458.4 MB)
[K     |████████████████████████████████| 458.4 MB 74 kB/s 
[?25hCollecting termcolor~=1.1.0
  Downloading termcolor-1.1.0.tar.gz (3.9 kB)
Collecting h5py~=3.1.0
  Downloading h5py-3.1.0-cp38-cp38-manylinux1_x86_64.whl (4.4 MB)
[K     |████████████████████████████████| 4.4 MB 100.9 MB/s 
Collecting wheel~=0.35
  Downloading wheel-0.37.0-py2.py3-none-any.whl (35 kB)
Collecting keras-preprocessing~=1.1.2
  Downloading Keras_Preprocessing-1.1.2-py2.py3-none-any.w

## Downloading the UrbanSound8k dataset
To download the dataset, navigation to the [UrbanSoundDataSet webpage](https://urbansounddataset.weebly.com/urbansound8k.html) and navigate to the bottom of the page. 
There will be a simple [form](https://urbansounddataset.weebly.com/download-urbansound8k.html) that you will need to fill out before getting access to the dataset.

Once its filled out, you will receive a url to download the dataset. Copy that link, and paste into the `{urbansound8k link}`. using `wget`, we'll download the url to the notebook

In [6]:
!wget {urbansound8k link}

--2021-08-18 20:17:49--  http://%7Burbansound8k/
Resolving {urbansound8k ({urbansound8k)... failed: Name or service not known.
wget: unable to resolve host address ‘{urbansound8k’
--2021-08-18 20:17:49--  http://link%7D/
Resolving link} (link})... failed: Name or service not known.
wget: unable to resolve host address ‘link}’


Once its downloaded, we will then need to untar the file. Just replace `{urbansound8k_downloaded_file}` with the name of the file. 
Optionally, we can removed the downloaded file here as well

In [None]:
!tar -zxvf /home/abs/abs/crnn-audio-classification/crnn-audio-classification/UrbanSound8K.tar.gz

# !rm -f {urbansound8k_downloaded_file}

## Preparing the config file

The config file is used to build out the training model. The only thing you will *need* to change is the path to the dataset.
which is located in `data["path"]`.
You may also want to change the number of epochs(`data["train"]["epochs"]`), when testing. Running the training with 10 epochs took about 10 minutes on a GPU. (You are running this on a GPU right :))

In [1]:
json_config = {
    "name"          :   "Urban Testing",
    "data"          :   {
                            "type"      :   "CSVDataManager",
                            "path"      :   "UrbanSound8K",
                            "format"    :   "audio",
                            "loader"    :   {
                                                "shuffle"       : True,
                                                "batch_size"    : 24,
                                                "num_workers"   : 4,
                                                "drop_last"     : True
                                            },
                            "splits"    :   {
                                                "train" : [1,2,3,4,5,6,7,8,9], 
                                                "val"   : [10]                                            
                                            }
                        },
    "transforms"    :   {
                            "type"      :   "AudioTransforms",
                            "args"      :   {
                                                "channels"       : "avg",
                                                "noise"    : [0.3, 0.001],
                                                "crop"     : [0.4, 0.25]
                                            }
                        },
    "optimizer"     :   {
                            "type"      :   "Adam",
                            "args"      :   {
                                                "lr"            : 0.002,
                                                "weight_decay"  : 0.01,
                                                "amsgrad"       : True
                                            }
                        },
    "lr_scheduler"   :   {
                            "type"      :   "StepLR",
                            "args"      :   {
                                                "step_size" : 10,
                                                "gamma"     : 0.5
                                            }
                        },
    "model"         :   {
                            "type"      :   "AudioCRNN"
                        },
    "train"         :   {
                            "loss"      :   "nll_loss",
                            "epochs"    :   100,
                            "save_dir"  :   "saved_cv/",
                            "save_p"    :   1,
                            "verbosity" :   2,
                            
                            "monitor"   :   "min val_loss",
                            "early_stop":   8,
                            "tbX"       :   True
                        },
    "metrics"       :   "classification_metrics"

}

Next, we'll write this json out, in order for the model to read in this updated json file

In [2]:
import json
with open('my-config.json', 'w') as json_file:  
  json.dump(json_config, json_file)

## Training
Finally, we can start training the model. We'll be passing 3 parameters, with the first parameter being the action we want to take, which is `train`. You can use `train` to train the model, or `eval`, to perform evalation on the model. The `-c` parameter is the config file, which we just created, and `--cfg`, which is the layer configuration of the model.

In [3]:
!python3 run.py train -c my-config.json --cfg crnn.cfg

Compose(
    ProcessChannels(mode=avg)
    AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
    RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
    ToTensorAudio()
)
AudioCRNN(
  (spec): MelspectrogramStretch(num_bands=128, fft_len=2048, norm=spec_whiten, stretch_param=[0.4, 0.4])
  (net): ModuleDict(
    (convs): Sequential(
      (conv2d_0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_0): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_0): ELU(alpha=1.0)
      (maxpool2d_0): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
      (dropout_0): Dropout(p=0.1)
      (conv2d_1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_1): ELU(alpha=1.0)
      (maxpool2d_1): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
    

## Inference

After we have trainined the model, we can run inference on it.
Call the `run.py` with 2 parameters. The first is a path to a sample audio audio file. For this example, we'll use a random audio sample from the UrbanSound8K dataset. The second parameter will be the path to the model checkpoint. It will look something like this `saved_cv/{timestamp}/checkoints/model_best.pth`

In [4]:
!python3 run.py UrbanSound8K/audio/fold10/100795-3-0-0.wav -r saved_cv/0818_213813/checkpoints/checkpoint-current.pth

dog_bark 0.9858338236808777
