## Usage

This notebook will go over how to install this repo on an external server to run the training and inference.
To begin, we'll first need to clone the repo

In [1]:
!git clone https://github.com/ksanjeevan/crnn-audio-classification.git

Cloning into 'crnn-audio-classification'...
remote: Enumerating objects: 127, done.[K
remote: Counting objects: 100% (127/127), done.[K
remote: Compressing objects: 100% (87/87), done.[K
remote: Total 127 (delta 63), reused 101 (delta 37), pack-reused 0[K
Receiving objects: 100% (127/127), 3.55 MiB | 0 bytes/s, done.
Resolving deltas: 100% (63/63), done.


We will also need to install the [torchaudio-contrib package](https://github.com/keunwoochoi/torchaudio-contrib). This is simple as cloneing the repo and using `pip` to install it 

In [2]:
!git clone https://github.com/keunwoochoi/torchaudio-contrib
!pip install -e torchaudio-contrib

Cloning into 'torchaudio-contrib'...
remote: Enumerating objects: 54, done.[K
remote: Counting objects: 100% (54/54), done.[K
remote: Compressing objects: 100% (33/33), done.[K
remote: Total 209 (delta 22), reused 34 (delta 11), pack-reused 155[K
Receiving objects: 100% (209/209), 1.89 MiB | 0 bytes/s, done.
Resolving deltas: 100% (99/99), done.
Obtaining file:///home/jupyter/crnn-audio-classification%20-%20UrbanSound8k/torchaudio-contrib
Installing collected packages: torchaudio-contrib
  Running setup.py develop for torchaudio-contrib
Successfully installed torchaudio-contrib


Next, we'll do some cleanup and move the repo into the root folder, which is optional

In [3]:
!cp -r crnn-audio-classification/* .

Next, we'll install the other required packages. Note tensorboardX is optional. If you want to install tensorboardX, you'll also need to install Tensorflow as well

In [13]:
!pip install SoundFile
!pip install git+https://github.com/ksanjeevan/torchparse.git

Collecting git+https://github.com/ksanjeevan/torchparse.git
  Cloning https://github.com/ksanjeevan/torchparse.git to /tmp/pip-req-build-cxyj2nui
Building wheels for collected packages: torchparse
  Running setup.py bdist_wheel for torchparse ... [?25ldone
[?25h  Stored in directory: /tmp/pip-ephem-wheel-cache-pndy0yu8/wheels/53/64/f2/c60bf851fcf5d5363538889a115dd68d728474bbc22ef7d280
Successfully built torchparse
Collecting pysoundfile
  Downloading https://files.pythonhosted.org/packages/2a/b3/0b871e5fd31b9a8e54b4ee359384e705a1ca1e2870706d2f081dc7cc1693/PySoundFile-0.9.0.post1-py2.py3-none-any.whl
Installing collected packages: pysoundfile
Successfully installed pysoundfile-0.9.0.post1


In [None]:
#optional
!pip install tensorboardX
!pip install tensorboard
!pip install tensorflow

## Downloading the UrbanSound8k dataset
To download the dataset, navigation to the [UrbanSoundDataSet webpage](https://urbansounddataset.weebly.com/urbansound8k.html) and navigate to the bottom of the page. 
There will be a simple [form](https://urbansounddataset.weebly.com/download-urbansound8k.html) that you will need to fill out before getting access to the dataset.

Once its filled out, you will receive a url to download the dataset. Copy that link, and paste into the `{urbansound8k link}`. using `wget`, we'll download the url to the notebook

In [None]:
!wget {urbansound8k link}

Once its downloaded, we will then need to untar the file. Just replace `{urbansound8k_downloaded_file}` with the name of the file. 
Optionally, we can removed the downloaded file here as well

In [None]:
!tar -zxvf {urbansound8k_downloaded_file}
!rm -f {urbansound8k_downloaded_file}

## Preparing the config file

The config file is used to build out the training model. The only thing you will *need* to change is the path to the dataset.
which is located in `data["path"]`.
You may also want to change the number of epochs(`data["train"]["epochs"]`), when testing. Running the training with 10 epochs took about 10 minutes on a GPU. (You are running this on a GPU right :))

In [7]:
json_config = {
    "name"          :   "Urban Testing",
    "data"          :   {
                            "type"      :   "CSVDataManager",
                            "path"      :   "UrbanSound8K",
                            "format"    :   "audio",
                            "loader"    :   {
                                                "shuffle"       : True,
                                                "batch_size"    : 24,
                                                "num_workers"   : 4,
                                                "drop_last"     : True
                                            },
                            "splits"    :   {
                                                "train" : [1,2,3,4,5,6,7,8,9], 
                                                "val"   : [10]                                            
                                            }
                        },
    "transforms"    :   {
                            "type"      :   "AudioTransforms",
                            "args"      :   {
                                                "channels"       : "avg",
                                                "noise"    : [0.3, 0.001],
                                                "crop"     : [0.4, 0.25]
                                            }
                        },
    "optimizer"     :   {
                            "type"      :   "Adam",
                            "args"      :   {
                                                "lr"            : 0.002,
                                                "weight_decay"  : 0.01,
                                                "amsgrad"       : True
                                            }
                        },
    "lr_scheduler"   :   {
                            "type"      :   "StepLR",
                            "args"      :   {
                                                "step_size" : 10,
                                                "gamma"     : 0.5
                                            }
                        },
    "model"         :   {
                            "type"      :   "AudioCRNN"
                        },
    "train"         :   {
                            "loss"      :   "nll_loss",
                            "epochs"    :   100,
                            "save_dir"  :   "saved_cv/",
                            "save_p"    :   1,
                            "verbosity" :   2,
                            
                            "monitor"   :   "min val_loss",
                            "early_stop":   8,
                            "tbX"       :   True
                        },
    "metrics"       :   "classification_metrics"

}

Next, we'll write this json out, in order for the model to read in this updated json file

In [8]:
import json
with open('my-config.json', 'w') as json_file:  
  json.dump(json_config, json_file)

## Training
Finally, we can start training the model. We'll be passing 3 parameters, with the first parameter being the action we want to take, which is `train`. You can use `train` to train the model, or `eval`, to perform evalation on the model. The `-c` parameter is the config file, which we just created, and `--cfg`, which is the layer configuration of the model.

In [9]:
!python3 run.py train -c config.json --cfg crnn.cfg

Compose(
    ProcessChannels(mode=avg)
    AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
    RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
    ToTensorAudio()
)
Traceback (most recent call last):
  File "run.py", line 176, in <module>
    train_main(config, args.resume)
  File "run.py", line 85, in train_main
    model = getattr(net_module, m_name)(classes, config=config)
  File "/home/selcuk/PycharmProjects/crnn-audio-classification/net/model.py", line 30, in __init__
    self.net = parse_cfg(config['cfg'], in_shape=[in_chan, self.spec.num_mels, 400])
  File "/usr/local/lib/python3.6/dist-packages/torchparse/parser.py", line 139, in parse_cfg
    return CFGParser(fname).get_modules(in_shape)
  File "/usr/local/lib/python3.6/dist-packages/torchparse/parser.py", line 120, in get_modules
    model = self._flow(in_shape)
  File "/usr/local/lib/python3.6/dist-packages/torchparse/parser.py", line 108, in _flow
    in_shape = layer.get_out_shape()
  File "/us

## Inference

After we have trainined the model, we can run inference on it.
Call the `run.py` with 2 parameters. The first is a path to a sample audio audio file. For this example, we'll use a random audio sample from the UrbanSound8K dataset. The second parameter will be the path to the model checkpoint. It will look something like this `saved_cv/{timestamp}/checkoints/model_best.pth`

In [4]:
!python run.py UrbanSound8K/audio/fold10/100795-3-0-0.wav -r saved_cv/0515_171217/checkpoints/model_best.pth

dog_bark 0.9858338236808777


When running our inference, we got a 98% confidence of the supplied audio to be a dog bark.