# Torch-KWT Tutorial

This notebook will guide you through the steps to training and running inference on Google Speech Commands V2 (35) with the [Torch-KWT](https://github.com/ID56/Torch-KWT) repository.

## Setup

### 1. Clone the repository

In [None]:
!git clone https://github.com/ID56/Torch-KWT.git

Cloning into 'Torch-KWT'...
remote: Enumerating objects: 99, done.[K
remote: Counting objects: 100% (99/99), done.[K
remote: Compressing objects: 100% (75/75), done.[K
remote: Total 99 (delta 41), reused 67 (delta 19), pack-reused 0[K
Unpacking objects: 100% (99/99), done.


In [None]:
cd Torch-KWT/

/content/Torch-KWT


### 2. Install requirements

In [None]:
!pip install -qr requirements.txt

[K     |████████████████████████████████| 636 kB 5.2 MB/s 
[K     |████████████████████████████████| 1.7 MB 40.9 MB/s 
[K     |████████████████████████████████| 133 kB 52.5 MB/s 
[K     |████████████████████████████████| 170 kB 45.2 MB/s 
[K     |████████████████████████████████| 97 kB 6.7 MB/s 
[K     |████████████████████████████████| 63 kB 1.7 MB/s 
[?25h  Building wheel for subprocess32 (setup.py) ... [?25l[?25hdone
  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


### 3. Download the Google Speech Commands V2 dataset

We'll be saving it to the `./data/` folder.

In [None]:
!sh ./download_gspeech_v2.sh ./data/

--2021-08-28 16:09:35--  http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 142.250.152.128, 2607:f8b0:4001:c56::80
Connecting to download.tensorflow.org (download.tensorflow.org)|142.250.152.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2428923189 (2.3G) [application/gzip]
Saving to: ‘STDOUT’


2021-08-28 16:10:32 (40.8 MB/s) - written to stdout [2428923189/2428923189]



In [None]:
!ls data

_background_noise_  five     left     README.md		tree
backward	    follow   LICENSE  right		two
bed		    forward  marvin   seven		up
bird		    four     nine     sheila		validation_list.txt
cat		    go	     no       six		visual
dog		    happy    off      stop		wow
down		    house    on       testing_list.txt	yes
eight		    learn    one      three		zero


As you can see, the dataset provides a `validation_list.txt` and a `testing_list.txt` as the split. We'll run a simple script `make_data_list.py` to also generate a `training_list.txt`, as well as a `label_map.json` that maps numeric indices to class labels.

In [None]:
!python make_data_list.py -v ./data/validation_list.txt -t ./data/testing_list.txt -d ./data/ -o ./data/

Number of training samples: 84843
Number of validation samples: 9981
Number of test samples: 11005
Saved data lists and label map.


## Training

For training, we only need to provide the config file.

### 9. Setting Up Your Config File

For this example, we'll be using the `sample_configs/base_config.yaml`. In fact, you should be able to use this config to reproduce the results of the provided pretrained KWT-1 checkpoint if you follow the exact settings (training for 140 epochs / ~23000 steps @ batch_size = 512).

We'll be training for 10 epochs in this example.

You can also use [wandb](wandb.ai) to log your runs. Either provide a path to a txt file containing your API key, or set the env variable "WANDB_API_KEY", like:

```
os.environ["WANDB_API_KEY"] = "yourkey"
```

We will not be using wandb in this example, but feel free to try it.

In [None]:
conf_str = """# sample config to run a demo training of 20 epochs

data_root: ./data/
train_list_file: ./data/training_list.txt
val_list_file: ./data/validation_list.txt
test_list_file: ./data/testing_list.txt
label_map: ./data/label_map.json

exp:
    wandb: False
    wandb_api_key: <path/to/api/key>
    proj_name: torch-kwt-1
    exp_dir: ./runs
    exp_name: exp-0.0.1
    device: auto
    log_freq: 20    # log every l_f steps
    log_to_file: True
    log_to_stdout: True
    val_freq: 1    # validate every v_f epochs
    n_workers: 1
    pin_memory: True
    cache: 2 # 0 -> no cache | 1 -> cache wavs | 2 -> cache specs; stops wav augments
    model_mode: 0 # 0 -> mfcc filters | 1 -> custom adaptive filters | 2 -> custom mfcc filters
    

hparams:
    seed: 0
    batch_size: 512
    n_epochs: 10
    l_smooth: 0.1

    audio:
        sr: 16000
        n_mels: 40
        n_fft: 480
        win_length: 480
        hop_length: 160
        center: False
    
    model:
        name: # if name is provided below settings will be ignored during model creation   
        input_res: [40, 98]
        patch_res: [40, 1]
        num_classes: 35
        mlp_dim: 256
        dim: 64
        heads: 1
        depth: 12
        dropout: 0.0
        emb_dropout: 0.1
        pre_norm: False

    optimizer:
        opt_type: adamw
        opt_kwargs:
            lr: 0.001
            weight_decay: 0.1
    
    scheduler:
        n_warmup: 10
        max_epochs: 140
        scheduler_type: cosine_annealing

    augment:
        # resample:
            # r_min: 0.85
            # r_max: 1.15
        
        # time_shift:
            # s_min: -0.1
            # s_max: 0.1

        # bg_noise:
            # bg_folder: ./data/_background_noise_/

        spec_aug:
            n_time_masks: 2
            time_mask_width: 25
            n_freq_masks: 2
            freq_mask_width: 7"""

!mkdir -p configs
with open("configs/kwt1_colab.yaml", "w+") as f:
    f.write(conf_str)

### 10. Initiating Training

Make sure you are using a GPU runtime.

In order to train to a full 140 epochs / 23000 steps like the paper, on free resources, we need to cut down on disk I/O and audio processing time. So, we'll preemptively convert all our `.wav` files into MFCCs of shape `(40, 98)` and keep them stored in memory. This caching process may take ~6 minutes.

Since we'll be directly using MFCCs, no wav augmentations like resample, time_shift or background_noise will be used; we'll just use spectral augmentation with the settings from the paper.



> Note: You may notice a "Warning: Leaking Caffe2 thread-pool after fork." message after each epoch. It seems to be an existing torch-1.9 issue, which you can ignore. [See more here.](https://github.com/pytorch/pytorch/issues/57273)





In [None]:
# !python train.py --conf configs/kwt1_colab.yaml

Set seed 0
Using settings:
 data_root: ./data/
exp:
  cache: 2
  device: &id001 !!python/object/apply:torch.device
  - cuda
  exp_dir: ./runs
  exp_name: exp-0.0.1
  log_freq: 20
  log_to_file: true
  log_to_stdout: true
  n_workers: 1
  pin_memory: true
  proj_name: torch-kwt-1
  save_dir: ./runs/exp-0.0.1
  val_freq: 1
  wandb: false
  wandb_api_key: <path/to/api/key>
hparams:
  audio:
    center: false
    hop_length: 160
    n_fft: 480
    n_mels: 40
    sr: 16000
    win_length: 480
  augment:
    spec_aug:
      freq_mask_width: 7
      n_freq_masks: 2
      n_time_masks: 2
      time_mask_width: 25
  batch_size: 512
  device: *id001
  l_smooth: 0.1
  model:
    depth: 12
    dim: 64
    dropout: 0.0
    emb_dropout: 0.1
    heads: 1
    input_res:
    - 40
    - 98
    mlp_dim: 256
    name: null
    num_classes: 35
    patch_res:
    - 40
    - 1
    pre_norm: false
  n_epochs: 10
  optimizer:
    opt_kwargs:
      lr: 0.001
      weight_decay: 0.1
    opt_type: adamw
  schedul

After training 10 epochs, we have a validation accuracy of **~78.99%** and a test accuracy of **~76.52%**.

In colab, it takes ~84s per epoch, with an additional ~3s for validation. To do a complete training like the paper (140 epochs / 23K steps) on colab, you'd thus need around **3.4 hours**.

You may also try running Torch-KWT training on kaggle, which I've found to be notably faster. Full training takes less than **2 hours** there.

In [None]:
from train import caching_pipeline, import_model, import_optimization_methods, training_pipeline
from config_parser import get_config
from utils.misc import seed_everything

config = get_config('configs/kwt1_colab.yaml')
seed_everything(config["hparams"]["seed"])

In [None]:
trainloader, valloader = caching_pipeline(config)

In [None]:
model = import_model(config)
optimizer, criterion, schedulers = import_optimization_methods(config, model, trainloader)

In [None]:
training_pipeline(config, model, optimizer, criterion, trainloader, valloader, schedulers)