# Summary
The purpose of this code is to create a TensorFlow Lite model. To do this, we take in a data set, create a TensorFlow model, train the model on that data set, then convert the TF model into a TF Lite model. This TF Lite model can then be loaded onto a mobile device for audio classification. 

There are a variety of models that can be trained on a variety of data sets. This code is for training amplitude models, which are the most straight foraward. This guide aims to describe all of the parameters in the training process, so that you can change them and build your own models. Remember, the end goal is to acheive the highest validation accuracy possible before loading that model onto a phone for use.

**Numbers to beat for our l6-data:** <br>
*A good goal--* 85% val_acc <br>
*Our best model--* 91% val_acc

# Imports
We use Keras (which is built on top of Tensorflow) to build and train our models. Librosa is used for audio processing.

In [1]:
# make sure kernel matches pip version
!pip3 install -r requirements.txt



In [1]:
%load_ext autoreload
%autoreload 2

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# Tensorflow
import tensorflow as tf
from tensorflow.python.tools import freeze_graph
from tensorflow.python.tools import optimize_for_inference_lib

# Keras
import keras
from keras import regularizers
from keras.models import Sequential
from keras.layers import (Activation, Dense, Dropout, Flatten, Conv2D, Conv1D, 
                          MaxPooling2D, GlobalAveragePooling2D, MaxPooling1D, Lambda)
from keras.layers.normalization import BatchNormalization
from keras.callbacks import Callback, ReduceLROnPlateau, ModelCheckpoint
from keras.utils import to_categorical, multi_gpu_model
import keras.backend as K

import librosa
import multiprocessing as mp
import numpy as np
import scipy.io.wavfile
from scipy.fftpack import dct
from sklearn.model_selection import train_test_split
from tqdm import tqdm as tqdm
import time
from pprint import pprint
import uuid
import glob
import math

Using TensorFlow backend.


# Constants

## General

**RAW_DATA_DIR:** Where the raw training data is located. In the directory each folder name is a label, and its contents are .wav files corresponding to that label. See the "l6-data" directory for an example. 

**AMP_PROCESSED_DATA_DIR:** Where the processed training data is to be stored. After processing the data, it will be populated with numpy files. Each numpy file is named after a label, and contains all of the training data for that label stored as a 3d numpy array. More info on this later.

**AUDIO_LENGTH:** The desired input size for the model. An input size of 44100 at 44100 Hz would be a one second input. 

**SAMPLE_RATE:** The sample rate of the microphone.

## Training

**channel:** We use a one channel audio input

**epochs:** This is how many times the model will train on your data. 200 is usually a good number, there are diminishing returns after a certain point. 

**batch_size:** I don't even know what this is.

**verbose:** 1 is true, 0 is false. Its always a good idea to have this on. 

**num_classes:** The number of classes (also referred to as labels) in your data set.

In [2]:
RAW_DATA_DIR = "data/"
AMP_PROCESSED_DATA_DIR = "amp-processed-data/"
MODEL_NAME = 'Audio_Recorder'
AUDIO_LENGTH = 44100
SAMPLE_RATE = 44100

channel = 1
epochs = 1
batch_size = 128
verbose = 1
num_classes = 6

# Data Processing

### get_labels(path)
**Input:** `RAW_DATA_DIR` <br>
**Output:** `Tuple (Labels, Indices of the labels, one-hot encoded labels)` <br>
**Description:** Gets all labels (aka filenames) inside your raw data directory. The indices are the positions of these labels, and the hot encoded vector is a vector of zeros the length of the number of labels, with a 1 at the corresponding label's index. More info can be found about these online.

### label_to_amplitude_vecs(args)
**Input:** `Tuple (label, input_path, output_path, tqdm_position)`<br>
**Output:** Shape of the generated numpy file (stored under `output_path`)<br>
**Description:** This function is called by `process_data_amplitude()` in parallel to convert a label's raw .wav files to a single numpy file. This can take a while, so David made a super fancy tdqm display (hence the tqdm_position parameter). **This function will split the training data into overlapping arrays the size of input length, then store them in a 3d numpy array under the `output_path` dir.** It will move though the clip at a rate of AUDIO_LENGTH / 2, so if a clip is 10.1 seconds long, and the input length is 44100 with a 44100 Hz sampling rate, it will be processed as 21 one second clips. The final piece of the clip is padded with zeros to match the input length. This is done for all the .wav files in a label, which are converted to amplitude arrays with `librosa.load()`. The resulting numpy array contains the numbers that are actually trained on.

### process_data_amplitude(input_path, output_path)
**Input:** RAW_DATA_DIR, AMP_PROCESSED_DATA_DIR<br>
**Output:** Shape of the numpy files stored in output_path<br>
**Description:** Calls ` label_to_amplitude_vecs()`, and populates output_path with numpy files. Each numpy file is named after a label, and its content is a 3d numpy array with all the training data for that label. The shape of this array is (number_of_files, AUDIO_LENGTH, 1). The 3rd empty dimension is required by Keras.

In [3]:
def get_labels(path):
    labels = [i for i in sorted(os.listdir(path)) if i[0] != "."]
    label_indices = np.arange(0, len(labels))
    return labels, label_indices, to_categorical(label_indices)

In [4]:
def label_to_amplitude_vecs(args) -> None:
    label, input_path, output_path, tqdm_position = args

    # Get all audio files for this label
    wavfiles = [os.path.join(input_path, label, wavfile) for wavfile in os.listdir(os.path.join(input_path, label))]

    # tqdm is amazing, so print all the things this way
    print(" ", end="", flush=True)
    twavs = tqdm(wavfiles, position=tqdm_position)
    
    vectors = []
    for i, wavfile in enumerate(twavs):
        # Load the audio file; this also works for .flac files
        audio_buf, _ = librosa.load(wavfile, mono=True, sr=SAMPLE_RATE)
        audio_buf = audio_buf.reshape(-1, 1)
        audio_buf = (audio_buf - np.mean(audio_buf)) / np.std(audio_buf)            
        remaining_buf = audio_buf.copy()
        while remaining_buf.shape[0] > AUDIO_LENGTH:
            # Add the first AUDIO_LENGTH of the buffer as a new vector to train on
            new_buf = remaining_buf[ : AUDIO_LENGTH ]
            vectors.append(new_buf)
            
            # Remove 1/2 * AUDIO_LENGTH from the front of the buffer
            remaining_buf = remaining_buf[ int(AUDIO_LENGTH / 2) : ]
            
        # Whatever is left, pad and stick in the training data
        remaining_buf = np.concatenate((remaining_buf, np.zeros(shape=(AUDIO_LENGTH - len(remaining_buf), 1))))
        vectors.append(remaining_buf)
        
        # Update tqdm
        twavs.set_description("Label - '{}'".format(label))
        twavs.refresh()
    np_vectors = np.array(vectors)
    np.save(os.path.join(output_path, label + '.npy'), np_vectors)
    return np_vectors.shape
    
def process_data_amplitude(input_path, output_path):
    
    labels, _, _ = get_labels(input_path)
    pool = mp.Pool()
    result = pool.map(label_to_amplitude_vecs, 
                     [(label, input_path, output_path, tqdm_position) 
                          for tqdm_position, label in enumerate(labels)])
    pool.close()
    return result

In [5]:
!ls amp-processed-data

In [6]:
process_data_amplitude(RAW_DATA_DIR, AMP_PROCESSED_DATA_DIR)

      


  0%|          | 0/202 [00:00<?, ?it/s][A


  0%|          | 0/146 [00:00<?, ?it/s][A[A[A

  0%|          | 0/169 [00:00<?, ?it/s][A[A



  0%|          | 0/407 [00:00<?, ?it/s][A[A[A[A




  0%|          | 0/398 [00:00<?, ?it/s][A[A[A[A[A
Label - 'Coughing':   0%|          | 0/286 [00:00<?, ?it/s][A
Label - 'Coughing':   0%|          | 0/286 [00:00<?, ?it/s][A

Label - 'Finger_snapping':   0%|          | 0/169 [00:00<?, ?it/s][A[A

Label - 'Finger_snapping':   0%|          | 0/169 [00:00<?, ?it/s][A[A



Label - 'Knock':   0%|          | 0/407 [00:00<?, ?it/s][A[A[A[A
Label - 'Coughing':   0%|          | 0/286 [00:00<?, ?it/s][A



Label - 'Knock':   0%|          | 0/407 [00:00<?, ?it/s][A[A[A[A




Label - 'Laughing':   0%|          | 0/398 [00:00<?, ?it/s][A[A[A[A[A
Label - 'Coughing':   0%|          | 0/286 [00:00<?, ?it/s][A

Label - 'Finger_snapping':   0%|          | 0/169 [00:00<?, ?it/s][A[A


Label - 'Computer_keyboard':   0%|          |

Label - 'Knock':   2%|▏         | 9/407 [00:00<00:40,  9.92it/s][A[A[A[A
Label - 'Coughing':   3%|▎         | 9/286 [00:00<00:26, 10.43it/s][A
Label - 'Coughing':   3%|▎         | 9/286 [00:00<00:26, 10.43it/s][A
Label - 'Coughing':   4%|▍         | 11/286 [00:01<00:25, 10.99it/s][A


Label - 'Keys_jangling':   5%|▍         | 7/146 [00:01<00:17,  7.79it/s][A[A[A



Label - 'Computer_keyboard':   3%|▎         | 7/202 [00:01<00:27,  7.20it/s]


Label - 'Keys_jangling':   5%|▍         | 7/146 [00:01<00:17,  7.79it/s][A[A[A
Label - 'Coughing':   4%|▍         | 11/286 [00:01<00:25, 10.99it/s][A



Label - 'Computer_keyboard':   3%|▎         | 7/202 [00:01<00:27,  7.20it/s]
Label - 'Coughing':   4%|▍         | 11/286 [00:01<00:25, 10.99it/s][A



Label - 'Computer_keyboard':   4%|▍         | 8/202 [00:01<00:25,  7.56it/s]A


Label - 'Keys_jangling':   5%|▍         | 7/146 [00:01<00:17,  7.79it/s][A[A[A


Label - 'Keys_jangling':   5%|▍         | 7/146 [00:01<00:17,  7.79it/

Label - 'Computer_keyboard':   9%|▉         | 19/202 [00:01<00:14, 12.50it/s]


Label - 'Keys_jangling':  13%|█▎        | 19/146 [00:01<00:10, 12.23it/s][A[A[A

Label - 'Finger_snapping':   2%|▏         | 4/169 [00:01<00:13, 11.84it/s][A[A
Label - 'Coughing':   8%|▊         | 23/286 [00:02<00:19, 13.58it/s][A


Label - 'Keys_jangling':  13%|█▎        | 19/146 [00:02<00:10, 12.23it/s][A[A[A

Label - 'Finger_snapping':   3%|▎         | 5/169 [00:02<01:30,  1.81it/s][A[A
Label - 'Coughing':   8%|▊         | 23/286 [00:02<00:19, 13.58it/s][A


Label - 'Computer_keyboard':   9%|▉         | 19/202 [00:02<00:14, 12.50it/s][A[A

Label - 'Computer_keyboard':  10%|█         | 21/202 [00:02<00:13, 12.99it/s][A

Label - 'Finger_snapping':   3%|▎         | 5/169 [00:02<01:30,  1.81it/s][A[A
Label - 'Coughing':   8%|▊         | 23/286 [00:02<00:19, 13.58it/s][A
Label - 'Coughing':   8%|▊         | 23/286 [00:02<00:19, 13.58it/s][A


Label - 'Keys_jangling':  14%|█▍        | 21/146 

Label - 'Computer_keyboard':  15%|█▌        | 31/202 [00:02<00:14, 11.83it/s]




Label - 'Laughing':   3%|▎         | 11/398 [00:02<01:46,  3.62it/s][A[A[A[A[A
Label - 'Coughing':  12%|█▏        | 35/286 [00:02<00:20, 12.29it/s][A

Label - 'Finger_snapping':   9%|▉         | 16/169 [00:02<00:20,  7.34it/s][A[A


Label - 'Computer_keyboard':  15%|█▌        | 31/202 [00:02<00:14, 11.83it/s][A[A

Label - 'Finger_snapping':   9%|▉         | 16/169 [00:02<00:20,  7.34it/s][A[A


Label - 'Computer_keyboard':  15%|█▌        | 31/202 [00:03<00:14, 11.83it/s][A[A

Label - 'Finger_snapping':   9%|▉         | 16/169 [00:03<00:20,  7.34it/s][A[A

Label - 'Finger_snapping':   9%|▉         | 16/169 [00:03<00:20,  7.34it/s][A[A

Label - 'Computer_keyboard':  15%|█▌        | 31/202 [00:03<00:14, 11.83it/s]A[A
Label - 'Computer_keyboard':  15%|█▌        | 31/202 [00:03<00:14, 11.83it/s]
Label - 'Computer_keyboard':  16%|█▋        | 33/202 [00:03<00:13, 12.66it/s]


Label - 'Keys_jangl

Label - 'Keys_jangling':  27%|██▋       | 39/146 [00:03<00:09, 11.57it/s][A[A[A



Label - 'Knock':   7%|▋         | 29/407 [00:03<01:07,  5.61it/s][A[A[A[A
Label - 'Computer_keyboard':  19%|█▉        | 39/202 [00:03<00:14, 11.09it/s]


Label - 'Keys_jangling':  27%|██▋       | 39/146 [00:03<00:09, 11.57it/s][A[A[A

Label - 'Finger_snapping':  15%|█▌        | 26/169 [00:03<00:13, 10.50it/s][A[A
Label - 'Computer_keyboard':  19%|█▉        | 39/202 [00:03<00:14, 11.09it/s]


Label - 'Keys_jangling':  28%|██▊       | 41/146 [00:03<00:09, 11.12it/s][A[A[A



Label - 'Knock':   7%|▋         | 29/407 [00:03<01:07,  5.61it/s][A[A[A[A




Label - 'Laughing':   4%|▍         | 17/398 [00:03<00:57,  6.68it/s][A[A[A[A[A

Label - 'Computer_keyboard':  20%|██        | 41/202 [00:03<00:14, 11.15it/s]A[A
Label - 'Coughing':  15%|█▌        | 43/286 [00:03<00:23, 10.55it/s][A



Label - 'Knock':   7%|▋         | 29/407 [00:03<01:07,  5.61it/s][A[A[A[A


Label - 'Keys_jangli

Label - 'Computer_keyboard':  24%|██▍       | 49/202 [00:04<00:12, 12.09it/s]


Label - 'Keys_jangling':  34%|███▎      | 49/146 [00:04<00:08, 11.50it/s][A[A[A



Label - 'Computer_keyboard':  24%|██▍       | 49/202 [00:04<00:12, 12.09it/s]



Label - 'Knock':   9%|▊         | 35/407 [00:04<00:54,  6.87it/s][A[A[A[A



Label - 'Knock':   9%|▉         | 37/407 [00:04<00:44,  8.28it/s][A[A[A[A




Label - 'Laughing':   5%|▌         | 21/398 [00:04<00:50,  7.41it/s][A[A[A[A[A
Label - 'Coughing':  19%|█▊        | 53/286 [00:04<00:19, 12.23it/s][A


Label - 'Keys_jangling':  34%|███▎      | 49/146 [00:04<00:08, 11.50it/s][A[A[A




Label - 'Computer_keyboard':  24%|██▍       | 49/202 [00:04<00:12, 12.09it/s][A[A
Label - 'Coughing':  19%|█▊        | 53/286 [00:04<00:19, 12.23it/s][A


Label - 'Keys_jangling':  34%|███▎      | 49/146 [00:04<00:08, 11.50it/s][A[A[A



Label - 'Knock':   9%|▉         | 37/407 [00:04<00:44,  8.28it/s][A[A[A[A




Label - 'Computer_k

Label - 'Finger_snapping':  24%|██▎       | 40/169 [00:05<00:14,  8.63it/s][A[A



Label - 'Knock':  11%|█         | 43/407 [00:05<00:45,  8.00it/s][A[A[A[A
Label - 'Coughing':  21%|██▏       | 61/286 [00:05<00:20, 11.23it/s][A

Label - 'Finger_snapping':  24%|██▍       | 41/169 [00:05<00:15,  8.33it/s][A[A


Label - 'Keys_jangling':  40%|████      | 59/146 [00:05<00:08, 10.56it/s][A[A[A
Label - 'Coughing':  22%|██▏       | 63/286 [00:05<00:20, 11.06it/s][A



Label - 'Knock':  11%|█         | 43/407 [00:05<00:45,  8.00it/s][A[A[A[A




Label - 'Laughing':   7%|▋         | 28/398 [00:05<00:55,  6.63it/s][A[A[A[A[A


Label - 'Keys_jangling':  40%|████      | 59/146 [00:05<00:08, 10.56it/s][A[A[A

Label - 'Finger_snapping':  24%|██▍       | 41/169 [00:05<00:15,  8.33it/s][A[A



Label - 'Knock':  11%|█         | 43/407 [00:05<00:45,  8.00it/s][A[A[A[A




Label - 'Laughing':   7%|▋         | 28/398 [00:05<00:55,  6.63it/s][A[A[A[A[A
Label - 'Coughing':

Label - 'Knock':  12%|█▏        | 48/407 [00:06<00:54,  6.58it/s][A[A[A[A


Label - 'Keys_jangling':  48%|████▊     | 70/146 [00:06<00:05, 14.25it/s][A[A[A

Label - 'Finger_snapping':  28%|██▊       | 47/169 [00:06<00:17,  6.89it/s][A[A
Label - 'Coughing':  26%|██▌       | 73/286 [00:06<00:14, 14.96it/s][A




Label - 'Laughing':   8%|▊         | 31/398 [00:06<01:34,  3.90it/s][A[A[A[A[A



Label - 'Knock':  12%|█▏        | 48/407 [00:06<00:54,  6.58it/s][A[A[A[A
Label - 'Coughing':  26%|██▌       | 73/286 [00:06<00:14, 14.96it/s][A


Label - 'Keys_jangling':  48%|████▊     | 70/146 [00:06<00:05, 14.25it/s][A[A[A




Label - 'Laughing':   8%|▊         | 32/398 [00:06<01:21,  4.49it/s][A[A[A[A[A

Label - 'Finger_snapping':  28%|██▊       | 47/169 [00:06<00:17,  6.89it/s][A[A



Label - 'Knock':  12%|█▏        | 50/407 [00:06<00:48,  7.36it/s][A[A[A[A
Label - 'Coughing':  26%|██▌       | 75/286 [00:06<00:16, 12.65it/s][A


Label - 'Keys_jangling':  48%|

Label - 'Knock':  14%|█▎        | 55/407 [00:06<00:52,  6.75it/s][A[A[A[A


Label - 'Keys_jangling':  55%|█████▍    | 80/146 [00:06<00:05, 12.61it/s][A[A[A
Label - 'Coughing':  29%|██▉       | 83/286 [00:06<00:14, 13.65it/s][A
Label - 'Coughing':  30%|██▉       | 85/286 [00:07<00:13, 14.49it/s][A
Label - 'Coughing':  30%|██▉       | 85/286 [00:07<00:13, 14.49it/s][A


Label - 'Keys_jangling':  55%|█████▍    | 80/146 [00:07<00:05, 12.61it/s][A[A[A
Label - 'Coughing':  30%|██▉       | 85/286 [00:07<00:13, 14.49it/s][A


Label - 'Keys_jangling':  55%|█████▍    | 80/146 [00:07<00:05, 12.61it/s][A[A[A
Label - 'Coughing':  30%|██▉       | 85/286 [00:07<00:13, 14.49it/s][A


Label - 'Keys_jangling':  55%|█████▍    | 80/146 [00:07<00:05, 12.61it/s][A[A[A
Label - 'Coughing':  30%|██▉       | 85/286 [00:07<00:13, 14.49it/s][A


Label - 'Keys_jangling':  55%|█████▍    | 80/146 [00:07<00:05, 12.61it/s][A[A[A


Label - 'Keys_jangling':  56%|█████▌    | 82/146 [00:07<00:04, 

Label - 'Knock':  15%|█▍        | 60/407 [00:07<00:46,  7.53it/s][A[A[A[A
Label - 'Coughing':  34%|███▎      | 96/286 [00:07<00:15, 12.59it/s][A




Label - 'Laughing':  12%|█▏        | 46/398 [00:07<00:39,  8.91it/s][A[A[A[A[A



Label - 'Computer_keyboard':  30%|██▉       | 60/202 [00:07<00:22,  6.23it/s]
Label - 'Coughing':  34%|███▎      | 96/286 [00:07<00:15, 12.59it/s][A
Label - 'Coughing':  34%|███▎      | 96/286 [00:07<00:15, 12.59it/s][A




Label - 'Laughing':  12%|█▏        | 46/398 [00:07<00:39,  8.91it/s][A[A[A[A[A


Label - 'Keys_jangling':  60%|██████    | 88/146 [00:07<00:05, 10.12it/s][A[A[A



Label - 'Knock':  15%|█▍        | 60/407 [00:07<00:46,  7.53it/s][A[A[A[A




Label - 'Laughing':  12%|█▏        | 46/398 [00:07<00:39,  8.91it/s][A[A[A[A[A


Label - 'Keys_jangling':  60%|██████    | 88/146 [00:07<00:05, 10.12it/s][A[A[A
Label - 'Computer_keyboard':  30%|██▉       | 60/202 [00:07<00:22,  6.23it/s]



Label - 'Knock':  15%|█▍     

Label - 'Laughing':  14%|█▎        | 54/398 [00:08<00:33, 10.38it/s][A[A[A[A[A
Label - 'Coughing':  36%|███▋      | 104/286 [00:08<00:15, 11.39it/s][A



Label - 'Knock':  17%|█▋        | 68/407 [00:08<00:35,  9.59it/s][A[A[A[A


Label - 'Keys_jangling':  66%|██████▋   | 97/146 [00:08<00:04, 10.16it/s][A[A[A

Label - 'Finger_snapping':  36%|███▌      | 61/169 [00:08<00:20,  5.22it/s][A[A
Label - 'Coughing':  36%|███▋      | 104/286 [00:08<00:15, 11.39it/s][A



Label - 'Knock':  17%|█▋        | 70/407 [00:08<00:32, 10.45it/s][A[A[A[A




Label - 'Laughing':  14%|█▎        | 54/398 [00:08<00:33, 10.38it/s][A[A[A[A[A
Label - 'Coughing':  37%|███▋      | 106/286 [00:08<00:14, 12.23it/s][A

Label - 'Finger_snapping':  36%|███▌      | 61/169 [00:08<00:20,  5.22it/s][A[A




Label - 'Laughing':  14%|█▎        | 54/398 [00:08<00:33, 10.38it/s][A[A[A[A[A


Label - 'Keys_jangling':  66%|██████▋   | 97/146 [00:08<00:04, 10.16it/s][A[A[A

Label - 'Finger_snappi

Label - 'Computer_keyboard':  35%|███▌      | 71/202 [00:09<00:19,  6.88it/s]

Label - 'Finger_snapping':  40%|███▉      | 67/169 [00:09<00:12,  8.14it/s][A[A



Label - 'Knock':  19%|█▉        | 78/407 [00:09<00:34,  9.48it/s][A[A[A[A




Label - 'Laughing':  16%|█▌        | 62/398 [00:09<00:31, 10.51it/s][A[A[A[A[A


Label - 'Keys_jangling':  72%|███████▏  | 105/146 [00:09<00:04,  9.15it/s][A[A[A
Label - 'Computer_keyboard':  35%|███▌      | 71/202 [00:09<00:19,  6.88it/s]

Label - 'Finger_snapping':  40%|████      | 68/169 [00:09<00:18,  5.41it/s][A[A




Label - 'Laughing':  16%|█▌        | 62/398 [00:09<00:31, 10.51it/s][A[A[A[A[A
Label - 'Coughing':  40%|███▉      | 114/286 [00:09<00:18,  9.43it/s][A



Label - 'Computer_keyboard':  36%|███▌      | 72/202 [00:09<00:20,  6.35it/s]




Label - 'Laughing':  16%|█▌        | 64/398 [00:09<00:35,  9.47it/s][A[A[A[A[A



Label - 'Knock':  19%|█▉        | 78/407 [00:09<00:34,  9.48it/s][A[A[A[A

Label - 'Fi

Label - 'Keys_jangling':  77%|███████▋  | 113/146 [00:10<00:02, 11.95it/s][A[A[A


Label - 'Keys_jangling':  77%|███████▋  | 113/146 [00:10<00:02, 11.95it/s][A[A[A


Label - 'Keys_jangling':  79%|███████▉  | 115/146 [00:10<00:02, 12.91it/s][A[A[A




Label - 'Laughing':  17%|█▋        | 69/398 [00:10<00:42,  7.70it/s][A[A[A[A[A



Label - 'Knock':  20%|█▉        | 81/407 [00:10<00:48,  6.75it/s][A[A[A[A




Label - 'Laughing':  17%|█▋        | 69/398 [00:10<00:42,  7.70it/s][A[A[A[A[A



Label - 'Knock':  20%|█▉        | 81/407 [00:10<00:48,  6.75it/s][A[A[A[A



Label - 'Knock':  20%|██        | 82/407 [00:10<01:11,  4.56it/s][A[A[A[A
Label - 'Coughing':  43%|████▎     | 124/286 [00:10<00:12, 13.06it/s][A


Label - 'Keys_jangling':  79%|███████▉  | 115/146 [00:10<00:02, 12.91it/s][A[A[A
Label - 'Coughing':  43%|████▎     | 124/286 [00:10<00:12, 13.06it/s][A




Label - 'Laughing':  17%|█▋        | 69/398 [00:10<00:42,  7.70it/s][A[A[A[A[A


Lab

Label - 'Knock':  22%|██▏       | 88/407 [00:11<00:43,  7.31it/s][A[A[A[A


Label - 'Keys_jangling':  84%|████████▍ | 123/146 [00:11<00:02, 10.58it/s][A[A[A




Label - 'Laughing':  19%|█▉        | 76/398 [00:11<00:34,  9.40it/s][A[A[A[A[A

Label - 'Finger_snapping':  49%|████▊     | 82/169 [00:11<00:09,  8.81it/s][A[A
Label - 'Coughing':  46%|████▌     | 132/286 [00:11<00:12, 12.13it/s][A



Label - 'Knock':  22%|██▏       | 88/407 [00:11<00:43,  7.31it/s][A[A[A[A




Label - 'Laughing':  19%|█▉        | 76/398 [00:11<00:34,  9.40it/s][A[A[A[A[A


Label - 'Keys_jangling':  84%|████████▍ | 123/146 [00:11<00:02, 10.58it/s][A[A[A
Label - 'Coughing':  46%|████▌     | 132/286 [00:11<00:12, 12.13it/s][A



Label - 'Knock':  22%|██▏       | 89/407 [00:11<00:40,  7.92it/s][A[A[A[A




Label - 'Laughing':  20%|█▉        | 78/398 [00:11<00:31, 10.15it/s][A[A[A[A[A

Label - 'Finger_snapping':  49%|████▊     | 82/169 [00:11<00:09,  8.81it/s][A[A


Label - 'K

Label - 'Coughing':  50%|████▉     | 142/286 [00:11<00:12, 11.79it/s][A



Label - 'Knock':  23%|██▎       | 95/407 [00:11<00:35,  8.88it/s][A[A[A[A

Label - 'Computer_keyboard':  39%|███▉      | 79/202 [00:11<00:49,  2.49it/s]A[A


Label - 'Keys_jangling':  90%|████████▉ | 131/146 [00:11<00:01, 10.81it/s][A[A[A




Label - 'Laughing':  21%|██        | 82/398 [00:11<00:41,  7.69it/s][A[A[A[A[A
Label - 'Computer_keyboard':  39%|███▉      | 79/202 [00:11<00:49,  2.49it/s]



Label - 'Knock':  23%|██▎       | 95/407 [00:11<00:35,  8.88it/s][A[A[A[A




Label - 'Computer_keyboard':  40%|████      | 81/202 [00:11<00:37,  3.26it/s][A[A


Label - 'Keys_jangling':  90%|████████▉ | 131/146 [00:11<00:01, 10.81it/s][A[A[A



Label - 'Knock':  23%|██▎       | 95/407 [00:11<00:35,  8.88it/s][A[A[A[A




Label - 'Laughing':  21%|██        | 84/398 [00:11<00:36,  8.64it/s][A[A[A[A[A


Label - 'Keys_jangling':  90%|████████▉ | 131/146 [00:11<00:01, 10.81it/s][A[A[A
L

Label - 'Coughing':  53%|█████▎    | 152/286 [00:12<00:10, 12.53it/s][A


Label - 'Computer_keyboard':  44%|████▍     | 89/202 [00:12<00:16,  6.92it/s][A[A




Label - 'Laughing':  23%|██▎       | 92/398 [00:12<00:27, 11.10it/s][A[A[A[A[A

Label - 'Finger_snapping':  56%|█████▌    | 94/169 [00:12<00:09,  8.24it/s][A[A


Label - 'Keys_jangling':  97%|█████████▋| 141/146 [00:12<00:00, 11.62it/s][A[A[A




Label - 'Laughing':  23%|██▎       | 92/398 [00:12<00:27, 11.10it/s][A[A[A[A[A
Label - 'Coughing':  53%|█████▎    | 152/286 [00:12<00:10, 12.53it/s][A
Label - 'Coughing':  53%|█████▎    | 152/286 [00:12<00:10, 12.53it/s][A


Label - 'Keys_jangling':  97%|█████████▋| 141/146 [00:12<00:00, 11.62it/s][A[A[A




Label - 'Laughing':  23%|██▎       | 92/398 [00:12<00:27, 11.10it/s][A[A[A[A[A
Label - 'Coughing':  53%|█████▎    | 152/286 [00:12<00:10, 12.53it/s][A



Label - 'Knock':  25%|██▌       | 103/407 [00:12<00:25, 11.81it/s][A[A[A[A


Label - 'Keys_jangl

Label - 'Coughing':  57%|█████▋    | 162/286 [00:13<00:10, 11.68it/s][A

Label - 'Finger_snapping':  59%|█████▉    | 100/169 [00:13<00:08,  7.99it/s][A[A

Label - 'Finger_snapping':  59%|█████▉    | 100/169 [00:13<00:08,  7.99it/s][A[A




Label - 'Laughing':  25%|██▌       | 100/398 [00:13<00:30,  9.70it/s][A[A[A[A[A

Label - 'Finger_snapping':  60%|██████    | 102/169 [00:13<00:10,  6.54it/s][A[A




Label - 'Laughing':  25%|██▌       | 100/398 [00:13<00:30,  9.70it/s][A[A[A[A[A
Label - 'Coughing':  57%|█████▋    | 162/286 [00:13<00:10, 11.68it/s][A




Label - 'Laughing':  26%|██▌       | 102/398 [00:13<00:29,  9.88it/s][A[A[A[A[A
Label - 'Coughing':  57%|█████▋    | 162/286 [00:13<00:10, 11.68it/s][A
Label - 'Coughing':  57%|█████▋    | 164/286 [00:13<00:10, 11.64it/s][A

Label - 'Finger_snapping':  60%|██████    | 102/169 [00:13<00:10,  6.54it/s][A[A

Label - 'Finger_snapping':  60%|██████    | 102/169 [00:13<00:10,  6.54it/s][A[A




Label - 'Laughing

Label - 'Computer_keyboard':  45%|████▌     | 91/202 [00:14<00:22,  4.98it/s]A[A[A



Label - 'Knock':  29%|██▊       | 117/407 [00:14<00:34,  8.39it/s][A[A[A[A
Label - 'Coughing':  61%|██████    | 174/286 [00:14<00:08, 13.52it/s][A




Label - 'Computer_keyboard':  46%|████▌     | 93/202 [00:14<00:34,  3.14it/s]A[A[A



Label - 'Knock':  29%|██▉       | 119/407 [00:14<00:30,  9.57it/s][A[A[A[A
Label - 'Coughing':  61%|██████    | 174/286 [00:14<00:08, 13.52it/s][A
Label - 'Coughing':  62%|██████▏   | 176/286 [00:14<00:07, 13.96it/s][A




Label - 'Laughing':  28%|██▊       | 110/398 [00:14<00:27, 10.30it/s][A[A[A[A[A



Label - 'Knock':  29%|██▉       | 119/407 [00:14<00:30,  9.57it/s][A[A[A[A




Label - 'Laughing':  28%|██▊       | 110/398 [00:14<00:27, 10.30it/s][A[A[A[A[A
Label - 'Coughing':  62%|██████▏   | 176/286 [00:14<00:07, 13.96it/s][A



Label - 'Knock':  29%|██▉       | 119/407 [00:14<00:30,  9.57it/s][A[A[A[A




Label - 'Laughing':  28%

Label - 'Finger_snapping':  69%|██████▊   | 116/169 [00:15<00:08,  5.92it/s][A[A




Label - 'Laughing':  31%|███       | 122/398 [00:15<00:20, 13.34it/s][A[A[A[A[A



Label - 'Knock':  31%|███       | 127/407 [00:15<00:31,  9.00it/s][A[A[A[A
Label - 'Coughing':  66%|██████▋   | 190/286 [00:15<00:06, 14.78it/s][A

Label - 'Finger_snapping':  69%|██████▊   | 116/169 [00:15<00:08,  5.92it/s][A[A




Label - 'Laughing':  31%|███       | 122/398 [00:15<00:20, 13.34it/s][A[A[A[A[A

Label - 'Finger_snapping':  70%|██████▉   | 118/169 [00:15<00:07,  7.21it/s][A[A




Label - 'Laughing':  31%|███       | 124/398 [00:15<00:22, 12.20it/s][A[A[A[A[A



Label - 'Knock':  31%|███       | 127/407 [00:15<00:31,  9.00it/s][A[A[A[A



Label - 'Knock':  31%|███       | 127/407 [00:15<00:31,  9.00it/s][A[A[A[A
Label - 'Coughing':  66%|██████▋   | 190/286 [00:15<00:06, 14.78it/s][A

Label - 'Finger_snapping':  70%|██████▉   | 118/169 [00:15<00:07,  7.21it/s][A[A
Label 

Label - 'Coughing':  69%|██████▉   | 198/286 [00:16<00:06, 13.12it/s][A




Label - 'Laughing':  33%|███▎      | 132/398 [00:16<00:22, 11.99it/s][A[A[A[A[A
Label - 'Coughing':  70%|██████▉   | 200/286 [00:16<00:06, 13.41it/s][A



Label - 'Knock':  33%|███▎      | 135/407 [00:16<00:24, 11.23it/s][A[A[A[A



Label - 'Knock':  33%|███▎      | 135/407 [00:16<00:24, 11.23it/s][A[A[A[A



Label - 'Knock':  34%|███▎      | 137/407 [00:16<00:22, 12.02it/s][A[A[A[A




Label - 'Laughing':  33%|███▎      | 132/398 [00:16<00:22, 11.99it/s][A[A[A[A[A




Label - 'Laughing':  33%|███▎      | 132/398 [00:16<00:22, 11.99it/s][A[A[A[A[A




Label - 'Laughing':  34%|███▎      | 134/398 [00:16<00:20, 12.78it/s][A[A[A[A[A



Label - 'Knock':  34%|███▎      | 137/407 [00:16<00:22, 12.02it/s][A[A[A[A
Label - 'Coughing':  70%|██████▉   | 200/286 [00:16<00:06, 13.41it/s][A



Label - 'Knock':  34%|███▎      | 137/407 [00:16<00:22, 12.02it/s][A[A[A[A
Label - 'Coughi

Label - 'Finger_snapping':  78%|███████▊  | 132/169 [00:17<00:05,  6.40it/s][A[A




Label - 'Laughing':  37%|███▋      | 146/398 [00:17<00:19, 13.01it/s][A[A[A[A[A




Label - 'Laughing':  37%|███▋      | 146/398 [00:17<00:19, 13.01it/s][A[A[A[A[A
Label - 'Coughing':  74%|███████▍  | 212/286 [00:17<00:05, 13.20it/s][A
Label - 'Coughing':  74%|███████▍  | 212/286 [00:17<00:05, 13.20it/s][A

Label - 'Finger_snapping':  78%|███████▊  | 132/169 [00:17<00:05,  6.40it/s][A[A

Label - 'Finger_snapping':  78%|███████▊  | 132/169 [00:17<00:05,  6.40it/s][A[A




Label - 'Laughing':  37%|███▋      | 146/398 [00:17<00:19, 13.01it/s][A[A[A[A[A




Label - 'Laughing':  37%|███▋      | 146/398 [00:17<00:19, 13.01it/s][A[A[A[A[A
Label - 'Coughing':  74%|███████▍  | 212/286 [00:17<00:05, 13.20it/s][A




Label - 'Laughing':  37%|███▋      | 148/398 [00:17<00:18, 13.37it/s][A[A[A[A[A
Label - 'Coughing':  74%|███████▍  | 212/286 [00:17<00:05, 13.20it/s][A

Label - 'Fi

Label - 'Laughing':  40%|███▉      | 158/398 [00:17<00:19, 12.56it/s][A[A[A[A[A

Label - 'Finger_snapping':  82%|████████▏ | 138/169 [00:17<00:04,  6.86it/s][A[A

Label - 'Finger_snapping':  83%|████████▎ | 140/169 [00:17<00:03,  8.33it/s][A[A
Label - 'Computer_keyboard':  50%|█████     | 101/202 [00:18<01:13,  1.37it/s]
Label - 'Computer_keyboard':  50%|█████     | 101/202 [00:18<01:13,  1.37it/s]

Label - 'Finger_snapping':  83%|████████▎ | 140/169 [00:18<00:03,  8.33it/s][A[A




Label - 'Laughing':  40%|███▉      | 158/398 [00:17<00:19, 12.56it/s][A[A[A[A[A

Label - 'Finger_snapping':  83%|████████▎ | 140/169 [00:18<00:03,  8.33it/s][A[A




Label - 'Laughing':  40%|███▉      | 158/398 [00:18<00:19, 12.56it/s][A[A[A[A[A
Label - 'Coughing':  79%|███████▉  | 227/286 [00:18<00:03, 16.71it/s][A
Label - 'Coughing':  79%|███████▉  | 227/286 [00:18<00:03, 16.71it/s][A

Label - 'Finger_snapping':  83%|████████▎ | 140/169 [00:18<00:03,  8.33it/s][A[A
Label - 'Comp

Label - 'Coughing':  84%|████████▍ | 241/286 [00:18<00:02, 15.81it/s][A



Label - 'Knock':  39%|███▉      | 159/407 [00:18<00:32,  7.55it/s][A[A[A[A



Label - 'Knock':  39%|███▉      | 159/407 [00:18<00:32,  7.55it/s][A[A[A[A




Label - 'Laughing':  41%|████      | 164/398 [00:18<00:23,  9.96it/s][A[A[A[A[A
Label - 'Coughing':  84%|████████▍ | 241/286 [00:18<00:02, 15.81it/s][A




Label - 'Laughing':  41%|████      | 164/398 [00:18<00:23,  9.96it/s][A[A[A[A[A
Label - 'Coughing':  84%|████████▍ | 241/286 [00:18<00:02, 15.81it/s][A




Label - 'Laughing':  42%|████▏     | 166/398 [00:18<00:21, 10.76it/s][A[A[A[A[A
Label - 'Coughing':  85%|████████▍ | 243/286 [00:18<00:02, 16.06it/s][A
Label - 'Coughing':  85%|████████▍ | 243/286 [00:18<00:02, 16.06it/s][A
Label - 'Coughing':  85%|████████▍ | 243/286 [00:19<00:02, 16.06it/s][A
Label - 'Coughing':  85%|████████▍ | 243/286 [00:19<00:02, 16.06it/s][A
Label - 'Coughing':  85%|████████▍ | 243/286 [00:19<00:02,

Label - 'Coughing':  89%|████████▉ | 255/286 [00:19<00:02, 15.28it/s][A
Label - 'Coughing':  89%|████████▉ | 255/286 [00:19<00:02, 15.28it/s][A



Label - 'Knock':  41%|████▏     | 168/407 [00:19<00:25,  9.31it/s][A[A[A[A
Label - 'Coughing':  89%|████████▉ | 255/286 [00:19<00:02, 15.28it/s][A



Label - 'Knock':  41%|████▏     | 168/407 [00:19<00:25,  9.31it/s][A[A[A[A
Label - 'Coughing':  89%|████████▉ | 255/286 [00:19<00:02, 15.28it/s][A




Label - 'Laughing':  43%|████▎     | 172/398 [00:19<00:23,  9.48it/s][A[A[A[A[A
Label - 'Coughing':  90%|████████▉ | 257/286 [00:19<00:01, 15.85it/s][A




Label - 'Laughing':  43%|████▎     | 172/398 [00:19<00:23,  9.48it/s][A[A[A[A[A



Label - 'Knock':  41%|████▏     | 168/407 [00:19<00:25,  9.31it/s][A[A[A[A




Label - 'Laughing':  44%|████▎     | 174/398 [00:19<00:21, 10.41it/s][A[A[A[A[A



Label - 'Knock':  41%|████▏     | 168/407 [00:19<00:25,  9.31it/s][A[A[A[A
Label - 'Coughing':  90%|████████▉ | 257

Label - 'Coughing':  94%|█████████▍| 270/286 [00:20<00:01, 15.05it/s][A



Label - 'Knock':  43%|████▎     | 176/407 [00:20<00:25,  8.94it/s][A[A[A[A




Label - 'Laughing':  46%|████▋     | 185/398 [00:20<00:14, 14.44it/s][A[A[A[A[A



Label - 'Knock':  43%|████▎     | 176/407 [00:20<00:25,  8.94it/s][A[A[A[A

Label - 'Finger_snapping':  96%|█████████▋| 163/169 [00:20<00:00,  7.37it/s][A[A




Label - 'Laughing':  46%|████▋     | 185/398 [00:20<00:14, 14.44it/s][A[A[A[A[A



Label - 'Knock':  44%|████▎     | 178/407 [00:20<00:21, 10.65it/s][A[A[A[A

Label - 'Finger_snapping':  96%|█████████▋| 163/169 [00:20<00:00,  7.37it/s][A[A
Label - 'Coughing':  94%|█████████▍| 270/286 [00:20<00:01, 15.05it/s][A




Label - 'Laughing':  47%|████▋     | 188/398 [00:20<00:13, 15.42it/s][A[A[A[A[A

Label - 'Finger_snapping':  98%|█████████▊| 165/169 [00:20<00:00,  5.88it/s][A[A
Label - 'Coughing':  94%|█████████▍| 270/286 [00:20<00:01, 15.05it/s][A



Label - 'Knoc

Label - 'Knock':  46%|████▋     | 189/407 [00:22<00:30,  7.13it/s][A[A[A[A



Label - 'Knock':  46%|████▋     | 189/407 [00:22<00:30,  7.13it/s][A[A[A[A



Label - 'Knock':  47%|████▋     | 191/407 [00:22<00:24,  8.76it/s][A[A[A[A



Label - 'Computer_keyboard':  55%|█████▍    | 111/202 [00:22<00:54,  1.68it/s]



Label - 'Computer_keyboard':  56%|█████▌    | 113/202 [00:22<00:47,  1.86it/s]



Label - 'Knock':  47%|████▋     | 191/407 [00:22<00:24,  8.76it/s][A[A[A[A



Label - 'Computer_keyboard':  56%|█████▌    | 113/202 [00:22<00:47,  1.86it/s]



Label - 'Computer_keyboard':  56%|█████▌    | 113/202 [00:22<00:47,  1.86it/s]



Label - 'Knock':  47%|████▋     | 193/407 [00:22<00:21,  9.99it/s][A[A[A[A



Label - 'Knock':  47%|████▋     | 193/407 [00:22<00:21,  9.99it/s][A[A[A[A



Label - 'Knock':  47%|████▋     | 193/407 [00:22<00:21,  9.99it/s][A[A[A[A



Label - 'Knock':  47%|████▋     | 193/407 [00:22<00:21,  9.99it/s][A[A[A[A



Label - 'Knock':

Label - 'Computer_keyboard':  59%|█████▉    | 120/202 [00:25<00:42,  1.94it/s]



Label - 'Knock':  57%|█████▋    | 231/407 [00:24<00:08, 20.56it/s][A[A[A[A



Label - 'Computer_keyboard':  59%|█████▉    | 120/202 [00:25<00:42,  1.94it/s]



Label - 'Knock':  57%|█████▋    | 231/407 [00:25<00:08, 20.56it/s][A[A[A[A



Label - 'Computer_keyboard':  59%|█████▉    | 120/202 [00:25<00:42,  1.94it/s]



Label - 'Computer_keyboard':  61%|██████    | 123/202 [00:25<00:29,  2.68it/s]



Label - 'Knock':  57%|█████▋    | 234/407 [00:25<00:10, 16.65it/s][A[A[A[A



Label - 'Computer_keyboard':  61%|██████    | 123/202 [00:25<00:29,  2.68it/s]



Label - 'Knock':  57%|█████▋    | 234/407 [00:25<00:10, 16.65it/s][A[A[A[A



Label - 'Knock':  57%|█████▋    | 234/407 [00:25<00:10, 16.65it/s][A[A[A[A



Label - 'Knock':  58%|█████▊    | 236/407 [00:25<00:10, 16.62it/s][A[A[A[A



Label - 'Knock':  58%|█████▊    | 236/407 [00:25<00:10, 16.62it/s][A[A[A[A



Label - 'Knock':

Label - 'Laughing':  51%|█████     | 202/398 [00:26<01:17,  2.54it/s][A[A[A[A[A



Label - 'Knock':  64%|██████▍   | 262/407 [00:26<00:06, 20.81it/s][A[A[A[A




Label - 'Laughing':  51%|█████▏    | 204/398 [00:26<01:04,  3.02it/s][A[A[A[A[A



Label - 'Knock':  65%|██████▌   | 265/407 [00:26<00:06, 22.30it/s][A[A[A[A



Label - 'Knock':  65%|██████▌   | 265/407 [00:26<00:06, 22.30it/s][A[A[A[A



Label - 'Knock':  65%|██████▌   | 265/407 [00:26<00:06, 22.30it/s][A[A[A[A



Label - 'Knock':  65%|██████▌   | 265/407 [00:26<00:06, 22.30it/s][A[A[A[A



Label - 'Knock':  65%|██████▌   | 265/407 [00:26<00:06, 22.30it/s][A[A[A[A



Label - 'Knock':  65%|██████▌   | 265/407 [00:26<00:06, 22.30it/s][A[A[A[A



Label - 'Knock':  65%|██████▌   | 265/407 [00:26<00:06, 22.30it/s][A[A[A[A



Label - 'Knock':  66%|██████▌   | 268/407 [00:26<00:05, 24.10it/s][A[A[A[A



Label - 'Knock':  66%|██████▌   | 268/407 [00:26<00:05, 24.10it/s][A[A[A[A



Lab

Label - 'Knock':  71%|███████   | 287/407 [00:27<00:05, 20.89it/s][A[A[A[A



Label - 'Knock':  71%|███████   | 287/407 [00:27<00:05, 20.89it/s][A[A[A[A



Label - 'Knock':  71%|███████   | 287/407 [00:27<00:05, 20.89it/s][A[A[A[A



Label - 'Knock':  71%|███████   | 287/407 [00:27<00:05, 20.89it/s][A[A[A[A



Label - 'Computer_keyboard':  68%|██████▊   | 138/202 [00:27<00:17,  3.64it/s]



Label - 'Knock':  71%|███████   | 287/407 [00:27<00:05, 20.89it/s][A[A[A[A



Label - 'Computer_keyboard':  68%|██████▊   | 138/202 [00:27<00:17,  3.64it/s]



Label - 'Computer_keyboard':  68%|██████▊   | 138/202 [00:27<00:17,  3.64it/s]




Label - 'Laughing':  55%|█████▌    | 220/398 [00:27<00:13, 12.88it/s][A[A[A[A[A




Label - 'Laughing':  55%|█████▌    | 220/398 [00:27<00:13, 12.88it/s][A[A[A[A[A



Label - 'Knock':  71%|███████▏  | 291/407 [00:27<00:05, 23.03it/s][A[A[A[A



Label - 'Knock':  71%|███████▏  | 291/407 [00:27<00:05, 23.03it/s][A[A[A[A



La

Label - 'Laughing':  58%|█████▊    | 232/398 [00:29<00:11, 14.29it/s][A[A[A[A[A



Label - 'Knock':  78%|███████▊  | 316/407 [00:29<00:05, 17.24it/s][A[A[A[A




Label - 'Laughing':  58%|█████▊    | 232/398 [00:29<00:11, 14.29it/s][A[A[A[A[A



Label - 'Knock':  78%|███████▊  | 316/407 [00:29<00:05, 17.24it/s][A[A[A[A




Label - 'Laughing':  58%|█████▊    | 232/398 [00:29<00:11, 14.29it/s][A[A[A[A[A



Label - 'Knock':  78%|███████▊  | 318/407 [00:29<00:05, 17.67it/s][A[A[A[A




Label - 'Laughing':  59%|█████▉    | 235/398 [00:29<00:20,  7.96it/s][A[A[A[A[A



Label - 'Knock':  78%|███████▊  | 318/407 [00:29<00:05, 17.67it/s][A[A[A[A



Label - 'Knock':  78%|███████▊  | 318/407 [00:29<00:05, 17.67it/s][A[A[A[A




Label - 'Laughing':  59%|█████▉    | 235/398 [00:29<00:20,  7.96it/s][A[A[A[A[A




Label - 'Laughing':  59%|█████▉    | 235/398 [00:29<00:20,  7.96it/s][A[A[A[A[A



Label - 'Knock':  78%|███████▊  | 318/407 [00:29<00:05, 1

Label - 'Knock':  85%|████████▌ | 347/407 [00:31<00:04, 13.52it/s][A[A[A[A



Label - 'Knock':  85%|████████▌ | 347/407 [00:31<00:04, 13.52it/s][A[A[A[A



Label - 'Knock':  85%|████████▌ | 347/407 [00:31<00:04, 13.52it/s][A[A[A[A



Label - 'Knock':  85%|████████▌ | 347/407 [00:31<00:04, 13.52it/s][A[A[A[A



Label - 'Knock':  85%|████████▌ | 347/407 [00:31<00:04, 13.52it/s][A[A[A[A



Label - 'Knock':  85%|████████▌ | 347/407 [00:31<00:04, 13.52it/s][A[A[A[A



Label - 'Knock':  85%|████████▌ | 347/407 [00:31<00:04, 13.52it/s][A[A[A[A



Label - 'Knock':  86%|████████▌ | 351/407 [00:31<00:03, 14.33it/s][A[A[A[A



Label - 'Knock':  86%|████████▌ | 351/407 [00:31<00:03, 14.33it/s][A[A[A[A



Label - 'Knock':  86%|████████▌ | 351/407 [00:31<00:03, 14.33it/s][A[A[A[A



Label - 'Knock':  86%|████████▌ | 351/407 [00:31<00:03, 14.33it/s][A[A[A[A



Label - 'Knock':  86%|████████▌ | 351/407 [00:31<00:03, 14.33it/s][A[A[A[A



Label - 'Knock':

Label - 'Knock':  96%|█████████▌| 390/407 [00:33<00:01, 16.48it/s][A[A[A[A



Label - 'Knock':  96%|█████████▌| 390/407 [00:33<00:01, 16.48it/s][A[A[A[A



Label - 'Knock':  96%|█████████▌| 390/407 [00:33<00:01, 16.48it/s][A[A[A[A



Label - 'Knock':  96%|█████████▌| 390/407 [00:33<00:01, 16.48it/s][A[A[A[A



Label - 'Knock':  96%|█████████▌| 390/407 [00:33<00:01, 16.48it/s][A[A[A[A



Label - 'Knock':  96%|█████████▌| 390/407 [00:33<00:01, 16.48it/s][A[A[A[A



Label - 'Knock':  96%|█████████▌| 390/407 [00:33<00:01, 16.48it/s][A[A[A[A



Label - 'Knock':  96%|█████████▌| 390/407 [00:33<00:01, 16.48it/s][A[A[A[A



Label - 'Knock':  96%|█████████▌| 390/407 [00:33<00:01, 16.48it/s][A[A[A[A



Label - 'Knock':  97%|█████████▋| 394/407 [00:33<00:00, 19.49it/s][A[A[A[A



Label - 'Knock':  97%|█████████▋| 394/407 [00:33<00:00, 19.49it/s][A[A[A[A



Label - 'Knock':  97%|█████████▋| 394/407 [00:33<00:00, 19.49it/s][A[A[A[A



Label - 'Knock':

Label - 'Laughing':  67%|██████▋   | 267/398 [00:45<00:33,  3.95it/s][A[A[A[A[A




Label - 'Laughing':  67%|██████▋   | 267/398 [00:45<00:33,  3.95it/s][A[A[A[A[A




Label - 'Laughing':  67%|██████▋   | 267/398 [00:45<00:33,  3.95it/s][A[A[A[A[A




Label - 'Laughing':  67%|██████▋   | 267/398 [00:45<00:33,  3.95it/s][A[A[A[A[A




Label - 'Laughing':  67%|██████▋   | 267/398 [00:45<00:33,  3.95it/s][A[A[A[A[A




Label - 'Laughing':  67%|██████▋   | 267/398 [00:45<00:33,  3.95it/s][A[A[A[A[A




Label - 'Laughing':  67%|██████▋   | 267/398 [00:45<00:33,  3.95it/s][A[A[A[A[A




Label - 'Laughing':  67%|██████▋   | 267/398 [00:45<00:33,  3.95it/s][A[A[A[A[A




Label - 'Laughing':  67%|██████▋   | 267/398 [00:45<00:33,  3.95it/s][A[A[A[A[A




Label - 'Laughing':  68%|██████▊   | 271/398 [00:45<00:23,  5.35it/s][A[A[A[A[A




Label - 'Laughing':  68%|██████▊   | 271/398 [00:45<00:23,  5.35it/s][A[A[A[A[A




Label - 'Laughing':  

Label - 'Laughing':  77%|███████▋  | 306/398 [00:46<00:03, 24.75it/s][A[A[A[A[A




Label - 'Laughing':  77%|███████▋  | 306/398 [00:46<00:03, 24.75it/s][A[A[A[A[A




Label - 'Laughing':  77%|███████▋  | 306/398 [00:46<00:03, 24.75it/s][A[A[A[A[A




Label - 'Laughing':  77%|███████▋  | 306/398 [00:47<00:03, 24.75it/s][A[A[A[A[A




Label - 'Laughing':  78%|███████▊  | 309/398 [00:47<00:06, 14.65it/s][A[A[A[A[A




Label - 'Laughing':  78%|███████▊  | 309/398 [00:47<00:06, 14.65it/s][A[A[A[A[A




Label - 'Laughing':  78%|███████▊  | 309/398 [00:47<00:06, 14.65it/s][A[A[A[A[A




Label - 'Laughing':  78%|███████▊  | 309/398 [00:47<00:06, 14.65it/s][A[A[A[A[A




Label - 'Laughing':  78%|███████▊  | 309/398 [00:47<00:06, 14.65it/s][A[A[A[A[A




Label - 'Laughing':  78%|███████▊  | 309/398 [00:47<00:06, 14.65it/s][A[A[A[A[A




Label - 'Laughing':  78%|███████▊  | 309/398 [00:47<00:06, 14.65it/s][A[A[A[A[A




Label - 'Laughing':  

Label - 'Laughing':  87%|████████▋ | 347/398 [00:48<00:01, 27.16it/s][A[A[A[A[A




Label - 'Laughing':  87%|████████▋ | 347/398 [00:48<00:01, 27.16it/s][A[A[A[A[A




Label - 'Laughing':  87%|████████▋ | 347/398 [00:48<00:01, 27.16it/s][A[A[A[A[A




Label - 'Laughing':  87%|████████▋ | 347/398 [00:48<00:01, 27.16it/s][A[A[A[A[A




Label - 'Laughing':  87%|████████▋ | 347/398 [00:48<00:01, 27.16it/s][A[A[A[A[A




Label - 'Laughing':  87%|████████▋ | 347/398 [00:48<00:01, 27.16it/s][A[A[A[A[A




Label - 'Laughing':  87%|████████▋ | 347/398 [00:48<00:01, 27.16it/s][A[A[A[A[A




Label - 'Laughing':  87%|████████▋ | 347/398 [00:48<00:01, 27.16it/s][A[A[A[A[A




Label - 'Laughing':  88%|████████▊ | 351/398 [00:48<00:01, 28.55it/s][A[A[A[A[A




Label - 'Laughing':  88%|████████▊ | 351/398 [00:48<00:01, 28.55it/s][A[A[A[A[A




Label - 'Laughing':  88%|████████▊ | 351/398 [00:48<00:01, 28.55it/s][A[A[A[A[A




Label - 'Laughing':  

Label - 'Laughing':  97%|█████████▋| 386/398 [00:50<00:00, 28.92it/s][A[A[A[A[A




Label - 'Laughing':  97%|█████████▋| 386/398 [00:50<00:00, 28.92it/s][A[A[A[A[A




Label - 'Laughing':  97%|█████████▋| 386/398 [00:50<00:00, 28.92it/s][A[A[A[A[A




Label - 'Laughing':  97%|█████████▋| 386/398 [00:50<00:00, 28.92it/s][A[A[A[A[A




Label - 'Laughing':  97%|█████████▋| 386/398 [00:50<00:00, 28.92it/s][A[A[A[A[A




Label - 'Laughing':  98%|█████████▊| 390/398 [00:50<00:00, 29.71it/s][A[A[A[A[A




Label - 'Laughing':  98%|█████████▊| 390/398 [00:50<00:00, 29.71it/s][A[A[A[A[A




Label - 'Laughing':  98%|█████████▊| 390/398 [00:50<00:00, 29.71it/s][A[A[A[A[A




Label - 'Laughing':  98%|█████████▊| 390/398 [00:50<00:00, 29.71it/s][A[A[A[A[A




Label - 'Laughing':  98%|█████████▊| 390/398 [00:50<00:00, 29.71it/s][A[A[A[A[A




Label - 'Laughing':  98%|█████████▊| 390/398 [00:50<00:00, 29.71it/s][A[A[A[A[A




Label - 'Laughing':  

[(4255, 44100, 1),
 (2930, 44100, 1),
 (911, 44100, 1),
 (2204, 44100, 1),
 (3043, 44100, 1),
 (6057, 44100, 1)]

# Training the Model

### get_train_test(split_ratio: float)
**Inputs:** `split_ratio` <br>
**Outputs:** `X_train, X_test, y_train, y_test` <br>
**Description:** Uses a sklearn library to split the processed data into training data and test data. We use a .75 ratio of training to test data. X_train is the training data, which is a 3d array with dimnesions (number_of_clips, input_size, 1). Again, the 3rd dimension is a requirement from Keras. y_train is a 1d array containing label indices that correspond to each clip in X_train. X_test and y_test are the same, but for testing data instead of training data. The testing data is used to calculate accuracy metrics during training.
### get_model()
**Inputs:** None <br>
**Outputs:** `Tensorflow Model` <br> 
**Description:** This function is where the model itself is built. The input size must match the clip length (e.g. our input size variable), and the output must be a set of weights corresponding to each label (so an array of length labels). The highest weight will correspond to the final classification. The models we use are CNNs, which are built layer by layer. In general, a larger input size means that more layers can be used to learn on that data. Because the amplitude input is fairly large (44000 elements, compared to MFCC with just 11136), this model should be fairly large as well. Any model with the proper input and output sizes can be loaded. In our case, the input shape was (44100,1), as Keras required the extra dimension. 

### train(model, X_train, y_train_hot, X_test, y_test_hot)
**Inputs:** `model, X_train, y_train_hot, X_test, y_test_hot`<br>
**Outputs:** The given model is trained on the given data. <br>
**Description:** We use 'adam' optimization during training, as it usually results in much higher accuracies. We also save the model at its highest validation accuracies as it trains, so that we can retrieve the model from the epoch with the highest accuracy. These are saved as "weights-{val_acc}.hdf5", and can be converted to TF Lite models later.

In [None]:
def get_train_test(split_ratio: float):
    
    assert split_ratio < 1 and split_ratio > 0
    
    # Get available labels
    labels, indices, _ = get_labels(RAW_DATA_DIR)

    # Getting first arrays
    X = np.load(AMP_PROCESSED_DATA_DIR + labels[0] + '.npy')
    y = np.zeros(X.shape[0])

    # Append all of the dataset into one single array, same goes for y
    for i, label in tqdm(enumerate(labels[1:])):
        
        x = np.load(AMP_PROCESSED_DATA_DIR + label + '.npy')
        print(AMP_PROCESSED_DATA_DIR + label + '.npy')
        print (X.shape)
        print (x.shape)
        X = np.vstack((X, x))
        y = np.append(y, np.full(x.shape[0], fill_value= (i + 1)))

    assert X.shape[0] == len(y)

    # # Loading train set and test set
    return train_test_split(X, y, test_size= (1 - split_ratio), random_state=42, shuffle=True)

X_train, X_test, y_train, y_test = get_train_test(0.75)
        
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

print("Converting to categorical")
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

print(y_train.shape)
print(y_test.shape)

In [None]:
def get_model():
    m = Sequential()
    m.add(Conv1D(64,
                 input_shape=[AUDIO_LENGTH, 1],
                 kernel_size=80,
                 strides=4,
                 padding='same',
                 kernel_initializer='glorot_uniform',
                 kernel_regularizer=regularizers.l2(l=0.0001),
                 name='voice'))
    m.add(BatchNormalization())
    m.add(Activation('relu'))
    m.add(MaxPooling1D(pool_size=4, strides=None))

    for i in range(2):
        m.add(Conv1D(64,
                     kernel_size=3,
                     strides=1,
                     padding='same',
                     kernel_initializer='glorot_uniform',
                     kernel_regularizer=regularizers.l2(l=0.0001)))
        m.add(BatchNormalization())
        m.add(Activation('relu'))
    m.add(MaxPooling1D(pool_size=4, strides=None))

    for i in range(2):
        m.add(Conv1D(128,
                     kernel_size=3,
                     strides=1,
                     padding='same',
                     kernel_initializer='glorot_uniform',
                     kernel_regularizer=regularizers.l2(l=0.0001)))
        m.add(BatchNormalization())
        m.add(Activation('relu'))
    m.add(MaxPooling1D(pool_size=4, strides=None))

    for i in range(3):
        m.add(Conv1D(256,
                     kernel_size=3,
                     strides=1,
                     padding='same',
                     kernel_initializer='glorot_uniform',
                     kernel_regularizer=regularizers.l2(l=0.0001)))
        m.add(BatchNormalization())
        m.add(Activation('relu'))
    m.add(MaxPooling1D(pool_size=4, strides=None))

    for i in range(2):
        m.add(Conv1D(512,
                     kernel_size=3,
                     strides=1,
                     padding='same',
                     kernel_initializer='glorot_uniform',
                     kernel_regularizer=regularizers.l2(l=0.0001)))
        m.add(BatchNormalization())
        m.add(Activation('relu'))

    m.add(Lambda(lambda x: K.mean(x, axis=1))) # Same as GAP for 1D Conv Layer
    m.add(Dense(num_classes, activation='softmax', name='label'))
    return m

In [None]:
def train(model, X_train, y_train_hot, X_test, y_test_hot):
    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    print(model.summary())
    
    reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.5, patience=10, min_lr=0.0001, verbose=1)
    mcp_save = ModelCheckpoint('out/weights-{val_acc:.2f}.hdf5', save_best_only=True, monitor='val_acc', mode='max')
    model.fit(X_train, 
              y_train_hot, 
              batch_size=batch_size, 
              epochs=epochs, 
              verbose=verbose, 
              validation_data=(X_test, y_test_hot),
              callbacks=[reduce_lr, mcp_save],
              shuffle=True)

In [None]:
if not os.path.exists('out'):
    os.mkdir('out')
        
model = get_model()
train(model, X_train, y_train, X_test, y_test)
model.save("out/amplitude-l6.hdf5")

# Converting to TF Lite
**frozen_model_name:** The frozen model to convert. These are saved automatically in the `out/` directory during training.<br>
**tf_lite_model_name:** The desired name of your TF model. We follow the following format: `dataset_feature_'i'inputsize_'l'labelsize`. So, a model trained on 6 office sounds with 1 second of amplitude at a sample rate of 44100 Hz would be named `OfficeSounds_Amplitude_i44100_l6.tflite`.

In [None]:
frozen_model_name = 'out/weights-0.91.hdf5'
tf_lite_model_name = "amplitude-l6-acc91.tflite"

converter = tf.lite.TFLiteConverter.from_keras_model_file(frozen_model)
tfmodel = converter.convert()
open (tf_lite_name, "wb") .write(tfmodel)

# Predicting
A possibly unfinished, definitely never used function to actually predict labels on the computer side. 

In [None]:
def predict(filepath, model):
    """Predict the classification of a single audio sample"""
    audio_buf, _ = librosa.load(filepath, mono=True, sr=SAMPLE_RATE)
    audio_buf = audio_buf.reshape(-1, 1)
    audio_buf = (audio_buf - np.mean(audio_buf)) / np.std(audio_buf)

    remaining_buf = audio_buf.copy()
    wave_vectors = []
    while remaining_buf.shape[0] > AUDIO_LENGTH:
        # Add the first AUDIO_LENGTH of the buffer as a new vector to train on
        new_buf = remaining_buf[ : AUDIO_LENGTH ]
        wave_vectors.append(new_buf)

        # Shrink the buffer by AUDIO_LENGTH
        remaining_buf = remaining_buf[ AUDIO_LENGTH : ]

    # Whatever is left, pad and stick in the training data
    remaining_buf = np.concatenate((remaining_buf, np.zeros(shape=(AUDIO_LENGTH - len(remaining_buf), 1))))
    wave_vectors.append(remaining_buf)
    
    # wave_vectors now contains every segment of the provided audio input. 
    # Get the classification for the sample by predicting on each segment, 
    # and then taking a majority vote.
    
    def most_common(lst):
        return max(set(lst), key=lst.count)
    
    def most_confident(lst):
        """Return the label with the highest confidence"""
        return 
    
    labels, _, _ = get_labels(DATA_PATH)
    predicted_labels = []
    predicted_confs = []
    
#     for i, vec in enumerate(wave_vectors):
#         inp = np.array([vec])
#         label_vec = model.predict(inp)
#         best_label_idx = np.argmax(label_vec, axis=1)[0]
#         predicted_confs.append(np.asscalar(np.max(label_vec, axis=1)))
#         predicted_labels.append(best_label_idx)
# #         print("label for {}: {}".format(i, best_label_idx))

    label_vec = model.predict(wave_vectors)
    best_label_idx = np.argmax(label_vec, axis=1)[0]
    predicted_confs.append(np.asscalar(np.max(label_vec, axis=1)))
    predicted_labels.append(best_label_idx)
#         print("label for {}: {}".format(i, best_label_idx))
        
    print(predicted_labels)
    print(predicted_confs)
    
    # returns the label recognized with the highest confidence
    best_total_label_idx = predicted_labels[np.argmax(predicted_confs)]
    
    return best_total_label_idx

In [None]:
import glob
num_correct = 0
num_total = 0
# files = glob.glob("esc50/keyboard_typing/*.wav")
files = glob.glob("keyboard.wav")
for f in files:
    label_idx = predict(f, model)
    
    string_label = labels[label_idx]
    print("Predicted label: {}".format(string_label))
    
    if label_idx == 3:
        num_correct += 1
    num_total += 1
print("Number correct: {}".format(num_correct))
print("Total files: {}".format(num_total))