# DCASE 2016 DNN Baseline
In this notebook, we implement the **Detection and Classification of Acoustic Scenes and Events challenge** 2016.

**Suppress warnings**: We need to supress warnings as we are going to use some sunctionality of an older version of *scikit-learn*.

In [41]:
import warnings
warnings.simplefilter("ignore")

## keras_aud library
Clone [keras_aud](https://github.com/channelCS/keras_aud) and place the **path** in *ka_path* variable so that we can import modules.

In [42]:
import sys
ka_path="your/path/here"
sys.path.insert(0, ka_path)
from keras_aud import aud_audio, aud_feature
from keras_aud import aud_model

**Make imports**: We now import libraries which shall be required in this task. We use
1. `csv` for reading `.csv` files.
2. `cPickle` for reading `.f` pickle files.
3. `scipy` for calculating `mode`
4. `time` for calciulating *time to load* pickle files.
5. `KFold` for kfold cross validation.
6. `to_categorical` for reshaping *labels* into `num_classes`.
7. `load_model` for loading a saved model.

In [31]:
import csv
import cPickle
import numpy as np
import scipy
import time
from sklearn.metrics import accuracy_score
from sklearn.cross_validation import KFold
from keras.utils import to_categorical
from keras.models import load_model


In [44]:
%%html
<style>
table {float:left}
</style>

**Define paths**: We now tell the paths for **audio**, **features** and **texts**.

| Variable        | Description                     |
| :-------------  |:-------------                   |
| `wav_dev_fd`    | Development audio folder        |
| `wav_eva_fd`    | Evaluation audio folder         |
| `dev_fd`        | Development features folder     |
| `eva_fd`        | Evaluation features folder      |
| `label_csv`     | Development meta file           |
| `txt_eva_path`  | Evaluation test file            |
| `new_p`         | Evaluation evaluate file        |


In [4]:
wav_dev_fd   = 'audio/dev'
wav_eva_fd   = 'audio/eva'
dev_fd       = 'features/Fe/logmel'
eva_fd       = 'features/Fe_eva/logmel'
label_csv    = 'texts/development/meta.txt'
txt_eva_path = 'texts/evaluation/test.txt'
new_p        = 'texts/evaluation/evaluate.txt'

**Define Labels**: We give the names of all the labels in the dataset

In [5]:
labels = [ 'bus', 'cafe/restaurant', 'car', 'city_center', 'forest_path', 'grocery_store', 'home', 'beach', 
            'library', 'metro_station', 'office', 'residential_area', 'train', 'tram', 'park' ]
lb_to_id = { lb:id for id, lb in enumerate(labels) }
id_to_lb = { id:lb for id, lb in enumerate(labels) }

## Extract features
This is where feature extraction takes place. We pass the
1. Feature name such as mel, logmel, mfcc.
2. Folder containing audios
3. Folder where features will be extracted
4. A yaml file containing parameters for features.

In [7]:
aud_audio.extract('logmel', wav_dev_fd, dev_fd,'example.yaml')
aud_audio.extract('logmel', wav_eva_fd, eva_fd,'example.yaml')


extraction complete!
Feature found
extraction complete!
Feature found


## Model Parameters
We define all model parameters here.

| Variable           | Description              | type       | Accepted values             |
| :-------------     | :-------------           | :--------- | :---------                  |
| `prep`             | mode to use              | `str`      | dev, eval                   |
| `save_model`       | Whether to save model    | `bool`     |                             |
| `model_type`       | Type of model            | `str`      | Dynamic, Functional, Static |
| `model`            | Name of model            | `str`      | DNN, CNN, CRNN, RNN, FCRNN  |
| `modelx`           | Name of model for saving | `str`      | Should end with `.h5`       |
| `feature`          | Name of feature          | `str`      | mel, logmel, cqt, mfcc, zcr |
|<td colspan=2 style="text-align:center">**Works only for Functional**</td>|||
| `dropout1`         | 1st Dropout              | `float`    |                             |
| `act1`             | 1st Activation           | `str`      |                             |
| `act2`             | 2nd Activation           | `str`      |                             |
| `act3`             | 3rd Activation           | `str`      |                             |
| `act4`             | 4th Activation           | `str`      | Only in case of DNN         |
|<td colspan=2 style="text-align:center">**Works for all Models**</td>|||
| `input_neurons`    | Number of Neurons        | `int`      |                             |
| `epochs`           | Number of Epochs         | `int`      |                             |
| `batchsize`        | Batch Size               | `int`      |                             |
| `num_classes`      | Number of classes        | `int`      |                             |
| `filter_length`    | Size of Filter           | `int`      |                             |
| `nb_filter`        | Number of Filters        | `int`      |                             |
|<td colspan=2 style="text-align:center">**Feature Parameters**</td>|||
| `agg_num`          | Number of frames         | `int`      |                             |
| `hop`              | Hop Length               | `int`      |                             |
| `custom_check_ftr` | check for dimensions     | `bool`     | True: know dimension        |


In [19]:
prep='eval'               # Which mode to use(String) Can be dev or eval.
save_model=False          # True when you want to save the model with weights.
#Parameters that are passed to the model.
model_type='Functional'   # Type of model Can be Dynamic or Functional or Static
model='CNN'               # Name of model(String) Can be DNN or CNN
feature="logmel"          # Name of feature(String) Can be mel logmel cqt mfcc zcr 
#Works only for Functional
dropout1=0.1             # 1st Dropout(Float) 
act1='relu'              # 1st Activation(String) 
act2='relu'              # 2nd Activation(String) 
act3='softmax'           # 3rd Activation(String) 
#Works for all Models
input_neurons=400      # Number of Neurons(Integer) 
epochs=10              # Number of Epochs(Integer)
batchsize=128          # Batch Size(Integer)
num_classes=15         # Number of classes(Integer)
filter_length=3        # Size of Filter(Integer)
nb_filter=100          # Number of Filters(Integer)
#Feature Parameters: that are passed to the features.
agg_num=10             # Agg Number(Integer) Number of frames
hop=10                 # Hop Length(Integer)
custom_check_ftr=False # True when you know the feature dimension else False.

In [20]:
paul=aud_model.Feature(feature=feature)

In [21]:
def GetAllData(fe_fd, csv_file, agg_num, hop):
    """
    Input: Features folder(String), CSV file(String), agg_num(Integer), hop(Integer).
    Output: Loaded features(Numpy Array) and labels(Numpy Array).
    Loads all the features saved as pickle files.
    """
    # read csv
    with open( csv_file, 'rb') as f:
        reader = csv.reader(f)
        lis = list(reader)
    
    # init list
    X3d_all = []
    y_all = []
    i=0
    for li in lis:
        # load data
        [na, lb] = li[0].split('\t')
        na = na.split('/')[1][0:-4]
        path = fe_fd + '/' + na + '.f'
        try:
            X = cPickle.load( open( path, 'rb' ) )
        except Exception as e:
            print 'Error while parsing',path
            continue
        # reshape data to (n_block, n_time, n_freq)
        i+=1
        X3d = aud_model.mat_2d_to_3d( X, agg_num, hop )
        X3d_all.append( X3d )
        y_all += [ lb_to_id[lb] ] * len( X3d )
    
    print "Features loaded",i                
    print 'All files loaded successfully'
    # concatenate list to array
    X3d_all = np.concatenate( X3d_all )
    y_all = np.array( y_all )
    
    return X3d_all, y_all

In [22]:
def test(md,csv_file,new_p,model):
    # load name of wavs to be classified
    with open( csv_file, 'rb') as f:
        reader = csv.reader(f)
        lis = list(reader)
    
    # do classification for each file
    names = []
    pred_lbs = []
    
    for li in lis:
        names.append( li[0] )
        na = li[0][6:-4]
        #audio evaluation name
        fe_path = eva_fd + '/' + na + '.f'
        X0 = cPickle.load( open( fe_path, 'rb' ) )
        X0 = aud_model.mat_2d_to_3d( X0, agg_num, hop )
        
        X0 = aud_model.mat_3d_to_nd(model,X0)
    
        # predict
        p_y_preds = md.predict(X0)        # probability, size: (n_block,label)
        preds = np.argmax( p_y_preds, axis=-1 )     # size: (n_block)
        b = scipy.stats.mode(preds)
        pred = int( b[0] )
        pred_lbs.append( id_to_lb[ pred ] )
    
    pred = []    
    # write out result
    for i1 in xrange( len( names ) ):
        fname = names[i1] + '\t' + pred_lbs[i1] + '\n' 
        pred.append(fname)
        
    print 'write out finished!'
    truth = open(new_p,'r').readlines()
    pred = [i.split('\t')[1].split('\n')[0]for i in pred]
    truth = [i.split('\t')[1]for i in truth]
    pred.sort()
    truth.sort()
    return truth,pred


In [23]:
tr_X, tr_y = GetAllData( dev_fd, label_csv, agg_num, hop )


Features loaded 1170
All files loaded successfully


In [24]:
print(tr_X.shape)
print(tr_y.shape)

(150930L, 10L, 40L)
(150930L,)


In [25]:
if custom_check_ftr:
    reqd_dim = 40
    paul.check_dimension(reqd_dim,tr_X.shape[-1])

In [26]:
tr_X=aud_model.mat_3d_to_nd(model,tr_X)
print(tr_X.shape)
dimx=tr_X.shape[-2]
dimy=tr_X.shape[-1]

(150930L, 1L, 10L, 40L)


In [27]:
if prep=='dev':
    cross_validation=True
else:
    cross_validation=False

In [28]:
if model_type=='Static':
    miz=aud_model.Static_Model(input_neurons=input_neurons,cross_validation=cross_validation,
        nb_filter = nb_filter, filter_length=filter_length,
        epochs=epochs,batchsize=batchsize,num_classes=num_classes,
        model=model,agg_num=agg_num,hop=hop,dimx=dimx,dimy=dimy)

elif model_type=='Functional':
    miz=aud_model.Functional_Model(input_neurons=input_neurons,cross_validation=cross_validation,dropout1=dropout1,
        act1=act1,act2=act2,act3=act3,nb_filter = nb_filter, filter_length=filter_length,
        epochs=epochs,batchsize=batchsize,num_classes=num_classes,
        model=model,agg_num=agg_num,hop=hop,dimx=dimx,dimy=dimy)

elif model_type=='Dynamic':
    layers=4
    acts=['relu','relu','relu','relu','relu']
    drops=[0.1,0.1,0.1,0.1]
    pools=[2,2,2]
    bn=True
    miz=aud_model.Dynamic_Model(input_neurons=input_neurons,cross_validation=cross_validation,
        nb_filter = nb_filter, filter_length=filter_length,
        epochs=epochs,batchsize=batchsize,num_classes=num_classes,
        model=model,agg_num=agg_num,hop=hop,dimx=dimx,dimy=dimy,
        layers=layers,acts=acts,drops=drops,pools=pools,bn=bn)


In [40]:
np.random.seed(1155)
if cross_validation:
    kf = KFold(len(tr_X),folds,shuffle=True,random_state=42)
    results=[]    
    for train_indices, test_indices in kf:
        train_x = [tr_X[ii] for ii in train_indices]
        train_y = [tr_y[ii] for ii in train_indices]
        test_x  = [tr_X[ii] for ii in test_indices]
        test_y  = [tr_y[ii] for ii in test_indices]
        train_y = to_categorical(train_y,num_classes=len(labels))
        test_y = to_categorical(test_y,num_classes=len(labels)) 
        
        train_x=np.array(train_x)
        train_y=np.array(train_y)
        test_x=np.array(test_x)
        test_y=np.array(test_y)
        print "Development Mode"

        #get compiled model
        lrmodel=miz.prepare_model()

        if lrmodel is None:
            print "If you have used Dynamic Model, make sure you pass correct parameters"
            raise SystemExit
        #fit the model
        lrmodel.fit(train_x,train_y,batch_size=miz.batchsize,epochs=miz.epochs,verbose=1)
        
        #make prediction
        pred=lrmodel.predict(test_x, batch_size=32, verbose=2)

        pred = [ii.argmax()for ii in pred]
        test_y = [ii.argmax()for ii in test_y]

        results.append(accuracy_score(pred,test_y))
        print accuracy_score(pred,test_y)
        jj=str(set(list(test_y)))
        print "Unique in test_y",jj
    print "Results: " + str( np.array(results).mean() )
else:
    train_x=np.array(tr_X)
    train_y=np.array(tr_y)
    print "Evaluation mode"
    lrmodel=miz.prepare_model()
    train_y = to_categorical(train_y,num_classes=len(labels))
        
    #fit the model
    lrmodel.fit(train_x,train_y,batch_size=miz.batchsize,epochs=epochs,verbose=2)
    if save_model:
        lrmodel.save(modelx)
        lrmodel = load_model(modelx)

    truth,pred=test(lrmodel,txt_eva_path,new_p,model)

    acc=aud_model.calculate_accuracy(truth,pred)
    print "Accuracy %.2f prcnt"%acc


Evaluation mode
Activation 1 relu 2 relu 3 softmax
Model CNN
Epoch 1/10
 - 34s - loss: 1.8508 - acc: 0.3723
Epoch 2/10
 - 34s - loss: 1.0339 - acc: 0.6477
Epoch 3/10
 - 34s - loss: 0.8195 - acc: 0.7199
Epoch 4/10
 - 34s - loss: 0.6883 - acc: 0.7657
Epoch 5/10
 - 34s - loss: 0.5954 - acc: 0.7962
Epoch 6/10
 - 34s - loss: 0.5327 - acc: 0.8193
Epoch 7/10
 - 34s - loss: 0.4827 - acc: 0.8361
Epoch 8/10
 - 30s - loss: 0.4478 - acc: 0.8477
Epoch 9/10
 - 31s - loss: 0.4163 - acc: 0.8581
Epoch 10/10
 - 34s - loss: 0.3951 - acc: 0.8657
write out finished!
Accuracy 61.03 prcnt
