# DCASE 2016 DNN Baseline
In this notebook, we implement the **Detection and Classification of Acoustic Scenes and Events challenge** 2016.

**Suppress warnings**: We need to supress warnings as we are going to use some functionality of an older version of *scikit-learn*.

In [6]:
import warnings
warnings.simplefilter("ignore")

## keras_aud library
Clone [keras_aud](https://github.com/channelCS/keras_aud) and place the **path** in *ka_path* variable so that we can import modules.

In [1]:
import sys
#ka_path="C:/Users/aditya/version-control"
ka_path="E:/akshita_workspace/cc"
sys.path.insert(0, ka_path)
from keras_aud import aud_audio, aud_feature
from keras_aud import aud_model

Audio Feature extraction script
Script by channelCS

  from ._conv import register_converters as _register_converters
Using Theano backend.
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla C2075 (CNMeM is disabled, cuDNN not available)





**Make imports**: We now import libraries which shall be required in this task. We use
1. `csv` for reading `.csv` files.
2. `cPickle` for reading `.f` pickle files.
3. `scipy` for calculating `mode`
4. `time` for calciulating *time to load* pickle files.
5. `KFold` for kfold cross validation.
6. `to_categorical` for reshaping *labels* into `num_classes`.
7. `load_model` for loading a saved model.

In [2]:
import csv
import cPickle
import numpy as np
import scipy
import time
from sklearn.metrics import accuracy_score
from sklearn.cross_validation import KFold
from keras.utils import to_categorical
from keras.models import load_model




**Define paths**: We now tell the paths for **audio**, **features** and **texts**.

| Variable        | Description                     |
| :-------------  |:-------------                   |
| `wav_dev_fd`    | Development audio folder        |
| `wav_eva_fd`    | Evaluation audio folder         |
| `dev_fd`        | Development features folder     |
| `eva_fd`        | Evaluation features folder      |
| `label_csv`     | Development meta file           |
| `txt_eva_path`  | Evaluation test file            |
| `new_p`         | Evaluation evaluate file        |


In [3]:
wav_dev_fd   = ka_path+'/dcase_data/audio/dev'
wav_eva_fd   = ka_path+'/dcase_data/audio/eva'
dev_fd       = ka_path+'/dcase_data/features/dev/logmel'
eva_fd       = ka_path+'/dcase_data/features/eva/cqt'
label_csv    = '../texts/dcase/dev/meta.txt'
txt_eva_path = '../texts/dcase/eva/test.txt'
new_p        = '../texts/dcase/eva/evaluate.txt'

**Define Labels**: We give the names of all the labels in the dataset

In [4]:
labels = [ 'bus', 'cafe/restaurant', 'car', 'city_center', 'forest_path', 'grocery_store', 'home', 'beach', 
            'library', 'metro_station', 'office', 'residential_area', 'train', 'tram', 'park' ]
lb_to_id = { lb:id for id, lb in enumerate(labels) }
id_to_lb = { id:lb for id, lb in enumerate(labels) }

## Extract features
This is where feature extraction takes place. We pass the
1. Feature name such as mel, logmel, mfcc.
2. Folder containing audios
3. Folder where features will be extracted
4. A yaml file containing parameters for features.

In [5]:
aud_audio.extract('cqt', wav_dev_fd, dev_fd,'example.yaml',print_arr=['shape'])
aud_audio.extract('cqt', wav_eva_fd, eva_fd,'example.yaml')


  if not np.issubdtype(x.dtype, np.float):
  if np.issubdtype(x.dtype, float) or np.issubdtype(x.dtype, complex):
  if np.issubdtype(x.dtype, float) or np.issubdtype(x.dtype, complex):


(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L

(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L, 120L)
(2585L

extraction complete!
Feature found


## Model Parameters
We define all model parameters here.

In [20]:
prep='eval'               # Which mode to use(String) Can be dev or eval.
save_model=False          # True when we want to save the model with weights.
#Parameters that are passed to the model.
model_type='Functional'   # Type of model Can be Dynamic or Functional or Static
model='CNN'               # Name of model(String) Can be DNN or CNN
feature="logmel"          # Name of feature(String) Can be mel logmel cqt mfcc zcr 
#Works only for Functional
dropout1=0.1             # 1st Dropout(Float) 
act1='relu'              # 1st Activation(String) 
act2='relu'              # 2nd Activation(String) 
act3='softmax'           # 3rd Activation(String) 
#Works for all Models
input_neurons=400      # Number of Neurons
epochs=100             # Number of Epochs
batchsize=128          # Batch Size
num_classes=15         # Number of classes
filter_length=3        # Size of Filter
nb_filter=100          # Number of Filters
#Parameters that are passed to the features.
agg_num=10             # Number of frames
hop=10                 # Hop Length

In [21]:
paul=aud_model.Feature(feature=feature)

In [22]:
def GetAllData(fe_fd, csv_file, agg_num, hop):
    """
    Input: Features folder(String), CSV file(String), agg_num(Integer), hop(Integer).
    Output: Loaded features(Numpy Array) and labels(Numpy Array).
    Loads all the features saved as pickle files.
    """
    # read csv
    with open( csv_file, 'rb') as f:
        reader = csv.reader(f)
        lis = list(reader)
    
    # init list
    X3d_all = []
    y_all = []
    i=0
    for li in lis:
        # load data
        [na, lb] = li[0].split('\t')
        na = na.split('/')[1][0:-4]
        path = fe_fd + '/' + na + '.f'
        try:
            X = cPickle.load( open( path, 'rb' ) )
        except Exception as e:
            print 'Error while parsing',path
            continue
        # reshape data to (n_block, n_time, n_freq)
        i+=1
        X3d = aud_model.mat_2d_to_3d( X, agg_num, hop )
        X3d_all.append( X3d )
        y_all += [ lb_to_id[lb] ] * len( X3d )
    
    print "Features loaded",i                
    print 'All files loaded successfully'
    # concatenate list to array
    X3d_all = np.concatenate( X3d_all )
    y_all = np.array( y_all )
    
    return X3d_all, y_all

In [23]:
def test(md,csv_file,new_p,model):
    # load name of wavs to be classified
    with open( csv_file, 'rb') as f:
        reader = csv.reader(f)
        lis = list(reader)
    
    # do classification for each file
    names = []
    pred_lbs = []
    
    for li in lis:
        names.append( li[0] )
        na = li[0][6:-4]
        #audio evaluation name
        fe_path = eva_fd + '/' + na + '.f'
        X0 = cPickle.load( open( fe_path, 'rb' ) )
        X0 = aud_model.mat_2d_to_3d( X0, agg_num, hop )
        
        X0 = aud_model.mat_3d_to_nd(model,X0)
    
        # predict
        p_y_preds = md.predict(X0)        # probability, size: (n_block,label)
        preds = np.argmax( p_y_preds, axis=-1 )     # size: (n_block)
        b = scipy.stats.mode(preds)
        pred = int( b[0] )
        pred_lbs.append( id_to_lb[ pred ] )
    
    pred = []    
    # write out result
    for i1 in xrange( len( names ) ):
        fname = names[i1] + '\t' + pred_lbs[i1] + '\n' 
        pred.append(fname)
        
    print 'write out finished!'
    truth = open(new_p,'r').readlines()
    pred = [i.split('\t')[1].split('\n')[0]for i in pred]
    truth = [i.split('\t')[1]for i in truth]
    pred.sort()
    truth.sort()
    return truth,pred


In [24]:
tr_X, tr_y = GetAllData( dev_fd, label_csv, agg_num, hop )


Features loaded 1170
All files loaded successfully


In [25]:
print(tr_X.shape)
print(tr_y.shape)

(301860L, 10L, 120L)
(301860L,)


In [26]:
tr_X=aud_model.mat_3d_to_nd(model,tr_X)
print(tr_X.shape)
dimx=tr_X.shape[-2]
dimy=tr_X.shape[-1]

(301860L, 1L, 10L, 120L)


In [27]:
if prep=='dev':
    cross_validation=True
else:
    cross_validation=False

In [None]:
miz=aud_model.Functional_Model(input_neurons=input_neurons,cross_validation=cross_validation,dropout1=dropout1,
    act1=act1,act2=act2,act3=act3,nb_filter = nb_filter, filter_length=filter_length,
    epochs=epochs,batchsize=batchsize,num_classes=num_classes,
    model=model,agg_num=agg_num,hop=hop,dimx=dimx,dimy=dimy)

In [None]:
np.random.seed(1523)
if cross_validation:
    kf = KFold(len(tr_X),folds,shuffle=True,random_state=42)
    results=[]    
    for train_indices, test_indices in kf:
        train_x = [tr_X[ii] for ii in train_indices]
        train_y = [tr_y[ii] for ii in train_indices]
        test_x  = [tr_X[ii] for ii in test_indices]
        test_y  = [tr_y[ii] for ii in test_indices]
        train_y = to_categorical(train_y,num_classes=len(labels))
        test_y = to_categorical(test_y,num_classes=len(labels)) 
        
        train_x=np.array(train_x)
        train_y=np.array(train_y)
        test_x=np.array(test_x)
        test_y=np.array(test_y)
        print "Development Mode"

        #get compiled model
        lrmodel=miz.prepare_model()

        if lrmodel is None:
            print "If you have used Dynamic Model, make sure you pass correct parameters"
            raise SystemExit
        #fit the model
        lrmodel.fit(train_x,train_y,batch_size=miz.batchsize,epochs=miz.epochs,verbose=1)
        
        #make prediction
        pred=lrmodel.predict(test_x, batch_size=32, verbose=2)

        pred = [ii.argmax()for ii in pred]
        test_y = [ii.argmax()for ii in test_y]

        results.append(accuracy_score(pred,test_y))
        print accuracy_score(pred,test_y)
        jj=str(set(list(test_y)))
        print "Unique in test_y",jj
    print "Results: " + str( np.array(results).mean() )
else:
    train_x=np.array(tr_X)
    train_y=np.array(tr_y)
    print "Evaluation mode"
    lrmodel=miz.prepare_model()
    train_y = to_categorical(train_y,num_classes=len(labels))
        
    #fit the model
    lrmodel.fit(train_x,train_y,batch_size=miz.batchsize,epochs=epochs,verbose=2)
    if save_model:
        lrmodel.save(modelx)
        lrmodel = load_model(modelx)

    truth,pred=test(lrmodel,txt_eva_path,new_p,model)

    acc=aud_model.calculate_accuracy(truth,pred)
    print "Accuracy %.2f prcnt"%acc


Evaluation mode
Activation 1 relu 2 relu 3 softmax
Model CNN
Epoch 1/100
 - 137s - loss: 1.2674 - acc: 0.5684
Epoch 2/100
 - 137s - loss: 0.7819 - acc: 0.7383
Epoch 3/100
 - 135s - loss: 0.6638 - acc: 0.7787
Epoch 4/100
 - 141s - loss: 0.5950 - acc: 0.8016
Epoch 5/100
 - 140s - loss: 0.5522 - acc: 0.8161
Epoch 6/100
 - 140s - loss: 0.5161 - acc: 0.8282
Epoch 7/100
 - 138s - loss: 0.4884 - acc: 0.8376
Epoch 8/100
 - 140s - loss: 0.4650 - acc: 0.8441
Epoch 9/100
 - 141s - loss: 0.4446 - acc: 0.8507
Epoch 10/100
 - 142s - loss: 0.4279 - acc: 0.8565
Epoch 11/100
 - 141s - loss: 0.4106 - acc: 0.8618
Epoch 12/100
 - 141s - loss: 0.3963 - acc: 0.8669
Epoch 13/100
 - 142s - loss: 0.3839 - acc: 0.8704
Epoch 14/100
 - 141s - loss: 0.3701 - acc: 0.8752
Epoch 15/100
 - 141s - loss: 0.3603 - acc: 0.8777
Epoch 16/100
 - 135s - loss: 0.3478 - acc: 0.8819
Epoch 17/100
 - 141s - loss: 0.3381 - acc: 0.8848
Epoch 18/100
 - 137s - loss: 0.3277 - acc: 0.8877
Epoch 19/100
 - 141s - loss: 0.3170 - acc: 0.891