# Urban Sound Classification
8732 labeled sound excerpts of urban sounds from 10 classes

## <b>Context</b>
    
The automatic classification of environmental sound is a growing research field with multiple applications to largescale, content-based multimedia indexing and retrieval. In particular, the sonic analysis of urban environments is the subject of increased interest, partly enabled by multimedia sensor networks, as well as by large quantities of online multimedia content depicting urban scenes.

However, while there is a large body of research in related areas such as speech, music and bioacoustics, work on the analysis of urban acoustic environments is relatively scarce.Furthermore, when existent, it mostly focuses on the classification of auditory scene type, e.g. street, park, as opposed to the identification of sound sources in those scenes, e.g.car horn, engine idling, bird tweet.

There are primarily two major challenges with urban sound research namely

<ul><li>Lack of labeled audio data. Previous work has focused on audio from carefully produced movies or television tracks from specific environments such as elevators or office spaces and on commercial or proprietary datasets . The large effort involved in manually annotating real-world data means datasets based on field recordings tend to be relatively small (e.g. the event detection dataset of the IEEE AASP Challenge consists of 24 recordings per each of 17 classes).</li></ul>
<ul><li>Lack of common vocabulary when working on urban sounds.This means the classification of sounds into semantic groups may vary from study to study, making it hard to compare results so the objective of this notebook is to address the above two mentioned challenges.</li></ul>

## <b>Content</b>
The dataset is called UrbanSound and contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: - The dataset contains 8732 sound excerpts (<=4s) of urban sounds from 10 classes, namely: Air Conditioner Car Horn Children Playing Dog bark Drilling Engine Idling Gun Shot Jackhammer Siren Street Music The attributes of data are as follows: ID – Unique ID of sound excerpt Class – type of sound

## <b>Acknowledgements</b>
Source of the dataset : https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU

Source of research document : https://serv.cusp.nyu.edu/projects/urbansounddataset/salamon_urbansound_acmmm14.pdf

Source of Technique used : [CNN ARCHITECTURES FOR LARGE-SCALE AUDIO CLASSIFICATION](https://arxiv.org/pdf/1609.09430.pdf)

## Importing Basic Libraries and Installing Tools Required

In [1]:
%matplotlib inline
from memory_profiler import memory_usage
import os
import pandas as pd
from glob import glob
import numpy as np


### Installing libav-tools to get Librosa Working Properly

In [2]:
%%capture
!apt-get install libav-tools -y

### Importing Fast.ai for Deep Learning and Librosa for Creating Spectrogram

In [3]:
from fastai.vision import *
import librosa
import librosa.display
import pylab
import matplotlib
import gc

### Making Temporary Working Directories for Storing the Audio Conversions

In [4]:
!mkdir /kaggle/working/train
!mkdir /kaggle/working/test

## Defining the Create Spectrogram Function

In [5]:
def create_spectrogram(filename,name):
    plt.interactive(False)
    clip, sample_rate = librosa.load(filename, sr=None)
    fig = plt.figure(figsize=[0.72,0.72])
    ax = fig.add_subplot(111)
    ax.axes.get_xaxis().set_visible(False)
    ax.axes.get_yaxis().set_visible(False)
    ax.set_frame_on(False)
    S = librosa.feature.mfcc(y=clip, sr=sample_rate)
    librosa.display.specshow(librosa.power_to_db(S, ref=np.max))
    filename  = Path('/kaggle/working/train/' + name + '.jpg')
    plt.savefig(filename, dpi=400, bbox_inches='tight',pad_inches=0)
    plt.close()    
    fig.clf()
    plt.close(fig)
    plt.close('all')
    del filename,name,clip,sample_rate,fig,ax,S

In [6]:
def create_spectrogram_test(filename,name):
    plt.interactive(False)
    clip, sample_rate = librosa.load(filename, sr=None)
    fig = plt.figure(figsize=[0.72,0.72])
    ax = fig.add_subplot(111)
    ax.axes.get_xaxis().set_visible(False)
    ax.axes.get_yaxis().set_visible(False)
    ax.set_frame_on(False)
    S = librosa.feature.mfcc(y=clip, sr=sample_rate)
    librosa.display.specshow(librosa.power_to_db(S, ref=np.max))
    filename  = Path('/kaggle/working/test/' + name + '.jpg')
    plt.savefig(filename, dpi=400, bbox_inches='tight',pad_inches=0)
    plt.close()    
    fig.clf()
    plt.close(fig)
    plt.close('all')
    del filename,name,clip,sample_rate,fig,ax,S

### Splitting the Conversion in different cells and collecting the garbage to avoid Ram Overflow

In [7]:
Data_dir=np.array(glob("../input/train/Train/*"))

In [8]:
%load_ext memory_profiler

In [9]:
%%memit 
i=0
for file in Data_dir[i:i+1500]:
    filename,name = file,file.split('/')[-1].split('.')[0]
    create_spectrogram(filename,name)

peak memory: 346.59 MiB, increment: 32.21 MiB


In [10]:
gc.collect()

4127

In [11]:
%%memit 
i=1500
for file in Data_dir[i:i+1500]:
    filename,name = file,file.split('/')[-1].split('.')[0]
    create_spectrogram(filename,name)

peak memory: 346.59 MiB, increment: 10.38 MiB


In [12]:
gc.collect()

1313

In [13]:
%%memit 
i=3000
for file in Data_dir[i:i+1500]:
    filename,name = file,file.split('/')[-1].split('.')[0]
    create_spectrogram(filename,name)

peak memory: 345.89 MiB, increment: 9.23 MiB


In [14]:
gc.collect()

1314

In [15]:
%%memit 
i=4500
for file in Data_dir[i:]:
    filename,name = file,file.split('/')[-1].split('.')[0]
    create_spectrogram(filename,name)

peak memory: 348.18 MiB, increment: 9.88 MiB


In [16]:
gc.collect()

15368

## Creating Data Bunch for Training and Creating a Resnet34 Model

In [17]:
path = Path('/kaggle/working/')
np.random.seed(42)
data = ImageDataBunch.from_csv(path,csv_labels='../input/train.csv', folder="train", valid_pct=0.2, suffix='.jpg',
        ds_tfms=get_transforms(), size=224, num_workers=0).normalize(imagenet_stats)

In [18]:
data.classes

['air_conditioner',
 'car_horn',
 'children_playing',
 'dog_bark',
 'drilling',
 'engine_idling',
 'gun_shot',
 'jackhammer',
 'siren',
 'street_music']

In [19]:
learn = create_cnn(data, models.resnet34, metrics=accuracy)

Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /tmp/.torch/models/resnet34-333f7ec4.pth
100%|██████████| 87306240/87306240 [00:01<00:00, 84182423.38it/s]


## Training The Model

In [20]:
learn.fit_one_cycle(4)

epoch,train_loss,valid_loss,accuracy
1,2.047395,1.546359,0.467341
2,1.628782,1.321002,0.544618
3,1.414829,1.211940,0.571297
4,1.298724,1.199479,0.589696


In [21]:
learn.save('stage-1')

In [22]:
learn.load('stage-1')

Learner(data=ImageDataBunch;

Train: LabelList (4348 items)
x: ImageItemList
Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
y: CategoryList
siren,street_music,drilling,siren,dog_bark
Path: /kaggle/working;

Valid: LabelList (1087 items)
x: ImageItemList
Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
y: CategoryList
dog_bark,jackhammer,engine_idling,siren,drilling
Path: /kaggle/working;

Test: None, model=Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps

In [23]:
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
Min numerical gradient: 1.10E-06


In [24]:
learn.recorder.plot()

Min numerical gradient: 1.10E-06


In [25]:
learn.fit_one_cycle(7, max_lr=slice(1e-4,1e-3))

epoch,train_loss,valid_loss,accuracy
1,1.138813,1.013132,0.637534
2,0.982950,0.874455,0.680773
3,0.747891,0.655787,0.772769
4,0.552951,0.538374,0.822447
5,0.388687,0.371108,0.866605
6,0.240043,0.274302,0.911684
7,0.167931,0.274175,0.908004


In [26]:
learn.save('stage-2')
learn.load('stage-2')

Learner(data=ImageDataBunch;

Train: LabelList (4348 items)
x: ImageItemList
Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
y: CategoryList
siren,street_music,drilling,siren,dog_bark
Path: /kaggle/working;

Valid: LabelList (1087 items)
x: ImageItemList
Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
y: CategoryList
dog_bark,jackhammer,engine_idling,siren,drilling
Path: /kaggle/working;

Test: None, model=Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps

In [27]:
#learn.lr_find()
#learn.recorder.plot()

In [28]:
#learn.fit_one_cycle(1, max_lr=slice(1e-6,2e-6))

In [29]:
#learn.load('stage-2')

In [30]:
#learn.save('stage-3')

In [31]:
Test_dir=np.array(glob("../input/test/Test/*"))

In [32]:
%%memit 
i=0
for file in Test_dir[i:i+1500]:
    filename,name = file,file.split('/')[-1].split('.')[0]
    create_spectrogram_test(filename,name)



peak memory: 1931.49 MiB, increment: 16.50 MiB


In [33]:
gc.collect()

26614

In [34]:
%%memit 
i=1500
for file in Test_dir[i:]:
    filename,name = file,file.split('/')[-1].split('.')[0]
    create_spectrogram_test(filename,name)



peak memory: 1931.81 MiB, increment: 1.51 MiB


In [35]:
gc.collect()

5527

In [36]:
learn.load('stage-2')
test_csv = pd.read_csv('../input/test.csv')

## Making predictions and writing it to CSV

In [37]:
with open('output_resnet34_e7.csv',"w") as file:
    file.write("ID,Prediction\n")
    for test in test_csv.ID:
        img = open_image('/kaggle/working/test/'+str(test)+'.jpg')
        prediction = str(learn.predict(img)[0]).split()[0]
        file.write(str(test)+','+prediction)
        file.write('\n')

In [38]:
with open('final_resnet34_e7.csv',"w") as file:
    file.write("Class,ID\n")
    for test in test_csv.ID:
        img = open_image('/kaggle/working/test/'+str(test)+'.jpg')
        prediction = str(learn.predict(img)[0]).split()[0]
        file.write(prediction+','+str(test))
        file.write('\n')

In [39]:
output = pd.read_csv('final_resnet34_e7.csv')
output.head()

Unnamed: 0,Class,ID
0,jackhammer,5
1,dog_bark,7
2,drilling,8
3,dog_bark,9
4,street_music,13


### Removing Unwanted Folders

In [40]:
%%capture
!apt-get install zip
!zip -r train.zip /kaggle/working/train/
!zip -r test.zip /kaggle/working/test/
!rm -rf train/*
!rm -rf test/*