## Urban8k Dataset

We use the samples collected in the Urban8k project.

Samples can be downloaded [here](https://serv.cusp.nyu.edu/projects/urbansounddataset/).

The soundsamples are licenced under the [Creative Commons Attribution Noncommercial License](https://creativecommons.org/licenses/by-nc/3.0/)





In [22]:
# load the metadata of the urrban8k samples

import pandas as pd

# location of urnab8k download
df = pd.read_csv('../../urban8k/UrbanSound8K/metadata/UrbanSound8K.csv')


df.head(5)

Unnamed: 0,slice_file_name,fsID,start,end,salience,fold,classID,class
0,100032-3-0-0.wav,100032,0.0,0.317551,1,5,3,dog_bark
1,100263-2-0-117.wav,100263,58.5,62.5,1,5,2,children_playing
2,100263-2-0-121.wav,100263,60.5,64.5,1,5,2,children_playing
3,100263-2-0-126.wav,100263,63.0,67.0,1,5,2,children_playing
4,100263-2-0-137.wav,100263,68.5,72.5,1,5,2,children_playing


In [23]:
# rename some columns
df.rename(columns={'slice_file_name': 'filename', 'class': 'category'}, inplace=True)
# get all categories
categories = df.category.unique()

categories

array(['dog_bark', 'children_playing', 'car_horn', 'air_conditioner',
       'street_music', 'gun_shot', 'siren', 'engine_idling', 'jackhammer',
       'drilling'], dtype=object)

In [24]:
# we are interested in the gunshots
# keep only relevant categories
df_relevant = df[df.category.isin(['gun_shot'])]

# check filter
df_relevant.category.unique()
len(df_relevant)

374

In [25]:
df_sample = df_relevant.sample(n=100)

In [26]:
# now we copy these files over to our dataset location for training
import sys, csv, os, shutil

from tqdm import tqdm

# source dataset location
source_path = '../../urban8k/UrbanSound8K'
dest_path = "./dataset/audio"

for index, row in tqdm(df_sample.iterrows(), total=len(df_sample)):
    # we need the fold
    fold = 'fold'+str(row.fold)
    # lees wav check of die er is zo ja kopier naar destination
    src_file_path = os.path.normcase("%s/audio/%s/%s" % (source_path,fold,row.filename))
    # make destination path
    dst_file_path = "%s/%s" % (dest_path,row.filename)
    # copy file from to
    shutil.copyfile(src_file_path,dst_file_path)
    #print(src_file_path)

100%|██████████| 100/100 [00:01<00:00, 96.43it/s]


In [20]:
# keep only relevant columns

# save the metadata for later
df_relevant[['filename','category']].to_csv('./dataset/meta-data-urban8k-relevant.csv',index=False)