## Cornell Elephant Listening Project BACKGROUND samples

We use sound samples from [this](http://www.birds.cornell.edu/brp/elephant/) project.

We have 24 hour recordings for several days. We use [Raven-lite](http://www.birds.cornell.edu/brp/raven/RavenOverview.html) to collect the annotated datasets and split them into 5 second samples.

### BACKGROUND samples

The annotation table from Raven is manipulated in a spreadsheet program to filter on samples that DO NOT hold  elephant rumbeles, these samples are used to learn the background environment.

This gives us a list of selections, we presume you have this list of selections locally available.

In [24]:
# for the original elephant listening dataset there is a txt file in the folder with the wav filenames

import pandas as pd

# location of ESC-50 download
df = pd.read_csv('../../data/cornell/elephant-listening/20151005-background/sel.170626.112846.txt',header=None)

df.columns = ['filename']

df.head(5)

Unnamed: 0,filename
0,sel-01-20170626-112953.75.wav
1,sel-02-20170626-113011.82.wav
2,sel-05-20170626-113405.20.wav
3,sel-06-20170626-113655.88.wav
4,sel-09-20170626-113737.82.wav


In [25]:
# add a column with the category
df['category'] ='background'
df.shape

(1825, 2)

In [26]:
# random seed
import numpy
numpy.random.seed(42)

# sample 
df_sample = df.sample(n=1000)

In [27]:
# now we copy these files over to our dataset location for training
import sys, csv, os, shutil

# we will also resample the original recordings
import librosa
import resampy

from tqdm import tqdm

# this is the sample reate we want
sr_target = 44100

# source dataset location
source_path = '../../data/cornell/elephant-listening/20151005-background/'
dest_path = "./dataset/audio"

for index, row in tqdm(df_sample.iterrows(), total=len(df_sample)):
    # lees wav check of die er is zo ja kopier naar destination
    src_file_path = os.path.normcase("%s/%s" % (source_path,row.filename))
    # make destination path
    dst_file_path = "%s/%s" % (dest_path,row.filename)
    # copy file from to
    #shutil.copyfile(src_file_path,dst_file_path)
    
    # resample
    # Load audio file at its native sampling rate
    x, sr_orig = librosa.load(src_file_path, mono=True, sr=None)

    # We can resample this to any sampling rate we like, say 16000 Hz
    y = resampy.resample(x, sr_orig, sr_target)

    # write it back
    librosa.output.write_wav(dst_file_path, y, sr_target)

100%|██████████| 1000/1000 [04:21<00:00,  3.91it/s]


In [28]:
# save the metadata for later
df_sample.to_csv('./dataset/meta-data-elephant-background.csv',index=False)