# Is it a banger? - Make your own dataset

### TODO

Discuss folder structure, `split_files.sh` script, duration of each clip, `p_include`

#### Folder structure
```
data
├── label_1
├── label_2
├──    ·
├──    ·
├──    ·
└── label_k
```

For the given example **Need link here eventually**, we simply have

```
data
├── banger
└── not_a_banger
```

#### File splitting - EXPLAIN WHAT THIS DOES, UPDATE IF FILE CHANGED
```bash
#!/bin/bash

SEGMENT_TIME=5 # in seconds
DATA_ROOT_DIR="../data"

DIRS=$(find "${DATA_ROOT_DIR}" -maxdepth 1 -mindepth 1 -type d)

# Make sure globstar is enabled
shopt -s globstar

for FILE in "${DATA_ROOT_DIR}"/**/*.mp3
do 
    echo "Processing ${FILE}"
    ffmpeg -i "${FILE}" -f segment -segment_time ${SEGMENT_TIME} -c copy "${FILE%.*}"%03d.wav                
    rm "${FILE}"
    rm "$(ls -t "${FILE%.*}"*.wav | tail -n 1)" # remove last file so uniform length
done
```

In [1]:
import os
import glob
import librosa
import numpy as np
np.random.seed(1234)
import pandas as pd

In [4]:
parent_dir = '../data'
parent_dir_contents = [os.path.join(parent_dir, dirname) for dirname in os.listdir(parent_dir)]
sub_dirs = [filename if os.path.isdir(filename) else None for filename in parent_dir_contents]
sub_dirs = list(filter(None.__ne__, sub_dirs))
labels_list = [os.path.relpath(path, parent_dir) for path in sub_dirs]

In [5]:
def extract_features(file_name, sample_rate=22050, segment_time=1, samples_to_clip=500):
    audio, sample_rate = librosa.load(file_name, sr=sample_rate)
    end_idx = sample_rate * segment_time - samples_to_clip # remove some end samples as not strictly uniform size
    audio = audio[0:end_idx]
    log_specgram = librosa.logamplitude(np.abs(librosa.stft(audio))**2, ref_power=np.max)
    features = {"audio": audio, "log_specgram": log_specgram}
    return features

def one_hot_encode(label, labels_list):
    n_labels = len(labels_list)
    one_hot_encoded = np.zeros(n_labels)
    for idx, cmp in enumerate(labels_list):
        if label == cmp:
            one_hot_encoded[idx] = 1                     
    return one_hot_encoded

def trim_file_list(fnames_list, p_include=1.0):
    fnames_list = np.asarray(fnames_list)
    include = np.random.rand(*fnames_list.shape)
    fnames_list = fnames_list[include < p_include]
    return fnames_list
    

def parse_audio_files(parent_dir, sub_dirs_list, labels_list, file_ext='*.wav', p_include=1.0,\
                      sample_rate=22050, segment_time=1, samples_to_clip=500):
    data = []
    index = []
    for label_idx, sub_dir in enumerate(sub_dirs_list):
        fnames_list = glob.glob(os.path.join(sub_dir, "*.wav"))
        fnames_list = trim_file_list(fnames_list, p_include=p_include)
        for fname in fnames_list:
            print("Processing " + os.path.basename(fname))
            features = extract_features(fname)
            label = labels_list[label_idx]
            label_one_hot = one_hot_encode(label, labels_list)
            features['label'] = label
            features["label_one_hot"] = label_one_hot
            data.append(features)
            index.append(os.path.basename(fname))
    return pd.DataFrame(data, index=index)

In [8]:
df = parse_audio_files(parent_dir, sub_dirs, labels_list, p_include=0.3, segment_time=5)
df = df.iloc[np.random.permutation(len(df))] # shuffle rows
df.to_pickle(os.path.join(parent_dir, 'processed_dataset.pkl'))

Processing Fun. - We Are Young ft. Janelle Monáe [OFFICIAL VIDEO]004.wav
Processing The Lumineers - Big Parade060.wav
Processing Mumford & Sons - I Will Wait015.wav
Processing Passenger _ Let Her Go (Official Video)029.wav
Processing Passenger _ Let Her Go (Official Video)001.wav
Processing The Lumineers - Holdin' Out - Storks - Original Motion Picture Soundtrack009.wav
Processing Imagine Dragons - Radioactive007.wav
Processing The Lumineers - White Lie (lyrics)019.wav
Processing Lorde - Royals (US Version)028.wav
Processing The Lumineers - White Lie (lyrics)031.wav
Processing Lorde - Royals (US Version)014.wav
Processing Avicii - Wake Me Up (Official Video)012.wav
Processing Bastille - Pompeii003.wav
Processing The Lumineers - Slow It Down056.wav
Processing Imagine Dragons - It's Time019.wav
Processing The Lumineers - Gale Song [Lyrics]034.wav
Processing The Lumineers   This Must Be The Place030.wav
Processing The Lumineers - Angela024.wav
Processing The Lumineers - Nobody Knows (From

Processing Imagine Dragons - It's Time020.wav
Processing The Lumineers - Gale Song [Lyrics]025.wav
Processing The Lumineers   This Must Be The Place021.wav
Processing Imagine Dragons - It's Time008.wav
Processing The Lumineers - Gale Song [Lyrics]031.wav
Processing The Lumineers   This Must Be The Place035.wav
Processing Fun. - Some Nights [OFFICIAL VIDEO]052.wav
Processing The Lumineers - In The Light [Lyrics]007.wav
Processing The Lumineers - Nobody Knows (From 'Pete's Dragon')008.wav
Processing The Lumineers - Nobody Knows (From 'Pete's Dragon')020.wav
Processing The Lumineers - Angela035.wav
Processing The Lumineers - Nobody Knows (From 'Pete's Dragon')035.wav
Processing The Lumineers - Angela020.wav
Processing The Lumineers - In The Light [Lyrics]006.wav
Processing Vance Joy - 'Riptide' Official Video027.wav
Processing Vance Joy - 'Riptide' Official Video033.wav
Processing The Lumineers - Flowers In Your Hair016.wav
Processing The Lumineers - Gale Song [Lyrics]024.wav
Processing T

Processing The Lumineers - 'Stubborn Love' (Official Video)014.wav
Processing The Lumineers - Classy Girls022.wav
Processing The Lumineers - My Eyes [Lyrics]031.wav
Processing The Lumineers - My Eyes [Lyrics]025.wav
Processing The Lumineers - Big Parade010.wav
Processing The Lumineers - Where The Skies Are Blue [Lyrics]008.wav
Processing The Lumineers - 'Submarines' (Official Video)013.wav
Processing The Lumineers - Where The Skies Are Blue [Lyrics]022.wav
Processing P!nk - Just Give Me A Reason ft. Nate Ruess016.wav
Processing The Lumineers - Charlie Boy049.wav
Processing The Lumineers - Patience [Lyrics]014.wav
Processing Rihanna - Stay ft. Mikky Ekko015.wav
Processing The Lumineers - Ho Hey (Official Video)012.wav
Processing The Lumineers - 'Stubborn Love' (Official Video)002.wav
Processing Rihanna - Stay ft. Mikky Ekko029.wav
Processing Theme - The Lumineers - Scotland013.wav
Processing The Lumineers - Darlene [Lyrics in description]006.wav
Processing Avicii - Wake Me Up (Official 

Processing Imagine Dragons - Demons (Official)041.wav
Processing The Lumineers - Darlene [Lyrics in description]019.wav
Processing The Lumineers - Darlene [Lyrics in description]025.wav
Processing Imagine Dragons - Radioactive042.wav
Processing The Lumineers - Ho Hey (Official Video)019.wav
Processing The Lumineers - 'Stubborn Love' (Official Video)009.wav
Processing The Lumineers - Cleopatra037.wav
Processing The Lumineers - Classy Girls017.wav
Processing The Lumineers - Ho Hey (Official Video)025.wav
Processing The Lumineers - Classy Girls003.wav
Processing Passenger _ Let Her Go (Official Video)044.wav
Processing Sleep On The Floor (LYRICS) - The Lumineers014.wav
Processing The Lumineers - Big Parade031.wav
Processing P!nk - Just Give Me A Reason ft. Nate Ruess035.wav
Processing The Lumineers - Where The Skies Are Blue [Lyrics]015.wav
Processing The Lumineers - Where The Skies Are Blue [Lyrics]017.wav
Processing The Lumineers - 'Submarines' (Official Video)018.wav
Processing The Lum

Processing The Lumineers - Morning Song053.wav
Processing The Lumineers - Ain't Nobody's Problem004.wav
Processing The Lumineers - Morning Song047.wav
Processing Of Monsters And Men - Little Talks (Official Video)007.wav
Processing OneRepublic - Counting Stars011.wav
Processing Imagine Dragons - It's Time002.wav
Processing The Lumineers - Flowers In Your Hair021.wav
Processing The Lumineers   This Must Be The Place003.wav
Processing Vance Joy - 'Riptide' Official Video038.wav
Processing The Lumineers - In The Light [Lyrics]019.wav
Processing The Lumineers - Angela002.wav
Processing The Lumineers - In The Light [Lyrics]018.wav
Processing Fun. - Some Nights [OFFICIAL VIDEO]065.wav
Processing Vance Joy - 'Riptide' Official Video005.wav
Processing The Lumineers   This Must Be The Place002.wav
Processing The Lumineers - Gun Song [Lyrics]041.wav
Processing Imagine Dragons - It's Time003.wav
Processing The Lumineers - Flowers In Your Hair008.wav
Processing The Lumineers - Slow It Down058.wav


Processing Selected New Year Mix118.wav
Processing Selected New Year Mix130.wav
Processing Selected New Year Mix332.wav
Processing Selected New Year Mix326.wav
Processing Selected New Year Mix246.wav
Processing Selected New Year Mix252.wav
Processing Selected New Year Mix087.wav
Processing Selected New Year Mix051.wav
Processing Selected New Year Mix092.wav
Processing Selected New Year Mix086.wav
Processing Selected New Year Mix247.wav
Processing Selected New Year Mix125.wav
Processing Selected New Year Mix119.wav
Processing Selected New Year Mix109.wav
Processing Selected New Year Mix280.wav
Processing Selected New Year Mix243.wav
Processing Selected New Year Mix257.wav
Processing Selected New Year Mix096.wav
Processing Selected New Year Mix055.wav
Processing Selected New Year Mix242.wav
Processing Selected New Year Mix295.wav
Processing Selected New Year Mix281.wav
Processing Selected New Year Mix108.wav
Processing Selected New Year Mix120.wav
Processing Selected New Year Mix134.wav


In [9]:
display(df[:10])

Unnamed: 0,audio,label,label_one_hot,log_specgram
The Lumineers - Ain't Nobody's Problem010.wav,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",not_a_banger,"[1.0, 0.0]","[[-80.0, -34.0311, -15.0564, -15.9555, -21.711..."
The Lumineers - Angela004.wav,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",not_a_banger,"[1.0, 0.0]","[[-80.0, -48.4892, -19.4601, -19.1463, -42.978..."
Imagine Dragons - Radioactive029.wav,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",not_a_banger,"[1.0, 0.0]","[[-74.7518, -42.5359, -33.4131, -39.1166, -28...."
Fun. - Some Nights [OFFICIAL VIDEO]057.wav,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",not_a_banger,"[1.0, 0.0]","[[-80.0, -27.6447, -19.1384, -22.9659, -39.575..."
P!nk - Just Give Me A Reason ft. Nate Ruess003.wav,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",not_a_banger,"[1.0, 0.0]","[[-80.0, -71.0971, -45.9859, -73.1597, -35.874..."
Mumford & Sons - I Will Wait014.wav,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",not_a_banger,"[1.0, 0.0]","[[-80.0, -75.9527, -34.2853, -41.4846, -42.21,..."
Selected New Year Mix196.wav,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",banger,"[0.0, 1.0]","[[-66.5509, -46.4885, -46.7563, -47.9261, -42...."
The Lumineers - Dead Sea006.wav,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",not_a_banger,"[1.0, 0.0]","[[-80.0, -80.0, -72.3932, -67.8201, -64.4327, ..."
The Lumineers - Big Parade005.wav,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",not_a_banger,"[1.0, 0.0]","[[-67.1488, -59.6014, -37.0821, -27.2378, -37...."
Selected New Year Mix186.wav,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",banger,"[0.0, 1.0]","[[-69.115, -37.6017, -29.8442, -33.0671, -47.0..."
