In this notebook, I'm going to extract features from the first 1 second of each audio using only 2 categories as an example, then use it to build a simple logistic regression model. Use 

In [93]:
% load_ext autoreload
% autoreload 2
import os
from pathlib import Path
import glob
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import math
import IPython.display as ipd
import librosa
import librosa.display
import librosa.feature as lf

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Load data

In [5]:
data_folder = Path('data/raw/freesound-audio-tagging')
training_data_folder = data_folder / 'audio_train'
test_data_folder = data_folder / 'audio_test'

In [12]:
tags = pd.read_csv(data_folder / 'train.csv')

In [19]:
tags['label'].value_counts().sample(3, random_state=123).head(3)

Harmonica    165
Shatter      300
Fireworks    300
Name: label, dtype: int64

#### Sample data

Take only Harmonica and Fireworks for now

In [29]:
df_files = tags[tags['label'].isin(['Harmonica', 'Fireworks'])]

In [32]:
train_files = [training_data_folder / x for x in df_files['fname'].tolist()]

In [36]:
test_files = [Path(x) for x in glob.glob(test_data_folder.as_posix() + '/*.wav')]

In [35]:
%%time
train_wav_inputs = {wav.name: librosa.load(wav) for wav in train_files}

Wall time: 2min 43s


Listen to sample

In [110]:
sample_file = train_files[0].name

In [111]:
train_wav_inputs[sample_file]

(array([-1.0839484e-05, -4.6323126e-05, -2.9048624e-05, ...,
        -9.3051522e-06, -4.7470004e-07,  2.0633401e-05], dtype=float32), 22050)

first one is Harmonica

In [41]:
ipd.Audio(str(train_files[0]))

#### Truncate to only 1 second

Use second 1. Some files have less than 1 second of data like `027aee2a.wav` or `0765b7d1.wav`

In [68]:
sample_rate = 22050

In [113]:
train_wav_inputs[sample_file][0].shape[0] / sample_rate

14.56

In [112]:
ipd.Audio(train_wav_inputs[sample_file][0][:sample_rate], rate=sample_rate)

In [71]:
train_input_1_sec = {filename: audio_sample[0][:sample_rate] for filename, audio_sample in train_wav_inputs.items()}

Check length

In [74]:
{key: x.shape[0] for key, x in train_input_1_sec.items() if x.shape[0] < 22050} 

{'027aee2a.wav': 21609,
 '0765b7d1.wav': 10584,
 '3e93dd4c.wav': 20286,
 '4018374b.wav': 12348,
 '433c3a13.wav': 16758,
 '436ec5c3.wav': 15876,
 '4619c0dd.wav': 10143,
 '47c0a3fc.wav': 18963,
 '643dfb33.wav': 13230,
 '79e22357.wav': 14553,
 '83be8d1d.wav': 14553,
 '89da251b.wav': 15435,
 'caafc360.wav': 18522,
 'd7241122.wav': 20727,
 'db634e41.wav': 10143,
 'e74ef511.wav': 15876}

For files with less than 1 second, pad it with itself to 1 second. Example:

In [108]:
def pad_audio(sound: np.ndarray, sample_rate=22050):
    padded_sound = np.tile(sound, math.ceil(sample_rate / sound.shape[0]))
    return padded_sound[:sample_rate]

In [115]:
short_sound = train_input_1_sec['027aee2a.wav']
ipd.Audio(pad_audio(short_sound), rate=sample_rate)

In [105]:
short_files = {key: x for key, x in train_input_1_sec.items() if x.shape[0] < 22050} 
for key, audio in short_files.items():
    train_input_1_sec[key] = pad_audio(audio)

## Preprocessing

### Use chromagram as features

Just to build the framework

In [107]:
train_chromas = {filename: lf.chroma_stft(sample) for filename, sample in train_input_1_sec.items()}



What is this warning??

Flatten the arrays and use as features

Example

In [116]:
sample_chroma = train_chromas[sample_file]

In [123]:
pd.DataFrame(sample_chroma.reshape(1,-1))

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,518,519,520,521,522,523,524,525,526,527
0,0.785165,0.618009,0.597678,0.579074,0.441998,0.595878,0.523667,0.561396,0.561605,0.712012,...,0.475643,0.435534,0.034601,0.050538,0.127993,0.331543,0.996019,0.783161,0.787075,0.692177
