## Using Specific Part of Sound Clip for Initial Training

Fastai was designed to make transfer learning easier. Let's see if training the model on the specific part of the soundclip first helps the model understand what it is looking for better.

In the first part, let's make the trimmed soundclips. The parameters for the specific parts of the clip were provided with the training data.

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import librosa
import librosa.display
import matplotlib as plt
import numpy as np
import pandas as pd
import soundfile as sf

from tqdm.notebook import tqdm
from scipy.signal import butter, lfilter
from IPython.display import Audio

from fastai.vision.all import *
import torchaudio



In [3]:
dpath = Path('rfcx-species-audio-detection/')
training_df = pd.read_csv(dpath/'train_tp.csv')
training_df.head()

Unnamed: 0,recording_id,species_id,songtype_id,t_min,f_min,t_max,f_max
0,003bec244,14,1,44.544,2531.25,45.1307,5531.25
1,006ab765f,23,1,39.9615,7235.16,46.0452,11283.4
2,007f87ba2,12,1,39.136,562.5,42.272,3281.25
3,0099c367b,17,4,51.4206,1464.26,55.1996,4565.04
4,009b760e6,10,1,50.0854,947.461,52.5293,10852.7


#### From SciPy Cookbook
https://scipy-cookbook.readthedocs.io/items/ButterworthBandpass.html

In [4]:
def butter_bandpass(lowcut, highcut, fs, order=5):
    nyq = 0.5 * fs
    low = lowcut / nyq
    high = highcut / nyq
    try:
        b, a = butter(order, [low, high], btype='band')
    except ValueError:
        b, a = butter(order, [low, 2-high], btype='band') 
        # high needs to be below 1, if it is greater, that means that fmax > sample_rate/2
        # nyquist num is the freq in nyquist domain (ie freq/(sample_rate/2))
        # essentially, it is creating a filter that ends above the sample
        # so to create the filter, below the max by the same amount that it was above the max value
    return b, a

def butter_bandpass_filter(data, lowcut, highcut, fs, order=5):
    b, a = butter_bandpass(lowcut, highcut, fs, order=order)
    y = lfilter(b, a, data)
    return y

In [5]:
def make_rel_clip(fpath, tmin, tmax, fmin, fmax):
    clip, sample_rate = librosa.load(fpath)
    clip = clip[int(sample_rate * tmin):int(sample_rate * tmax)]
    return butter_bandpass_filter(clip, int(fmin), int(fmax), sample_rate), sample_rate

#### Testing the filtering functions

In [6]:
train_path = dpath/'train'
clip, sample_rate = librosa.load(train_path.ls()[0])
Audio(clip, rate=sample_rate)

In [7]:
fmin = training_df.loc[0,'f_min']           
fmax = training_df.loc[0, 'f_max']
tmin = training_df.loc[0, 't_min']
tmax = training_df.loc[0, 't_max']
rel_clip, sr = make_rel_clip(dpath/'train'/f'{training_df.loc[0, "recording_id"]}.flac', tmin, tmax, fmin, fmax)
Audio(rel_clip, rate=sample_rate)

This sample was throwing an error, looked further into it for investigation and found that some fmax were above the frequencies sampled, resulting in the `try-except` in `butter_bandpass()`

In [8]:
clip, sample_rate = librosa.load(train_path.ls()[1])
Audio(clip, rate=sample_rate)

In [9]:
fmin = training_df.loc[1,'f_min']           
fmax = training_df.loc[1, 'f_max']
tmin = training_df.loc[1, 't_min']
tmax = training_df.loc[1, 't_max']
rel_clip, sr = make_rel_clip(dpath/'train'/f'{training_df.loc[46, "recording_id"]}.flac', tmin, tmax, fmin, fmax)
Audio(rel_clip, rate=sample_rate)

#### Filtering all clips

In [10]:
try:
    os.mkdir(dpath/'rel_train')
except FileExistsError:
    pass
for i in tqdm(range(len(training_df))):
    row = training_df.iloc[i]
    fpath = dpath/'train'/f'{row["recording_id"]}.flac'
    rpath = dpath/'rel_train'/f'{row["recording_id"]}.flac'
    if not os.path.exists(rpath):
        rel_clip, sr = make_rel_clip(fpath, row['t_min'], row['t_max'], row['f_min'], row['f_max'])
        sf.write(rpath, rel_clip, sr, format='flac', subtype='PCM_24')

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1216.0), HTML(value='')))




### Begin Modeling

In [24]:
train_files = get_files(dpath/'rel_train')
tfms = [torchaudio.transforms.MelSpectrogram(n_mels=128), torchaudio.transforms.AmplitudeToDB]
dblock = DataBlock(blocks=(TransformBlock(type_tfms=tfms), MultiCategoryBlock),                   
                 get_x=lambda x: torchaudio.load(x)[0],
                 get_y=lambda x: set(training_df[training_df.recording_id == x.stem].species_id) or {24})
dls = dblock.dataloaders(train_files, bs=16, num_workers=0)

In [29]:
learn = cnn_learner(dls, resnet18, config={"n_in":1})

In [30]:
learn.lr_find()

RuntimeError: stack expects each tensor to be equal size, but got [1, 128, 712] at entry 0 and [1, 128, 30] at entry 1