# PreProcessing 16bit

This notebook will be used to build a pipeline that will preprocess samples on my local machine to prepare them for use in training the classification model.

This pipeline will perform three main functions.

1. Crawl through local folders of samples previously downloaded from Splice.com and load all files found into the pipeline.
2. Use keywords in the file names to determine the proper training labels and filter out files that would be detrimental to training.
3. Standardize samplerates, volume normalization, and sample length before writing new wav files that contain the training labels in their names.

In my initial attempts at implementing this pipeline, I ran into issues with mapping my processing functions and training the classification model on the training set. This is partially due to my use of librosa to load audio files in the model-training pipeline. I will attempt to only use tensorflow in the model-training pipeline, but tf.audio.decode_wav only accepts 16bit wav files. To accomodate this, I will need to alter this preprocessing pipeline to write our training samples as 16bit wav files.

# 0. Import packages

In [2]:
import os
from matplotlib import pyplot as plt
import numpy as np
import re

import librosa
import soundfile as sf

# 1. Build and test functions on single samples

## 1.1 Define test filenames

In [3]:
SHORT_SAMPLE_FILEPATH: str = '/Users/tyler/2TB SSD/Samples/Splice - TYX/sounds/packs/8-Bitstep/PL0347_WAV_ACID_8_-_Bitstep/Prime_Loops_-_8-Bitstep/Drum_One-Shots/Kicks/'
SHORT_SAMPLE_FILENAME: str = 'Kick_06.wav'
LONG_SAMPLE_FILEPATH: str = '/Users/tyler/2TB SSD/Samples/Splice - TYX/sounds/packs/Bedroom Pop/synth/Loops/'
LONG_SAMPLE_FILENAME: str = 'JP_BP_synth_loop_wet_simple_124_Gmin.wav'

NEG_SAMPLE_FILEPATH: str = '/Users/tyler/2TB SSD/Samples/Splice - TYX/sounds/packs/Industry Vol. 1/'
NEG_SAMPLE_FILENAME: str = 'AirCompressorRelease_SFXB.2.wav'

ASD_SAMPLE_FILEPATH: str = '/Users/tyler/2TB SSD/Samples/Splice - TYX/sounds/packs/deadmau5 - Chimaera'
ASD_SAMPLE_FILENAME: str = 'mau5_kick_04_Fm.wav.asd'

## 1.2 Define target sample rate

All samples in this library are sampled at least 44.1kHz. As a safe bet against potential aliasing of high frequencies, we will downsample to 44.1kHz. If training a model with a larger dataset, or if otherwise concerned about storage overhead, we could downsample to 22kHz, but we would risk aliasing effects that could interfere with our model. 

In [4]:
TARGET_SAMPLE_RATE: int = 44100

## 1.3 Load audio samples

In [5]:
# returns array representing audio sample at the target sample rate and the sample file name for labeling
def load_sample(filepath: str, filename: str) -> tuple[list[float], str]:
    full_file_path: str = os.path.join(filepath, filename)
    audio, sample_rate = librosa.load(full_file_path, mono=True, sr=TARGET_SAMPLE_RATE)
    return (audio, filename)

In [6]:
short_test = load_sample(SHORT_SAMPLE_FILEPATH, SHORT_SAMPLE_FILENAME)
short_test 

(array([ 2.3841858e-07,  2.3841858e-07,  1.7881393e-07, ...,
        -1.1920929e-07,  0.0000000e+00, -2.3841858e-07], dtype=float32),
 'Kick_06.wav')

In [7]:
long_test = load_sample(LONG_SAMPLE_FILEPATH, LONG_SAMPLE_FILENAME)
long_test

(array([-0.00062246, -0.00097344, -0.00067311, ..., -0.01063265,
        -0.01121341,  0.        ], dtype=float32),
 'JP_BP_synth_loop_wet_simple_124_Gmin.wav')

## 1.4 Label samples appropriately

In [8]:
def label_percussion(sample: tuple[list[float], str]) -> tuple[list[float], str]:
    audio, filename = sample
    
    if re.search(r'kick', filename.lower()):
        return (audio, 'kick')
    elif re.search(r'snare', filename.lower()):
        return (audio,'snare')
    elif re.search(r'clap', filename.lower()):
        return (audio, 'clap')
    elif re.search(r'hat', filename.lower()):
        return (audio, 'hat')
    elif re.search(r'crash|ride|splash|china|trash', filename.lower()):
        return (audio, 'cymbal')
    elif re.search(r'perc|tom', filename.lower()):
        return (audio, 'perc')
    else:
        return (audio, None)

In [9]:
def label_sustains(sample: tuple[list[float], str]) -> tuple[list[float], str]:
    audio, filename = sample

    if re.search(r'808', filename.lower()):
        return (audio, '808')
    elif re.search(r'bass|reese', filename.lower()):
        return (audio, 'bass')
    elif re.search(r'vocal|vox|shout|chant', filename.lower()):
        return (audio, 'vocal')
    elif re.search(r'synth|pad|drone|atmosphere', filename.lower()):
        return (audio,'synth')
    elif re.search(r'guitar', filename.lower()):
        return (audio, 'guitar')
    else:
        return (audio, None)

In [10]:
# remember to skip .asd filenames when implementing pipeline
def label_sample(sample: tuple[list[float], str])-> tuple[list[float], str]:
    audio, filename = sample
    audio, label = audio, None

    # skip irrelevant samples
    if re.search(r'wavetable|fx|fill', filename):
        return audio, label

    # attempt to label percussion
    if not re.search(r'loop', filename.lower()):
        audio, label = label_percussion(sample)

    # attempt to label sustains
    if not label:
        audio, label = label_sustains(sample)

    return (audio, label)

In [11]:
short_test_labeled = label_sample(short_test)
short_test_labeled

(array([ 2.3841858e-07,  2.3841858e-07,  1.7881393e-07, ...,
        -1.1920929e-07,  0.0000000e+00, -2.3841858e-07], dtype=float32),
 'kick')

In [12]:
long_test_labeled = label_sample(long_test)
long_test_labeled

(array([-0.00062246, -0.00097344, -0.00067311, ..., -0.01063265,
        -0.01121341,  0.        ], dtype=float32),
 'synth')

## 1.5 Trim or pad audio to standard length

In [13]:
STANDARD_SAMPLE_LENGTH_SEC: int = 3
STANDARD_SAMPLE_LENGTH_SAMPLES: int = TARGET_SAMPLE_RATE * STANDARD_SAMPLE_LENGTH_SEC

In [14]:
def trim_sample_length(sample: tuple[list[float], str]) -> tuple[list[float], str]:
    audio, label = sample

    audio = audio[:STANDARD_SAMPLE_LENGTH_SAMPLES]
    
    zero_padding = np.zeros(STANDARD_SAMPLE_LENGTH_SAMPLES - len(audio), dtype=float)
    return (np.append(audio, zero_padding), label)

In [15]:
short_test_std = trim_sample_length(short_test_labeled)
short_test_std

(array([2.38418579e-07, 2.38418579e-07, 1.78813934e-07, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00]),
 'kick')

In [16]:
short_test_std[0].shape

(132300,)

In [17]:
long_test_std = trim_sample_length(long_test_labeled)
long_test_std

(array([-0.00062246, -0.00097344, -0.00067311, ..., -0.01021675,
        -0.01180182, -0.01450411]),
 'synth')

In [18]:
long_test_std[0].shape

(132300,)

## 1.6 Normalize audio volume

In [19]:
def normalize_sample(sample: tuple[list[float], str]) -> tuple[list[float], str]:
    audio, label = sample
    return (librosa.util.normalize(audio), label)

In [20]:
short_test_norm = normalize_sample(short_test_std)
short_test_norm

(array([3.00153468e-07, 3.00153468e-07, 2.25115101e-07, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00]),
 'kick')

In [22]:
abs(short_test_norm[0]).max()

1.0

In [77]:
long_test_norm = normalize_sample(long_test_std)
long_test_norm

(array([-0.0007198 , -0.00112567, -0.00077838, ..., -0.01181446,
        -0.0136474 , -0.01677229]),
 'synth')

In [23]:
abs(short_test_norm[0]).max()

1.0

## 1.7 Write samples to new .wav files

In [86]:
PROCESSED_SAMPLES_DIR: str = 'preprocessed_samples_16bit'

In [87]:
sf.available_subtypes('WAV')

{'PCM_16': 'Signed 16 bit PCM',
 'PCM_24': 'Signed 24 bit PCM',
 'PCM_32': 'Signed 32 bit PCM',
 'PCM_U8': 'Unsigned 8 bit PCM',
 'FLOAT': '32 bit float',
 'DOUBLE': '64 bit float',
 'ULAW': 'U-Law',
 'ALAW': 'A-Law',
 'IMA_ADPCM': 'IMA ADPCM',
 'MS_ADPCM': 'Microsoft ADPCM',
 'GSM610': 'GSM 6.10',
 'G721_32': '32kbs G721 ADPCM',
 'NMS_ADPCM_16': '16kbs NMS ADPCM',
 'NMS_ADPCM_24': '24kbs NMS ADPCM',
 'NMS_ADPCM_32': '32kbs NMS ADPCM',
 'MPEG_LAYER_III': 'MPEG Layer III'}

In [88]:
def write_sample_file(sample: tuple[list[float], str], sample_id: str) -> None:
    audio, label = sample

    file_name: str = f"{label}_{sample_id.zfill(6)}.wav"
    file_path: str = os.path.join(PROCESSED_SAMPLES_DIR, file_name)

    sf.write(file_path, audio, TARGET_SAMPLE_RATE, subtype='PCM_16', format='WAV') # changed from 24 to 16bit for compatibility with tensorflow decode_wav()

In [89]:
write_sample_file(short_test_norm, '13')

In [90]:
write_sample_file(long_test_norm, '57389')

# 1.8 Write wrapper function for all preprocessing functions

In [91]:
def process_sample(filepath: str, filename: str, file_id: str) -> None:
    sample_raw: tuple[list[float], str] = load_sample(filepath, filename)
    sample_labelled: tuple[list[float], str] = label_sample(sample_raw)
    if not sample_labelled[1]:
        return
    sample_trim: tuple[list[float], str] = trim_sample_length(sample_labelled)
    sample_norm: tuple[list[float], str] = normalize_sample(sample_trim)
    write_sample_file(sample_norm, file_id)

In [92]:
process_sample(SHORT_SAMPLE_FILEPATH, SHORT_SAMPLE_FILENAME, '6483')
process_sample(NEG_SAMPLE_FILEPATH, NEG_SAMPLE_FILENAME, '36413')
process_sample(LONG_SAMPLE_FILEPATH, LONG_SAMPLE_FILENAME, '885')

# 2. Crawl through folders, applying transform pipeline to each sample

In [93]:
DATA_PATH = '/Users/tyler/2TB SSD/Samples/Splice - TYX/sounds/packs'

In [94]:
for dir_num, (root, dirs, files) in enumerate(os.walk(DATA_PATH)):
    for file_num, file in enumerate(files):
        if file.endswith(".wav"):
            file_id: str = str(dir_num + 1) + str(file_num + 1)
            process_sample(root, file, file_id)
            