
## Automated Audio Data Augmentation

### IMPORTANT:  <span style="color: red;">This code is only intended to be run ONCE. It generates all the new audio files!</span>

*Robin McBurnie*
<br>Updated June 2024

>Augment a small set of Audio samples using the **excellent** [Audiomentations](https://github.com/iver56/audiomentations) Python library to usefully enlarge the number of recognition samples available to our ML model and save them in a directory structure ready for easy import into the Dataset we intend to use.
<br>
*Remember it is better to apply these augmentations with a light touch*
<br>
Excessive mangling of the audio usually has a detrimental effect on model accuracy rather than improving resilience.
<br>

In [1]:
# Imports needed:
import os
from glob import glob
import numpy
from pathlib import Path
import soundfile as sf
import librosa
from audiomentations import Compose, AddBackgroundNoise, AddShortNoises, AddGaussianSNR, TimeStretch, BitCrush, PitchShift, BandPassFilter, TanhDistortion, Padding, ApplyImpulseResponse, PolarityInversion

This is the approximate code we'll use to apply the *augmod=Compose()* chain of functions to an individual sample.
We'll put it in a loop once we have set up the source and destination folders/files.

```
     sample, sr = librosa.load(<InputFileName>)
     augmodsample = augmod(sample, sr, Freeze)
     sf.write(<OutputFileName>, augmodsample, sr)
```

The above will be called between 10 and 25 times on each original input sample, depending how many aug samples we want, bearing in mind that we should end up with the same number of samples for each target word (command) we wish to identify.<br>
We also have to supply a suitable number of files containing background noises and other sounds that are similar to unwanted environmental contamination likely to be encountered by the system in normal use, to be randomly mixed (as in audio mixer) into parts or all of the augmented sample we want.

These are labelled as *BGNoise*.


First we need to provide the directory names for the audio files that will be used.
<br>
*DataDir* is the parent folder of all the sample folders.
<br>
   The name of each folder will be used to determine the label for the samples inside it when passed to the ML model.
<br>
*NoiseDir* is a folder containing useable noise.
<br>
*SoundsDir* is a folder containing a range of non-target sounds.
<br>
*ImpulseDir* is a folder containing some parameters for the Impulse filter.
<br>
<br>
<span style="color: CadetBlue;"> **Note that this is where you can set paths to your own version if using this code.** </span>
<br>

In [2]:
DataDir = Path("C:/Users/RM/Documents/ML/NIKKI/Data")

NoiseDir = Path("C:/Users/RM/Documents/ML/NIKKI/AudioNoise/Noise")
SoundsDir = Path("C:/Users/RM/Documents/ML/NIKKI/AudioNoise/Unknown")
ImpulseDir = Path("C:/Users/RM/Documents/ML/NIKKI/ImpulseEcho")


At the start of the augmentation process on each folder the names of all existing (captured) samples in that folder will be read into the list - *Sources*.
<br>
The final names of all generated (augmentation) files will be assigned based on the name of that folder from the list *Outputs*. So the file name will start with the Command Word the audio file contains.<br>
<br>
<span style="color: orange;"> ***This is important*** </span>
<br>
All augmentation files will be the same size as the captured files,  as the ML Model works on fixed size windows, which are chosen based on the sample length the model will be fed as input.<br>
<br>
We will work through each directory in turn adding the generated files.<br>
First we create a set with the same general parameters we will use on all samples this gives us some reasonably consistant augmentation samples.<br>
Next we add a further set of files with more randomised parameters for greater variance.<br>
Finally we generate a largish batch where multiple different filters may randomly be applied, all with randomised parameters.<br><br>

**Now we move on to the actual augmentation part!**

This starts with all the function definitions for the various filters/effects to be used. Note that all but one of them has a "Freeze" option that allows the current parameters (which usually take on fresh random values on each call) to be kept the same while we run each filter on all the original source files for a specific command word. This allows a degree of consistency to be provided for the first group of samples.<br>

In [3]:
# Background Noise
def BGNoise(audioArr, SR, Freeze):

    transform = AddBackgroundNoise(
        sounds_path = NoiseDir,
        min_snr_in_db=4.0,
        max_snr_in_db=30.0,
        noise_transform=PolarityInversion(),
        p=1.0
    )
    
    if Freeze:
        transform.freeze_parameters()
    else:
        transform.unfreeze_parameters()

    return transform(audioArr, SR)

In [4]:
# Gaussian SNR
def GaussianSNR(audioArr, SR, Freeze):
    
    transform = AddGaussianSNR(
        min_snr_db=5.0,
        max_snr_db=40.0,
        p=1.0
    )
    
    if Freeze:
        transform.freeze_parameters()
    else:
        transform.unfreeze_parameters()

    return transform(audioArr, SR)

In [5]:
# Short Noises
def ShortNoises(audioArr, SR, Freeze):
    
    transform = AddShortNoises(
        sounds_path=SoundsDir,
        min_snr_in_db=3.0,
        max_snr_in_db=30.0,
        noise_rms="relative_to_whole_input",
        min_time_between_sounds=0.2,
        max_time_between_sounds=0.9,
        noise_transform=PolarityInversion(),
        p=1.0
    )
        
    if Freeze:
        transform.freeze_parameters()
    else:
        transform.unfreeze_parameters()

    return transform(audioArr, SR)

In [6]:
# Time Stretch
def FlexiTime(audioArr, SR, Freeze):
    
    transform = TimeStretch(
        min_rate=0.8,
        max_rate=1.25,
        leave_length_unchanged=True,
        p=1.0
    )
    
    if Freeze:
        transform.freeze_parameters()
    else:
        transform.unfreeze_parameters()

    return transform(audioArr, SR)

In [7]:
# Bitcrush
def Crusher(audioArr, SR, Freeze):
    
    transform = BitCrush(min_bit_depth=8, max_bit_depth=14, p=1.0)
    
    if Freeze:
        transform.freeze_parameters()
    else:
        transform.unfreeze_parameters()

    return transform(audioArr, SR)

In [8]:
# Band Pass
def BandPass(audioArr, SR, Freeze):
    
    transform = BandPassFilter(min_center_freq=300, max_center_freq=800, p=1.0)
    
    if Freeze:
        transform.freeze_parameters()
    else:
        transform.unfreeze_parameters()

    return transform(audioArr, SR)

In [9]:
# Pitch Shift
def Pitchy(audioArr, SR, Freeze):
    
    transform = PitchShift(
        min_semitones=-2.0,
        max_semitones=1.4,
        p=1.0
    )
    
    if Freeze:
        transform.freeze_parameters()
    else:
        transform.unfreeze_parameters()

    return transform(audioArr, SR)

In [10]:
# Tan h Distortion
def TanHDist(audioArr, SR, Freeze):

    transform = TanhDistortion(
        min_distortion=0.01,
        max_distortion=0.7,
        p=1.0
    )
    
    if Freeze:
        transform.freeze_parameters()
    else:
        transform.unfreeze_parameters()

    return transform(audioArr, SR)

In [11]:
# Padding
def Padder(audioArr, SR, Freeze):

    transform = Padding(
        mode = "silence",
        min_fraction = 0.01,
        max_fraction = 0.12,
        pad_section = "start",
        p = 1.0
    )
    
    if Freeze:
        transform.freeze_parameters()
    else:
        transform.unfreeze_parameters()

    return transform(audioArr, SR)

In [12]:
# Impulse Response
def ImpulseResponse(audioArr, SR, Freeze):
    
    transform = ApplyImpulseResponse(ir_path = ImpulseDir, p=1.0)
    
    if Freeze:
        transform.freeze_parameters()
    else:
        transform.unfreeze_parameters()

    return transform(audioArr, SR)


<br>***This is the block that generates samples with possibly multiple filters applied with all parameters randomised***  
There is no "Freeze" option for this block so the parameters will change every time it is called.<br>It is useful for generating a decent number of fairly mildly mangled versions of the source files!

In [13]:
# Create an augmentation chain, with suitable ranges and probabilities. This can be run on each sample multiple times
def multiFX(audioArr, SR):
    augmod = Compose([
        AddGaussianSNR(min_snr_db=5.0, max_snr_db=30.0, p = 0.4),
        PitchShift(min_semitones = -1.8, max_semitones = 1, p = 0.36),
        TanhDistortion(min_distortion=0.01, max_distortion=0.7, p=0.42),
        Padding(mode="silence", min_fraction=0.01, max_fraction=0.12, pad_section="start", p=0.32),
        TimeStretch(min_rate=0.8, max_rate=1.05, leave_length_unchanged=True, p=0.25),
        ApplyImpulseResponse(ir_path=ImpulseDir, p=0.24),
        BitCrush(min_bit_depth=6, max_bit_depth=8, p=0.45)
    ])

    return augmod(audioArr, SR)

In [14]:
Mods = {BGNoise, GaussianSNR, ShortNoises, FlexiTime, Crusher, BandPass, Pitchy, TanHDist, Padder, ImpulseResponse}



####  <span style="color: red;">Warning from here on we are writing files to disk!</span>
**From here on we start looping through all the original source files, generating wave after wave of new Augmented files.**<br>
These new files are saved as the process continues so it is important for the software to keep careful track of file ID numbers.<br>

In [15]:
# Original files plus directory structure to use
Targets = []
Words = []
Sources = []
WordList =[]

# list of current file number ID for each Output Directory
IDList = []


#Get all the sub-directories (the Target word folders) in DataDir
Targets = Path(DataDir).glob('*')
for target in Targets:
    i = 0
    #Get the Words
    if target.is_dir():
        Words.append(target)
        tmpWPLen = len(os.path.dirname(target)) + 1
        tmpWItem = str(target) [(tmpWPLen):]
        WordList.append(tmpWItem)
        i = i + 1
print(i)

# Get original recorded source files for each word
for worddir in Words:
    Sources.append(os.listdir(worddir))
    IDList.append(100)  # The generated files will start  at 100


print(Words)
print()
print(WordList)
print()
print(IDList)
print()
print(Sources)
print()



1
[WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/0'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/1'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/2'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/3'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/4'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/5'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/6'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/7'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/8'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/9'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/Channel'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/Down'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/Go'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/Hey Nikki'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/High'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/Light'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/Low'), WindowsPath('C:/Users/RM/Documents/ML/NIKKI/Data/Med

#### Armed with the above lists of "Words", Directories and *Source* files we're ready to start generating new files now.  
First we generate a modified version of the 1st *Source file* for the 1st Word. We then **Freeze** the parameters for the modifier and apply it to each of the other source files for that word. Next we cycle through the folders for all the other words doing exactly the same thing - *with the parameters **still frozen***. This ensures that every source file has exactly the same modification applied to it.<br><br>
Going back to the 1st sample in the first folder, we **Unfreeze** the parameters, run the modifier and **Refreeze** the parameters. We then repeat as above. Altogether we perform the above **20 times for each modifier** in turn.<br><br>
Having done the above, we simply repeat the process the same number of times but leaving the **Freeze** flags set to *false*. This will randomize all parameters on every use of the *modifier*. The idea being to give a broader set of modifications generally.<br><br>

Each modification results in a *new* file that is stored in the appropriate directory. The first *new* file in each directory (i.e. for each Word) is saved with the Identifier 0100. Each further file has this identifier increased by 1, on a per-directory basis.<br>
When this section been run there will be: *number of Source Files* X *number of Modifiers* X *40* **new** files in **each** directory.

In [16]:
print(Mods)

{<function BandPass at 0x000001E97FD90E00>, <function ImpulseResponse at 0x000001E97FD91260>, <function GaussianSNR at 0x000001E974937CE0>, <function FlexiTime at 0x000001E97FD90900>, <function Pitchy at 0x000001E974937920>, <function TanHDist at 0x000001E97FD91120>, <function ShortNoises at 0x000001E97FD90F40>, <function BGNoise at 0x000001E974934D60>, <function Crusher at 0x000001E974937F60>, <function Padder at 0x000001E974D77F60>}


In [17]:
# Set up ID tracking
j = 0
for tWrd in Words:
    IDList [j] = 100
    j += 1

print("FROZEN")
# FROZEN PARAMETERS
# Repeat this bit 10 times to generate 10 sets of samples where the parameters are only
# randomised at the start of each set, then frozen through that set
for sets in range(0, 10):
    sdir=0   # Current (Word) source Directory
    sfile=0  # Current (Word) source File
    for word in Words:
        # V Dubugging V
        #print("Current Word:  " + WordList [sdir])
        for sfile in Sources [sdir]:
            for mod in Mods:
                tmpfilein = os.path.join(word , Path(sfile))
                # V Dubugging V
                #print("Input File:  " + str(tmpfilein))
                sample, sr = librosa.load(tmpfilein, sr=None)
                # V Dubugging V
                #print("Sample Rate:  ", str(sr))
                augmodsample = mod(sample, sr, Freeze=True)
                tmpID = str(IDList [sdir]).zfill(4)
                tmpfilename =  WordList [sdir] + "_" + tmpID + ".wav"
                tmpfileout = os.path.join(word, Path(tmpfilename))
                sf.write(tmpfileout, augmodsample, sr)
                # V Dubugging V
                print(str(tmpfileout), sr)
                #print()
                IDList [sdir] += 1
        sdir += 1
        print(IDList)


print("UNFROZEN")
# UNFROZEN PARAMETERS
# Repeat this bit 10 times to generate 10 sets of samples where the parameters
# are randomised for every modified output.
for sets in range(0, 10):
    sdir=0   # Current (Word) source Directory
    sfile=0  # Current (Word) source File
    for word in Words:
        # V Dubugging V
        #print("Current Word:  " + WordList [sdir])
        for sfile in Sources [sdir]:
            for mod in Mods:
                tmpfilein = os.path.join(word , Path(sfile))
                # V Dubugging V
                #print("Input File:  " + str(tmpfilein))
                sample, sr = librosa.load(tmpfilein, sr=None)
                # V Dubugging V
                #print("Sample Rate:  ", str(sr))
                augmodsample = mod(sample, sr, Freeze=False)
                tmpID = str(IDList [sdir]).zfill(4)
                tmpfilename =  WordList [sdir] + "_" + tmpID + ".wav"
                tmpfileout = os.path.join(word, Path(tmpfilename))
                sf.write(tmpfileout, augmodsample, sr)
                # V Dubugging V
                print(str(tmpfileout), sr)
                #print()
                IDList [sdir] += 1
        sdir += 1
        print(IDList)


FROZEN
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0100.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0101.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0102.wav 16000




C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0103.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0104.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0105.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0106.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0107.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0108.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0109.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0110.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0111.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0112.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0113.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0114.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0115.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0116.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0117.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0118.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0119.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_0120.wav 16000
C:\Users\R



C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0101.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0102.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0103.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0104.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0105.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0106.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0107.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0108.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0109.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0110.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0111.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0112.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0113.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0114.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0115.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0116.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0117.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_0118.wav 16000
C:\Users\R



C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0106.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0107.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0108.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0109.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0110.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0111.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0112.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0113.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0114.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0115.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0116.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0117.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0118.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0119.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0120.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0121.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0122.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\2\2_0123.wav 16000
C:\Users\R

#### Now the randomised modifier with various probabilities of more than one modifier applied to each sample.


In [18]:
print("RANDOMISED MULTI-MODS")
# Repeat this bit 50 times to generate LOTS of extra randomised samples.
for extras in range(0, 50):
    sdir=0   # Current (Word) source Directory
    sfile=0  # Current (Word) source File
    for word in Words:
        # V Dubugging V
        #print("Current Word:  " + WordList [sdir])
        for sfile in Sources [sdir]:
            tmpfilein = os.path.join(word , Path(sfile))
            # V Dubugging V
            #print("Input File:  " + str(tmpfilein))
            sample, sr = librosa.load(tmpfilein, sr=None)
            # V Dubugging V
            #print("Sample Rate:  ", str(sr))
            augmodsample = multiFX(sample, sr)
            tmpID = str(IDList [sdir]).zfill(4)
            tmpfilename =  WordList [sdir] + "_" + tmpID + ".wav"
            tmpfileout = os.path.join(word, Path(tmpfilename))
            sf.write(tmpfileout, augmodsample, sr)
            # V Dubugging V
            print(str(tmpfileout), sr)
            #print()
            IDList [sdir] += 1
        sdir += 1
        print(IDList)

RANDOMISED MULTI-MODS
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_1500.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_1501.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_1502.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_1503.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_1504.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_1505.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\0\0_1506.wav 16000
[1507, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500]
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_1500.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_1501.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_1502.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_1503.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_1504.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_1505.wav 16000
C:\Users\RM\Documents\ML\NIKKI\Data\1\1_1506.wav 16000
[1507, 1507, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 15