<p style="font-size:28px;font-weight:bold;">Module 1 - Data Input</p>
In order to simplify the handling of data further on, this module is responsible for reading in all audio data, performing the feature extraction and labeling, and saving the result to Numpy files. </br>
</br>
Data will be read in from all wav files in the dataset directory, with the labels being determined by the path name. Do not place any unrelated wav files, as they will cause issues.</br>
For example, files in : '.\Dataset\Pump 1\run\' will be labeled as Pump 1 Run

In [1]:
import librosa
import numpy as np
import os
import torchaudio
from torchaudio import transforms

In [2]:
## For labels, pumps 1-4 = labels 0, 3, 6, and 9
## For run state, add: start = 0, run = 1, stop = 2

samples = 0;
for root,dirs,files in os.walk(".\\Dataset", topdown=True):
    for name in files:
        if(".wav" in name):
            samples += 1;
paths = [];
labels= np.zeros(samples);
print(f"Number of recordings: {labels.shape[0]}");
i=0;
for root,dirs,files in os.walk(".\\Dataset", topdown=True):
    for name in files:
        if("Pump 1" in root):
            paths.append(os.path.join(root,name));
            labels[i] = 0;
            if("run" in root):
                labels[i] += 1;
            elif("start" in root):
                labels[i] += 0;
            elif("stop" in root):
                labels[i] += 2;
        elif("Pump 2" in root):
            paths.append(os.path.join(root,name));
            labels[i] = 3;
            if("run" in root):
                labels[i] += 1;
            elif("start" in root):
                labels[i] += 0;
            elif("stop" in root):
                labels[i] += 2;
        elif("Pump 3" in root):
            paths.append(os.path.join(root,name));
            labels[i] = 6;
            if("run" in root):
                labels[i] += 1;
            elif("start" in root):
                labels[i] += 0;
            elif("stop" in root):
                labels[i] += 2;
        elif("Pump 4" in root):
            paths.append(os.path.join(root,name));
            labels[i] = 9;
            if("run" in root):
                labels[i] += 1;
            elif("start" in root):
                labels[i] += 0;
            elif("stop" in root):
                labels[i] += 2;
        else:
            print("Bad File " + str(i));
            print(os.path.join(root,name));
            i += -1;
        i += 1;

Number of recordings: 918


<p style="font-size:20px;font-weight:bold;">Feature Extractions</p>
The following lines perform feature extraction on the files searched above. This includes:</br>
<ul>
    <li>Mel-Frequency Cepstrum Coefficients</li>
    <li>Linear-Frequency Cepstrum Coefficients</li>
    <li>Linear Prediction Coefficients</li>
</ul>
The MFCC and LPC processing is performed by the librosa library, while the LFCC processing is performed by TorchAudio, an extension of PyTorch</br>
The number of samples read from each file is capped by the value determined to be the length of the shortest recording. This is so dimensionality remains constant across all recordings.

In [3]:
shortestSamples = 46298*2;                    # This value is the length (in samples) of the shortest recording in the set.
samples_LPC = np.zeros((len(paths),14));      # LPC returns a 1x(1+order) array for each recording
samples_LFCC = np.zeros((len(paths),13,463)); # LFCC Returns an (n_lfcc)x(463) array for this length of recording
samples_MFCC = np.zeros((len(paths),13,181)); # MFCC Returns an (n_mfcc)x(181) array for this length of recording

count = 0;
for path in paths:
    y,sr = librosa.load(path,sr=None);
    samples_LPC[count] = librosa.lpc(y[:shortestSamples],order=13)
    samples_MFCC[count] = librosa.feature.mfcc(y=y[:shortestSamples], sr=sr, n_mfcc=13);
    
    y,sr = torchaudio.load(path);
    transform = transforms.LFCC(sample_rate = sr, n_lfcc=13);
    samples_LFCC[count] = transform(y[0][:shortestSamples])
    count += 1;

  samples_LFCC[count] = transform(y[0][:shortestSamples])


The outputs from these feature extractions are 3-Dimensional for all but the Linear Prediction Coefficients. We can save these directly for use with the Neural Networks, but to improve handling with traditional learning methods, we will also transform the data to a 2-Dimensional set and save that.

In [4]:
np.save("3D_LFCC.npy",samples_LFCC);
np.save("3D_MFCC.npy",samples_MFCC);

In order to reshape the 3D data to 2D, we use the numpy reshape call. </br>This function takes the dimension of the first dimension, which we want to remain constant, and by passing -1 to the second argument, we reduce dimensionality by one. </br>This transforms our data from a <b>[recordings]x[mfccs]x[time]</b> array to a <b>[recordings]x[mfccs*time]</b> array

In [5]:
samples_LFCC2D = samples_LFCC.reshape(samples_LFCC.shape[0],-1);
samples_MFCC2D = samples_MFCC.reshape(samples_MFCC.shape[0],-1);

In [7]:
print(f"The 3D MFCC data is {samples_MFCC.shape} in dimension");
print(f"The 3D LFCC data is {samples_LFCC.shape} in dimension");
print(f"The 2D MFCC data is {samples_MFCC2D.shape} in dimension");
print(f"The 2D LFCC data is {samples_LFCC2D.shape} in dimension");

The 3D MFCC data is (918, 13, 181) in dimension
The 3D LFCC data is (918, 13, 463) in dimension
The 2D MFCC data is (918, 2353) in dimension
The 2D LFCC data is (918, 6019) in dimension


In [6]:
np.save("2D_LPC.npy",samples_LPC);
np.save("2D_MFCC.npy", samples_MFCC2D);
np.save("2D_LFCC.npy", samples_LFCC2D);

Finally, we need to save the labels to a Numpy file as well

In [8]:
np.save("Labels.npy", labels);