---
---

# Feature Engineering

---
---

_The assertions and methodologies outlined in this notebook are substantiated by referenced scientific studies detailed in the README file._

Load libraries and Data

In [1]:
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
import sys
sys.path.append("../src")
from data_manager import load_audio_files, filter_data_based_on_accents
from load_config import load_constants_from_yaml
from custom_transformers.split_silence_transformer import SplitSilenceTransformer
from custom_transformers.mfcc_transformer import MfccTransformer
from custom_transformers.expander_transformer import ExpanderTransformer

Loading all the needed constants

In [2]:
constants = load_constants_from_yaml('../constants.yml')

SAMPLING_RATING = constants["SAMPLING_RATING"]
FRAME_LENGTH_ENERGY = constants["FRAME_LENGTH_ENERGY"]
THRESHOLD_PERCENTAGE = constants["THRESHOLD_PERCENTAGE"]
MIN_SILENCE_DURATION = constants["MIN_SILENCE_DURATION"]
HOP_LENGTH = constants["HOP_LENGTH"]
SEGMENT_DURATION = constants["SEGMENT_DURATION"]
SEGMENT_OVERLAP = constants["SEGMENT_OVERLAP"]
N_MFCC = constants["N_MFCC"]
CONSIDERED_ACCENTS = constants["CONSIDERED_ACCENTS"]

In [3]:
df = load_audio_files("../data/raw/recordings/", sr=SAMPLING_RATING)
df = filter_data_based_on_accents(df=df, considered_accents=CONSIDERED_ACCENTS)

Trim silence from audio

In [4]:
split_tranformer=SplitSilenceTransformer(
    variables=['audio', 'labels'],
    sampling_rating=SAMPLING_RATING,
    threshold_percentage=THRESHOLD_PERCENTAGE,
    min_silence_duration=MIN_SILENCE_DURATION,
    frame_length_energy=FRAME_LENGTH_ENERGY,
    hop_length=HOP_LENGTH
)
split_tranformer.fit(df)

In [5]:
df=split_tranformer.transform(df)

In [6]:
df.shape

(242, 2)

---

One of the most commonly used spectral feature representations is the Mel-frequency cepstral coefficients (MFCC). MFCC features are generally employed in automatic speech recognition (ASR) and accent recognition systems and are known to perform best in shallow models. Spectrograms, on the other hand, are more effective in deep models and are sometimes utilized in accent recognition. We will extract MFCCs using the Librosa library.

Extract MFCC features from trimmed audio data (not on segmented audio data because the function itself will split the audio data into segments).

In [7]:
print(df.shape)
df.head

(242, 2)


<bound method NDFrame.head of                                                  audio   labels
0    [-0.00081889424, -0.0012332641, -0.0010821958,...  english
1    [-3.3004353e-06, 2.3220142e-05, -5.8616065e-06...  english
2    [1.5094573e-05, -4.2987816e-07, 2.1244243e-06,...  english
3    [0.0027380765, 0.0043952055, 0.004028226, 0.00...  english
4    [0.00018491458, -9.82639e-05, 6.523135e-05, -7...  english
..                                                 ...      ...
237  [0.0066564586, 0.0095736515, 0.008673913, 0.00...  english
238  [-5.1107454e-06, 1.4583517e-05, 2.057516e-05, ...   arabic
239  [-0.00048380543, -0.00065112824, -5.683335e-05...  english
240  [-1.4131354e-05, 2.5187623e-05, -1.1105545e-05...  english
241  [-0.0004948875, -0.0009152264, -0.0009689817, ...  english

[242 rows x 2 columns]>

In [8]:
mfcc_transformer=MfccTransformer(
    variables=["audio", "labels"],
    sampling_rating=SAMPLING_RATING, 
    n_mfcc=N_MFCC,
    duration=SEGMENT_DURATION,
    overlap=SEGMENT_OVERLAP
)
df=mfcc_transformer.fit_transform(df)
print('df shape : ',df.shape)
print('df.columns : ',df.columns)

df shape :  (242, 15)
df.columns :  Index(['audio', 'labels', 'mfcc_1', 'mfcc_2', 'mfcc_3', 'mfcc_4', 'mfcc_5',
       'mfcc_6', 'mfcc_7', 'mfcc_8', 'mfcc_9', 'mfcc_10', 'mfcc_11', 'mfcc_12',
       'mfcc_13'],
      dtype='object')


---

As highlighted in the [Exploratory Data Analysis (EDA) notebook](EDA.ipynb), accent recognition can be enhanced by focusing on specific intervals rather than analyzing the entire audio signal. To achieve this, we'll employ a transformer that expands the MFCCs features. Each feature will be represented in its own row along with its corresponding label (accent).

In [9]:
expander_transformer= ExpanderTransformer(columns_to_remain=["labels"], n_mfcc = N_MFCC)
df_expanded = expander_transformer.fit_transform(df)

 We are in iteration 0


  X_expanded = pd.concat([X_expanded, pd_row])


 We are in iteration 1
 We are in iteration 2
 We are in iteration 3
 We are in iteration 4
 We are in iteration 5
 We are in iteration 6
 We are in iteration 7
 We are in iteration 8
 We are in iteration 9
 We are in iteration 10
 We are in iteration 11
 We are in iteration 12
 We are in iteration 13
 We are in iteration 14
 We are in iteration 15
 We are in iteration 16
 We are in iteration 17
 We are in iteration 18
 We are in iteration 19
 We are in iteration 20
 We are in iteration 21
 We are in iteration 22
 We are in iteration 23
 We are in iteration 24
 We are in iteration 25
 We are in iteration 26
 We are in iteration 27
 We are in iteration 28
 We are in iteration 29
 We are in iteration 30
 We are in iteration 31
 We are in iteration 32
 We are in iteration 33
 We are in iteration 34
 We are in iteration 35
 We are in iteration 36
 We are in iteration 37
 We are in iteration 38
 We are in iteration 39
 We are in iteration 40
 We are in iteration 41
 We are in iteration 42
 

In [None]:
df_expanded = expander_transformer.fit_transform(df)

---