## Feature Extraction Notebook
---

This notebook is used to extract features from the data. The audio is divided into windows of 1 second. The features extracted from each audio window are:
- MFCC (Mel-Frequency Cepstral Coefficients): 20 coefficients
- Chroma STFT: 1 coefficient
- RMS (Root Mean Square) Energy: 1 coefficient
- Spectral Centroid: 1 coefficient
- Spectral Bandwidth: 1 coefficient
- Spectral Roll-off: 1 coefficient
- Zero Crossing Rate: 1 coefficient
  
#### Findings:
- We have highly unbalanced classes in the dataset.

In [50]:
# import all the functions
from utils import *
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [51]:
%%html
<style>
.cell-output-ipywidget-background {
    background-color: transparent !important;
}
:root {
    --jp-widgets-color: var(--vscode-editor-foreground);
    --jp-widgets-font-size: var(--vscode-editor-font-size);
}  
</style>

# -------- tqdm DARK THEME --------

### Features Extraction

In [52]:
# set the paths to data
BASE_DIR = '../dataset/'
ARTIFACTS_DIR = BASE_DIR + 'artifacts/'
EXTRAHLS_DIR = BASE_DIR + 'extrahls/'
MURMURS_DIR = BASE_DIR + 'murmurs/'
NORMALS_DIR = BASE_DIR + 'normals/'
EXTRASTOLES_DIR = BASE_DIR + 'extrastoles/'

# paths to save the features
#FEATURES_RAW_DIR = '../features/raw/'
FEATURES_RAW_DIR = "../features/balanced/priori/"

PATHS = [ARTIFACTS_DIR, EXTRAHLS_DIR, MURMURS_DIR, NORMALS_DIR, EXTRASTOLES_DIR]

# REMOVE THE AUGEMENTED SAMPLES WHEN EXTRACTING BASE RAW FEATURES
# for path in PATHS:
#      remove_generated_samples(path, 'USERAUGMENTED')

In [53]:
# FEATURES EXTRACTION
names = ['artifacts', 'extrahls', 'murmurs', 'normals', 'extrastoles']
features_dict = {}
window_length = 1
sample_rate = 'mix'
n_mfcc = 40
# melkwargs = {'n_fft': 2048, 'hop_length': 512, 'n_mels': 128}

# Save the features to a file
storing_name = f'full_data_{window_length}_{sample_rate}_{n_mfcc}'  ########## CHANGE THIS TO CHANGE THE NAME OF THE FILE


if os.path.exists(FEATURES_RAW_DIR + storing_name + '.npy'):
     print('The features have already been extracted')
     
else:
     for i, PATH_ in enumerate(PATHS):
          
          print(f'Extracting features from {PATH_}')
          features = extract_features(PATH_, label = i, frame_length = window_length, n_mfcc=n_mfcc)
          
          # Stack the features into a single tensor
          features = torch.cat(features, dim=0).numpy()
          print(f'The shape of the {PATH_} features tensor is: {features.shape}')
          
          X = features[:,:-2]
          y = features[:,-2]
          filename = features[:,-1]
          
          local_dict = {'X': X, 'y': y, 'filename': filename}
          
          features_dict[names[i]] = local_dict
               
     # Save the features to a file
     np.save(FEATURES_RAW_DIR + storing_name, features_dict)
               
     print('Features extracted and saved')

          

Extracting features from ../dataset/artifacts/


Extraction in progress:   0%|          | 0/84 [00:00<?, ?it/s]

Converting stereo audio to mono for artifact_2023_17.wav...




Converting stereo audio to mono for artifact_2023_16.wav...
Converting stereo audio to mono for artifact_2023_14.wav...


  return pitch_tuning(


Converting stereo audio to mono for artifact_2023_15.wav...
Converting stereo audio to mono for artifact_2023_39.wav...
Converting stereo audio to mono for artifact_2023_11.wav...
Converting stereo audio to mono for artifact_2023_38.wav...
Converting stereo audio to mono for artifact_2023_12.wav...
Converting stereo audio to mono for artifact_2023_13.wav...
Converting stereo audio to mono for artifact_2023_41.wav...
Converting stereo audio to mono for artifact_2023_40.wav...
Converting stereo audio to mono for artifact_2023_42.wav...
Converting stereo audio to mono for artifact_2023_43.wav...
Converting stereo audio to mono for artifact_2023_46.wav...
Converting stereo audio to mono for artifact_2023_44.wav...
Converting stereo audio to mono for artifact_2023_45.wav...
Converting stereo audio to mono for artifact_2023_36.wav...
Converting stereo audio to mono for artifact_2023_22.wav...
Converting stereo audio to mono for artifact_2023_4.wav...
Converting stereo audio to mono for artif

Extraction in progress:   0%|          | 0/150 [00:00<?, ?it/s]

No features extracted for speed_13USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for pitch_44USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for speed_20USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for pitch_12USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for extrahls__201104021355.wav. Skipping...
No features extracted for noisy_3USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for speed_28USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for speed_21USERAUGMENTEDextrahls__201104021355.wav. Skipping...
Finished processing all files.

The shape of the ../dataset/extrahls/ features tensor is: (883, 28)
Extracting features from ../dataset/murmurs/


Extraction in progress:   0%|          | 0/149 [00:00<?, ?it/s]

No features extracted for murmur__201104021355.wav. Skipping...
Converting stereo audio to mono for abnormal_s4_2023_2.wav...
No features extracted for murmur__171_1307971016233_E.wav. Skipping...
Converting stereo audio to mono for atrial_septal_defect_2023_6.wav...
Converting stereo audio to mono for abnormal_s3_2023_1.wav...
Converting stereo audio to mono for mitral_stenosis_2023_9.wav...
Converting stereo audio to mono for abnormal_s3_2023_0.wav...
Converting stereo audio to mono for holosystolic_murmur_2023_7.wav...
Converting stereo audio to mono for aortic_stenosis_2023_4.wav...
Converting stereo audio to mono for mitral_valve_prolapse_2023_10.wav...
Converting stereo audio to mono for innocent_murmur_2023_8.wav...
Finished processing all files.

The shape of the ../dataset/murmurs/ features tensor is: (1149, 28)
Extracting features from ../dataset/normals/


Extraction in progress:   0%|          | 0/355 [00:00<?, ?it/s]

No features extracted for normal__296_1311682952647_A1.wav. Skipping...
Converting stereo audio to mono for normal_2023_3.wav...
Converting stereo audio to mono for normal_2023_2.wav...
Converting stereo audio to mono for normal_2023_0.wav...
Converting stereo audio to mono for normal_2023_1.wav...
Finished processing all files.

The shape of the ../dataset/normals/ features tensor is: (2161, 28)
Extracting features from ../dataset/extrastoles/


Extraction in progress:   0%|          | 0/150 [00:00<?, ?it/s]

Finished processing all files.

The shape of the ../dataset/extrastoles/ features tensor is: (806, 28)
Features extracted and saved


In [54]:
# load the features and check the shape
features_dict = np.load(FEATURES_RAW_DIR + storing_name + '.npy', allow_pickle=True).item()
for key in features_dict.keys():
	print(f'The shape of the {key} features tensor is: {features_dict[key]["X"].shape}')
	print(f'The shape of the {key} labels tensor is: {features_dict[key]["y"].shape}')
	print(f'The shape of the {key} filenames tensor is: {features_dict[key]["filename"].shape}')
	print('-----------------------------------------------')

The shape of the artifacts features tensor is: (1199, 26)
The shape of the artifacts labels tensor is: (1199,)
The shape of the artifacts filenames tensor is: (1199,)
-----------------------------------------------
The shape of the extrahls features tensor is: (883, 26)
The shape of the extrahls labels tensor is: (883,)
The shape of the extrahls filenames tensor is: (883,)
-----------------------------------------------
The shape of the murmurs features tensor is: (1149, 26)
The shape of the murmurs labels tensor is: (1149,)
The shape of the murmurs filenames tensor is: (1149,)
-----------------------------------------------
The shape of the normals features tensor is: (2161, 26)
The shape of the normals labels tensor is: (2161,)
The shape of the normals filenames tensor is: (2161,)
-----------------------------------------------
The shape of the extrastoles features tensor is: (806, 26)
The shape of the extrastoles labels tensor is: (806,)
The shape of the extrastoles filenames tensor

In [55]:
s = 0
for key in features_dict.keys():
	s += features_dict[key]["X"].shape[0]
s

6198