## Feature Extraction Notebook
---

This notebook is used to extract features from the data. The audio is divided into windows of 1 second. The features extracted from each audio window are:
- MFCC (Mel-Frequency Cepstral Coefficients): 20 coefficients
- Chroma STFT: 1 coefficient
- RMS (Root Mean Square) Energy: 1 coefficient
- Spectral Centroid: 1 coefficient
- Spectral Bandwidth: 1 coefficient
- Spectral Roll-off: 1 coefficient
- Zero Crossing Rate: 1 coefficient
  
#### Findings:
- We have highly unbalanced classes in the dataset.

In [5]:
# import all the functions
from utils import *
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [6]:
%%html
<style>
.cell-output-ipywidget-background {
    background-color: transparent !important;
}
:root {
    --jp-widgets-color: var(--vscode-editor-foreground);
    --jp-widgets-font-size: var(--vscode-editor-font-size);
}  
</style>

# -------- tqdm DARK THEME --------

### Features Extraction

In [7]:
# set the paths to data
BASE_DIR = '../dataset/'
ARTIFACTS_DIR = BASE_DIR + 'artifacts/'
EXTRAHLS_DIR = BASE_DIR + 'extrahls/'
MURMURS_DIR = BASE_DIR + 'murmurs/'
NORMALS_DIR = BASE_DIR + 'normals/'
EXTRASTOLES_DIR = BASE_DIR + 'extrastoles/'

# paths to save the features
#FEATURES_RAW_DIR = '../features/raw/'
FEATURES_RAW_DIR = "../features/balanced/priori/"

In [8]:
# FEATURES EXTRACTION
paths = [ARTIFACTS_DIR, EXTRAHLS_DIR, MURMURS_DIR, NORMALS_DIR, EXTRASTOLES_DIR]
names = ['artifacts', 'extrahls', 'murmurs', 'normals', 'extrastoles']
window_length = 1
sample_rate = 'mix'
n_mfcc = 20
# melkwargs = {'n_fft': 2048, 'hop_length': 512, 'n_mels': 128}

for i, path in enumerate(paths):
     print(i)
     
     # Save the features to a file
     name = f'{names[i]}_{window_length}_{sample_rate}.npz'  ########## CHANGE THIS TO CHANGE THE NAME OF THE FILE
     
     if os.path.exists(FEATURES_RAW_DIR + name):
          print('The features have already been extracted')
     else:
          print(f'Extracting features from {path}')
          features = extract_features(path, label = i, frame_length = window_length, n_mfcc=n_mfcc)
          
          # Stack the features into a single tensor
          features = torch.cat(features, dim=0)
          print(f'The shape of the {path} features tensor is: {features.shape}')
          
          # if returns None continue
          if save_features(features, FEATURES_RAW_DIR, name) == None:
               continue
    
     print('Features extracted and saved')

          

0
Extracting features from ../dataset/artifacts/


Extraction in progress:   0%|          | 0/92 [00:00<?, ?it/s]



Converting stereo audio to mono for artifact_2023_45.wav...
Converting stereo audio to mono for artifact_2023_18.wav...


  return pitch_tuning(


Converting stereo audio to mono for artifact_2023_17.wav...
Converting stereo audio to mono for artifact_2023_23.wav...
Converting stereo audio to mono for artifact_2023_22.wav...
Converting stereo audio to mono for artifact_2023_34.wav...
Converting stereo audio to mono for artifact_2023_40.wav...
Converting stereo audio to mono for artifact_2023_6.wav...
Converting stereo audio to mono for artifact_2023_20.wav...
Converting stereo audio to mono for artifact_2023_49.wav...
Converting stereo audio to mono for artifact_2023_41.wav...
Converting stereo audio to mono for artifact_2023_14.wav...
Converting stereo audio to mono for artifact_2023_19.wav...
Converting stereo audio to mono for artifact_2023_11.wav...
Converting stereo audio to mono for artifact_2023_37.wav...
Converting stereo audio to mono for artifact_2023_38.wav...
Converting stereo audio to mono for artifact_2023_50.wav...
Converting stereo audio to mono for artifact_2023_44.wav...
Converting stereo audio to mono for artif

Extraction in progress:   0%|          | 0/150 [00:00<?, ?it/s]

No features extracted for pitch_4USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for pitch_8USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for speed_4USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for noisy_35USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for speed_20USERAUGMENTEDextrahls__201104021355.wav. Skipping...
No features extracted for extrahls__201104021355.wav. Skipping...
Finished processing all files.

The shape of the ../dataset/extrahls/ features tensor is: torch.Size([1026, 48])
Saving features...
2
Extracting features from ../dataset/murmurs/


Extraction in progress:   0%|          | 0/149 [00:00<?, ?it/s]

Converting stereo audio to mono for innocent_murmur_2023_8.wav...
No features extracted for murmur__171_1307971016233_E.wav. Skipping...
Converting stereo audio to mono for atrial_septal_defect_2023_6.wav...
No features extracted for murmur__201104021355.wav. Skipping...
Converting stereo audio to mono for mitral_valve_prolapse_2023_10.wav...
Converting stereo audio to mono for abnormal_s3_2023_1.wav...
Converting stereo audio to mono for holosystolic_murmur_2023_7.wav...
Converting stereo audio to mono for abnormal_s3_2023_0.wav...
Converting stereo audio to mono for abnormal_s4_2023_2.wav...
Converting stereo audio to mono for mitral_stenosis_2023_9.wav...
Converting stereo audio to mono for aortic_stenosis_2023_4.wav...
Finished processing all files.

The shape of the ../dataset/murmurs/ features tensor is: torch.Size([1149, 48])
Saving features...
3
Extracting features from ../dataset/normals/


Extraction in progress:   0%|          | 0/355 [00:00<?, ?it/s]

Converting stereo audio to mono for normal_2023_3.wav...
No features extracted for normal__296_1311682952647_A1.wav. Skipping...
Converting stereo audio to mono for normal_2023_0.wav...
Converting stereo audio to mono for normal_2023_2.wav...
Converting stereo audio to mono for normal_2023_1.wav...
Finished processing all files.

The shape of the ../dataset/normals/ features tensor is: torch.Size([2161, 48])
Saving features...
4
Extracting features from ../dataset/extrastoles/


Extraction in progress:   0%|          | 0/150 [00:00<?, ?it/s]

Finished processing all files.

The shape of the ../dataset/extrastoles/ features tensor is: torch.Size([777, 48])
Saving features...
