## Feature Extraction Notebook
---

This notebook is used to extract features from the data. The audio is divided into windows of 1 second. The features extracted from each audio window are:
- MFCC (Mel-Frequency Cepstral Coefficients): 20 coefficients
- Chroma STFT: 1 coefficient
- RMS (Root Mean Square) Energy: 1 coefficient
- Spectral Centroid: 1 coefficient
- Spectral Bandwidth: 1 coefficient
- Spectral Roll-off: 1 coefficient
- Zero Crossing Rate: 1 coefficient
  
#### Findings:
- We have highly unbalanced classes in the dataset.

In [3]:
# import all the functions
from utils import *
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
%%html
<style>
.cell-output-ipywidget-background {
    background-color: transparent !important;
}
:root {
    --jp-widgets-color: var(--vscode-editor-foreground);
    --jp-widgets-font-size: var(--vscode-editor-font-size);
}  
</style>

# -------- tqdm DARK THEME --------

### Features Extraction

In [5]:
# set the paths to data
BASE_DIR = '../dataset/'
ARTIFACTS_DIR = BASE_DIR + 'artifacts/'
EXTRAHLS_DIR = BASE_DIR + 'extrahls/'
MURMURS_DIR = BASE_DIR + 'murmurs/'
NORMALS_DIR = BASE_DIR + 'normals/'
EXTRASTOLES_DIR = BASE_DIR + 'extrastoles/'

# paths to save the features
FEATURES_RAW_DIR = '../features/raw/'

In [6]:
# FEATURES EXTRACTION
paths = [ARTIFACTS_DIR, EXTRAHLS_DIR, MURMURS_DIR, NORMALS_DIR, EXTRASTOLES_DIR]
names = ['artifacts', 'extrahls', 'murmurs', 'normals', 'extrastoles']
window_length = 1
sample_rate = 'mix'
n_mfcc = 20
# melkwargs = {'n_fft': 2048, 'hop_length': 512, 'n_mels': 128}

for i, path in enumerate(paths):
     print(i)
     
     # Save the features to a file
     name = f'{names[i]}_base_{window_length}_{sample_rate}.npz'  ########## CHANGE THIS TO CHANGE THE NAME OF THE FILE
     
     if os.path.exists(FEATURES_RAW_DIR + name):
          print('The features have already been extracted')
     else:
          print(f'Extracting features from {path}')
          features = extract_features(path, label = i, frame_length = window_length, n_mfcc=n_mfcc)
          
          # Stack the features into a single tensor
          features = torch.cat(features, dim=0)
          print(f'The shape of the {path} features tensor is: {features.shape}')
          
          # if returns None continue
          if save_features(features, FEATURES_RAW_DIR, name) == None:
               continue
    
     print('Features extracted and saved')

          

0
The features have already been extracted
Features extracted and saved
1
The features have already been extracted
Features extracted and saved
2
The features have already been extracted
Features extracted and saved
3
The features have already been extracted
Features extracted and saved
4
The features have already been extracted
Features extracted and saved


#### Train, Validation and Test Data