# Audio Feature Extraction and Preprocessing Documentation

## Introduction

This document explains the process of extracting audio features from a set of audio files and performing preprocessing steps. The provided code utilizes the `librosa` library for feature extraction and includes a custom class structure for enhanced organization.

## Libraries Installation

# Install librosa
pip install librosa

# Install pandas and numpy (if not already installed)
pip install pandas numpy

# Install scikit-learn (if not already installed)
pip install scikit-learn

In [1]:
# !pip install librosa

In [2]:
# !pip install essentia

In [3]:
#!pip install opensmile

In [4]:
# !pip show librosa

In [91]:
import librosa
import pandas as pd
import numpy as np
import os
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# import seaborn as sns
# import matplotlib.pyplot as plt

# 1. Folder Structure
The `Audiofile` class manages folder paths corresponding to different audio commands. Each folder represents a specific action or command, and the paths are stored in a dictionary.

# 2. Feature Extraction
The `Feature_Extraction` class extracts audio features using the `librosa` library, including chroma features, mel spectrogram, MFCCs, and more.

# 3. Mean Calculation
The mean of each feature across time is calculated to obtain a single representative value for each feature.

# 4. Data Preprocessing
The `Data_Preprocessing` class combines feature extraction and preprocessing. It uses the `Feature_Extraction` class to obtain mean feature values and performs additional steps, including label encoding and feature scaling.

# 5. Dataset Creation
A Pandas DataFrame is created, where each row corresponds to an audio file, and columns represent mean values of different audio features. The last column contains labels.

## Execution Steps
1. **Import Libraries:** Ensure required libraries are installed.

2. **Class Initialization:**
   - Create an instance of `Audiofile` to manage folder paths.
   - Create an instance of `Feature_Extraction` for feature extraction.

3. **Feature Extraction:**
   - Use `compute_features` method of `Feature_Extraction` to extract features.
   - Calculate mean values for each audio file.

4. **Dataset Creation:**
   - Store resulting data in a Pandas DataFrame.
   - Columns represent audio features, and each row corresponds to an audio file.

5. **Dataset Export (Optional):**
   - Export dataset to a CSV file for further analysis.

## Conclusion
This process provides a structured approach to extracting meaningful features from audio files and preparing data for machine learning tasks. The resulting dataset can be used for training and evaluating machine learning models for audio classification or related tasks.

In [6]:
class Audiofile:
    def __init__(self):
        self._folder_paths = {
            'assistance off': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Assistance off",
            'assistance on': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Assistance on",
            'Turn off Wi-Fi.': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Turn off wifi",
            'Turn off Bluetooth.': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Turn of bluetooth",
            'Open control panel': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\open control panel",
            'Stop Movie': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Stop movie",
            'Play Movie': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Play movie",
            'Next Movie': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Next Movie",
            'Unmute Volume': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Unmute Volume",
            'Volume Down': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Volume Down",
            'Volume Up': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Volume up",
            'Open Start Menu': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Open start menu",
            'zoom in': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Zoom in",
            'zoom out': r"C:\Users\anasa\Desktop\JARVIS-Urdu-Voice-Assistant-\Zoom out",
        }
    def get_folder_names(self):
        return list(self._folder_paths.keys())
    def get_folderpaths(self):
        # Function to get the path of folder
        return self._folder_paths

In [7]:
audio = Audiofile()
folders = audio.get_folderpaths()
print(folders)

{'assistance off': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\Assistance off', 'assistance on': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\Assistance on', 'Turn off Wi-Fi.': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\Turn off wifi', 'Turn off Bluetooth.': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\Turn of bluetooth', 'Open control panel': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\open control panel', 'Stop Movie': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\Stop movie', 'Play Movie': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\Play movie', 'Next Movie': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\Next Movie', 'Unmute Volume': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\Unmute Volume', 'Volume Down': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\Volume Down', 'Volume Up': 'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-\\Volume up', 'Ope

In [80]:
class Feature_Extraction:
    
    def __init__(self):
         self.X=[]
#         self.features={}

    def extract_features(self,audio_data, sr):
        
        features = {}
        # Existing Librosa Features
        features["chroma_stft"] = librosa.feature.chroma_stft(y=audio_data, sr=sr)
        features["chroma_cqt"] = librosa.feature.chroma_cqt(y=audio_data, sr=sr)
        features["chroma_cens"] = librosa.feature.chroma_cens(y=audio_data, sr=sr)
        features["melspectrogram"] = librosa.feature.melspectrogram(y=audio_data, sr=sr)
        features["mfccs"] = librosa.feature.mfcc(y=audio_data, sr=sr)
        features["rms"] = librosa.feature.rms(y=audio_data)
        features["spectral_centroid"] = librosa.feature.spectral_centroid(y=audio_data, sr=sr)
        features["spectral_bandwidth"] = librosa.feature.spectral_bandwidth(y=audio_data, sr=sr)
        features["spectral_contrast"] = librosa.feature.spectral_contrast(y=audio_data, sr=sr)
        features["spectral_flatness"] = librosa.feature.spectral_flatness(y=audio_data)
        features["spectral_rolloff"] = librosa.feature.spectral_rolloff(y=audio_data, sr=sr)
        features["poly_features"] = librosa.feature.poly_features(y=audio_data, sr=sr)
        features["zero_crossing_rate"] = librosa.feature.zero_crossing_rate(y=audio_data)

        # Additional Librosa Features
        features["harmonic_centroid"] = librosa.feature.spectral_centroid(y=librosa.effects.harmonic(audio_data), sr=sr)
        features["harmonic_tonnetz"] = librosa.effects.harmonic(librosa.feature.tonnetz(y=audio_data, sr=sr))
        features["harmonic_rms"] = librosa.feature.rms(y=librosa.effects.harmonic(audio_data))
        features["harmonic_spectral_flatness"] = librosa.feature.spectral_flatness(y=librosa.effects.harmonic(audio_data))
        features["harmonic_spectral_contrast"] = librosa.feature.spectral_contrast(y=librosa.effects.harmonic(audio_data), sr=sr)
        features["harmonic_spectral_rolloff"] = librosa.feature.spectral_rolloff(y=librosa.effects.harmonic(audio_data), sr=sr)
        features["harmonic_zero_crossing_rate"] = librosa.feature.zero_crossing_rate(y=librosa.effects.harmonic(audio_data))
        
        return features
    
    def calculate_mean(self,features):
        mean=[]
        for feature_name, feature_values in features.items():
            # Calculate mean
            feature_mean = np.mean(feature_values, axis=1)
            Final_feature_mean=np.mean(feature_mean, axis=0)
            mean.append(Final_feature_mean)
        return mean
    
    def compute_features(self):
        audios=Audiofile()
        paths =audios.get_folderpaths()             #folder name : folder path
        for folder,path in paths.items():
#             print(folder," : ",path)
            os.chdir(path)
#             print()
#             print(os.getcwd(),"\n",os.listdir())
            for one in os.listdir():
#                 print(one)
                audio_data,sr=librosa.load(one)
                features = self.extract_features(audio_data, sr)
                mean=self.calculate_mean(features)
                mean.append(folder)
                self.X.append(mean)
#             self.X.append(folder)

In [87]:
# feature_extractor=Feature_Extraction()
# audio_data, sr = librosa.load('Recording 112.wav')
# features = feature_extractor.extract_features(audio_data, sr)  
# features
# # # print(os.getcwd(),"\n",os.listdir())
data=Feature_Extraction()
data.compute_features()

In [88]:
# data=Data_Preprocessing()
# data.preprocessing()
data.X

In [96]:
XX=pd.DataFrame(data.X)
XX.columns=['chroma_stft','chroma_cqt',
        'chroma_cens','melspectrogram','mfccs','rms','spectral_centroid','spectral_bandwidth',
        'spectral_contrast','spectral_flatness','spectral_rolloff',
        'poly_features','zero_crossing_rate',"harmonic_centroid","harmonic_tonnetz",
        "harmonic_rms","harmonic_spectral_flatness","harmonic_spectral_contrast",
        "harmonic_spectral_rolloff","harmonic_zero_crossing_rate",'class']
XX
# data.X

Unnamed: 0,chroma_stft,chroma_cqt,chroma_cens,melspectrogram,mfccs,rms,spectral_centroid,spectral_bandwidth,spectral_contrast,spectral_flatness,...,poly_features,zero_crossing_rate,harmonic_centroid,harmonic_tonnetz,harmonic_rms,harmonic_spectral_flatness,harmonic_spectral_contrast,harmonic_spectral_rolloff,harmonic_zero_crossing_rate,class
0,0.392101,0.433277,0.242936,1.451152,-13.043401,0.057793,1488.516951,1655.304808,18.637944,0.035536,...,0.576072,0.069899,1031.550593,-0.011075,0.034002,0.017408,20.861509,1852.352389,0.043066,assistance off
1,0.302495,0.372940,0.245439,2.227921,-11.281276,0.076170,1261.275967,1564.335617,19.578484,0.028953,...,0.717842,0.050817,861.621012,0.004607,0.045365,0.003294,21.872580,1463.011153,0.030715,assistance off
2,0.376695,0.419409,0.242149,1.495821,-11.762228,0.060381,1408.103133,1656.593533,19.012299,0.029437,...,0.620585,0.062710,979.006364,-0.000849,0.031979,0.001991,21.580977,1695.201416,0.037798,assistance off
3,0.385027,0.421750,0.243437,1.527858,-12.482666,0.061625,1450.312847,1648.888122,18.969604,0.040412,...,0.645264,0.065782,1004.542082,0.002325,0.032547,0.012145,21.239787,1886.018724,0.038903,assistance off
4,0.369200,0.406338,0.237796,1.543871,-12.507836,0.063178,1421.990520,1616.881403,19.152994,0.037937,...,0.662252,0.068034,1029.920584,-0.009871,0.036978,0.019219,21.317681,1824.524865,0.043186,assistance off
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,0.372977,0.413106,0.237089,2.309412,-11.281770,0.076644,1592.420220,1773.113599,19.412211,0.038876,...,0.669578,0.063037,943.810021,-0.010726,0.057426,0.022389,21.936583,1709.878305,0.033396,zoom in
62,0.375370,0.431783,0.241089,1.103975,-12.995523,0.051806,1541.906564,1753.095219,18.938171,0.032485,...,0.479525,0.069078,947.787182,0.003055,0.037820,0.009332,21.488115,1725.542948,0.033479,zoom in
63,0.355738,0.409837,0.240686,1.665722,-11.517665,0.066623,1378.650197,1626.772139,19.395640,0.033538,...,0.593242,0.059935,830.633897,-0.003193,0.045849,0.009181,21.526614,1436.761569,0.031141,zoom in
64,0.383149,0.436924,0.247246,1.373639,-11.375330,0.058646,1585.737717,1810.886452,18.574513,0.036055,...,0.557978,0.065297,1023.374837,0.002668,0.037561,0.015024,20.828308,1939.343518,0.037618,zoom in


In [97]:
os.getcwd()
path=r'C:\\Users\\anasa\\Desktop\\JARVIS-Urdu-Voice-Assistant-'
os.chdir(path)

In [98]:
# XX.to_csv("data.csv", index=False)

In [99]:
data=pd.read_csv("data.csv")
data

Unnamed: 0,chroma_stft,chroma_cqt,chroma_cens,melspectrogram,mfccs,rms,spectral_centroid,spectral_bandwidth,spectral_contrast,spectral_flatness,...,poly_features,zero_crossing_rate,harmonic_centroid,harmonic_tonnetz,harmonic_rms,harmonic_spectral_flatness,harmonic_spectral_contrast,harmonic_spectral_rolloff,harmonic_zero_crossing_rate,class
0,0.392101,0.433277,0.242936,1.451152,-13.043401,0.057793,1488.516951,1655.304808,18.637944,0.035536,...,0.576072,0.069899,1031.550593,-0.011075,0.034002,0.017408,20.861509,1852.352389,0.043066,assistance off
1,0.302495,0.372940,0.245439,2.227921,-11.281276,0.076170,1261.275967,1564.335617,19.578484,0.028953,...,0.717842,0.050817,861.621012,0.004607,0.045365,0.003294,21.872580,1463.011153,0.030715,assistance off
2,0.376695,0.419409,0.242149,1.495821,-11.762228,0.060381,1408.103133,1656.593533,19.012299,0.029437,...,0.620585,0.062710,979.006364,-0.000849,0.031979,0.001991,21.580977,1695.201416,0.037798,assistance off
3,0.385027,0.421750,0.243437,1.527859,-12.482666,0.061625,1450.312847,1648.888122,18.969604,0.040412,...,0.645264,0.065782,1004.542082,0.002325,0.032547,0.012145,21.239787,1886.018724,0.038903,assistance off
4,0.369200,0.406338,0.237796,1.543871,-12.507836,0.063178,1421.990520,1616.881403,19.152994,0.037937,...,0.662252,0.068034,1029.920584,-0.009871,0.036978,0.019219,21.317681,1824.524865,0.043186,assistance off
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,0.372977,0.413106,0.237089,2.309411,-11.281770,0.076644,1592.420220,1773.113599,19.412211,0.038876,...,0.669578,0.063037,943.810021,-0.010726,0.057426,0.022389,21.936583,1709.878305,0.033396,zoom in
62,0.375370,0.431783,0.241089,1.103975,-12.995523,0.051806,1541.906564,1753.095219,18.938171,0.032485,...,0.479525,0.069078,947.787182,0.003055,0.037820,0.009332,21.488115,1725.542948,0.033479,zoom in
63,0.355738,0.409837,0.240686,1.665722,-11.517665,0.066623,1378.650197,1626.772139,19.395640,0.033538,...,0.593242,0.059935,830.633897,-0.003193,0.045849,0.009181,21.526614,1436.761569,0.031141,zoom in
64,0.383149,0.436924,0.247246,1.373639,-11.375330,0.058646,1585.737717,1810.886452,18.574513,0.036055,...,0.557978,0.065297,1023.374837,0.002668,0.037561,0.015024,20.828308,1939.343518,0.037618,zoom in
