# 2. Feature Extraction

In this notebook we will extract various audio features for a sample of audio. We will focus on the time domain features of:

- Amplitude Envelope 
- Root-Mean-Squared Energy
- Zero Crossing Rate

And the frequency domain features of:

- Band Energy Ratio
- Spectral Centroid
- Bandwidth
- Mel Frequency Cepstral Coefficients

Before beginning this notebook, you should have already create a dataset of audio samples from the Saraga Carnatic Dataset. If not, please review notebook **1. Dataset Creation**.

## 2.1 Load Samples

As part of **1. Dataset Creations** you should have created some loaders to load audio samples and associated metadata. Import these loaders and load one sample to experiment with.

In [None]:
import os
import pandas as pd
import librosa
import numpy as np
import scipy


def load_sample(index):
    """
    Load sample with index, <index>
    """
    # Load the audio file using the index
    audio_path = f'audio_chunk_{index}.wav'
    audio, sr = librosa.load(audio_path, sr=None)
    return audio, sr


def get_metadata(index):
    """
    Get metadata for sample with index, <index>
    """
    # Load the metadata CSV file
    metadata = pd.read_csv('metadata.csv')
    
    # Get the metadata for the specified index
    sample_metadata = metadata.loc[metadata['index'] == index].squeeze()
    return sample_metadata

In [None]:
# import dataset loaders


In [None]:
# load one individual sample


Are you able to listen to this audio and plot the waveform using `plot_waveform()` function from **1. Dataset Creation**?

In [None]:
# Listen to audio


In [None]:
# plot waveform


## 2.1 Time Domain Features

Here we will extract the time domain features of:

- Amplitude Envelope 
- Root-Mean-Squared Energy
- Zero Crossing Rate

We will work with the *time domain* representation of your audio (i.e. the amplitude values loaded by your loader).

### 2.1.1 Amplitude Envelope

The amplitude is computed by splitting the  audio into frames and taking the maximum amplitude value for each frame...

![title](images/amplitude_envelope.png)

Try and split your audio into frames of length `w`. 

**Hint**: `numpy` indexing you to take windows of size `w` from an array using `np.array_split(array, w)`.

In [None]:
# Split audio array

w = 44100 * 5
split_chunks = np.array_split(array,w)

Take the maximum value for each window using `np.max()`.

**Hint** A list comprehension may be useful here: `[<f(x)> for x in <iterable>]` where `<iterable>` is some iterable object and `<f(x)>` is some functon to apply to each  element of that iterable.

In [None]:
# Get amplitude envelope by taking the max of the values of the split array
def max_value(inputlist):
    return max([sublist[-1] for sublist in inputlist])

max_values = max_value(split_chunks)

Can you plot this envelope using `matplotlib.pyplot`. How does it compare with the original signal?

### 2.1.2 Root Mean Square Energy

The RMS energy is computed using

![title](images/rms_energy.png)

The `librosa` libray has an implementation at `librosa.feature.rms`

In [None]:
# import librosa rms
import feature.rms from librosa as rms

In [None]:
# extact rms value


### 2.1.3 Zero Crossing Rate

The zero crossing rate is the number of times the signal crosses the x axis

![title](images/zero_crossing_graph.png)

It is computed with the following equation

![title](images/zero_crossing_eq.png)

The `librosa` library provides an implementaton at `librosa.feature.zero_crossing_rate`

In [None]:
# import librosa zero_crossing_rate

In [None]:
# compute ZCR

## 2.2 Frequency Domain Features

Here we will extract the frequency domain features of:

- Band Energy Ratio
- Spectral Centroid
- Bandwidth
- Mel Frequency Cepstral Coefficients

We will work with the **frequency domain** representation of your audio (i.e. the frequency magnitude extracted from the fourier transform of your time domain signal).

Almost all implementations of these feature extractors will compute the frequency domain spectrum for you.

### 2.2.1 Band Energy Ratio

Band energy ratio is the ratio of energy between lower and higher frequency bands divided by some threshold, `F`.

![title](images/bre_spec.png)

![title](images/bre_eq.png)

The `essentia` library provides an implementation for computing the BER at `essentia.standard.EnergyBandRatio`. Import it and extract for your signal.

In [None]:
# import essentia

In [None]:
# compute BER

### 2.2.2 Spectral Centroid

The spectral centroid is a weighted mean of energy across all frequency bands. It is computed as:

![title](images/spec_cent.png)

The `librosa` libary provides an implementation at `librosa.feature.spectral_centroid`.

In [None]:
# import librosa implementation

In [None]:
# compute centroid for your signal

### 2.2.3 Bandwidth

The bandwidth captures the variance around spectral centroid.

![title](images/bandwidth.png)

The `librosa` libary provides an implementation at `librosa.feature.spectral_bandwidth`.

In [None]:
# import librosa implementation

In [None]:
# compute bandwidth for your signal

### 2.2.4 Mel Frequency Cepstral Coefficients

MFCC maps frequency magnitudes to the more perceptually relevant Mel frequencies. The mapping between frequency and Mel frequency is as follows.

![title](images/mel_freq.png)

This mapping is not a simple conversion. More information on exactly how this is achieved can be found [here](https://www.youtube.com/watch?v=9GHCiiDLHQ4)

The `librosa` libary provides an implementation at `librosa.feature.mfcc`.

In [None]:
# import librosa implementation

In [None]:
# compute MFCC coefficients for your signal

## 2.3 Extracting features across the dataset

Now that you are able to extract features for one individual sample. Let's extract this to the entire dataset. The metadata file that you created in the **1. Dataset Creation** notebook should include all indices and filepaths of each sample.

**Hint**: The `pandas` library loads a csv to a pandas dataframe with `pandas.read_csv`

In [None]:
# Load metadata

Iterate through the rows in this metadata file, extracting filepath and index. For each filepath, load the audio sample and apply the feature extractors defined above. Make sure to store these features in a dataframe with columns=`index`,`feature_name`.

**Hint**: `.iterrows()` allows dataframe row iteration on pandas dataframe

**Hint**: `pandas.DataFrame` initialises a new dataframe

**Hint**: `df.append` appends new rows to a pandas dataframe

In [None]:
# iterate through df
# extract features
# store in new features dataframe

Store this features dataframe alongside the original dataset

**Hint**: `df.to_csv` writes a dataframe to csv file

In [None]:
# write dataframe