In [1]:
import pandas as pd
import numpy as np
import utils
import librosa
import librosa.display
import soundfile as sf
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Executive Summary

### Goal
The goal of this notebook is to elucidate the components of an audio signal that cause us to perceive that audio in one way or another. To do this I attempt to predict the human construct of genre based on various audio features. The data I use were pulled from the [free music archive](https://github.com/mdeff/fma) github.

### Data
The audio columns are actually statistics of audio features extracted using the [librosa library](http://man.hubwiz.com/docset/LibROSA.docset/Contents/Resources/Documents/index.html). The feature statistics which best accounted for the variability in the data were the mel frequency cepstral coefficient (mfcc) statistics. The mel frequency scale is discussed further below.

### Metrics
Predicting genre with linear regression using these statistics was 52.2% accurate which is a significant improvement over the dummy baseline of 12.5%.

### Limitations
Genre is subjective. It is an attempt to put words to a feeling. Two people can listen to a song and classify it in different genres. To get around this I used the most broad definitions of genre provided. No matter what I do the models I build will inherit the perception of genre of whoever encoded this data.

# Mel Frequency Cepstral Coefficients

The mel frequency cepstral coefficients (mfccs) are an abstraction of a spectrogram that attempt to capture the components of an audio signal that describe for our perception of that signal. For instance in speech we have the concepts of vowels and consonants. These features describe the construction of a word and our understanding of that word. The mel frequency cepstral coefficients can not be compared directly to vowels and consonants but they abstract audio into components that describe what humans hear.

start with wave


![waveform](../images/waveform.png)

The first step to extracting mfccs is to preform the short time fourier transform on small time windows of the audio signal. This looks at sections of the signal and extracts the frequency components that make up that signal. This is the spectrogram of a 30 second clip of the song colorful lights which is track 065488 in the F.M.A.


![spectrogram](../images/stft.png)



![mel filter bins](../images/mel_filter_bins.png)


![mel spectrogram](../images/mel_spec.png)


![mfccs.png](../images/mfccs.png)