# COMPSCIX415.2-014 Final Project : Predicting Music Genres - "Classify"

### Authors

- JACQUAND Matthieu : matthieujacquand@berkeley.edu - https://www.linkedin.com/in/matthieu-jacquand-0b0ba6181/
- POURET Andrew : andrew.pouret@berkeley.edu - https://www.linkedin.com/in/andrew-pouret/

## Project Research Framework

### Foreword

For many of us, music plays a role in our daily lives. We listen to music while working, taking part in hobbies, during our commutes or just for the sake of it. We develop a taste for specific music, become attached to certain artists, albums and genres. 

This is why we decided to conduct our project on music, in order to see how data and data analysis and prediction techniques can be used in this field. It is also fascinating to wonder how music, which in essence is sound, can be numerized, quantified, and thus analyzed. 

In this project, we will be turning towards a concept of music that applies to every piece ever produced : genre. We will be using data and techniques tailored towards the prediction of which genre a piece of sample audio belongs to, studying which machine learning models are most appropriate and pinpointing which boundaries we run into.

![Music_Genre_Feature.jpg](attachment:Music_Genre_Feature.jpg)

### Data

The dataset we've chosen is from Kaggle, "a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges." (https://en.wikipedia.org/wiki/Kaggle) 

The dataset can be found here : https://www.kaggle.com/carlthome/gtzan-genre-collection

It consists of 1000 audio samples, each 30 seconds long.
It is divided in 10 sections, each consisting of 100 samples from a specific genre :

- Blues
- Classical
- Country 
- Disco
- Hip-Hop
- Jazz
- Metal
- Pop
- Reggae
- Rock

Orginally, each sample is in a ".au" format, a format used with the software "Audacity" (https://www.audacityteam.org/), a free to use software used to treat sound.

However, for later on production purposes, we've decide to transform these files into ".mp3" files, a more widespread format. This will enable us to perform sample evaluation using any song.





### Research questions

Our main question is : 
#### How can we predict music genre from an mp3 file using Python ?

To supplement this question and further interrogate ourselves in our approach, we can ask ourselves :


- Which are the most important features to predict a genre?
- Looking at the features, what differentiates each music genre (e.g : classical music from disco and disco from pop) ?
- Are there groups of genres that look alike features wise? Can that turn out to be an issue in prediction? 
- Is the music’s spectrogram useful to use to make predictions?
- How will dataset volumetry affect model runtime and feature CSV creation?
- How do the scales of the features compare with each other? 
- Are there any correlations between features ?
- For the human ear, rock and metal are similar genres compared to reggae, is this also true when looking at the features? 
- Which ML algorithms will prove to be most efficient in predicting music genres ?
- Can we make accurate predictions using any mp3 outside of the dataset (sample evaluation) ? 
- Is data normalisation required ?
- Once the model is operational, is it important to extract x seconds or just run the whole music?
- Once the model is operational, will audio quality be an issue to extract features of tracks we wish to identify? 



### Hypotheses

Our approach will be guided by these first hypotheses, which we will confront throughout this work with our results :

- Numeric features extracted from audio samples can be used for prediction.
- The prediction we are looking for is one relative to classification, as our data consists of 10 distinct genres.
- Since we require classification : Support Vector Machines, K-Nearest Neighbors and Neural Networks will be tested.
- Audio Spectograms have proven to be relevant in audio treatment, we will explore this as well.


### Workflow

The data consists solely of the audio samples. The process we now have to follow is as presented in the graphic below :

- Extracting features from the mp3 file
- Scaling these features, and encoding them when necessary in order to have only numeric features
- Plugging these features in our classifying model
- Obtaining the predicted genre

![workflow-figure](./docs/OverallWorkflow.jpg)

### Feature Extraction

In order to extract numeric features from our data, we will use a Python library called Librosa.  Librosa "is a python package for music and audio analysis. It provides the building blocks necessary to create music information retrieval systems." (https://librosa.org/doc/latest/index.html)


Using this library, we will extract a total of 66 features for each audio sample. These features will be inserted inside a CSV file alongside filename and genre label, thus leading to a CSV file of 1000 lines for 68 columns. Among the features we will extract are :

- Mel-frequency cepstral coefficients (MFCCs)
- Spectral contrast
- Spectal flatness
- Spectral centroid
- Rolloff frequency
- Chroma variant “Chroma Energy Normalized” or CENS
- Tonal centroid features or tonnetz
- Zero crossing rate


![dataset-extraction-figure](./docs/FeatureExtractionCSV_V2.jpg)