# Music Genre Recognition - Milestone 1
Darren Midkiff and Cheng-Wei Hu

## Overview
This project aims to identify the genre of music in an audio sample. Features will be extracted from the analog data, and a genre will be predicted using a KNN model, trained on labelled audio samples from the freely available FMA dataset.

## Training Data

The [FMA Dataset](https://github.com/mdeff/fma) is comprised of over 100,000 tracks from 161 genres. In order to make the problem more manageable, we will use the small version of the dataset, which includes 8,000 tracks from 8 top-level genres. The dataset also includes dozens of features -- year released, location of artist, number of listens, etc. Because this project aims to identify genre using only audio signal, all of these features are irrelevant and will be dropped.

In [7]:
import pandas as pd

#read full metadata file
metadata = pd.read_csv("./fma_metadata/tracks.csv", skiprows=[0,2], low_memory=False)

# drop all tracks that are not in fma_small dataset
metadata = metadata[metadata["subset"].eq("small")]
# add name to track_id column (missing because of stupid CSV formatting)
metadata = metadata.rename(columns={"Unnamed: 0": "track_id"})
# drop all columns that don't relate to genre
# we will not have this metadata from the audio file
metadata.drop(metadata.columns.difference(["track_id","genre_top"]),1,inplace=True)
# reset indices accounting for dropped rows
metadata = metadata.reset_index(drop=True)

# #write only relevant metadata to file for use in training
# metadata.to_csv("fma_small_genres.csv")

metadata.head()

Unnamed: 0,track_id,genre_top
0,2,Hip-Hop
1,5,Hip-Hop
2,10,Pop
3,140,Folk
4,141,Folk


In [9]:
print(metadata["genre_top"].unique())

['Hip-Hop' 'Pop' 'Folk' 'Experimental' 'Rock' 'International' 'Electronic'
 'Instrumental']


## Feature Extraction

For feature extraction, we aim to analyze the audio files from the small version of the [FMA Dataset](https://github.com/mdeff/fma). The `fma_small` consists of 8,000 audio files from 8 top-level genres and each file has 30 seconds of audio data. For each file, we extract four features: [Zero Crossing Rate](https://en.wikipedia.org/wiki/Zero-crossing_rate), [Spectral Centroid](https://en.wikipedia.org/wiki/Spectral_centroid), [Spectral Rolloff](https://en.wikipedia.org/wiki/Spectral_slope), and [Mel-Frequency Cepstral Coefficients](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum).


In our project, we will use [librosa](https://librosa.org/doc/latest/index.html) to extract features from the raw audio. To make the training data more accessible, we will export all the features into a csv file.  

In [1]:
%matplotlib inline
import librosa
import IPython.display as ipd
import matplotlib.pyplot as plt
import librosa.display
import sklearn

In [None]:
# Get all the audio file path from the fma_small dataset 
import os
file_names = []
for root, dirs, files in os.walk('./fma_small', topdown=False):
    for name in files:
        if name[-1] != '3':
            continue
        file_names.append(os.path.join(root, name))
# print(len(file_names))

In [None]:
# Define the function to extract the four features from  
def normalize(x, axis=0):
    return sklearn.preprocessing.minmax_scale(x, axis=axis)

def extract_feature_from_audio(audio_path, should_plot = False, should_print = False):
    # load
    x , sr = librosa.load(audio_path)
    if should_plot:
        plt.figure(figsize=(14, 5))
        librosa.display.waveplot(x, sr=sr)
    
    # zero_crossings
    zero_crossings = librosa.zero_crossings(x, pad=False)
    zero_crossings_sum = sum(zero_crossings)
    
    if should_print:
        print(zero_crossings.shape)
        print(zero_crossings)
        print(zero_crossings_sum)
        print(zero_crossings_sum / len(x))
    
    # spectral_centroids
    spectral_centroids = librosa.feature.spectral_centroid(x, sr=sr)[0]
    
    if should_print:
        print(spectral_centroids.shape)
        print(spectral_centroids)

    # Computing the time variable for visualization
    # frames = range(len(spectral_centroids))
    # t = librosa.frames_to_time(frames)
    
    # spectral_rolloff
    spectral_rolloff = librosa.feature.spectral_rolloff(x, sr=sr)[0]
    
    if should_print:
        print(spectral_rolloff.shape)
        print(spectral_rolloff)

    # mfccs
    mfccs = librosa.feature.mfcc(x, sr=sr)
    if should_print:
        print(mfccs.shape)
        print(mfccs)
    #Displaying  the MFCCs:
    # librosa.display.specshow(mfccs, sr=sr, x_axis='time')
    
    return [zero_crossings, spectral_centroids, spectral_rolloff, mfccs]

In [None]:
# Extract the features from audio files
# Some of the audio files are damaged. So we skipped those files.
# The damaged files are: './fma_small/099/099134.mp3', './fma_small/108/108925.mp3', './fma_small/133/133297.mp3'

train_audio_features_all = []
fail_file_names_all = []
fail_file_idx_all = []
fail_file_names_dict_all = {}

for idx, file in enumerate(file_names):
    print(idx, file)
    try:
        single_audio_features = extract_feature_from_audio(file)
        row = []
        row.append(file.split('/')[-1])
        for f in single_audio_features:
            row.append(f)
        train_audio_features_all.append(row)
    except:
        print("Failed: ", idx, file)
        fail_file_names_all.append(file)
        fail_file_idx_all.append(file)
        fail_file_names_dict_all[file] = True

In [2]:
# Export the features into a csv files
import csv

with open('features_all.csv', mode='w') as features_file:
    features_file = csv.writer(features_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    features_names = ["file_name", "zero_crossing (zero_crossing_sum & total_frame)", "spectral_centroids", "spectral_rolloff", "mfccs"]
    features_file.writerow(features_names)
    for idx, raw_feature in enumerate(train_audio_features_all):
        row = []
        for i, f in enumerate(raw_feature):
            if i == 0:
                row.append(f)
                print(idx, f)
            elif i == 1:
                row.append([sum(f), len(f)])
            else:
                row.append(f.tolist())
        features_file.writerow(row)

The exported csv file can be found [here]() 

## Model