# Audio Genre Classification

## Exploratory Data Analysis

The "GTZAN" dataset is used (http://marsyas.info/downloads/datasets.html). 1000 audio tracks each 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22050Hz Mono 16-bit audio files in .wav format, which is something good for analysis. The audio files are divided in 10 genres or classes:

   - Blues
   - Classical
   - Country
   - Disco
   - Hiphop
   - Jazz
   - Metal
   - Pop
   - Reggae
   - Rock

Every file is in .wav format. According to the author, sampling rate, bit depth and number of channels are the same for each audio file.

## Example

Audio files are imported. Also, some preliminar hearing is accomplished.

In [2]:
import IPython.display as ipd

ipd.Audio ('../Notebooks/Audio Samples/blues.00016.wav')

In [53]:
ipd.Audio ('../Notebooks/Audio Samples/metal.00002.wav')

Oh my beloved Iron Maiden. Gorgeous.

In [9]:
ipd.Audio ('../Notebooks/Audio Samples/pop.00008.wav')

In [10]:
ipd.Audio ('../Notebooks/Audio Samples/jazz.00024.wav')

## Metadata

Metadata included in .csv file is loaded in order to see what is included.

In [54]:
import pandas as pd
metadata = pd.read_csv('/home/diego/ML/Audio Genre Classification/metadata.csv')

metadata.head()

Unnamed: 0,filename,label
0,blues.00000.wav,blues
1,blues.00001.wav,blues
2,blues.00002.wav,blues
3,blues.00003.wav,blues
4,blues.00004.wav,blues


Only filename and label are included. We will append the resulting features later to this .csv file. We can count the samples per class of the whole dataset:

In [55]:
samples_per_class = metadata.label.value_counts()

print(samples_per_class)

rock         100
disco        100
metal        100
country      100
hiphop       100
jazz         100
reggae       100
blues        100
pop          100
classical    100
Name: label, dtype: int64


Effectively, we have 100 samples for each class.

## Verifying audio parameters: Bit-depth, Channels y Sample Rate

Here we will verify the number of channels, sampler rate and bit-depth for each audio file. 

In [46]:
#Voy a iterar a través de cada audio para obtener número de canales, bit-depth y sample rate. Luego, se va a 
#storear en un pandas dataframe
import glob
import os
import librosa as lb
from wav_format_processor import WavFormatProcessor

wavformatprocessor = WavFormatProcessor()

audiodata = []

for index, row in metadata.iterrows():
    
    file_name = os.path.join(os.path.abspath('/media/diego/4A64372E64371BDF/Downloads/Datasets/GTZAN/genres/'),str(row["label"])+'/',str(row["filename"]))
    data = wavformatprocessor.wav_file_properties(file_name)
    audiodata.append(data) #appendo en audiodata
    
#Convierto la data a un pandas dataframe

audio_dataframe = pd.DataFrame(audiodata, columns=['num_channels', 'sample_rate', 'bit_depth'])
audio_dataframe

Unnamed: 0,num_channels,sample_rate,bit_depth
0,1,22050,16
1,1,22050,16
2,1,22050,16
3,1,22050,16
4,1,22050,16
5,1,22050,16
6,1,22050,16
7,1,22050,16
8,1,22050,16
9,1,22050,16


## Audio Channels

Number of channels should be 1 for each file (monoaural).

In [47]:
n_canales = audio_dataframe.num_channels.value_counts()
print (n_canales)

1    1000
Name: num_channels, dtype: int64


## Sampling Frequency

The most common Sampling Frequency should be 44.1kHz or 48kHz. However, the author stated sampling frequency should be 22050Hz for each file. Let's verify that.

In [48]:
f_sampleo = audio_dataframe.sample_rate.value_counts()
print(f_sampleo)

22050    1000
Name: sample_rate, dtype: int64


## Bit Depth

The most common values are 16 bits or 32 bits,and also some 24 bit examples could appear in databases. Anyway, let's verify that every file has a value of 16 bits.

In [50]:
prof_bits = audio_dataframe.bit_depth.value_counts()
print (prof_bits)

16    1000
Name: bit_depth, dtype: int64


Despite the author statements, it's important to double-check the information given. Thus, we are not getting any surprise in the next step of our process.