In [19]:
import json
import os
import math
import librosa


# MFCC Extractor

In [20]:
dataset_path = "data/archive/Data/genres_original"
json_path = "data/gtzan_mfcc_json.json"
sr = 22050
duration = 30   # seconds
total_samples = sr * duration
num_mfcc = 13
n_fft = 2048
hop_length = 512
segments_per_track = 10

We create a dictionary to store labels of all the songs' MFCCs.
* `mapping` consists of all the 10 genres.
* `labels` consists of the label for each of the 1000 songs.
 0 corresponds to blues, 1 for classical, 2 for country and so on.
 Since each songs has a label, there will be 100 zeroes, 100 ones, 100 twos, and so on.
* `mfcc` consists of individual mfcc values (grouped by 13 to be called an MFCC) for every song.
Every song consists of `22050 * 30 = 661500` total number of samples,
which are divided into 10 segments. So each segment has 66150 samples.
The number of MFCCs in each segment would be determined by `hop_length` (`=512`),
which would be `ceil(66150 / 512) = 130` MFCCs in each segment.

In [21]:
# dictionary to store mapping, labels, and MFCCs
data = {
    "mapping": [],
    "labels": [],
    "mfcc": []
}
print("No. of segments: ", segments_per_track)

samples_per_segment = int(total_samples / segments_per_track)
print("No. of samples per segment: ", samples_per_segment)

num_mfcc_vectors_per_segment = math.ceil(samples_per_segment / hop_length)
print("No. of MFCCs per segment: ", num_mfcc_vectors_per_segment)

No. of segments:  10
No. of samples per segment:  66150
No. of MFCCs per segment:  130


In [22]:
# loop through all genre sub-folder
for i, (dirpath, dirnames, filenames) in enumerate(os.walk(dataset_path)):

    # ensure we're processing a genre sub-folder level
    if dirpath is not dataset_path:

        # save genre label (i.e., sub-folder name) in the mapping
        # For Windows, '\\' is used. For Linux, change to '/'
        semantic_label = dirpath.split('\\')[-1]
        # print(dirpath)
        # print(semantic_label)
        data["mapping"].append(semantic_label)
        print("Processing:", semantic_label)

        # process all audio files in genre sub-dir
        for f in filenames:

            # load audio file
            file_path = os.path.join(dirpath, f)
            signal, sample_rate = librosa.load(file_path, sr=sr)

            # process all segments of audio file
            for d in range(segments_per_track):

                # calculate start and finish sample for current segment
                start = samples_per_segment * d
                finish = start + samples_per_segment

                # extract mfcc
                mfcc = librosa.feature.mfcc(signal[start:finish], sample_rate, n_mfcc=num_mfcc, n_fft=n_fft,
                                            hop_length=hop_length)
                mfcc = mfcc.T
                # store only mfcc feature with expected number of vectors
                if len(mfcc) == num_mfcc_vectors_per_segment:
                    data["mfcc"].append(mfcc.tolist())
                    data["labels"].append(i - 1)

print("\nMFCCs extracted. Saving to JSON file...")
# save MFCCs to json file
with open(json_path, "w") as fp:
    json.dump(data, fp, indent=4)
print("Done")

Processing: blues
Processing: classical
Processing: country
Processing: disco
Processing: hiphop
Processing: jazz
Processing: metal
Processing: pop
Processing: reggae
Processing: rock

MFCCs extracted. Saving to JSON file...
Done


* There are total 1000-1 = 999 songs (one song removed as the file was corrupted)
So there should ideally be 9990 total number of segments, which would serve
as the input to the training part.
The dimensions would be (9990, 130, 13)
* The above dimensions are under the assumption that every song is __exactly__ 30 seconds in duration.

In [23]:
print("Labels:", len(data["labels"]))
print("MFCCs:", len(data["mfcc"]))

Labels: 9986
MFCCs: 9986


* We see that there are slightly less number of segments as expected.
There are 4 segments less. The possible reason could be that not
every song is exactly 30 seconds, there could be +/- few milliseconds 
for each song. 