In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pickle


import torch
from torch.utils.data import DataLoader, Dataset


from Preprocessing import *
from ExtractGenre import *
from CNN_ExtractGenre import *

import DatasetLoader as DL
import Model as M

# Cleaning data

Apply the functions in Preprocessing.py to clean the midi dataset. There are multiple files that are currupted or duplicated. 
For this analysis we are also going to use only Midi file with a timestamp of 4/4, like in the reference paper. This filtering is done in CleaningData()

In [3]:
DeleteDuplicates()
CleaningData()

Deleting Duplicates:: 100%|██████████| 2201/2201 [00:00<00:00, 2227.41it/s]
100%|██████████| 2201/2201 [06:27<00:00,  5.68it/s]


# Preprocessing data:

Firstly we reconstruct the database, transforming all the polyphonic audios into monophonic, keeping the information about the tracks (instruments) in each of the midi file. It is done by keeping only the highest pitch from each polyphonic note.

In [13]:
RecreateDatabase()

Recreating Database : 100%|██████████| 2079/2079 [14:22<00:00,  2.41it/s] 


The input of the model has to be a 128x16 matrix as in the paper. The following function clasify the midi tracks into 7 instrumental classes:
- String
- Keyboard
- Aereophone
- Percussion
- Voice
- Synth
- Others

In [3]:
Dataset = PreProcessing(nDir = 1000)

Preprocessing: 100%|██████████| 1000/1000 [01:20<00:00, 12.35it/s]


In [5]:
# with open('Dataset.pkl', 'wb') as f:
#    pickle.dump(Dataset, f)

with open('Dataset.pkl', 'rb') as f:
   Dataset = pickle.load(f)

for key, value in Dataset.items():
   print(key, '', len(value['Tempo']))

Others  48970
String  37930
Keyboard  15210
Voice  9940
Percussion  22340
Aerophone  8440
Sync  3050


# Genre detection using CNN

The idea is to train a Convolutional Neural Network (CNN) to understand the structure of the songs and to implement a classifier capable of identifying the genre of each song in our MIDI dataset.

However, we cannot train the CNN directly on our MIDI dataset, since this would compromise both learning and classification. Moreover, CNNs are supervised learning models, and our dataset does not include genre labels. For this reason, we found another dataset containing 100 songs in .wav format for each of the following musical genres:

- metal
- disco
- classical
- hiphop
- jazz
- country
- pop
- blues 
- raggae 
- rock

The idea is to train the CNN using this labeled dataset. Before doing that, we need to perform some preprocessing, since some songs in the dataset are corrupted. Additionally, the audio clips are only a few seconds long, so we preprocess each song to have a fixed length and a consistent format. The preprocessing functions are implemented in the file **CNN_ExtractGenre.py**.

After preprocessing, we define the CNN model and the data loader in **Model.py** and **DataLoader.py**, respectively. The model is trained on Google Colab (not on the local machine), and we later load the trained model using its state_dict.

The CNN achieves a strong validation accuracy of 84%, as shown in the accompanying paper.

Once the model is trained on the labeled dataset, we use it to classify our own songs. This is a complex process because our songs are in .mid format, while the model expects .wav spectrograms as input. Therefore, each MIDI file must be converted into audio, transformed into a spectrogram, and then classified by the CNN.

After classifying each song, we save a file containing the song’s name and its predicted genre. From there, we proceed as before: we separate our dataset by genre, and within each genre, we further separate the songs by instrument.


N.B. all the function in the file **CNN_ExtractGenre** has already been runned since the computation is quite long. In the following cell we are showing the final result

In [None]:
#Load the preprocessed and classified dataset:
with open('CNN_GenreDataset.pkl', 'rb') as f:
   CNN_GenreDataset = pickle.load(f)

#Mapping each genre into a number for classification
GenreMapping = {'metal': 0, 'disco': 1, 'classical': 2, 'hiphop': 3, 'jazz': 4,
          'country': 5, 'pop': 6, 'blues': 7, 'reggae': 8, 'rock': 9}


#Visualizing the shape of the dataset bars for rock songs, with string instrument 
np.shape(CNN_GenreDataset[9]['String']['Bars'])

(6495, 128, 16)

We now have three datasets available to train the MidiNet model:
1. Instrument-Only Dataset
This is the simplest version, where each song is categorized solely based on the instrument(s) played.
2.	Clustering-Based Genre Dataset
In this version, songs are first grouped by genre using clustering techniques, based on simple extracted features. Within each genre cluster, the songs are further categorized by instrument.
3. CNN-Based Genre Dataset
This version uses a Convolutional Neural Network (CNN) to classify each original song in the dataset by genre. Once a genre label is assigned, each song is then placed into its corresponding genre category and further divided by instrument.

# Model and Architecture

The class DatasetTransorm allow us to choose which intrument's bars to load. We can load from the simple dataset (1) or from (2) or (3), specifying the genre and the instrument's bar

In [6]:
#We are selecting the data from the (3) dataset, rocks song played with string instruments
Data = DL.DatasetTransform(Genre = False, CNN = None, Cluster = None, Instrument='String')
trainData = DataLoader(Data, batch_size=10, shuffle=True, num_workers=0)