In [2]:
%load_ext autoreload
%autoreload 2

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pickle


import torch
from torch.utils.data import DataLoader, Dataset


from Preprocessing import *
from CNN_ExtractGenre import *
from PolyphonicPreprocessing import *

import DatasetLoader as DL
import Model as M

# Cleaning data

Apply the functions in Preprocessing.py to clean the midi dataset. There are multiple files that are currupted or duplicated. 
For this analysis we are also going to use only Midi file with a timestamp of 4/4, like in the reference paper. This filtering is done in CleaningData()

In [3]:
DeleteDuplicates()
CleaningData()

Deleting Duplicates:: 100%|██████████| 2201/2201 [00:00<00:00, 2227.41it/s]
100%|██████████| 2201/2201 [06:27<00:00,  5.68it/s]


The input of the model has to be a 128x16 matrix as in the paper. The following function clasify the midi tracks into instrumental classes:
- Guitar  
- Percussion
- Organ  
- Sound Effects 
- Bass  
- Piano 
- Synth Lead 
- Chromatic Percussion 
- Synth Pad  
- Percussive 
- Synth Effects
- Ethnic  

In [3]:
MonophonicDataset = PreProcessing(nDir = 2000)
torch.save(MonophonicDataset, 'MonophonicDataset.pt')

Preprocessing: 100%|██████████| 2000/2000 [07:57<00:00,  4.19it/s]


In [10]:
MonophonicDataset = torch.load('MonophonicDataset.pt', map_location='cpu')

In [11]:
for key in MonophonicDataset.keys():
   print(f"{key.ljust(25)} number of Bars: {len(MonophonicDataset[key])}")

Bass                      number of Bars: 106314
Ensemble                  number of Bars: 53067
Organ                     number of Bars: 37496
Guitar                    number of Bars: 176320
Piano                     number of Bars: 48277
Synth Lead                number of Bars: 18727
Percussive                number of Bars: 4689
Synth Effects             number of Bars: 6506
Reed                      number of Bars: 28849
Brass                     number of Bars: 31106
Synth Pad                 number of Bars: 13475
Pipe                      number of Bars: 16226
Strings                   number of Bars: 27713
Sound Effects             number of Bars: 7644
Chromatic Percussion      number of Bars: 25030
Percussion                number of Bars: 3890
Ethnic                    number of Bars: 2970


In [12]:
#First element of the Bass instruments. There are all the possible information needed
MonophonicDataset['Bass'][0]

{'SongName': ("It's a Real Good Feeling", "It's a Real Good Feeling"),
 'Bars': (tensor(indices=tensor([[29, 29, 29, 29, 29, 29, 36, 36, 36, 36, 36, 36, 36, 36,
                          38, 38, 38, 38],
                         [ 0,  1,  2,  3,  4,  5,  5,  6,  7, 11, 12, 13, 14, 15,
                           8,  9, 10, 11]]),
         values=tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
         size=(128, 16), nnz=18, dtype=torch.int32, layout=torch.sparse_coo),
  tensor(indices=tensor([[34, 34, 34, 34, 34, 34, 41, 41, 41, 41, 41, 41, 41, 43,
                          43, 43, 43],
                         [ 0,  1,  2,  3,  4,  5,  6,  7,  8, 12, 13, 14, 15,  8,
                           9, 10, 11]]),
         values=tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
         size=(128, 16), nnz=17, dtype=torch.int32, layout=torch.sparse_coo)),
 'Program': 35,
 'Channel': (1, 1),
 'numBar': (2, 3),
 'Tempo': (123, 123)}

In [None]:
#Since we load it after, we free some space
del MonophonicDataset
gc.collect()

# Genre detection using CNN

The idea is to train a Convolutional Neural Network (CNN) to understand the structure of the songs and to implement a classifier capable of identifying the genre of each song in our MIDI dataset.

However, we cannot train the CNN directly on our MIDI dataset, since this would compromise both learning and classification. Moreover, CNNs are supervised learning models, and our dataset does not include genre labels. For this reason, we found another dataset containing 100 songs in .wav format for each of the following musical genres:

- metal
- disco
- classical
- hiphop
- jazz
- country
- pop
- blues 
- raggae 
- rock

The idea is to train the CNN using this labeled dataset. Before doing that, we need to perform some preprocessing, since some songs in the dataset are corrupted. Additionally, the audio clips are only a few seconds long, so we preprocess each song to have a fixed length and a consistent format. The preprocessing functions are implemented in the file **CNN_ExtractGenre.py**.

After preprocessing, we define the CNN model and the data loader in **Model.py** and **DataLoader.py**, respectively. The model is trained on Google Colab (not on the local machine), and we later load the trained model using its state_dict.

The CNN achieves a strong validation accuracy of 84%, as shown in the accompanying paper.

Once the model is trained on the labeled dataset, we use it to classify our own songs. This is a complex process because our songs are in .mid format, while the model expects .wav spectrograms as input. Therefore, each MIDI file must be converted into audio, transformed into a spectrogram, and then classified by the CNN.

After classifying each song, we save a file containing the song’s name and its predicted genre. From there, we proceed as before: we separate our dataset by genre, and within each genre, we further separate the songs by instrument.


N.B. all the function in the file **CNN_ExtractGenre** has already been runned since the computation is quite long. In the following cell we are showing the final result

In [18]:
#Load the preprocessed and classified dataset:
with open('GenreDataset.pkl', 'rb') as f:
   GenreDataset = pickle.load(f)

#Mapping each genre into a number for classification
GenreMapping = {'metal': 0, 'disco': 1, 'classical': 2, 'hiphop': 3, 'jazz': 4,
          'country': 5, 'pop': 6, 'blues': 7, 'reggae': 8, 'rock': 9}

i = 0
for key in GenreDataset.keys():
   i += 1
   genre, prob = GenreDataset[key]
   print(f"{'Song name:':<10} {key.ljust(60)} {'Genre and probability:':<25} ({genre}, {prob:.2f})")

   if i > 10:
      break


Song name: Gordon Lightfoot/Sundown                                     Genre and probability:    (9, 0.50)
Song name: Gordon Lightfoot/Sundown.1                                   Genre and probability:    (9, 1.00)
Song name: Gordon Lightfoot/Rainy Day People                            Genre and probability:    (9, 0.50)
Song name: Gordon Lightfoot/Carefree Highway                            Genre and probability:    (9, 0.92)
Song name: Gordon Lightfoot/Beautiful                                   Genre and probability:    (4, 0.67)
Song name: Gounod Charles/Ave Maria.1                                   Genre and probability:    (2, 0.92)
Song name: Gounod Charles/Marche funebre d'une marionnette              Genre and probability:    (2, 0.42)
Song name: Gounod Charles/Waltz from Faust                              Genre and probability:    (2, 1.00)
Song name: Grace Jones/Slave to the Rhythm                              Genre and probability:    (6, 0.58)
Song name: Grand Funk Railro

# Polyphonic Music Generator

as now we are only considering monophonic tracks, we are thus losing all the information between notes of the same instrument and the correlation between instruments! Therefore we would like to try to implement a polyphonic music generator. 

The strategy is the same as before. Instead of having (128x16) matrix we have (4x128x16) where 4 is the maximum number of instruments that can be played at the same time. Now each matrix 128x16 does not encode for a single note as before, but it allows for multiple note of the same instrument.

In the file **PolyphonicPreprocessing.py** there are all the function used to preprocess the clean_midi dataset and build the dataset of mapped songs, cathegorized by genre (using the genre recognition dataset built before). Here we are processing and storing the Polyphonic dataset

In [19]:
PolyphonicDataset = PolyphonicPreProcessing(nDir = 2000)
torch.save(PolyphonicDataset, 'PolyphonicDataset.pt')

Preprocessing: 100%|██████████| 2000/2000 [13:25<00:00,  2.48it/s]


3900


And these are all the genre and the number of bars for each genre

In [20]:
PolyphonicDataset = torch.load('PolyphonicDataset.pt', weights_only=False)

In [21]:
PolyphonicDataset[0]

{'SongName': ('The Righteous Brothers/Unchained Melody.3',
  'The Righteous Brothers/Unchained Melody.3'),
 'Bars': (tensor(indices=tensor([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                           0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
                           1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
                           1,  1],
                         [31, 33, 33, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36,
                          36, 36, 35, 35, 35, 35, 35, 35, 35, 35, 37, 37, 37, 37,
                          42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
                          42, 42],
                         [ 6,  6,  7,  0,  1,  2,  3,  4,  5,  8,  9, 10, 11, 12,
                          13, 14,  0,  1,  6,  7,  8,  9, 14, 15,  4,  5, 12, 13,
                           0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13,
                          14, 15]]),
         values=tensor([1, 1, 1, 1, 1, 1

In [None]:
del PolyphonicDataset
gc.collect()

# Monophonic Model and Architecture

The class DatasetTransorm allow us to choose which intrument's bars to load. 

In [None]:
#We are selecting the data from the dataset with the guitar instrument
Data = DL.MonophonicDataset(Instrument='Guitar')
Bars, PreviousBars, Cond1D = DataLoader(Data, batch_size=10, shuffle=True, num_workers=0)

In [None]:
PolData = DL.PolyphonicDataset(Genre = 'jazz')
PolTrainData = DataLoader(PolData, batch_size=30, shuffle=True, num_workers=0)