## Helpful Resources

### Data
- [Original Dataset](https://www.kaggle.com/datasets/blanderbuss/midi-classic-music/code)

### Mido
- [Mido Documentation](https://mido.readthedocs.io/en/stable/)
- [Example using mido](https://medium.com/analytics-vidhya/convert-midi-file-to-numpy-array-in-python-7d00531890c)

### MIDI
- [MIDI File Statistic Calculation](https://nbviewer.org/github/craffel/midi-ground-truth/blob/master/Statistics.ipynb)
- [Useful MIDI Functions](https://gist.github.com/devxpy/063968e0a2ef9b6db0bd6af8079dad2a#file-midi_numbers-py-L153-L159)

### AI Research
- [Composer Classification Paper](https://hal.science/hal-01879276/document)
- [Composer Classification Dissertation](https://dalspace.library.dal.ca/bitstream/handle/10222/82031/RohitNitinKher2022.pdf?sequence=3&isAllowed=y)

## Exploration

In [1]:
from os import listdir
from os.path import isfile, join

import pandas as pd
from mido import MidiFile

In [5]:
# Location of data files
PATH = "/Users/jayminpatel/MSAAI_git_workspace/AAI-511-final-project/data/project/"

In [6]:
# Composers we are to use
COMPOSERS = [f for f in listdir(PATH)]
COMPOSERS

['Mozart', 'Chopin', 'Beethoven', 'Bach']

In [7]:
# Using Chopin for now
composer_path = PATH + COMPOSERS[2]
files = [f for f in listdir(composer_path) if isfile(join(composer_path, f))]
print(files)

# Import the first file for the selected composer
mid = MidiFile(composer_path + "/" + files[0])

['Hess063 keyb Kaplied.mid', 'op077 Fantaisie.mid', 'Sonata for Piano & Cello n2 op05.MID', 'Violin Concerto op61 2-3movs.mid', "Lieder op48 n6 ''Busslied''.mid", 'Piano Sonata No2 Assai vivace.mid', 'String Quartet n2 op18 n2 3mov.mid', 'Bagatella Fur Elise.mid', 'Sonatina op33 4mov.mid', "Overture ''Inauguration of the House'' op214.mid", 'WoO060 Mouvement Pour Piano.mid', '32 Variations on a theme.mid', 'Piano Concerto n2 op19 3mov.mid', 'Piano Concerto n3 op37 2mov.mid', 'Rondo Opus.51, No.2.mid', 'Piano Romance  No.50.mid', 'String Quartet n2 op18 n2 1mov.mid', 'Anh08Nb1 Gavotte 4 hands.mid', 'Piano Sonata No.27,  3rd mov.mid', 'op126 Six Bagatellas.mid', 'Preludes 2 Through Major keys 39.mid', 'Sieben Bagatellen, in D Major, Opus.33, No.6.mid', 'Piano Concerto n2 op19 1mov.mid', 'Rondo Opus.51, No.1.mid', 'Rondo in B flat.mid', 'Hess057 Bagatella.mid', 'WoO048 Rondo.mid', 'Sonata Opus.81a -Les Adieux- E flat No.1.mid', 'String Quartet n1 op18 3mov.mid', "Lieder op48 n4 ''Die Ehre

In [8]:
# Get an idea for the messages in the tracks
for i, track in enumerate(mid.tracks):
    print('Track {}: {}'.format(i, track.name))
    for msg in track:
        print(msg)

Track 0: Kaplied di Chr. F.D. Schubart
MetaMessage('track_name', name='Kaplied di Chr. F.D. Schubart', time=0)
MetaMessage('track_name', name='Hess 63 (1788-1790)', time=0)
MetaMessage('copyright', text='Copyright © 2001 di FIORELLAEARMANDO@PANET.IT', time=0)
MetaMessage('text', text='FIORELLAEARMANDO@PANET.IT\n', time=0)
MetaMessage('time_signature', numerator=4, denominator=4, clocks_per_click=24, notated_32nd_notes_per_beat=8, time=0)
MetaMessage('key_signature', key='Eb', time=0)
MetaMessage('set_tempo', tempo=535714, time=0)
MetaMessage('set_tempo', tempo=530973, time=5736)
MetaMessage('set_tempo', tempo=535714, time=12)
MetaMessage('set_tempo', tempo=540541, time=12)
MetaMessage('set_tempo', tempo=545455, time=12)
MetaMessage('set_tempo', tempo=550459, time=10)
MetaMessage('set_tempo', tempo=555556, time=12)
MetaMessage('set_tempo', tempo=560748, time=12)
MetaMessage('set_tempo', tempo=566038, time=10)
MetaMessage('set_tempo', tempo=571429, time=12)
MetaMessage('set_tempo', tempo

In [9]:
# Check track names
tracks = mid.tracks
for track in tracks:
    print(track.name)

Kaplied di Chr. F.D. Schubart
PianoR
PianoL



- “note_on” tells the key is to be pressed (or released, if velocity=0).
- “note_off” tells the key is to be released (velocity should always be set to 0).
- “channel” tells to which channel the sound is to be sent. The standard midi supports 16 channels simultaneously.
- “note” tells which key it is. We can refer to the map below for the corresponding key on piano keyboard to each midi note id.
- “velocity” tells how fast to strike the key, the faster it is, the louder the sound is.
- “time” tells us the waiting time between the last and current operation. The duration of a note is the sum of “time” from each message in between of 2 nearest messages about the same note, where the first one tells you to on the note (when you see “note_on”, and “velocity” > 0) and the last one tells you to off the note (when you see “note_off”, or “note_on” with “velocity”=0).

In [14]:
# This is not perfect, many cases will break here or just have undesired
# results usually because instrument is not a unique key. Works decently
# well for first Chopin file

# Gather event data for each track in a more readable format
dfs = []
events = {}
for track in tracks:
    for idx, msg in enumerate(track):
        if msg.type == "text":
            instrument = msg.text
            events[instrument] = {}
        elif msg.type == "track_name":
            instrument = msg.name
            events[instrument] = {}
        elif msg.type == "note_on" or msg.type == "note_off":
            events[instrument][idx] = {"instrument": instrument, "type": msg.type, "time": msg.time, "note": msg.note, "velocity": msg.velocity}
    if len(events) > 0:
        df = pd.DataFrame.from_dict(events[instrument], orient='index')
        dfs.append(df)


In [29]:
# Make sure everything checks out
dfs[2].head()

Unnamed: 0,instrument,type,time,note,velocity
6,PianoL,note_on,480,51,100
7,PianoL,note_on,120,51,0
8,PianoL,note_on,0,55,100
9,PianoL,note_on,120,55,0
10,PianoL,note_on,0,58,100


In [17]:
# Convert note values to note names and octaves
# https://gist.github.com/devxpy/063968e0a2ef9b6db0bd6af8079dad2a#file-midi_numbers-py-L153-L159
NOTES = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
OCTAVES = list(range(11))
NOTES_IN_OCTAVE = len(NOTES)
def number_to_note(number: int) -> tuple:
    octave = number // NOTES_IN_OCTAVE
    note = NOTES[number % NOTES_IN_OCTAVE]

    return note, octave

note_names = [number_to_note(note) for note in df["note"]]
print(note_names)

[('D#', 4), ('D#', 4), ('G', 4), ('G', 4), ('A#', 4), ('A#', 4), ('A#', 3), ('A#', 3), ('D#', 4), ('D#', 4), ('G', 4), ('G', 4), ('A#', 4), ('A#', 4), ('G', 4), ('G', 4), ('D#', 4), ('D#', 4), ('A#', 4), ('D#', 4), ('D#', 4), ('A#', 4), ('D', 4), ('A#', 4), ('A#', 4), ('D', 4), ('A#', 3), ('A#', 3), ('F', 4), ('F', 4), ('F', 3), ('F', 3), ('A#', 3), ('A#', 3), ('A#', 4), ('A#', 4), ('F', 4), ('F', 4), ('D', 4), ('D', 4), ('A#', 3), ('A#', 3), ('D#', 4), ('G', 4), ('G', 4), ('D#', 4), ('A#', 3), ('A#', 3), ('D#', 4), ('G', 4), ('G', 4), ('D#', 4), ('A#', 3), ('A#', 3), ('D#', 4), ('G', 4), ('G', 4), ('D#', 4), ('A#', 3), ('A#', 3), ('D#', 4), ('G', 4), ('G', 4), ('D#', 4), ('A#', 3), ('A#', 3), ('F', 4), ('G#', 4), ('G#', 4), ('F', 4), ('A#', 3), ('A#', 3), ('F', 4), ('G#', 4), ('G#', 4), ('F', 4), ('A#', 3), ('A#', 3), ('G#', 4), ('F', 4), ('F', 4), ('G#', 4), ('A#', 3), ('A#', 3), ('G#', 4), ('F', 4), ('F', 4), ('G#', 4), ('A#', 3), ('A#', 3), ('D', 4), ('F', 4), ('F', 4), ('D', 4), (