## Helpful Resources

### Data
- [Original Dataset](https://www.kaggle.com/datasets/blanderbuss/midi-classic-music/code)

### Mido
- [Mido Documentation](https://mido.readthedocs.io/en/stable/)
- [Example using mido](https://medium.com/analytics-vidhya/convert-midi-file-to-numpy-array-in-python-7d00531890c)

### MIDI
- [MIDI File Statistic Calculation](https://nbviewer.org/github/craffel/midi-ground-truth/blob/master/Statistics.ipynb)
- [Useful MIDI Functions](https://gist.github.com/devxpy/063968e0a2ef9b6db0bd6af8079dad2a#file-midi_numbers-py-L153-L159)

### AI Research
- [Composer Classification Paper](https://hal.science/hal-01879276/document)
- [Composer Classification Dissertation](https://dalspace.library.dal.ca/bitstream/handle/10222/82031/RohitNitinKher2022.pdf?sequence=3&isAllowed=y)

## Exploration

In [80]:
from os import listdir
from os.path import isfile, join

import pandas as pd
from mido import MidiFile

In [81]:
# Location of data files
PATH = "data/project/"

In [82]:
# Composers we are to use
COMPOSERS = [f for f in listdir(PATH)]
COMPOSERS

['Bach', 'Beethoven', 'Chopin', 'Mozart']

In [83]:
# Using Chopin for now
composer_path = PATH + COMPOSERS[2]
files = [f for f in listdir(composer_path) if isfile(join(composer_path, f))]
print(files)

# Import the first file for the selected composer
mid = MidiFile(composer_path + "/" + files[0])

['(2542)Prelude opus.28, No.16 in B flat minor.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.10.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.11.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.12.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.13.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.14.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.15.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.16.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.17.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.18.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.19.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.2.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.3.mid', '19 Polish Songs, for Solo Voice and Piano accomplements, No.4.mid', '19 Polish Songs, for Solo Voice and Pia

In [84]:
# Get an idea for the messages in the tracks
for i, track in enumerate(mid.tracks):
    print('Track {}: {}'.format(i, track.name))
    for msg in track:
        print(msg)

Track 0: 
MetaMessage('time_signature', numerator=6, denominator=4, clocks_per_click=24, notated_32nd_notes_per_beat=8, time=0)
MetaMessage('key_signature', key='Bbm', time=0)
MetaMessage('set_tempo', tempo=333333, time=0)
MetaMessage('time_signature', numerator=4, denominator=4, clocks_per_click=24, notated_32nd_notes_per_beat=8, time=12288)
MetaMessage('end_of_track', time=184321)
Track 1: Acoustic Grand Piano
MetaMessage('track_name', name='Acoustic Grand Piano', time=0)
program_change channel=0 program=0 time=0
note_on channel=0 note=63 velocity=80 time=0
note_on channel=0 note=69 velocity=80 time=0
note_on channel=0 note=78 velocity=80 time=0
note_on channel=0 note=29 velocity=80 time=0
note_on channel=0 note=41 velocity=80 time=0
note_off channel=0 note=63 velocity=0 time=1024
note_off channel=0 note=69 velocity=0 time=0
note_off channel=0 note=78 velocity=0 time=0
note_on channel=0 note=63 velocity=80 time=0
note_on channel=0 note=69 velocity=80 time=0
note_on channel=0 note=77 

In [85]:
# Check track names
tracks = mid.tracks
for track in tracks:
    print(track.name)


Acoustic Grand Piano


- “note_on” tells the key is to be pressed (or released, if velocity=0).
- “note_off” tells the key is to be released (velocity should always be set to 0).
- “channel” tells to which channel the sound is to be sent. The standard midi supports 16 channels simultaneously.
- “note” tells which key it is. We can refer to the map below for the corresponding key on piano keyboard to each midi note id.
- “velocity” tells how fast to strike the key, the faster it is, the louder the sound is.
- “time” tells us the waiting time between the last and current operation. The duration of a note is the sum of “time” from each message in between of 2 nearest messages about the same note, where the first one tells you to on the note (when you see “note_on”, and “velocity” > 0) and the last one tells you to off the note (when you see “note_off”, or “note_on” with “velocity”=0).

In [86]:
# This is not perfect, many cases will break here or just have undesired
# results usually because instrument is not a unique key. Works decently
# well for first Chopin file

# Gather event data for each track in a more readable format
dfs = []
events = {}
for track in tracks:
    for idx, msg in enumerate(track):
        if msg.type == "text":
            instrument = msg.text
            events[instrument] = {}
        elif msg.type == "track_name":
            instrument = msg.name
            events[instrument] = {}
        elif msg.type == "note_on" or msg.type == "note_off":
            events[instrument][idx] = {"instrument": instrument, "type": msg.type, "time": msg.time, "note": msg.note, "velocity": msg.velocity}
    if len(events) > 0:
        df = pd.DataFrame.from_dict(events[instrument], orient='index')
        dfs.append(df)


In [87]:
# Make sure everything checks out
dfs[0].head()

Unnamed: 0,instrument,type,time,note,velocity
2,Acoustic Grand Piano,note_on,0,63,80
3,Acoustic Grand Piano,note_on,0,69,80
4,Acoustic Grand Piano,note_on,0,78,80
5,Acoustic Grand Piano,note_on,0,29,80
6,Acoustic Grand Piano,note_on,0,41,80


In [88]:
# Convert note values to note names and octaves
# https://gist.github.com/devxpy/063968e0a2ef9b6db0bd6af8079dad2a#file-midi_numbers-py-L153-L159
NOTES = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
OCTAVES = list(range(11))
NOTES_IN_OCTAVE = len(NOTES)
def number_to_note(number: int) -> tuple:
    octave = number // NOTES_IN_OCTAVE
    note = NOTES[number % NOTES_IN_OCTAVE]

    return note, octave

note_names = [number_to_note(note) for note in df["note"]]
print(note_names)

[('D#', 5), ('A', 5), ('F#', 6), ('F', 2), ('F', 3), ('D#', 5), ('A', 5), ('F#', 6), ('D#', 5), ('A', 5), ('F', 6), ('C', 4), ('F', 4), ('C', 5), ('D#', 5), ('A', 5), ('F', 6), ('D#', 5), ('A', 5), ('E', 6), ('C', 4), ('F', 4), ('C', 5), ('C', 4), ('F', 4), ('C', 5), ('D#', 5), ('A', 5), ('E', 6), ('D#', 5), ('A', 5), ('D#', 6), ('C', 4), ('F', 4), ('C', 5), ('C', 4), ('F', 4), ('C', 5), ('D#', 5), ('A', 5), ('D#', 6), ('D#', 5), ('A', 5), ('C#', 6), ('C', 4), ('F', 4), ('C', 5), ('C', 4), ('F', 4), ('C', 5), ('D#', 5), ('A', 5), ('C#', 6), ('D#', 5), ('A', 5), ('C', 6), ('C', 4), ('F', 4), ('C', 5), ('C', 4), ('F', 4), ('C', 5), ('D#', 5), ('A', 5), ('C', 6), ('C', 4), ('F', 4), ('C', 5), ('F', 2), ('F', 3), ('F', 6), ('A#', 3), ('F', 6), ('C', 6), ('C', 6), ('D#', 6), ('F', 4), ('C#', 5), ('D#', 6), ('C#', 6), ('C#', 6), ('F', 5), ('A#', 3), ('F', 5), ('A', 5), ('A', 5), ('A#', 5), ('F', 4), ('C#', 5), ('F', 3), ('A#', 5), ('C', 6), ('C', 6), ('C#', 6), ('F', 3), ('A#', 3), ('C#', 6)