# Exploring MIDI

The goal of this notebook is to find a way of processing MIDI data in python. 

Looking around for possible python libraries for MIDI we have: 

    - MIDIFile
    - mido
    - pretty_midi
    

After some trial and error I find pretty_midi to be the python library best suited for this projet needs. 

    

## PrettyMIDI

PrettyMIDI is a library developed by Colin Raffel with MIT license. 

In [1]:
! pip install pretty_midi

Collecting pretty_midi
  Using cached pretty_midi-0.2.9.tar.gz (5.6 MB)
Collecting numpy>=1.7.0
  Downloading numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)
[K     |████████████████████████████████| 14.8 MB 8.2 MB/s eta 0:00:01    |██▎                             | 1.1 MB 5.9 MB/s eta 0:00:03     |███████████████████▎            | 8.9 MB 4.3 MB/s eta 0:00:02     |████████████████████████████▌   | 13.2 MB 8.2 MB/s eta 0:00:01
[?25hCollecting mido>=1.1.16
  Using cached mido-1.2.9-py2.py3-none-any.whl (52 kB)
Building wheels for collected packages: pretty-midi
  Building wheel for pretty-midi (setup.py) ... [?25ldone
[?25h  Created wheel for pretty-midi: filename=pretty_midi-0.2.9-py3-none-any.whl size=5591952 sha256=3daec8fe13f910163907105a9b2920b69e6798548711b718fef68c62f76c08df
  Stored in directory: /home/dsc/.cache/pip/wheels/b9/ef/f5/9f35c0da899320e8f443f44ecbfa3a6721a4c3eacccad39844
Successfully built pretty-midi
Installing collected packages: numpy, mido, pretty-m

## Silent Night

For this first example I'll be using a very simple midi song file. It's silent night. 

In [3]:
import pretty_midi

In [4]:
pm = pretty_midi.PrettyMIDI('data\examples\silent_night_easy.mid')

We can play the song here right in the jupyter notebook:

In [5]:
import IPython.display

In [6]:
IPython.display.Audio(pm.synthesize(fs=1600), rate=1600)

We can list the instruments present in the file: 

In [7]:
pm.instruments

[Instrument(program=0, is_drum=False, name="Piano")]

In this simple file we only have one instrument, it's a piano. 

## Bohemian Rhapsody

If we analyze a more complex song we can see there are many more instruments available in MIDI. 

In [8]:
pm = pretty_midi.PrettyMIDI('data\examples\Queen - Bohemian Rhapsody.mid')

In [9]:
IPython.display.Audio(pm.synthesize(fs=1600), rate=1600)

In [10]:
pm.instruments

[Instrument(program=60, is_drum=False, name=""),
 Instrument(program=65, is_drum=False, name=""),
 Instrument(program=0, is_drum=False, name=""),
 Instrument(program=33, is_drum=False, name=""),
 Instrument(program=48, is_drum=False, name=""),
 Instrument(program=52, is_drum=False, name=""),
 Instrument(program=62, is_drum=False, name=""),
 Instrument(program=69, is_drum=False, name=""),
 Instrument(program=30, is_drum=False, name=""),
 Instrument(program=30, is_drum=False, name=""),
 Instrument(program=55, is_drum=False, name=""),
 Instrument(program=0, is_drum=True, name=""),
 Instrument(program=47, is_drum=False, name="")]

We can get the names associated with each of the MIDI programs thanks to pretty_midi. 

In [11]:
for instrument in pm.instruments:
    print(f"Instrument {pretty_midi.program_to_instrument_name(instrument.program)}, {len(instrument.notes)}")

Instrument French Horn, 142
Instrument Alto Sax, 138
Instrument Acoustic Grand Piano, 1920
Instrument Electric Bass (finger), 480
Instrument String Ensemble 1, 779
Instrument Choir Aahs, 262
Instrument Synth Brass 1, 154
Instrument English Horn, 137
Instrument Distortion Guitar, 273
Instrument Distortion Guitar, 273
Instrument Orchestra Hit, 75
Instrument Acoustic Grand Piano, 1114
Instrument Timpani, 178


## Getting the Notes

For each instrument we have an array of notes: 

In [12]:
for note in pm.instruments[2].notes:
    print(note)

Note(start=15.384600, end=15.713125, pitch=58, velocity=123)
Note(start=15.384600, end=15.801266, pitch=65, velocity=122)
Note(start=15.384600, end=15.865369, pitch=61, velocity=123)
Note(start=15.769215, end=16.193894, pitch=58, velocity=95)
Note(start=16.153830, end=16.506394, pitch=65, velocity=122)
Note(start=16.153830, end=16.602547, pitch=61, velocity=121)
Note(start=16.538445, end=16.971137, pitch=58, velocity=104)
Note(start=16.923060, end=17.307675, pitch=65, velocity=122)
Note(start=16.923060, end=17.379790, pitch=61, velocity=121)
Note(start=17.307675, end=17.772418, pitch=58, velocity=109)
Note(start=15.384600, end=18.036841, pitch=53, velocity=123)
Note(start=17.692290, end=18.108956, pitch=65, velocity=120)
Note(start=17.692290, end=18.141007, pitch=61, velocity=120)
Note(start=18.076905, end=18.237161, pitch=58, velocity=116)
Note(start=15.384600, end=18.429469, pitch=46, velocity=120)
Note(start=18.461520, end=18.749981, pitch=59, velocity=123)
Note(start=18.461520, end

Each note is defined by its start timestamp, its end timestamp, its pitch and its velocity. 

(Velocity is the force with which a note is played)

If we just want an harmonic aproach to music analysis, it is sensible to use only the instruments that are harmonic. This means no drums: 

In [13]:
all_notes = []
for instrument in pm.instruments:
    # Drum instrument notes don't have pitches!
    if instrument.is_drum:
        continue
    for note in instrument.notes:
        #n_c_to_d += (first_note.pitch % 12 == 0) and (second_note.pitch % 12 == 2)
        all_notes.append(note)

In [14]:
len(all_notes)

4811

Let's try to put all this notes in pandas DataFrame, with the name of the note:

In [17]:
!pip install pandas

Collecting pandas
  Downloading pandas-1.1.5-cp36-cp36m-manylinux1_x86_64.whl (9.5 MB)
[K     |████████████████████████████████| 9.5 MB 4.3 MB/s eta 0:00:01     |█████▎                          | 1.6 MB 3.1 MB/s eta 0:00:03     |██████▉                         | 2.0 MB 3.1 MB/s eta 0:00:03     |███████▍                        | 2.2 MB 3.1 MB/s eta 0:00:03     |████████▏                       | 2.4 MB 3.1 MB/s eta 0:00:03     |█████████▏                      | 2.7 MB 3.1 MB/s eta 0:00:03     |██████████                      | 3.0 MB 3.1 MB/s eta 0:00:03     |██████████▋                     | 3.2 MB 3.1 MB/s eta 0:00:03     |███████████████▍                | 4.6 MB 3.1 MB/s eta 0:00:02     |██████████████████              | 5.3 MB 2.6 MB/s eta 0:00:02     |███████████████████             | 5.6 MB 2.6 MB/s eta 0:00:02     |███████████████████████▎        | 6.9 MB 2.6 MB/s eta 0:00:02     |█████████████████████████▋      | 7.6 MB 2.6 MB/s eta 0:00:01     |██████████████████████████      |

In [19]:
import pandas as pd

In [20]:
df = pd.DataFrame()
df["pitch"]= pd.Series(note.pitch for note in all_notes)
df["note_name"] = pd.Series(pretty_midi.note_number_to_name(note.pitch) for note in all_notes)

df.head()

Unnamed: 0,pitch,note_name
0,72,C5
1,74,D5
2,70,A#4
3,72,C5
4,73,C#5


At this point we are able to get a midi file, separate the instruments that aren't drums and put all their notes in a pandas DataFrame. 

Yey!


We can encapsulate this functionality in a python function for easer use: 

In [21]:
def get_all_notes_sorted(file_path): 
    pm = pretty_midi.PrettyMIDI(file_path)                    
    all_notes = []
    for instrument in pm.instruments:
        # Drum instrument notes don't have pitches!
        if instrument.is_drum:
            continue
        for note in instrument.notes:
            #n_c_to_d += (first_note.pitch % 12 == 0) and (second_note.pitch % 12 == 2)
            all_notes.append(note)
    all_notes.sort(key=lambda x: x.start, reverse=False)
    return all_notes

In [23]:
all_notes = get_all_notes_sorted('data\examples\silent_night_easy.mid')
all_notes[:5]

[Note(start=0.000000, end=0.916406, pitch=67, velocity=53),
 Note(start=0.000000, end=1.800000, pitch=60, velocity=53),
 Note(start=0.900000, end=1.204687, pitch=69, velocity=56),
 Note(start=1.200000, end=1.811719, pitch=67, velocity=61),
 Note(start=1.800000, end=3.600000, pitch=60, velocity=57)]

## Getting the Chords

We would like to achieve the same thing that we did with notes, but now with chords. We want a function that gets the path of a MIDI file and returns the groups of notes that sound at the same time in that song. 

We create a function that tells us if a note if sounding at an exact time: 

In [26]:
def is_note_sounding(note, time):
    return note.start <= time and note.end > time + 0.2

This functions returns all the notes that are sounding at a certain time: 

In [27]:
def notes_sounding(notes, time):
    return list(filter(lambda note: is_note_sounding(note, time), notes))

We get all the ticks relevant on our song, we can do it by taking the time on which each note starts:

In [32]:
ticks = []
for note in all_notes:
    ticks += [note.start]

ticks.sort()
ticks = list(dict.fromkeys(ticks))
ticks[:10]

[0.0,
 0.8999999999999999,
 1.2,
 1.7999999999999998,
 3.5999999999999996,
 4.5,
 4.8,
 5.3999999999999995,
 6.0,
 6.6]

We use these ticks to get all the chords on our song:

In [35]:
chords = []
for tick in ticks:
    chords += [notes_sounding(all_notes, tick)]
chords[:5]

[[Note(start=0.000000, end=0.916406, pitch=67, velocity=53),
  Note(start=0.000000, end=1.800000, pitch=60, velocity=53)],
 [Note(start=0.000000, end=1.800000, pitch=60, velocity=53),
  Note(start=0.900000, end=1.204687, pitch=69, velocity=56)],
 [Note(start=0.000000, end=1.800000, pitch=60, velocity=53),
  Note(start=1.200000, end=1.811719, pitch=67, velocity=61)],
 [Note(start=1.800000, end=3.600000, pitch=60, velocity=57),
  Note(start=1.800000, end=3.635156, pitch=64, velocity=54)],
 [Note(start=3.600000, end=4.516406, pitch=67, velocity=65),
  Note(start=3.600000, end=5.400000, pitch=60, velocity=64)]]

We need a function that turns this arrays of notes into arrays of note names:

In [36]:
def note_array_number_to_name( note_array ):
    return list(pretty_midi.note_number_to_name(note.pitch) for note in note_array)

In [37]:
def note_matrix_number_to_name(note_matrix):
    return list(map(note_array_number_to_name, note_matrix))

In [39]:
note_matrix_number_to_name(chords)[:20]

[['G4', 'C4'],
 ['C4', 'A4'],
 ['C4', 'G4'],
 ['C4', 'E4'],
 ['G4', 'C4'],
 ['C4', 'A4'],
 ['C4', 'G4'],
 ['C4', 'E4'],
 ['E4', 'B3'],
 ['E4', 'A3'],
 ['D5', 'G3'],
 ['G3', 'D5'],
 ['G3', 'B4'],
 ['B4', 'G3'],
 ['C5', 'C4'],
 ['C4', 'C5'],
 ['G4', 'C4'],
 ['A4', 'F3'],
 ['F3', 'A4'],
 ['C5', 'F3']]

This are the groups of notes present on the song Silent Night that we were analysing.

We can name this chords using the library music21:

In [41]:
! pip install music21



In [45]:
from music21 import *
cMinor = chord.Chord(["C4","G4","E-5"])
cMinor.pitchedCommonName

'C-minor triad'

We would like to have a list of all the chords, and when they started sounding: 

In [46]:
chords = []
for tick in ticks:
    chords.append((notes_sounding(all_notes, tick), tick))

In [51]:
pre_chords = [(note_array_number_to_name(x[0]), x[1]) for x in chords]

In [52]:
timed_chords = [(chord.Chord(x[0]).pitchedCommonName, x[1]) for x in pre_chords]
timed_chords[:6]

[('Perfect Fifth above C', 0.0),
 ('Major Sixth above C', 0.8999999999999999),
 ('Perfect Fifth above C', 1.2),
 ('Major Third above C', 1.7999999999999998),
 ('Perfect Fifth above C', 3.5999999999999996),
 ('Major Sixth above C', 4.5)]

In [54]:
def previous_chords (start, timed_chords): 
    return [x for x in timed_chords if x[1] < start]

We can build the DataFrame:

In [55]:
df = pd.DataFrame()
df["pitch"]= pd.Series(note.pitch for note in pm.instruments[0].notes)
df["note_name"] = pd.Series(pretty_midi.note_number_to_name(note.pitch) for note in pm.instruments[0].notes)
df["start"] = pd.Series(note.start for note in pm.instruments[0].notes)


for i in range(10): 
    df[f"prev_chord {i+1}"] = pd.Series( previous_chords(note.start, timed_chords)[-i][0] if len(previous_chords(note.start, timed_chords)) > i else ''  for note in pm.instruments[0].notes)

df.head(10)

Unnamed: 0,pitch,note_name,start,prev_chord 1,prev_chord 2,prev_chord 3,prev_chord 4,prev_chord 5,prev_chord 6,prev_chord 7,prev_chord 8,prev_chord 9,prev_chord 10
0,72,C5,52.30764,Perfect Fifth above C,C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C
1,74,D5,52.30764,Perfect Fifth above C,C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C
2,70,A#4,54.999945,Perfect Fifth above C,C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C
3,72,C5,55.38456,Perfect Fifth above C,C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C
4,73,C#5,55.961482,Perfect Fifth above C,C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C
5,74,D5,55.769175,Perfect Fifth above C,C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C
6,74,D5,58.076865,Perfect Fifth above C,C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C
7,75,D#5,58.46148,Perfect Fifth above C,C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C
8,77,F5,58.846095,Perfect Fifth above C,C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C
9,75,D#5,59.038402,Perfect Fifth above C,C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C


In [60]:
def get_timed_chords(all_notes):
    chords = []
    for tick in ticks:
        chords.append((notes_sounding(all_notes, tick), tick))
    pre_chords = [(note_array_number_to_name(x[0]), x[1]) for x in chords]
    timed_chords = [(chord.Chord(x[0]).pitchedCommonName, x[1]) for x in pre_chords]
    return timed_chords

In [62]:
def get_notes_and_chords(midi_file_path):
    all_notes = get_all_notes_sorted(midi_file_path)
    timed_chords = get_timed_chords(all_notes)
    df = pd.DataFrame()
    df["pitch"]= pd.Series(note.pitch for note in all_notes)
    df["note_name"] = pd.Series(pretty_midi.note_number_to_name(note.pitch) for note in all_notes)
    df["start"] = pd.Series(note.start for note in all_notes)


    for i in range(10): 
        df[f"prev_chord {i+1}"] = pd.Series( previous_chords(note.start, timed_chords)[-i][0] if len(previous_chords(note.start, timed_chords)) > i else ''  for note in all_notes)
    return df

In [63]:
get_notes_and_chords('data\examples\silent_night_easy.mid')

Unnamed: 0,pitch,note_name,start,prev_chord 1,prev_chord 2,prev_chord 3,prev_chord 4,prev_chord 5,prev_chord 6,prev_chord 7,prev_chord 8,prev_chord 9,prev_chord 10
0,67,G4,0.0,,,,,,,,,,
1,60,C4,0.0,,,,,,,,,,
2,69,A4,0.9,Perfect Fifth above C,,,,,,,,,
3,67,G4,1.2,Perfect Fifth above C,Major Sixth above C,,,,,,,,
4,60,C4,1.8,Perfect Fifth above C,Perfect Fifth above C,Major Sixth above C,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
69,55,G3,37.8,Perfect Fifth above C,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C,Major Tenth above G,Perfect Twelfth above G,Minor Fourteenth above G,Perfect Twelfth above G
70,65,F4,38.7,Perfect Fifth above C,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C,Major Tenth above G,Perfect Twelfth above G,Minor Fourteenth above G
71,62,D4,39.0,Perfect Fifth above C,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C,Major Tenth above G,Perfect Twelfth above G
72,60,C4,39.6,Perfect Fifth above C,Perfect Fifth above G,Minor Seventh above G,Perfect Octave above G,Major Third above C,Perfect Fifth above C,Perfect Octave above C,Major Tenth above C,Perfect Octave above C,Major Tenth above G
