# MIDI Exploration

Inspiration taken from [this tutorial](https://nbviewer.jupyter.org/github/craffel/midi-dataset/blob/master/Tutorial.ipynb).

We use a reduced and cleaned Lakh MIDI dataset [LMD](https://colinraffel.com/projects/lmd/) for this exploration.

It is assumed that these files are extracted in a directory called "clean_midi" in the folder one level above this (../clean_midi).

In [None]:
#!pip install pretty-midi librosa pandas

In [None]:
# Imports
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pretty_midi
import librosa.display as display
import os
import pandas as pd

# Local path constants
DATA_PATH = '../clean_midi'

In [None]:
song_name = "Head Like a Hole.mid"

pm = pretty_midi.PrettyMIDI(os.path.join(DATA_PATH, "Nine Inch Nails/{}".format(song_name)))

piano_roll = pm.get_piano_roll()

plt.figure(figsize=(10, 6))
display.specshow(piano_roll, y_axis='cqt_note', cmap=plt.cm.hot)
plt.title(song_name)

# Get instrument names from midi

In [None]:
for instrument in pm.instruments:
    print(instrument)

# Get notes for specific instrument

In [None]:
instrument_no = 3

instrument = pm.instruments[instrument_no]
instrument_piano_roll = instrument.get_piano_roll()

cols = ['start', 'end', 'pitch', 'velocity']

note_seq = []
for index, note in enumerate(instrument.notes):
    note_seq.append([note.start, note.end, note.pitch, note.velocity])
    
note_df = pd.DataFrame(note_seq, columns=cols)
note_df.head()

# Or as a numpy array

In [None]:
note_seq_np = np.zeros((len(instrument.notes), 4))
for index, note in enumerate(instrument.notes):
    note_seq_np[index] = [note.start, note.end, note.pitch, note.velocity]

note_seq_np

# Try to encode the notes as variables over discrete timesteps

That means essentially converting the note sequence above to the piano roll we saw earlier.
For this to work we:

1. ~~Subtract the first start time from all the start/end fields.~~
1. ~~(Optional) Find the tempo of the song.~~
1. ~~Split the time steps into 32th notes in the given tempo. If no tempo, split it to a static X steps/second.~~
1. Encode the note pitches across as dummies denoting whether the pitch is playing at a given timestep or not.
1. ~~Remove the start/end from the data (we don't need this information, it needs to be stored in the sequence).~~
1. (Optional) Do we need note.velocity? What does this mean to us?
1. Should we split this into 2-bar melody pieces?


In [None]:
columns = [pretty_midi.note_number_to_name(n) for n in range(0,128)]
def encode_dummies(instrument):
    """ Gonna cheat a little bit by transposing the instrument piano roll. 
        However, that leaves us with a lot of blank space. 
    """
    return pd.DataFrame(instrument.get_piano_roll().T, columns=columns)

encoded = encode_dummies(instrument)

plt.figure(figsize=(10, 3))
display.specshow(encoded.T.values, y_axis='cqt_note', cmap=plt.cm.hot)
plt.title(song_name)
encoded.plot(legend=False) 

# What's the numbers on the y axis? Is this the velocity? Can maybe ignore this and code it as 1's
encoded.head()

# Let's try to forward this to where the action happens

In [None]:
def trim_blanks(df):
    nonzero = df.apply(lambda s: s != 0)
    first_nonzero = df[nonzero].apply(pd.Series.first_valid_index).min()
    return df.iloc[int(first_nonzero):]
    
trimmed = trim_blanks(encoded)
plt.figure(figsize=(10, 3))
display.specshow(trimmed.T.values, y_axis='cqt_note', cmap=plt.cm.hot)
plt.title(song_name)

trimmed.head()

# For inspection's sake, let's drop all columns that are all 0

I want to take a closer look at the values

In [None]:
trimmed = trimmed.loc[:, (trimmed != 0).any(axis=0)]
trimmed.head()

# Note Velocity

It seems like the piano roll is encoded using note velocity for each step.
We should be able to simplify the scores by just [replacing it with a fixed number](http://electronicmusic.wikia.com/wiki/Velocity) that shows up nicely on the visualized output.

It might have the effect of the music generated sounding more "robotic", but will simplify the problem a lot.

Alternatively, we can consider scaling it between 0 and 1.

# Something about reconstructing a midi from an instrument in transposed piano roll form

Let's try to figure out how to do this...

# Attribution and References

- Colin Raffel. "Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching". [PhD Thesis, 2016](http://colinraffel.com/publications/thesis.pdf).

- Colin Raffel and Daniel P. W. Ellis. [Intuitive Analysis, Creation and Manipulation of MIDI Data with pretty_midi](http://colinraffel.com/publications/ismir2014intuitive.pdf). In Proceedings of the 15th International Conference on Music Information Retrieval Late Breaking and Demo Papers, 2014.

- [Librosa](https://doi.org/10.5281/zenodo.591533): Audio and music signal analysis in python
