<a href="https://colab.research.google.com/github/asampat3090/musicalai/blob/dev/_3_MIDI_representation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MIDI Representations

As musicians working within a digital audio workstation (DAW) like [Ableton Live](https://www.ableton.com/en/live/), FL Studio, Garageband, Sony Acid or Logic Pro, we often work with raw audio tracks but also work with midi tracks which are represented as discrete tones in a spectrum, onto which a sound is attached to each note. We can represent percussion, melodies and everything in between with these notes. MIDI has also been one of the first places where automation and music generation made its mark. Let's walk through what a midi file looks like.



In [None]:
!pip install mido

Collecting mido
  Downloading mido-1.3.0-py3-none-any.whl (50 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.3/50.3 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: mido
Successfully installed mido-1.3.0


In [None]:
# Adapted from tutorial: https://www.twilio.com/blog/working-with-midi-data-in-python-using-mido

from mido import MidiFile

mid = MidiFile('transcription.mid', clip=True)
print(mid)

In [None]:
# for msg in mid.tracks[0]:
#     print(msg)
print(len(mid.tracks[0]))

1175


A MIDI file can have multiple tracks. We start with a simple single track MIDI file and examine each of the messages. From above we can see that this single track has 1175 messages. Each one can consist of a note on or off signal. Let's take a look at the first message

In [None]:
print(mid.tracks[0][0])

MetaMessage('set_tempo', tempo=500000, time=0)


This is a starting messages or Meta message at time 0 that sets the tempo at 500000 microseconds / quarter note. This translates to 60000000/500000 = 120 quarter notes per minute or equivalently 120 BPM (beats per minute) - this is often the default setting for most DAWs as well.

Messages follow these and can have many actions, a full list can be found by the MIDI association: https://www.midi.org/specifications-old/item/table-1-summary-of-midi-message. Let's look at the next one.

In [None]:
print(mid.tracks[0][1])

program_change channel=0 program=0 time=0


Program change simply means that the mode has changed at time 0, which is expected since we have started a new song. Next comes the notes themselves - or the "Note On" and "Note Off" messages.

In [None]:
print(mid.tracks[0][3])

note_on channel=0 note=67 velocity=61 time=0


This has a a few key attributes:
* `channel` can be used to play multiple instruments at the same time (i.e. have multiple messages with different channels). In this case it is a simple piano piece so only 1 channel.
* `note` is a range from 0 to 127 where each integer corresponds to a note and frequency: [full table here](https://www.inspiredacoustics.com/en/MIDI_note_numbers_and_center_frequencies).
* `velocity` is a range from 0 to 127 representing the amplitude or volume of note played (0 - silence, 127 - loudest).
* `time` is an integer value called "tick" to synchronize the notes across tracks. To convert this to seconds or milliseconds, we can use the following forumula (TO INSERT) http://midi.teragonaudio.com/tech/midifile/ppqn.htm

### Converting Audio to Midi

Conversion of Audio to Midi is an imperfect process - there are many tools that have been created over many years to address this. The wisdom of the crowd helps here - [Reddit Post](https://www.reddit.com/r/learnpython/comments/12kiu31/recommended_python_library_for_converting_audio/). The ones that worked best for me were these

* Magenta (from Google) and is neural network based: https://piano-scribe.glitch.me/
* Basic Pitch (from Spotify): https://github.com/spotify/basic-pitch

The goal of conversion is to create robust Midi datasets that we can use to train a simple model to predict the next notes. In some ways we want to creata a simple version of [Magenta Studio](https://magenta.tensorflow.org/studio/)

In [None]:
# Code converting audio files (.wav, .mp3, etc) to .mid files

# Midi Representations for Modeling

If we look at some of the core papers looking at MIDI generation we have

* MidiNet - https://arxiv.org/pdf/1703.10847.pdf
*

At some level all of these look at splitting the Midi files into discrete time bars (typically 1 bar is 4 counts each with 4 quarter notes in between, resulting in 16 time steps for 1 bar).

X ∈ {0, 1}^(h×w)

### Training a Simple Model

Let's start with training a simple model and use some well known datasets. Let's keep the task simple focused on only simple instruments. Here are some datasets

* Piano Midi Dataset: https://paperswithcode.com/dataset/giantmidi-piano
* Datasets used by Magenta (Google Midi-based Generative models): https://magenta.tensorflow.org/datasets/
* Lakh Music Dataset: https://paperswithcode.com/dataset/lakh-midi-dataset

In [None]:
# setup google drive mount
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
data_path = '/content/drive/MyDrive/Colab Notebooks/train-data/bach-midi-dataset'

In [None]:
!tar -xvf '/content/drive/MyDrive/Colab Notebooks/train-data/bach-midi-dataset/bach-doodle.tfrecord.tar.gz' -C '/content/drive/MyDrive/Colab Notebooks/train-data/bach-midi-dataset/'

['bach-doodle.tfrecord-00000-of-00192',
 'bach-doodle.tfrecord-00001-of-00192',
 'bach-doodle.tfrecord-00002-of-00192',
 'bach-doodle.tfrecord-00003-of-00192',
 'bach-doodle.tfrecord-00004-of-00192',
 'bach-doodle.tfrecord-00005-of-00192',
 'bach-doodle.tfrecord-00006-of-00192',
 'bach-doodle.tfrecord-00007-of-00192',
 'bach-doodle.tfrecord-00008-of-00192',
 'bach-doodle.tfrecord-00009-of-00192',
 'bach-doodle.tfrecord-00010-of-00192',
 'bach-doodle.tfrecord-00011-of-00192',
 'bach-doodle.tfrecord-00012-of-00192',
 'bach-doodle.tfrecord-00013-of-00192',
 'bach-doodle.tfrecord-00014-of-00192',
 'bach-doodle.tfrecord-00015-of-00192',
 'bach-doodle.tfrecord-00016-of-00192',
 'bach-doodle.tfrecord-00017-of-00192',
 'bach-doodle.tfrecord-00018-of-00192',
 'bach-doodle.tfrecord-00019-of-00192',
 'bach-doodle.tfrecord-00020-of-00192',
 'bach-doodle.tfrecord-00021-of-00192',
 'bach-doodle.tfrecord-00022-of-00192',
 'bach-doodle.tfrecord-00023-of-00192',
 'bach-doodle.tfrecord-00024-of-00192',


In [None]:
# testing training of a simple midi prediction model

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

# load in the data
# bach dataset from magenta: https://magenta.tensorflow.org/datasets/bach-doodle#download
from torchdata.datapipes.iter import FileLister, FileOpener
datapipe1 = FileLister(data_path,"*.tfrecord-*")
print(len(list(datapipe1)))
datapipe2 = FileOpener(datapipe1, mode="b")
tfrecord_loader_dp = datapipe2.load_from_tfrecord()
example_1 = None
for example in tfrecord_loader_dp:
    example_1 = example
    break

# split data between train / val

192


In [None]:
# define the model
# model = nn.Sequential(
#     nn.LayerNorm(),
#     nn.MultiheadAttention(5,5),
#     nn.LayerNorm()
# )

print(example_1.keys())
print(example_1['backend'])

# train the model

# infer a forward pass - see result

dict_keys(['session_id', 'backend', 'request_id', 'input_sequence', 'country', 'output_sequence', 'composition_time', 'key_sig', 'feedback', 'loops_listened'])
[[b'l']]
