# Chapter 7: Compose

In this notebook, we will be using a stacked LSTM network with an _attenion mechanism_ to generate new music.

Attention mechanisms are a technique originally used for machine translation. In a normal encoder-decoder network, we encode input sequences as a single context vector. This can be a bottleneck for how much information a model can retain about the text. Attention mechanics provide a solution to this problem by replacing a single context vector with a weighted sum of the hidden states of the RNN cell during each timestep.

Each hidden state is passed through a dense layer, known as the _alignment function_, with a softmax activation layer. We then multiply the hidden state at each timestep by the output of the dense layer and the result is summed over each step of the sequence.

## Preparing the Training Data

First we download the data for training the model.

In [0]:
!mkdir -p data

In [0]:
import subprocess

music_filenames = ['cs1-2all.mid',
                   'cs5-1pre.mid',
                   'cs4-1pre.mid',
                   'cs3-5bou.mid',
                   'cs1-4sar.mid',
                   'cs2-5men.mid',
                   'cs3-3cou.mid',
                   'cs2-3cou.mid',
                   'cs1-6gig.mid',
                   'cs6-4sar.mid',
                   'cs4-5bou.mid',
                   'cs4-3cou.mid',
                   'cs5-3cou.mid',
                   'cs6-5gav.mid',
                   'cs6-6gig.mid',
                   'cs2-1pre.mid',
                   'cs3-1pre.mid',
                   'cs3-6gig.mid',
                   'cs2-6gig.mid',
                   'cs2-4sar.mid',
                   'cs3-4sar.mid',
                   'cs1-5men.mid',
                   'cs1-3cou.mid',
                   'cs6-1pre.mid',
                   'cs2-2all.mid',
                   'cs3-2all.mid',
                   'cs1-1pre.mid',
                   'cs5-2all.mid',
                   'cs4-2all.mid',
                   'cs5-5gav.mid',
                   'cs4-6gig.mid',
                   'cs5-6gig.mid',
                   'cs5-4sar.mid',
                   'cs4-4sar.mid',
                   'cs6-3cou.mid']
url_prefix = 'http://www.jsbach.net/midi/'
data_dir = 'data/'

In [4]:
for fname in music_filenames:
  print('Downloading ', fname)
  subprocess.call(['wget', url_prefix + fname])
  subprocess.call(['mv', fname, data_dir])

Downloading  cs1-2all.mid
Downloading  cs5-1pre.mid
Downloading  cs4-1pre.mid
Downloading  cs3-5bou.mid
Downloading  cs1-4sar.mid
Downloading  cs2-5men.mid
Downloading  cs3-3cou.mid
Downloading  cs2-3cou.mid
Downloading  cs1-6gig.mid
Downloading  cs6-4sar.mid
Downloading  cs4-5bou.mid
Downloading  cs4-3cou.mid
Downloading  cs5-3cou.mid
Downloading  cs6-5gav.mid
Downloading  cs6-6gig.mid
Downloading  cs2-1pre.mid
Downloading  cs3-1pre.mid
Downloading  cs3-6gig.mid
Downloading  cs2-6gig.mid
Downloading  cs2-4sar.mid
Downloading  cs3-4sar.mid
Downloading  cs1-5men.mid
Downloading  cs1-3cou.mid
Downloading  cs6-1pre.mid
Downloading  cs2-2all.mid
Downloading  cs3-2all.mid
Downloading  cs1-1pre.mid
Downloading  cs5-2all.mid
Downloading  cs4-2all.mid
Downloading  cs5-5gav.mid
Downloading  cs4-6gig.mid
Downloading  cs5-6gig.mid
Downloading  cs5-4sar.mid
Downloading  cs4-4sar.mid
Downloading  cs6-3cou.mid


Then we parse each file into a list of notes and durations.

In [5]:
from music21 import converter, chord, note

notes = []
durations = []

for fname in music_filenames:
  print('Parsing ', fname)
  original_score = converter.parse(data_dir + fname).chordify()
  for el in original_score.flat:
    if isinstance(el, chord.Chord):
      notes.append('.'.join(n.nameWithOctave for n in el.pitches))
    elif isinstance(el, note.Note):
      notes.append(str(el.name) if el.isRest
                   else notes.append(el.nameWithOctave))
        
    durations.append(el.duration.quarterLength)

Parsing  cs1-2all.mid
Parsing  cs5-1pre.mid
Parsing  cs4-1pre.mid
Parsing  cs3-5bou.mid
Parsing  cs1-4sar.mid
Parsing  cs2-5men.mid
Parsing  cs3-3cou.mid
Parsing  cs2-3cou.mid
Parsing  cs1-6gig.mid
Parsing  cs6-4sar.mid
Parsing  cs4-5bou.mid
Parsing  cs4-3cou.mid
Parsing  cs5-3cou.mid
Parsing  cs6-5gav.mid
Parsing  cs6-6gig.mid
Parsing  cs2-1pre.mid
Parsing  cs3-1pre.mid
Parsing  cs3-6gig.mid
Parsing  cs2-6gig.mid
Parsing  cs2-4sar.mid
Parsing  cs3-4sar.mid
Parsing  cs1-5men.mid
Parsing  cs1-3cou.mid
Parsing  cs6-1pre.mid
Parsing  cs2-2all.mid
Parsing  cs3-2all.mid
Parsing  cs1-1pre.mid
Parsing  cs5-2all.mid
Parsing  cs4-2all.mid
Parsing  cs5-5gav.mid
Parsing  cs4-6gig.mid
Parsing  cs5-6gig.mid
Parsing  cs5-4sar.mid
Parsing  cs4-4sar.mid
Parsing  cs6-3cou.mid


In [6]:
notes[:10]

['B3', 'G2.D3.B3', 'B3', 'A3', 'G3', 'F#3', 'G3', 'D3', 'E3', 'F#3']

In [7]:
durations[:10]

[0.0, 0.0, 0.0, 0.0, 3.75, 0.0, 0.25, 1.0, 0.25, 0.25]

In [0]:
def get_distinct(elems):
  """Get all distinct elements in a list, sorted."""
  result = sorted(set(elems))
  return result, len(result)

def create_lookup_tables(elems):
  """Generate tokenization lookup table (and inverse table)."""
  return ({e: i for i, e in enumerate(elems)},
          {i: e for i, e in enumerate(elems)})

In [0]:
note_names, n_notes = get_distinct(notes)
duration_names, n_durations = get_distinct(durations)

note_to_int, int_to_note = create_lookup_tables(note_names)
duration_to_int, int_to_duration = create_lookup_tables(duration_names)

In [0]:
%tensorflow_version 1.x
from keras.utils import np_utils
import numpy as np

SEQUENCE_LENGTH = 32

X_notes, X_durations = [], []
y_notes, y_durations = [], []

for i in range(len(notes) - SEQUENCE_LENGTH):
  notes_seq_out = notes[i + SEQUENCE_LENGTH]
  durations_seq_out = durations[i + SEQUENCE_LENGTH]
  
  X_notes.append([note_to_int[n]
                  for n in notes[i:i + SEQUENCE_LENGTH]])
  X_durations.append([duration_to_int[d]
                      for d in durations[i:i + SEQUENCE_LENGTH]])
  
  y_notes.append(note_to_int[notes[i + SEQUENCE_LENGTH]])
  y_durations.append(duration_to_int[durations[i + SEQUENCE_LENGTH]])

n_patterns = len(X_notes)

X_notes = np.reshape(X_notes, (n_patterns, SEQUENCE_LENGTH))
X_durations = np.reshape(X_durations, (n_patterns, SEQUENCE_LENGTH))
X_train = [X_notes, X_durations]

y_notes = np_utils.to_categorical(y_notes, num_classes=n_notes)
y_durations = np_utils.to_categorical(y_durations, num_classes=n_durations)
y_train = [y_notes, y_durations]

In [15]:
X_train[0][0]

array([ 77, 340,  77,  27, 371, 260, 371, 168, 237, 260, 371,  27,  77,
       144, 196,  77, 371, 260, 371, 237, 168, 124,  61, 124, 168, 237,
       260, 371,  27,  77, 144,  27])

In [16]:
X_train[1][0]

array([ 0,  0,  0,  0, 20,  0,  3,  8,  3,  3,  3,  3,  3,  3,  3,  3,  3,
        3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3])

In [17]:
y_train[0][0]

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0.

In [18]:
y_train[1][0]

array([0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0.], dtype=float32)