# Chapter 7: Compose

## MuseGAN

_MuseGAN_ is a generative machine learning model capable of generating new samples of _multiphonic_ music, i.e. it is capable of composing music with multiple tracks. This is in contrast to the LSTM with an attension mechanism in `AttentionMechanism.ipynb`, which can only generate _monophonic_ music.

MuseGAN, which was introduced in [this paper](https://arxiv.org/abs/1709.06298), like other GANs, consists of a pair of convolutional neural networks, a generator and a critic. The MuseGAN we are building will be generating two new bars of choral music, using music composed by Bach for training.

### MuseGAN Generator

### MuseGAN Critic

## Data Preparation

### Downloading the Data

Here I download the training data from a [this GitHub repository](https://github.com/czhuang/JSB-Chorales-dataset). For this repository I will use a fork of the repository at the time that I am writing this notebook.

In [1]:
!git clone https://github.com/DCtheTall/JSB-Chorales-dataset

Cloning into 'JSB-Chorales-dataset'...
remote: Enumerating objects: 36, done.[K
remote: Total 36 (delta 0), reused 0 (delta 0), pack-reused 36[K
Unpacking objects: 100% (36/36), done.


### Preprocessing the Data

The function below outputs the raw data, the data as a 4D tensor in the shape `[n_songs, n_bars, n_steps_per_bar, n_tracks]`. We then one hot encode the data into a 5D tensor of the shape `[n_songs, n_bars, n_steps_per_bar, n_notes, n_tracks]`.

In [0]:
import numpy as np

def load_music(filename, n_bars, n_steps_per_bar):
  """Load the training data into memory and preprocess it."""
  with np.load(filename, encoding='bytes', allow_pickle=True) as f:
    data = f['train']

  data_ints = []
  timesteps = n_bars * n_steps_per_bar
  
  for x in data:
    counter = 0
    while np.any(np.isnan(x[counter:(counter + 4)])):
      counter += 4
    if timesteps < x.shape[0]:
      data_ints.append(x[counter:(counter + timesteps), :])

  data_ints = np.array(data_ints)
  n_songs, _, n_tracks = data_ints.shape
  
  data_ints = data_ints.reshape((n_songs, n_bars, n_steps_per_bar, n_tracks))

  max_note = 83
  where_nans = np.isnan(data_ints)
  data_ints[where_nans] = max_note + 1
  max_note += 1

  data_ints = data_ints.astype(np.int)
  n_classes = max_note + 1

  data_binary = np.eye(n_classes)[data_ints]  # One-hot encode the pitches
  data_binary[data_binary == 0] = -1  # Replace 0s with -1s.
  # Remove the index indicating the last possible pitch. For that pitch, we
  # can just use a row of all -1's.
  data_binary = np.delete(data_binary, max_note, -1)
  data_binary = data_binary.transpose((0, 1, 2, 4, 3))

  return np.squeeze(data_binary), data_ints, data

In [0]:
N_BARS = 2
N_STEPS_PER_BAR = 16

data_binary, data_ints, data = load_music(
    'JSB-Chorales-dataset/Jsb16thSeparated.npz', N_BARS, N_STEPS_PER_BAR)

## Implementing MuseGAN

In [0]:
%tensorflow_version 1.x
from tensorflow.keras.layers import (Input, Conv2DTranspose, BatchNormalization,
                                     Activation, LeakyReLU, Reshape, Lambda,
                                     Dense, Concatenate)
from tensorflow.keras.models import Model
from tensorflow.keras.initializers import RandomNormal


def conv_t(x, filters, kernel_size, strides, padding,
           kernel_initializer='uniform_glorot', batch_norm_momentum=None,
           activation='relu'):
  """Returns a convolutional transpose layer."""
  x = Conv2DTranspose(filters=filters, kernel_size=kernel_size, strides=strides,
                      padding=padding, kernel_initializer=kernel_initializer)(x)
  if batch_norm_momentum:
    x = BatchNormalization(momentum=batch_norm_momentum)(x)
  return LeakyReLU()(x) if activation == 'lrelu' else Activation(activation)(x)


def TemporalNetwork(z_dim, n_bars, kernel_initializer, name):
  """Build a Temporal Network which generates n_bars vectors of shape z_dim."""
  input_layer = Input(shape=(z_dim,))
  x = Reshape([1, 1, z_dim])(input_layer)
  x = conv_t(x, filters=1024, kernel_size=(2, 1), strides=(1, 1),
             activation='relu', batch_norm_momentum=0.9, padding='valid',
             kernel_initializer=kernel_initializer)
  x = conv_t(x, filters=z_dim, kernel_size=(n_bars - 1, 1), strides=(1, 1),
             activation='relu', batch_norm_momentum=0.9, padding='valid',
             kernel_initializer=kernel_initializer)
  output_layer = Reshape([n_bars, z_dim])(x)
  return Model(input_layer, output_layer, name=name)


def BarGenerator(z_dim, n_steps_per_bar, n_pitches, kernel_initializer):
  """Bar Generator expands the time and pitch dimensions of the input."""
  input_layer = Input(shape=(4 * z_dim,))
  x = Dense(1024)(input_layer)
  x = BatchNormalization(momentum=0.9)(x)
  x = Activation('relu')(x)
  x = Reshape([2, 1, 512])(x)
  x = conv_t(x, filters=512, kernel_size=(2, 1), strides=(2, 1), padding='same',
             activation='relu', batch_norm_momentum=0.9,
             kernel_initializer=kernel_initializer)
  x = conv_t(x, filters=256, kernel_size=(2, 1), strides=(2, 1), padding='same',
             activation='relu', batch_norm_momentum=0.9,
             kernel_initializer=kernel_initializer)
  x = conv_t(x, filters=256, kernel_size=(2, 1), strides=(2, 1), padding='same',
             activation='relu', batch_norm_momentum=0.9,
             kernel_initializer=kernel_initializer)
  x = conv_t(x, filters=256, kernel_size=(1, 7), strides=(1, 7), padding='same',
             activation='relu', batch_norm_momentum=0.9,
             kernel_initializer=kernel_initializer)
  x = conv_t(x, filters=1, kernel_size=(1, 12), strides=(1, 12),
             padding='same', activation='relu',
             kernel_initializer=kernel_initializer)
  output_layer = Reshape([1, n_steps_per_bar, n_pitches, 1])(x)
  return Model(input_layer, output_layer)


class MuseGAN(object):
  """Implementation of MuseGAN with Keras and TensorFlow."""

  def __init__(self, z_dim, n_tracks, n_bars, n_steps_per_bar, n_pitches):
    chords_input = Input(shape=(z_dim,), name='chords_input')
    style_input = Input(shape=(z_dim,), name='style_input')
    melody_input = Input(shape=(n_tracks, z_dim), name='melody_input')
    groove_input = Input(shape=(n_tracks, z_dim), name='groove_input')

    weight_init = RandomNormal(mean=0.0, stddev=0.02)
    self.chords_temp_network = TemporalNetwork(z_dim, n_bars, weight_init,
                                               'temporal_network')
    # Ouput shape is [?, n_bars, z_dim]
    chords_output = self.chords_temp_network(chords_input)

    melody_temp_networks = [None] * n_tracks
    # Output shape will be [n_tracks, ?, n_bars, z_dim]
    melody_outputs = [None] * n_tracks
    for track in range(n_tracks):
      melody_temp_networks[track] = TemporalNetwork(z_dim, n_bars, weight_init,
                                                    'melody_{}'.format(track))
      temp_network_input = Lambda(lambda x: x[:, track, :])(melody_input)
      melody_outputs[track] = melody_temp_networks[track](temp_network_input)

    self.bar_gen = [None] * n_tracks
    for track in range(n_tracks):
      self.bar_gen[track] = BarGenerator(z_dim, n_steps_per_bar, n_pitches,
                                        weight_init)
    
    # Output shape will be [n_bars, ?, 1, n_steps_per_bar, n_pitches, n_tracks]
    bars_output = [None] * n_bars
    for bar in range(n_bars):
      track_output = [None] * n_tracks
      c = Lambda(lambda x: x[:, bar, :])(chords_output)
      s = style_input
      for track in range(n_tracks):
        m = Lambda(lambda x: x[:, bar, :])(melody_outputs[track])
        g = Lambda(lambda x: x[:, track, :])(groove_input)
        z_input = Concatenate(axis=1)([c, s, m, g])
        track_output[track] = self.bar_gen[track](z_input)
      bars_output[bar] = Concatenate(axis=-1)(track_output)
    
    generator_output = Concatenate(axis=1)(bars_output)
    self.generator = Model(
        [chords_input, style_input, melody_input, groove_input],
        generator_output)

In [0]:
Z_DIM = 32
N_TRACKS = 4
N_PITCHES = 84

gan = MuseGAN(z_dim=Z_DIM, 
              n_tracks=N_TRACKS,
              n_bars=N_BARS,
              n_steps_per_bar=N_STEPS_PER_BAR,
              n_pitches=N_PITCHES)