# Improvise Percussion Music With an LSTM Network
 
<img src="images/LSTM_cell.svg" style="width:450;height:300px;">

We develop a model to create precussion music.  Since music is sequential, we use a specialized Recurrent Neural Network (RNN) called a [LSTM model](https://en.wikipedia.org/wiki/Long_short-term_memory) (Long short-term Memory) Model to learn the patterns of musical sequences.  We then use these learned patterns to generate new music. 

The type of music depends on a collection of music files in [MIDI format](https://en.wikipedia.org/wiki/MIDI).
Each MIDI file corresponds to a musical piece, which is a series of notes over time.  In this example, we use the [Groove Dataset]() from TensorFlow.  However, in the appendix below, we show how to import files from a url.

This project was based on the [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning) course on [Sequence Models](https://www.coursera.org/learn/nlp-sequence-models) by [deeplearning.ai](https://www.deeplearning.ai/).  However, we made the following significant changes:
- our music files are percussion-based.  
    - this changes the MIDI format to be based on Unpitched notes and PercussionChords
    - these types have significantly less internet examples available than pitched notes
    - this significantly changed the Data Exploration, Data Preparation, and some of the Generating Music sections.
- we rely on standard libraries
    - code from the course required customized libraries
    - we use standard Python, Audio, and Tensorflow packages
    - all customized functions were rewritten or coded in a different way
- the data is cleaned before used to create the model
    - we remove duplicates, short musical pieces, outliers, and unused features
- added features
    - display of the musical scores with MuseScore
    - code to download midi files from a url in the appendix
    - 

## Table of Contents

- [Load Packages and Data](#load_packages)
- [Data Exploration](#explore_data)
- [Clean the Data](#clean_data) and remove outliers.
- [Data Preparation](#data_preparation) includes extracting the data into time series, randomly selecting segments of the time series, and combining segments into 3D training example matrices.
- [Build the Model](#build_model)
- [Generate Music](#generate_music) using the trained model.
- [Conclusion](#conclusion) and potential extensions.
- [Appendix: Downloading MIDI Files](#maestro)
- [References](#references) 


<a name='load'></a>
## Load Packages and Data
We load standard Python packages, TensorFlow packages, and Music packages.  Then, we load the MIDI files.

### Load Standard Python Packages

In [None]:
# Numpy is the fundamental package for scientific computing with Python.
import numpy as np

# Pandas provides data structures and data analysis tools for Python
import pandas as pd

# Matplotlib is a Python 2D plotting library 
from matplotlib import pyplot as plt

# Seaborn is a Python data visualization library based on matplotlib.
import seaborn as sns

# Operating system dependent functionality

# Implements binary protocols 
from io import BytesIO

# Set the random seed
from numpy.random import seed
seed(1)

# Set directories
location = '../files/'
img_location = '../images/'

### Load Audio Libraries

In [None]:
import fluidsynth
import glob
import pretty_midi
import subprocess
import tempfile
from IPython.display import Image, Audio
import music21

# Specify the path to your FluidSynth soundfont (adjust this path accordingly)
soundfont_path = '/usr/share/sounds/sf2/FluidR3_GM.sf2'

### Load Tensorflow Packages

In [None]:

# TensorFlow is an open source machine learning framework.
import tensorflow as tf

# TensorFlow Datasets is a collection of datasets ready to use with TensorFlow.
import tensorflow_datasets as tfds

# TensorFlow Model is a high-level API to build and train models in TensorFlow.
from tensorflow.keras.models import Model

# TensorFlow Layers builds neural network architectures.
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.layers import Dropout, Reshape, Lambda, RepeatVector, Flatten

# TensorFlow Callbacks provides a set of functions that can be applied at different stages of training.
from tensorflow.keras.optimizers.legacy import Adam
from tensorflow.keras.utils import to_categorical

# Set seed for reproducibility
tf.random.set_seed(1)

# Set verbosity to low 
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)


### Load Data

Here, we use the [Groove dataset](https://www.tensorflow.org/datasets/catalog/groove) from the [TensorFlow Datasets](https://www.tensorflow.org/datasets) library. This dataset contains 13.6 hours of drumming performances from 10 drummers of various skill levels playing to a click track. The dataset contains 1,150 MIDI files and over 22,000 measures of drumming.

In [None]:
# Load the full Groove dataset with MIDI only (no audio) as a tf.data.Dataset
music_dataset = tfds.load(
    name="groove/full-midionly",
    split=tfds.Split.TRAIN,
    try_gcs=True)
print("Type of Music Dataset", type(music_dataset))

<a name='explore_data'></a>
## Explore the Data
- [Extract First Few Music Files](#extract_samples)
- [Explore Features of Music Files](#explore_features)
    - [Numeric Features](#numeric_features)
    - [Non-Numeric Features](#non_numeric_features)
- [Explore Features of Percussion MIDI](#explore_midi)

#### Display Dataset

Display the first few records of the dataset.  Each record of the Groove dataset corresponds to a song, and contains the following fields:
  - **bpm**: beats per minute
  - **drummer**: a unique number corresponding to the drummer who played the song
  - **id**: a unique string iddentying the song
  - **midi**: a sequence of musical notes
  - **style/primary**: the style of the song
  - **style/secondary**: the secondary style of the song
  - **time_signature**: the time signature of the song
  - **type**: the type of the song

If the Groove dataset has been replaced by MIDI files with a different type (not purely percussion), 
then the fields will be different.

In [None]:
# Create a pandas dataframe from the music dataset
music_df = tfds.as_dataframe(music_dataset)

In [None]:
# Display the first few records of the dataframe
music_df.head()

<a name='extract_samples'></a>
### Extract First Few Music Files

In the first step of exploring the data, we extract the first three music files from the dataset and save them as output_file0, output_file1, and output_file2 in wav format. We want to listen to the music files and get a sense of what the data looks like. We also want to see if there are any obvious differences between the music files. We will use the IPython library to display the audio files in the notebook and the MuseScore package to display the musical scores.

In [None]:
# Extract the first few MIDI files from the music_df DataFrame
#   NOT NEEDED if the music samples have already been extracted

# Number of music files to explore in detail
num_explore = 3

# Loop through the first three MIDI files in music_df
for i in range(num_explore):

    # Extract the first MIDI file from music_df (assuming it's in bytes format)
    midi_data = music_df.iloc[i]['midi']

    # Create a BytesIO object to work with the MIDI data
    midi_file = BytesIO(midi_data)

    # Specify the local output MIDIfile path
    output_mid_file = location + "output_file" + str(i) + ".mid"
    with open(output_mid_file, 'wb') as file:
        file.write(midi_data)

    # Create a temporary file to store the MIDI data
    with tempfile.NamedTemporaryFile(suffix='.mid', delete=False) as temp_midi_file:
        temp_midi_path = temp_midi_file.name
        temp_midi_file.write(midi_data)

    # Specify the local output WAV file path
    output_wav_local_path = location + "output_file" + str(i) + ".wav"

    # Convert MIDI to WAV using FluidSynth and save to the local directory
    subprocess.run(['fluidsynth', '-a', 'alsa', '-o', 'audio.alsa.device=default', '-F', output_wav_local_path, soundfont_path, temp_midi_path])



#### Play Sample Audio Files

In [None]:
Audio(location + "output_file0.wav")

In [None]:
Audio(location + "output_file1.wav")

In [None]:
Audio(location + "output_file2.wav")

#### Music Notation for Sample Audio Files

In [None]:
print("The second page of music notation for Audio 0:")
fig = Image(filename=(img_location + 'output_file0-2.png'))
fig


In [None]:
print("Music notation for the first page of output_file1")
fig = Image(filename=(img_location + 'output_file1-1.png'))
fig

In [None]:
print("Music notation for the first page of output_file2")
fig = Image(filename=(img_location +'output_file2-1.png'))
fig

<a name='explore_features'></a>
### Explore the Features of the Dataset

In [None]:
# Types of columns in the dataset
music_df.info()

In [None]:
print("Number of musical pieces in dataset:", len(music_df))

<a name='numeric_features'></a>
#### Numeric Features
Print statistics and plot histograms for numeric features.

In [None]:
# Statistics of the numeric features of the dataset
music_df.describe()

In [None]:
df_num = music_df.select_dtypes(include=np.number)
for col in df_num.columns:
    sns.countplot(music_df, x=col).set_title(col)
    plt.show()

##### Correlation of the Numeric Features

In [None]:
print("Absolute correlation between numeric features.")
sns.heatmap(df_num.corr().abs())

In [None]:
# Highly correlated features should be removed
print("Sorted Pairwise correlations: ")
print(df_num.corr().abs().unstack().sort_values()[:-len(df_num.corr()):2])

None of the features is highly correlated with the others.

<a name='non_numeric_features'></a>
#### Non-Numeric Features
Print statistics and plot histograms for non-numeric features.

In [None]:
print("Statistics for string columns\n")
music_df.describe(include=[object])

It appears that one of the musical pieces occurs twice in the datset.  We remove the duplicate in the 'Clean the Data' section.

The id feature is a string that uniquely identifies each piece of music.  The distribution of the id feature is uniform and not interesting. 
The midi feature contains the music infomation, and a histogram of this data would not be helpful.  However, the style/secondary feature is a categorical variable that can be plotted. 

In [None]:
sns.countplot(music_df, x='style/secondary')
plt.title('style/secondary')

<a name='explore_midi'></a>
### Explore Midi Files

Each MIDI file in this collection: 
- Starts with the same Header
*b'MThd\x00\x00\x00\x06\x00\x00\x00\x01\x01\xe0'*
- Contains a limited number of tracks, each starting with *b'MTrk'*
- Each track contains a sequence of instruments, including *b'Midi Drums'*

We add temporary features to explore the midi files

In [None]:
# Show some sample MIDI data
print("Sample Tracks (with header removed):")
for i in range(num_explore):
    ex = music_df.loc[i, 'midi']
    header, remainder = ex.split(b'MTrk')
    print("\t", remainder)

# Add temporary features to the datset
print("Adding temporary features to the dataset")
music_df['Header'] = music_df['midi'].apply(lambda x: x.count(b'MThd\x00\x00\x00\x06\x00\x00\x00\x01\x01\xe0'))
music_df['NumTracks'] = music_df['midi'].apply(lambda x: x.count(b'MTrk'))
music_df['NumDrums'] = music_df['midi'].apply(lambda x: x.count(b'Midi Drums'))
music_df['Brooklyn'] = music_df['midi'].apply(lambda x: x.count(b'Brooklyn'))

# Display statistics of the temporary features
tmp_features = ['Header', 'NumTracks', 'NumDrums', 'Brooklyn']
music_df[tmp_features].describe()

The temporary features confirm that each midi sequence starts with the same header and contains one track (starting with b'MTrk').  Each track contains 0, 1, or 2 instandce of b'Midi Drums' and 0 or 1 instantces of b'Brooklyn'.  After exploring the data, we no longer need these temporary features.  We can remove them from the data set.

In [None]:
music_df.drop(tmp_features, inplace=True, axis=1)   
music_df.head(10)

We use the [music21 Library](https://web.mit.edu/music21/doc/moduleReference/moduleMidi.html#music21.midi.MidiFile) to process the MIDI sequences into 
[streams](https://web.mit.edu/music21/doc/moduleReference/moduleMidiTranslate.html#modulemiditranslate) and display a sample stream of music 
starting with the time of the note/chord and the type of note/chord.

In [None]:
print("Music21 Stream of first MIDI sequence:")

s = music21.midi.translate.midiStringToStream(music_df.loc[0, 'midi'])
s.show("text")

<a name='clean_data'></a>
## Clean the Data
In this section, we 
- [Remove Duplicates](#remove_duplicates)
- [Remove Outliers](#remove_outliers)
- [Remove Features](#remove_features) that are not useful for our analysis
- [Remove Short Pieces](#remove_short_pieces) 

<a name='remove_duplicates'></a>
### Remove Duplicates

In [None]:
music_df.drop_duplicates()

<a name='remove_outliers'></a>
### Remove Outliers
We remove the following outliers from the dataset:
- Drummers. The drummers numbered 1, 3, 4, 5, 8, and 9 have a small number of samples. 
- Style/Primary. The style/primary numbered 0, 2, 4, 7, and 11 are not significant.
- Time Signature. The time_signature feature is mostly 1.  The other values are insignificant.  

In [None]:
music_df = music_df.drop(music_df[music_df.drummer.isin([1, 3, 4, 5, 8, 9])].index)
music_df = music_df.drop(music_df[music_df['style/primary'].isin([0, 2, 4, 7, 11])].index)
music_df = music_df.drop(music_df[music_df.time_signature.isin([0, 2, 3, 4])].index)

<a name='remove_features'></a>
### Remove Unused Features

The id feature uniquely identifies each song. It is not useful for our purposes, so we will remove it.

In [None]:
music_df.drop(['id'], inplace=True, axis=1)

<a name='remove_pieces'></a>
### Remove Short Pieces
We can't use short pieces of music in our training data.  We remove pieces of music with midi string length less than 500.

In [None]:
music_df = music_df[music_df.midi.str.len().ge(500)]
print("Remaining number of musical pieces in dataset:", len(music_df))

<a name='data_preparation'></a>
## Data Preparation

- [Parameters](#parameters)
- [Load Midi Files](#load_midi)
- [Create Time Series](#create_time_series)
- [Represent Notes/Chords](#represent_notes)
- [Create Model Input and Output](#model_input_output)

<a name='parameters'></a>
### Parameters

In [None]:
num_midi = 20 # number of midi files to use for training, 
Tx = 30   # number of time steps per input sequence
Ty = 30   # number of time steps per output sequence
mx = 400  # number of snippets of music to train on

<a name='load_midi'></a>
### Load Midi Files

The Music21 library divides streams of music into:
[scores](https://web.mit.edu/music21/doc/moduleReference/moduleStreamBase.html#music21.stream.base.Score), 
[parts](https://web.mit.edu/music21/doc/moduleReference/moduleStreamBase.html#music21.stream.base.Part), and 
[notes](https://web.mit.edu/music21/doc/moduleReference/moduleStreamBase.html#music21.stream.base.Stream.notes).

Below, we create a list of musical scores.  Each score contains a list of parts.  Each part contains a list of notes/chords.  Each note contains a pitch and a duration, unless its an [Unpitched]() note.  

In [None]:
originalScores = []
quantize = True
for i in range(num_midi):
    j = music_df.index[i]
    score = music21.converter.parseData(music_df.loc[j, 'midi'], quantizePost=quantize)
    originalScores.append(score)

print("Number of original scores: ", len(originalScores))

<a name='create_time_series'></a>
### Create Time Series

For each music piece, extract the [notes/chords](https://web.mit.edu/music21/doc/moduleReference/modulePercussion.html) and create a time series from these. In these percussion pieces, the notes/chords are of types 
- [PercussionChord](https://web.mit.edu/music21/doc/moduleReference/modulePercussion.html#music21.percussion.PercussionChord)
    -- a series of notes played at the same time
- [Unpitched](https://web.mit.edu/music21/doc/moduleReference/moduleNote.html#music21.note.Unpitched)

Technical Note: If the [quantization option parse](http://web.mit.edu/music21/doc/moduleReference/moduleConverter.html#music21.converter.parse) is not turned off, the music21 library will round the time offset of the notes.

In [None]:
# Initialize time series 
time_series_dict = {}

# Initialize chord dict
chord_list = []
chord_dict = {}
chord_num = 0
time_dict = {}
instrument_set = set([])


# For each musical piece in the originalScores list
for i, piece in enumerate(originalScores[:num_midi]):

    # Divide piece into parts
    part_stream = piece.parts.stream()  
   
    # Initialize time step
    time_note_pairs = []

    # t0 is the previous time, which is 
    #    initialized to 0.0 for each piece
    t0 = 0.0
     
    # Iterate through the notes in the Score
    for n1 in piece.flatten().notes:
  
        # Check if the note is a PercussionChord
        if isinstance(n1, music21.percussion.PercussionChord):

            # Extract the names of the percussion instruments in the chord
            instrument_list = [n2.getInstrument().instrumentName for n2 in n1.notes]
            chord = '/'.join(instrument_list)
            instrument_set.update(instrument_list)

        # Else check if note is Unpitched    
        elif isinstance(n1, music21.note.Unpitched):
            # For Unpitched percussion notes
            chord = n1.getInstrument().instrumentName
            instrument_set.add(chord)
          
        # Else check if note is listed as a Voice stream   
        elif isinstance(n1, music21.stream.Voice):
            for item in n1:
                if isinstance(item, music21.percussion.PercussionChord):
                    instrument_list = [n2.getInstrument().instrumentName for n2 in item.notes]
                    chord = '/'.join(instrument_list)
                    instrument_set.update(instrument_list)
                    
                elif isinstance(item, music21.note.Unpitched):
                    chord = item.getInstrument().instrumentName
                    instrument_set.add(chord) 

        # If the chord is not yet in the dict, add it
        if chord not in chord_dict.keys():
            chord_dict[chord] = chord_num
            time_dict[chord] = n1.offset - t0
            chord_num += 1
        t0 = n1.offset

        # Add the chord number to the time series
        time_note_pairs.append((n1.offset, chord_dict[chord]))     

    # Record the time series for the piece
    time_series_dict[i] = time_note_pairs
    chord_list.append(pd.Series([s[0] for s in time_series_dict[i]]). unique().tolist())

# Caclulate the number of unique chords
nx = chord_num
print(nx, "Chords:")

# Print Time series info
print("Number of time series", len(time_series_dict))
print("Sample time series")
for i in range(min(len(time_series_dict), 8)):
    print("    Series", i, "length:", len(time_series_dict[i]), time_series_dict[i])

<a name='represent_notes'></a>
### Represent notes/chords

For each musical piece $p$ and each time $t$ in the time series, we represent the note/chord by a one-hot vector. 

In [None]:
# Return the vector representation of the note/chord at time step t in time series p
def represent_note(p, k):

    # Find the note/notes at time step k in the pth time series
    chord_num = time_series_dict[p][k][1]
   
    # Initialize the note representation
    representation = np.zeros(nx)

    # Indicate which chord is being played
    representation[chord_num] = 1

    # Return the note representation
    return representation


# Test the represent_note function
print("Testing: Representing the first 3 chords in the first time series:")
for j in range(3):
    print("    ", represent_note(0, j))

<a name='model_input_output'></a>
### Create Model Input and Target

The input and output to the model will have the following shapes:

- `X` is an $(m_x, T_x, n_x)$ dimensional array. 
    - the first dimension indexes the **training example**
    - the second dimension indexes **time** in the interval $[0, T_x)$
    - the third dimension indexes **note values** in the set $\{0, 1, \ldots, n_x-1\}$.
        - each note value is represented as a one-hot vector. 
    - X[i,t,:] is a one-hot vector representing the value of the i-th example at time t. 

- `Y` is a $(T_y, m, n_x)$ dimensional array
    - This is essentially the same as `X`, but shifted one step to the right (to the future)., and may contain additional notes.
    - Notice that the data in `Y` is **reordered** to be dimension $(T_y, m, n_x)$. This format makes it more convenient to feed into the LSTM later.
    - The sequence model will predict $Y = x^{\langle t+1 \rangle}, \ldots, x^{\langle t+T_y \rangle}$ given $x^{\langle 1\rangle}, \ldots, x^{\langle t \rangle}$. 

- We define the input X by randomly choosing $m$ time series of length $T_x$ from the time series of length $T_p$ for each piece of music $p$. 
- We define the target Y by shifting the input one time step to the right.  Thus, the target Y is the same as the input X, but shifted one time step to the right.  

In [None]:
# The number of consecutive time steps needed to extract input $X$ and target $Y$
needed_time_steps = Ty + 1
mx = 50  # number of snippets of music to train on

# Make a list of all the potential times $t_0$ and pieces $p$ where the time series for $X$ can start
potential_starts = []
for i in range(num_midi):

    piece_length = len(time_series_dict[i])
    # If last time step is not long enough to extract X and Y, ignore the piece
    if  piece_length >= needed_time_steps:
        for j in range(piece_length - needed_time_steps):
            potential_starts.append((i, j))

# Randomly choose $mx$ of these potential starts
potential_starts = np.array(potential_starts)
np.random.shuffle(potential_starts)
starts = potential_starts[:mx]

# Create the training set from these randomly chosen start points
X = [[] for _ in range(mx)]
for k in range(mx):

    i, j = starts[k]

    # For each time step in the first $Tx$ time steps of the time series???
    for t in range(Tx):

        # Append the vector representation of the note/chord at time step $t$ in time series $p$ to the training set
        X[k].append(represent_note(i, t + j))

# Create the target set from these randomly chosen start points
Y = [[] for _ in range(Ty)]
for t in range(Ty):
    for k in range(mx):
        i, j = starts[k]
        Y[t].append(represent_note(i, t + j + 1))

# Convert the training set and target set to numpy arrays
X = np.array(X)
Y = np.array(Y)

# Print the shape of the training set and target set
print("X.shape:", X.shape)
print("Y.shape:", Y.shape)

<a name='building_model'></a>
## Build the Model

We  use an LSTM with hidden states that have $n_{a}$ dimensions.

<img src="LSTM_Cell.svg" style="width:600;height:400px;">
<caption><center><<b>Figure 1</b>: General LSTM model </center></caption>


### Hyperparameters

In [None]:
# number of dimensions for the hidden state of each LSTM cell.
n_a = 400

# opt = Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.999, weight_decay=0.01)
opt = Adam(learning_rate=0.0001)

epoch_num = 200

seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)

# Sampling rate for audio playback
_SAMPLING_RATE = 16000

### DJ Model Function
* we need to generate items in the sequence one at a time using $x^{\langle t\rangle} = y^{\langle t-1 \rangle}$. 
* The function `djmodel()` will call the LSTM layer $T_x$ times using a for-loop.
* All $T_x$ copies have the same weights that aren't re-initialized.
* the types of layers are:
    * [Reshape()](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Reshape): Reshapes an output to a certain shape.
    * [LSTM()](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM): Long Short-Term Memory layer
    * [Dense()](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense): A regular fully-connected neural network layer.

In [None]:
# Global Variables
reshaper = Reshape((1, nx))                  
LSTM_cell = LSTM(n_a, return_state = True)        
densor = Dense(nx, activation='softmax')    

In [None]:
# FUNCTION: djmodel

def djmodel(Tx, LSTM_cell, densor, reshaper):
    """
    Implement the djmodel composed of Tx LSTM cells where each cell is responsible
    for learning the following note based on the previous note and context.
    Each cell has the following schema: 
            [X_{t}, a_{t-1}, c0_{t-1}] -> RESHAPE() -> LSTM() -> DENSE()
    Arguments:
        Tx -- length of the sequences in the corpus
        LSTM_cell -- LSTM layer instance
        densor -- Dense layer instance
        reshaper -- Reshape layer instance
    
    Returns:
        model -- a keras instance model with inputs [X, a0, c0]
    """
    # Get the shape of input values
    nx = densor.units
    
    # Get the number of the hidden state vector
    n_a = LSTM_cell.units
    
    # Define the input layer and specify the shape
    X = Input(shape=(Tx, nx)) 
    
    # Define the initial hidden state a0 and initial cell state c0 using `Input`
    a0 = Input(shape=(n_a,), name='a0')
    c0 = Input(shape=(n_a,), name='c0')
    a = a0
    c = c0
    
    # Define blocks
    reshaper = Reshape((1, nx))                 
    LSTM_cell = LSTM(n_a, return_state = True)         
    densor = Dense(nx, activation='softmax')     
    
    # Create empty list to append the outputs
    outputs = []
    
    # Loop over time steps in [0, Tx)]
    for t in range(Tx):
        
        # Select the "t"th time step vector from X. 
        x = X[:,t,:]

        # Reshape x to be (1, nx)
        x = reshaper(x)
        x = Dropout(0.05)(x)
        
        # Perform one step of the LSTM_cell
        a, _, c = LSTM_cell(inputs=x, initial_state=[a, c])

        # Apply densor to the hidden state output of LSTM_Cell
        out = densor(a)

        # Add the output to "outputs"
        outputs.append(out)
        
    # Create model instance
    model = Model(inputs=[X, a0, c0], outputs=outputs)
    
    return model

### Create the model object


In [None]:
model = djmodel(Tx, LSTM_cell=LSTM_cell, densor=densor, reshaper=reshaper)

In [None]:
# Check the model layers
model.summary()

### Compile the model for training

With options:
    - optimizer: Adam optimizer
    - Loss function: categorical cross-entropy (for multi-class classification)

In [None]:
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['acc'])

### Train the model

In [None]:
a0 = np.zeros((mx, n_a))
c0 = np.zeros((mx, n_a))
history = model.fit([X, a0, c0], list(Y), validation_split=0.2, epochs=epoch_num)

In [None]:
print(f"loss at first epoch: {history.history['loss'][0]:.2}")
print(f"loss at last epoch: {history.history['loss'][epoch_num -1]:.2}")
plt.plot(history.history['loss'])

In [None]:
# The accuracy measure is the first key that includes 'acc' 
for key in history.history.keys():
    i = str(key).find('acc')
    if i > -1:
        print(key)
        acc_measure = key
        break

In [None]:
print(f"acc at first epoch: {history.history[acc_measure][0]:.2}")
print(f"acc at last epoch: {history.history[acc_measure][epoch_num -1]:.2}")
# plt.plot(history.history['dense_17_accuracy'])
plt.plot(history.history[acc_measure])
plt.plot(history.history['val_' + str(acc_measure)])
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validate'], loc='upper left')
plt.show()

<a name='generate_music'></a>
## Generate Music

We use our trained model to synthesize new music. 


<a name='music_inference_model'></a>
### Inference Model

The function `music_inference_model()` samples a sequence of musical values by propagating the LSTM forward. 

In [None]:
# FUNCTION: music_inference_model

def music_inference_model(LSTM_cell, densor, Tp=100):
    """
    Uses the trained "LSTM_cell" and "densor" from model() to generate a sequence of chords indicators.
    
    Arguments:
    LSTM_cell -- the trained "LSTM_cell" from model(), Keras layer object
    densor -- the trained "densor" from model(), Keras layer object
    Tp -- integer, number of time steps to generate for the new piece
    
    Returns:
    inference_model -- Keras model instance
    """
    
    # Get the shape of input values
    nx = densor.units

    # Get the number of the hidden state vector
    n_a = LSTM_cell.units
    
    # Define the input to the model with a shape 
    x0 = Input(shape=(1, nx))
 
    # Define s0, initial hidden state for the decoder LSTM
    a0 = Input(shape=(n_a,), name='a0')
    c0 = Input(shape=(n_a,), name='c0')
    a = a0
    c = c0
    x = x0

    # Create an empty list of "outputs" 
    outputs = []
    
    # Loop over Tp and generate a value at every time step
    for t in range(Tp):

        # Perform one step of LSTM_cell
        a, _, c = LSTM_cell(inputs=x, initial_state=[a,c])
        
        # Apply Dense layer to the hidden state output of the LSTM_cell 
        out = densor(a)

        # Append the prediction to "outputs".
        outputs.append(out)
 
        # Select the next value according to "out",
        # Set "x" to be the one-hot representation of the selected value
        idx = tf.math.argmax(out, axis=-1)
        x = tf.one_hot(idx, nx)
      
        # Use RepeatVector(1) to convert x into a tensor with shape=(None, 1, nx)
        x = RepeatVector(1)(x)     
        
    # Create model instance 
    inference_model = Model(inputs=[x0, a0, c0], outputs=outputs)

    return inference_model

# Define an inference model with 50 values
inference_model = music_inference_model(LSTM_cell, densor, Tp=20)

In [None]:
# Check the inference model
inference_model.summary()

<a name='predict_and_sample'></a>
### Predict and Sample

Use the inference model to predict an output `pred` which should be a list of length piece_length where each element is np (1, nx).

In [None]:
# Initialize x, a, and c as zeros
x_0 = np.zeros((1, 1, nx))
a_0 = np.zeros((1, n_a))
c_0 = np.zeros((1, n_a))

In [None]:
# FUNCTION: predict_and_sample

def predict_and_sample(inference_model, x_initializer = x_0, 
                       a_initializer = a_0, c_initializer = c_0):
    """
    Predicts the next Tp values using the inference model.
    
    Arguments:
    inference_model -- Keras model instance for inference time
    x_initializer -- numpy array of shape (1, 1, nx), one-hot vector initializing the values generation
    a_initializer -- numpy array of shape (1, n_a), initializing the hidden state of the LSTM_cell
    c_initializer -- numpy array of shape (1, n_a), initializing the cell state of the LSTM_cel
    
    Returns:
    results -- np array (Tp, nx), one-hot vectors representing the values generated
    indices -- np array (Tp, 1), indices representing the values generated
    """
    
    nx = x_initializer.shape[2]
    
    # Use inference model to predict an output sequence given x_initializer, a_initializer and c_initializer.
    print("Shapes: x0, a0 ", x_initializer.shape, a_initializer.shape)
    pred = inference_model.predict([x_initializer, a_initializer, c_initializer])
    
    # Convert "pred" into an np.array() of indices with the maximum probabilities
    print("   pred list ", len(pred))
    indices = np.argmax(pred, axis=-1)

    # Convert indices to one-hot vectors, the shape of the results should be (Tp, nx)
    print("   indices ", indices.shape)
    results = to_categorical(indices, num_classes=nx)
    print("   results ", results.shape)
   
    return results, indices

<a name='create_music'></a>
### Create Music 

#### Map the Percussion Instruments to Pitches

In [None]:
# Create dictionary
instrument_type_dict = {'Bass Drum': music21.instrument.BassDrum(),
                        'Snare Drum': music21.instrument.SnareDrum(),
                        'Tom Tom': music21.instrument.TomTom(),
                        'Tom-Tom': music21.instrument.TomTom(),
                        'Crash Cymbals': music21.instrument.CrashCymbals(),
                        'High Hat Cymbal': music21.instrument.HiHatCymbal(),
                        'Hi-Hat Cymbal': music21.instrument.HiHatCymbal(),
                        'Vibraslap': music21.instrument.Vibraslap(),
                        'Percussion': music21.instrument.Percussion()}


In [None]:
def generate_music(model, reversed_chord_dict, chord_dict, times, output_file):
    """
    Generates music using a model trained to learn musical patterns
    Creates an audio stream to save the music and play it.
    
    Arguments:
    model -- Keras model Instance, output of djmodel()
    chord_list -- list of all the instruments used in the input set
    output_file -- name of file to use for saving midi and wav files
    
    Returns:
    predicted_tones -- python list containing predicted tones
    """
    
    # set up audio stream
    out_stream = music21.stream.Stream()
    out_stream.timeSignature = music21.meter.TimeSignature('4/4')
    # Initialize tempo of the output stream with 130 bit per minute
    out_stream.insert(0.0, music21.tempo.MetronomeMark(number=130))
    # Add the key of C major
    out_stream.insert(0.0, music21.key.Key('C'))
    
    # Initialize chord variables
    curr_offset = 0.0     

    # Choose the starting note and instrument
    first_note = music21.note.Unpitched()
    first_note.storedInstrument = instrument_type_dict['Bass Drum']
    # first_note.displayInstrument = music21.instrument.BaseDrum()    
    print("First note", first_note.getInstrument().instrumentName)

    # Insert the first note into our output stream
    out_stream.insert(curr_offset, first_note)

    # Generate a sequence of chords using the model
    _, indices = predict_and_sample(inference_model)
    indices = list(indices.squeeze())
    # print(indices[:10])
    chord_names = [reversed_chord_dict[p] for p in indices]
    print(chord_names[:3])
    
    # Build the list of notes to play
    for k in range(len(indices) - 1):
        # Split the instruments in the chord
        instrument_names = chord_names[k].split('/')

        # Assign the time step associated with this chord
        t = times[chord_names[k]]

        # Insert the notes into the output stream
        note_list = []
        for inst in instrument_names:
            unp = music21.note.Unpitched()
            unp.storedInstrument = instrument_type_dict[inst]
            unp.quarterLength = t + curr_offset
            unp.quarterLength = t + curr_offset
            note_list.append(unp)
        # If only one note, add to output stream
        if len(note_list) == 1:
            out_stream.append(unp)
        # If more than one note, add chord to output stream
        else:
            pChord = music21.percussion.PercussionChord(note_list)
            out_stream.append(pChord)
        # Update the time offset
        curr_offset += t

    # Save audio stream to file
    mf = music21.midi.translate.streamToMidiFile(out_stream)
    mf.open(output_file +".midi", 'wb')
    mf.write()
    print("Your generated music is saved in " + output_file + ".midi")
    mf.close()
    
    return out_stream

In [None]:
num_to_chord = dict([(x[1], x[0]) for x in chord_dict.items()])
output_file = location + "generated"
out_stream = generate_music(inference_model, num_to_chord, chord_dict, 
time_dict, output_file)

#### Play Generated Audio

Note that currently the audio is being played on a piano sound, regardless of the choice of instruments.  This is because the piano is the default instrument in the MIDI library.  We will need to change the instrument to the correct one before playing the music.  
Each note represents a drum instrument.  
Still working on this...

In [None]:

# Specify the local output WAV file path
output_wav_local_path = output_file + ".wav"

# Specifiy location of midi file
temp_midi_path = output_file + ".midi"

# Convert MIDI to WAV using FluidSynth and save to the local directory
subprocess.run(['fluidsynth', '-a', 'alsa', '-o', 'audio.alsa.device=default', '-F', output_wav_local_path, soundfont_path, temp_midi_path])



In [None]:
# Playthe audio of the generated piece
Audio(location + 'generated.wav')

In [None]:
# Play an audio of another generated piece
Audio(location +'generatedsample2.wav')

<a name='conclusion'></a>
## Conclusion and Extensions

This model could be extended in many ways including the following:
- Use a larger sample of pieces from the dataset
- Use a more general form of MIDI files, not just Percussion
- Use shorter pieces to train and to generate shorter pieces.
- Use a random choice for the first note of the generated piece.
- Handle time steps more robustly.
- Use a more sophisticated model, such as a [Transformer](https://arxiv.org/abs/1706.03762) model.  
- Optimize hyperparameters.

<a name='maestro'></a>
## Appendix: Downloading a Different Dataset

In the demonstrtion above, we used data from the Groove dataset to create drumming pieces of music.  If you would rather create other types of music, you can download midi file collections from a website.  One such website is [MidiWorld](http://www.midiworld.com/files/).  You can download a collection of midi files from this website and use them to create music.  The code below shows how to download a collection of midi files from MidiWorld and use them to create music.

Below, we demonstrate downloading files from the Maestro database, a collection of classical piano pieces.   This show how you could load other types of music into the model.


### Download the Maestro Dataset
[Code below](https://www.tensorflow.org/tutorials/audio/music_generation) is taken from the tensorflow tutorial on music generation. It downloads the Maestro Dataset, which could be used to generate classical piano pieces rather than using the Groove dataset to generate drumming pieces.

In [None]:
# ONLY RUN if you want to download the maestro dataset
#   instead, you can use the Groove dataset, which is already extracted

# Choose the version of the dataset you want to use
maestro_file = 'maestro-v3.0.0-midi'
version = 'v3.0.0'
# Choose the location to download the dataset
cur_dir = '/home/jenny/Downloads'
data_dir = pathlib.Path(cur_dir + maestro_file)
print(data_dir)

# Download the dataset if it doesn't already exist
if not data_dir.exists():  
  tf.keras.utils.get_file(
      maestro_file + '.zip]',
      origin='https://storage.googleapis.com/magentadata/datasets/maestro/'+ version + '/' + maestro_file + '.zip',
      extract=True,
      cache_dir='.', cache_subdir='data',
  )
else:
  print("Data already downloaded")

In [None]:
# ONLY RUN if you want to extract the maestro dataset
#   instead, you can use the Groove dataset, which is already extracted
import patoolib
complete_maestro_file = pathlib.Path(cur_dir + '/' + maestro_file)
complete_maestro_zip = pathlib.Path(cur_dir + '/' + maestro_file + '.zip')
print(complete_maestro_file)
if not complete_maestro_file.exists():
    patoolib.extract_archive(complete_maestro_zip, outdir=data_dir)
    print("Maestro dataset extracted to " + str(cur_dir))

In [None]:
# ONLY RUN THIS FILE IF YOU WANT TO DOWNLOAD THE MAESTRO DATASET
#  Otherwise, you can use the Groove dataset, which is already included in the repo
print(complete_maestro_file)
filenames = glob.glob(str(complete_maestro_file/'**/**/*.mid*'))
print(filenames)
print('Number of files:', len(filenames))


In [None]:
sample_file = filenames[0]
print(sample_file)
pm = pretty_midi.PrettyMIDI(sample_file)


In [None]:
_SAMPLING_RATE = 16000

def display_audio(pm: pretty_midi.PrettyMIDI, seconds=30):
  waveform = pm.fluidsynth(fs=_SAMPLING_RATE)
  # Take a sample of the generated waveform to mitigate kernel resets
  waveform_short = waveform[:seconds*_SAMPLING_RATE]
  return display.Audio(waveform_short, rate=_SAMPLING_RATE)

display_audio(pm)


<a name='references'></a>
## References

- DeepLearning.AI Deep Learning Specialization.  This project is an expanded version of a class assignment from Course 5 of the specialization.
- [TensorFlow Tutorial](https://www.tensorflow.org/tutorials/audio/music_generation)
- [Maestro Dataset](https://magenta.tensorflow.org/datasets/maestro)
