# Lab 1 : Intro to TensorFlow and Music Generation with RNNs

# Part 2: Music Generation with RNNs

In this portion of the lab, we will explore building a Recurrent Neural Network (RNN) for music generation. We will train a model to learn the patterns in raw sheet music in [ABC notation](https://en.wikipedia.org/wiki/ABC_notation), and then use this model to generate new music. 

## 2.1 Dependencies 
First, let's download the course repository, install dependencies, and import the relevant packages we'll need for this lab.

In [0]:
!wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64 -O cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
!dpkg -i cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
!apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
!apt-get update
!apt-get install cuda
!pip install tf-nightly-gpu-2.0-preview

In [1]:
! git clone https://github.com/aamini/introtodeeplearning_labs.git
% cd introtodeeplearning_labs
! git checkout 2019
% cd ..

Cloning into 'introtodeeplearning_labs'...
remote: Enumerating objects: 66, done.[K
remote: Counting objects: 100% (66/66), done.[K
remote: Compressing objects: 100% (58/58), done.[K
remote: Total 315 (delta 41), reused 13 (delta 7), pack-reused 249[K
Receiving objects: 100% (315/315), 49.48 MiB | 30.49 MiB/s, done.
Resolving deltas: 100% (121/121), done.
/content/introtodeeplearning_labs
Branch '2019' set up to track remote branch '2019' from 'origin'.
Switched to a new branch '2019'
/content


In [0]:
from introtodeeplearning_labs.lab1.util import util as util


In [3]:
import tensorflow as tf 
tf.enable_eager_execution()

import numpy as np
import os
import time

is_correct_tf_version = '1.13.' in tf.__version__
assert is_correct_tf_version, "Wrong tensorflow version ({}) installed".format(tf.__version__)

is_eager_enabled = tf.executing_eagerly()
assert is_eager_enabled,      "Tensorflow eager mode is not enabled"

AssertionError: ignored

In [0]:
!pip install --force https://github.com/chengs/tqdm/archive/colab.zip

In [0]:
from tqdm import tqdm_notebook as tqdm


## 2.2 Dataset
 We've gathered a dataset of thousands of [Irish folk songs](https://www.youtube.com/watch?v=2Z_TheGgFWI), represented in the ABC notation. Let's download the dataset: 

In [4]:
path_to_file = tf.keras.utils.get_file('irish.abc', 'https://raw.githubusercontent.com/aamini/introtodeeplearning_labs/2019/lab1/data/irish.abc')

Downloading data from https://raw.githubusercontent.com/aamini/introtodeeplearning_labs/2019/lab1/data/irish.abc


### Inspect the dataset

We can take a look to get a better sense of the dataset:

In [5]:
text = open(path_to_file).read()
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Length of text: 197618 characters


We can grabe a song from our dataset as an example and play it back: 

TODO: will have a function here that grabs a song and then plays it back as an example

In [6]:
# Take a look at the first 250 characters in text
print(text[:250])

X:1
T:Alexander's
Z: id:dc-hornpipe-1
M:C|
L:1/8
K:D Major
(3ABc|dAFA DFAd|fdcd FAdf|gfge fefd|(3efe (3dcB A2 (3ABc|!
dAFA DFAd|fdcd FAdf|gfge fefd|(3efe dc d2:|!
AG|FAdA FAdA|GBdB GBdB|Acec Acec|dfaf gecA|!
FAdA FAdA|GBdB GBdB|Aceg fefd|(3efe dc d2:


One important thing to think about is how many different characters are present in the text file. This will become important soon, when we generate a numerical representation for the text data:

In [7]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

83 unique characters


<!-- TODO: here explanation of the one-hot encoding, getting the unique characters in the file -->

## 2.3 Process the dataset for the learning task

Let's take a step back and consider our prediction task. We're trying to train a RNN model to learn patterns in ABC music, and then use this model to generate (i.e., predict) a new piece of music based on this learned information. 

Breaking this down, what we're really asking the model is: given a character, or a sequence of characters, what is the most probable next character? We'll train the model to perform this task. 

To achieve this, we will input a sequence of characters to the model, and train the model to predict the output, that is, the following character at each time step. RNNs maintain an internal state that depends on previously seen elements, so we information about all characters seen up until a given moment will be taken into account in generating the prediction.


### Vectorize the text

Before we begin training our RNN model, we'll need to create a numerical representation of our text-based dataset. To do this, we'll generate two lookup tables: one that maps characters to numbers, and a second that maps numbers back to characters. Recall that we just identified the unique characters present in the text.

In [0]:
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
text_as_int = np.array([char2idx[c] for c in text])

'''TODO: Create a mapping from indices to characters'''
idx2char = np.array(vocab)

This gives us an integer representation for each character. Observe that the unique characters (i.e., our vocabulary) in the text are mapped as indices from 0 to `len(unique)`. Let's take a peek at this numerical representation of our dataset:

In [9]:
print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

{
  '\n':   0,
  '!' :   2,
  ' ' :   1,
  '#' :   4,
  '"' :   3,
  "'" :   5,
  ')' :   7,
  '(' :   6,
  '-' :   9,
  ',' :   8,
  '/' :  11,
  '.' :  10,
  '1' :  13,
  '0' :  12,
  '3' :  15,
  '2' :  14,
  '5' :  17,
  '4' :  16,
  '7' :  19,
  '6' :  18,
  ...
}


We can also look at how the first part of the text is mapped to an integer representation:

In [10]:
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))

'X:1\nT:Alexand' ---- characters mapped to int ---- > [49 22 13  0 45 22 26 67 60 79 56 69 59]


### Create training examples and targets

Our next step is to actually divide the text into example sequences that we'll use during training. Each input sequence that we feed into our RNN will contain `seq_length` characters from the text. We'll also need to define a target sequence for each input sequence, which will be used in training the RNN to predict the next character. For each input, the corresponding target will contain the same length of text, except shifted one character to the right.

To do this, we'll break the text into chunks of `seq_length+1`. Suppose `seq_length` is 4 and our text is "Hello". Then, our input sequence is "Hell", and the target sequence "ello".


First, use the [`tf.data.Dataset.from_tensor_slices`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensor_slices) function to convert the text vector into a stream of character indices. This is a function within [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) which is generally useful for importing data.

The [`batch`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch) method will then let us convert this stream of character indices to sequences of the desired size.

In [0]:
# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//seq_length

# Create training examples / targets
# Note how we are using the `tf.data` module!
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

'''TODO: use the batch function to generate sequences of the desired size'''
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)


Next, we need to define the input and target texts for each sequence. 

Define a function to do this, and then use the [`map`](http://book.pythontips.com/en/latest/map_filter.html) method to apply a simple function to each batch. 

In [0]:
'''TODO: define a function that takes a sequence (chunk) and outputs both the input text and target text sequences'''
'''Hint: consider the "Hello" example'''
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

'''TODO: use the map method to apply your function to the list of sequences to generate the dataset!'''
dataset = sequences.map(split_input_target)

For each of these vectors, each index is processed at a single time step. So, for the input at time step 0, the model receives the index for the first character in the sequence, and tries to predict the index of the next character. At the next timestep, it does the same thing but the `RNN` considers the information from the previous step, i.e., it's updated state, in addition to the current input.

We can make this concrete by taking a look at how this works over the first several characters in our text:

In [17]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 49 ('X')
  expected output: 22 (':')
Step    1
  input: 22 (':')
  expected output: 13 ('1')
Step    2
  input: 13 ('1')
  expected output: 0 ('\n')
Step    3
  input: 0 ('\n')
  expected output: 45 ('T')
Step    4
  input: 45 ('T')
  expected output: 22 (':')


### Create training batches

Great! Now we have our text split into sequences of manageable size. But before we actually feed this data into our model, we'll [`shuffle`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle) the data (for the purpose of stochastic gradient descent) and then pack it into batches which will be used during training.

In [26]:
# Batch size 
BATCH_SIZE = 64
steps_per_epoch = examples_per_epoch//BATCH_SIZE

# Buffer size is similar to a queue size
# This defines a manageable data size to put into memory, where elements are shuffled
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

# Examine the dimensions of the dataset
dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

## 2.4 The Recurrent Neural Network (RNN) Model

We will now define and train a RNN model on our ABC music dataset, and then use that trained model to generate a new song. We will train our RNN using batches of song snippets from our dataset.

This model will be based off a single LSTM cell, with a state vector used to maintain temporal dependencies between consecutive music notes. At each time step, we feed in a sequence of previous notes. The final output of the LSTM (i.e., of the last unit) is fed in to a single fully connected layer to output a probability distribution over the next note. In this way, we model the probability distribution

Use `tf.keras.Sequential` to define the model. For this simple example three layers are used to define our model:

* `tf.keras.layers.Embedding`: The input layer. A trainable lookup table that will map the numbers of each character to a vector with `embedding_dim` dimensions;
* `tf.keras.layers.LSTM`: A type of RNN with size `units=rnn_units` 
* `tf.keras.layers.Dense`: The output layer, with `vocab_size` outputs.

TODO: make sure this matches the code!

In [0]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension 
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

network learns how to embed sequence in meaningful way. this embedding is then fed into the LSTM
outside the LSTM, feed into dense layer which outputs a softmax over the vocab size

Next define a function to build the model.

Use `CuDNNLSTM` if running on GPU.  

In [18]:
tf.test.is_gpu_available()

True

In [0]:
if tf.test.is_gpu_available():
  rnn = tf.keras.layers.CuDNNLSTM
else:
  import functools
  rnn = functools.partial(
    tf.keras.layers.LSTM, recurrent_activation='sigmoid')

In [0]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, 
                              batch_input_shape=[batch_size, None]),
    rnn(rnn_units,
        return_sequences=True, 
        recurrent_initializer='glorot_uniform',
        stateful=True),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

In [0]:
model = build_model(
  vocab_size = len(vocab), 
  embedding_dim=embedding_dim, 
  rnn_units=rnn_units, 
  batch_size=BATCH_SIZE)


TODO: create student TODOs within build_model. Note return_sequences, recurrent_initializer, stateful the students may not know about... the main thing for the TODO could be the sizing (Embedding, rnn, Dense)

## Try the model

Now run the model to see that it behaves as expected.

First check the shape of the output:

In [29]:
for input_example_batch, target_example_batch in dataset.take(1): 
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(TensorShape([Dimension(64), Dimension(100), Dimension(83)]), '# (batch_size, sequence_length, vocab_size)')


In the above example the sequence length of the input is `100` but the model can be run on inputs of any length: 

In [30]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (64, None, 256)           21248     
_________________________________________________________________
cu_dnnlstm (CuDNNLSTM)       (64, None, 1024)          5251072   
_________________________________________________________________
dense (Dense)                (64, None, 83)            85075     
Total params: 5,357,395
Trainable params: 5,357,395
Non-trainable params: 0
_________________________________________________________________


Note about model summary: check the layers, the shape of the output of each of the layers, batch size, etc./


To get actual predictions from the model we need to sample from the output distribution, to get actual character indices. This distribution is defined by the logits over the character vocabulary. 

Note: It is important to _sample_ from this distribution as taking the _argmax_ of the distribution can easily get the model stuck in a loop.

Try it for the first example in the batch:

In [0]:
sampled_indices = tf.random.multinomial(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

This gives us, at each timestep, a prediction of the next character index:

In [32]:
sampled_indices

array([82, 76, 59, 40, 47, 39, 26, 78, 41, 31, 14, 67, 62, 59, 69, 79, 69,
       79, 29, 34, 32,  0, 30, 79,  7, 37,  9, 29, 33, 56,  0, 49, 53, 21,
       44, 19,  8, 13, 59,  4, 79, 73, 82, 31, 51,  2, 68, 23, 45, 18, 79,
       50, 47, 49, 38, 36, 28, 30, 23, 43, 28, 80, 53, 20,  3, 34, 74, 39,
       19, 22, 70, 30, 51,  5, 76, 19, 57, 33, 18, 78,  6, 72, 40, 10, 82,
       51, 55, 65, 74, 15, 78, 30, 42, 74, 36, 78, 54, 40,  1,  5])

Decode these to see the text predicted by this untrained model:

In [0]:
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

('Input: \n', '"3|]!\\n\\nX:100\\nT:Father Kelly\'s No. 1\\nZ: id:dc-reel-92\\nM:C\\nL:1/8\\nK:G Major\\nGA|B2GB AGEG|DGGF G2AB|cBAB "')
()
('Next Char Predictions: \n', '\'=\\\':XJnv3tS OX<2uz3G/\\nbnjO:(!S1:=)gCb/ldHe9S59Eu!Z\\nEt[swF^j#_4vw"/Ui.UXuFIb6ubFzZs3r<h]!j!hd:_3)64MMF\'')


## Train the model

At this point the problem can be treated as a standard classification problem. Given the previous RNN state, and the input this time step, predict the class of the next character.

In [36]:
def compute_loss(labels, logits):
  return tf.keras.backend.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = compute_loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)") 
print("scalar_loss:      ", example_batch_loss.numpy().mean())

('Prediction shape: ', TensorShape([Dimension(64), Dimension(100), Dimension(83)]), ' # (batch_size, sequence_length, vocab_size)')
('scalar_loss:      ', 4.4173813)


use sparse_categorical_crossentropy because it is categorical classification task
TODO: potentially incorporate class TODO here?

In [37]:

# Training step
EPOCHS = 5
optimizer = tf.train.AdamOptimizer()
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")


for epoch in range(EPOCHS):
    start = time.time()

    # initializing the hidden state at the start of every epoch
    # initally hidden is None
    hidden = model.reset_states()

    progress_bar = tqdm(enumerate(dataset))
    for (batch_n, (inp, target)) in progress_bar:

        with tf.GradientTape() as tape:
            # feeding the hidden state back into the model
            # This is the interesting step
            predictions = model(inp)
            loss = compute_loss(target, predictions)

        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))

        progress_bar.set_description("loss {0:.2f}".format(loss.numpy().mean()))

    model.save_weights(checkpoint_prefix.format(epoch=epoch))



## Generate text

Now want to do inference -- use batch size 1 to keep simple

### Restore the latest checkpoint


To keep this prediction step simple, use a batch size of 1.

Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built. 

To run the model with a different `batch_size`, we need to rebuild the model and restore the weights from the checkpoint.


In [0]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

In [39]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (1, None, 256)            21248     
_________________________________________________________________
cu_dnnlstm_1 (CuDNNLSTM)     (1, None, 1024)           5251072   
_________________________________________________________________
dense_1 (Dense)              (1, None, 83)             85075     
Total params: 5,357,395
Trainable params: 5,357,395
Non-trainable params: 0
_________________________________________________________________


### The prediction loop

The following code block generates the text:

* It Starts by choosing a start string, initializing the RNN state and setting the number of characters to generate.

* Get the prediction distribution of the next character using the start string and the RNN state.

* Then, use a multinomial distribution to calculate the index of the predicted character. Use this predicted character as our next input to the model.

* The RNN state returned by the model is fed back into the model so that it now has more context, instead than only one word. After predicting the next word, the modified RNN states are again fed back into the model, which is how it learns as it gets more context from the previously predicted words.


![To generate text the model's output is fed back to the input](https://tensorflow.org/tutorials/sequences/images/text_generation_sampling.png)

Looking at the generated text, you'll see the model knows when to capitalize, make paragraphs and imitates a Shakespeare-like writing vocabulary. With the small number of training epochs, it has not yet learned to form coherent sentences.

In [0]:
def generate_text(model, start_string, generation_length=1000):
  # Evaluation step (generating text using the learned model)

  # Converting our start string to numbers (vectorizing) 
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Here batch size == 1
  model.reset_states()
  for i in tqdm(range(generation_length)):
      predictions = model(input_eval)
      
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a multinomial distribution to predict the word returned by the model
      predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()
      
      # We pass the predicted word as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)
      
      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

In [49]:
# Experiment by changing the start string
print(generate_text(model, start_string="X"))

X8">L:llissdy id:hss
Z: L:dcdc-reel-f455
M:C
:1/8
K:G Major
K:G/G AcB Ac|cAd cBA|A3 D2:|!

X:372oTenos Novn
Z:
d:dce|afag g2ed:|!
e|gefg bgag|agbe fge2|dBBG BGAG|dBAG DBGB|AEFG ogf|d2Bg edg2|!
cdBd efga|efef e2f|BFB f2gufag gfa|fagb ed:|!
faa be2|fedg bef2|g2fd efge|gegf ggg:|!

X:18
T:Boste Shhesor
Z:16
L:1/8
K:G Mojor
GF2|c2fd efd|FAD DEF|EDA DFG ecBA|EDD G2:|!
GEG FADG|G3AG GGA2|ADFG EFA2:|!
fefg e2de|AcAF|GEGF E2AG|FABA d2d:|!
A2de a2ge|ced2 edfd|gfd^c d2:|!
A
BA|FAc2 (3aA|ced cA AF=|dBA cEA|EFE D2:|!

X:148
T:Gasopeoros
Z: i:dc--1iear
e,2
M:6/8
T: Maony
 F|F'D2 DFG|AFA ec|efd G2E|!
AGB c^dBc|e2a^ Aca^g|(faeg dgef|d4AB dBcA|!
d^cdd cdBA|(AedB defe|ffde fdAB|!
AAce^c2ccB ggfe|b2d2 gfd (eAFG E2dB|!
f3^d f2ge|edegB d2cB|GDcc d2B|!
gfaf agfg|gaaf agde|!
AGF2 G2B|efdc f2EB|cAdB cAFD|A2dBd geg2|!
B2cA ^ggg|f3g geed FCZ:1s/8
L:1/8
K:D Major
Z:2 itddc-
M:1/8
L:1/8
K:EMaMan
an cafe|E2Ac BcBd|fBGA FdcA|edgf (fdeg aga|fgf fef|!
d2ge g2ed|afaf gedf|afg^dB AABc|!
ege agf|de^a dfe|gaf efe|fee ed

The easiest thing you can do to improve the results it to train it for longer (try `EPOCHS=30`).

You can also experiment with a different start string, or try adding another RNN layer to improve the model's accuracy, or adjusting the temperature parameter to generate more or less random predictions.