<table align="center">
  <td align="center">
    <a target="_blank" href="http://inspiredk.org">
    <img align="center" src="https://i.ibb.co/Z6HZPSbH/Inspired-K-org-Logo-No-Whitespace-Extra-Small.png">InspiredK.org Website</a>
  </td>
  
  <td align="center">
    <a target="_blank" href="https://colab.research.google.com/github/InspiredK-organization/MITintrotodeeplearning/blob/master/lab1/solutions/Lab1.4 - Music Generation with RNNs and PyTorch Solution.ipynb">
    <img align="center" src="https://i.ibb.co/2P3SLwK/colab.png"/>Run in Google Colab</a>
  </td>
</table>

# Copyright Information

In [None]:
# Copyright 2025 MIT Introduction to Deep Learning. All Rights Reserved.
#
# Licensed under the MIT License. You may not use this file except in compliance
# with the License. Use and/or modification of this code outside of MIT Introduction
# to Deep Learning must reference:
#
# Â© MIT Introduction to Deep Learning
# http://introtodeeplearning.com
#
# Original lab is adopted from http://introtodeeplearning.com
# Lab is edited by http://InspiredK.org

# Lab 1: Intro to PyTorch and Music Generation with RNNs

# Part 2: Music Generation with RNNs

In this portion of the lab, we will explore building a Recurrent Neural Network (RNN) for music generation using PyTorch. We will train a model to learn the patterns in raw sheet music in [ABC notation](https://en.wikipedia.org/wiki/ABC_notation) and then use this model to generate new music.

## 2.1 Dependencies
First, let's download the course repository, install dependencies, and import the relevant packages we'll need for this lab.

In [None]:
# Import PyTorch and other relevant libraries
import torch
import torch.nn as nn
import torch.optim as optim

# Download and import the MIT Introduction to Deep Learning package
!pip install mitdeeplearning --quiet
import mitdeeplearning as mdl

import numpy as np # For nparrays
import os # For filepath joining
import time # Dependency for timing of different tasks
import functools # Dependency for functions that output other functions
from IPython import display as ipythondisplay # For built-in song playback
from tqdm import tqdm # For textual progress bars
from scipy.io.wavfile import write # Dependency for creation of downloadable music files
!apt-get install abcmidi timidity > /dev/null 2>&1 # Dependencies for conversion between text and audio music formats

# Check that you are using a GPU. If not, switch runtimes using Runtime > Change Runtime Type > GPU
assert torch.cuda.is_available(), "Please enable GPU from runtime settings"

## 2.2 Dataset

![Let's Dance!](http://33.media.tumblr.com/3d223954ad0a77f4e98a7b87136aa395/tumblr_nlct5lFVbF1qhu7oio1_500.gif)

We've gathered a dataset of thousands of Irish folk songs, represented in the ABC notation. Let's download the dataset and inspect it:


In [None]:
# Download the dataset
songs = mdl.lab1.load_training_data()

# Print one of the songs to inspect it in greater detail!
example_song = songs[0]
print("\nExample song: ")
print(example_song)

We can easily convert a song in ABC notation to an audio waveform and play it back. Be patient for this conversion to run, it can take some time.

In [None]:
# Convert the ABC notation to audio file and listen to it.
mdl.lab1.play_song(example_song)

One important thing to think about is that this notation of music does not simply contain information on the notes being played, but additionally there is meta information such as the song title, key, and tempo. How does the number of different characters that are present in the text file impact the complexity of the learning problem? This will become important soon, when we generate a numerical representation for the text data.

In [None]:
# Join our list of song strings into a single string containing all songs.
songs_joined = "\n\n".join(songs)

# Find all unique characters in the joined string.
vocab = sorted(set(songs_joined))
print("There are", len(vocab), "unique characters in the dataset")

## 2.3 Process the dataset for the learning task

Let's take a step back and consider our prediction task. We're trying to train an RNN model to learn patterns in ABC music, and then use this model to generate (i.e., predict) a new piece of music based on this learned information.

Breaking this down, what we're really asking the model is: given a character, or a sequence of characters, what is the most probable next character? We'll train the model to perform this task.

To achieve this, we will input a sequence of characters to the model, and train the model to predict the output, that is, the following character at each time step. RNNs maintain an internal state that depends on previously seen elements, so information about all characters seen up until a given moment will be taken into account in generating the prediction.

### Vectorize the text

Before we begin training our RNN model, we'll need to create a numerical representation of our text-based dataset. To do this, we'll generate two lookup tables: one that maps characters to numbers, and a second that maps numbers back to characters. Recall that we just identified the unique characters present in the text.


In [None]:
### Define numerical representation of text ###

# Create a mapping from character to unique index.
# For example, to get the index of the character "d", we can use `char2idx["d"]`.
char2idx = {u:i for i, u in enumerate(vocab)}

# Create a mapping from indices to characters. This is the inverse of char2idx and allows us to convert back from unique index to the character in our vocabulary.
idx2char = np.array(vocab)

This gives us an integer representation for each character. Observe that the unique characters (i.e., our vocabulary) in the text are mapped as indices from 0 to `len(unique)`. Let's take a peek at this numerical representation of our dataset:

In [None]:
print('{')
for char, _ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char])) # Display each character with its respective index.
print('  ...\n}')

In [None]:
### Vectorize the songs string ###

'''TODO: Write a function to convert the all songs string to a vectorized (i.e., numeric) representation. Use the appropriate mapping above to convert from vocab characters to the corresponding indices.
   NOTE: the output of the `vectorize_string` function should be a np.array with `N` elements, where `N` is the number of characters in the input string
'''
def vectorize_string(string):
  vectorized_output = np.array([char2idx[char] for char in string]) # Use numpy to store vectorized string as nparray.
  return vectorized_output

# def vectorize_string(string):
  # TODO

vectorized_songs = vectorize_string(songs_joined)

We can also look at how the first part of the text is mapped to an integer representation:


In [None]:
print ('{} ---- characters mapped to int ----> {}'.format(repr(songs_joined[:10]), vectorized_songs[:10])) # Visualize string to vectorized string transformation.
assert isinstance(vectorized_songs, np.ndarray), "returned result should be a numpy array"

### Create training examples and targets

Our next step is to actually divide the text into example sequences that we'll use during training. Each input sequence that we feed into our RNN will contain `seq_length` characters from the text. We'll also need to define a target sequence for each input sequence, which will be used in training the RNN to predict the next character. For each input, the corresponding target will contain the same length of text, except shifted one character to the right.

To do this, we'll break the text into chunks of `seq_length+1`. Suppose `seq_length` is 4 and our text is "Hello". Then, our input sequence is "Hell" and the target sequence is "ello".

The batch method will then let us convert this stream of character indices to sequences of the desired size.


In [None]:
### Batch definition to create training examples ###

def get_batch(vectorized_songs, seq_length, batch_size):
  # Get the length of the vectorized song string.
  n = vectorized_songs.shape[0] - 1
  # Randomly choose `batch_size` starting indices from a range of 0 to `n-seq_length` for the training batch.
  idx = np.random.choice(n-seq_length, batch_size)

  '''TODO: Construct a list of input sequences for the training batch.'''
  input_batch = [vectorized_songs[i : i+seq_length] for i in idx] # Input sequences to the model
  # input_batch = # TODO
  '''TODO: Construct a list of output sequences for the training batch.'''
  output_batch = [vectorized_songs[i+1 : i+seq_length+1] for i in idx] # Expected output sequences the model should predict
  # output_batch = # TODO

  # x_batch and y_batch provide the inputs and expected outputs for network training in the correct shape.
  x_batch = np.reshape(input_batch, [batch_size, seq_length])
  y_batch = np.reshape(output_batch, [batch_size, seq_length])
  return x_batch, y_batch

# Perform some tests to make sure the batch function is working properly.
test_args = (vectorized_songs, 10, 2)
if not mdl.lab1.test_batch_func_types(get_batch, test_args) or \
   not mdl.lab1.test_batch_func_shapes(get_batch, test_args) or \
   not mdl.lab1.test_batch_func_next_step(get_batch, test_args):
   print("======\n[FAIL] could not pass tests")
else:
   print("======\n[PASS] passed all tests!")

For each of these vectors, each index is processed at a single time step. So, for the input at time step 0, the model receives the index for the first character in the sequence, and tries to predict the index of the next character. At the next timestep, it does the same thing, but the RNN considers the information from the previous step, i.e., its updated state, in addition to the current input.

We can make this concrete by taking a look at how this works over the first several characters in our text:

In [None]:
x_batch, y_batch = get_batch(vectorized_songs, seq_length=5, batch_size=1) # Create a test batch with 5 characters.

for i, (input_idx, target_idx) in enumerate(zip(x_batch[0], y_batch[0])): # Visualize each character in the x and y batches.
    print("Step {:3d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(str(idx2char[input_idx]))))
    print("  expected output: {} ({:s})".format(target_idx, repr(str(idx2char[target_idx]))))

## 2.4 The Recurrent Neural Network (RNN) model

Now we're ready to define and train an RNN model on our ABC music dataset, and then use that trained model to generate a new song. We'll train our RNN using batches of song snippets from our dataset, which we generated in the previous section.

The model is based off the LSTM architecture, where we use a state vector to maintain information about the temporal relationships between consecutive characters. The final output of the LSTM is then fed into a fully connected linear [`nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) layer where we'll output a softmax over each character in the vocabulary, and then sample from this distribution to predict the next character.

As we introduced in the first portion of this lab, we'll be using PyTorch's [`nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) to define the model. Three components are used to define the model:

* [`nn.Embedding`](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html): This is the input layer, consisting of a trainable lookup table that maps the numbers of each character to a vector with `embedding_dim` dimensions.
* [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html): Our LSTM network, with size `hidden_size`.
* [`nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html): The output layer, with `vocab_size` outputs.

<img src="https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2019/lab1/img/lstm_unrolled-01-01.png" alt="Drawing"/>



<!--
Now we're ready to define and train a RNN model on our ABC music dataset, and then use that trained model to generate a new song. We'll train our RNN using batches of song snippets from our dataset, which we generated in the previous section.

The model is based off the LSTM architecture, where we use a state vector to maintain information about the temporal relationships between consecutive characters. The final output of the LSTM is then fed into a fully connected [`Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layer where we'll output a softmax over each character in the vocabulary, and then sample from this distribution to predict the next character.

As we introduced in the first portion of this lab, we'll be using the Keras API, specifically, [`tf.keras.Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential), to define the model. Three layers are used to define the model:

* [`tf.keras.layers.Embedding`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding): This is the input layer, consisting of a trainable lookup table that maps the numbers of each character to a vector with `embedding_dim` dimensions.
* [`tf.keras.layers.LSTM`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM): Our LSTM network, with size `units=rnn_units`.
* [`tf.keras.layers.Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense): The output layer, with `vocab_size` outputs.


<img src="https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2019/lab1/img/lstm_unrolled-01-01.png" alt="Drawing"/> -->

### Define the RNN model

Let's define our model as an `nn.Module`. Fill in the `TODOs` to define the RNN model.


In [None]:
### Defining the RNN Model ###

'''TODO: Add LSTM and Linear layers to define the RNN model using nn.Module'''
class LSTMModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_size):
        super(LSTMModel, self).__init__()
        self.hidden_size = hidden_size

        # Layer 1: The embedding layer to transform indices into dense vectors of a fixed embedding size.
        self.embedding = nn.Embedding(vocab_size, embedding_dim)

        # Layer 2: The LSTM layer with hidden_size `hidden_size`. Note: the number of layers defaults to 1.
        # TODO: Use the nn.LSTM() module from PyTorch.
        self.lstm = nn.LSTM(embedding_dim, hidden_size, batch_first=True) # Long short-term memory (LSTM) is good for tasks that require a long context memory.
        # self.lstm = nn.LSTM('''TODO''')

        # Layer 3: The linear (fully-connected) layer that transforms the LSTM output into the vocabulary size.
        # TODO: Add the Linear layer.
        self.fc = nn.Linear(hidden_size, vocab_size)
        # self.fc = nn.Linear('''TODO''')

    def init_hidden(self, batch_size, device):
        # Initialize the hidden state and cell state with all zeros.
        return (torch.zeros(1, batch_size, self.hidden_size).to(device),
                torch.zeros(1, batch_size, self.hidden_size).to(device))

    # Pass an input of x all the way through the model to get a probability distribution output.
    def forward(self, x, state=None, return_state=False):
        x = self.embedding(x) # Embed the input using our embedding layer.

        if state is None: # If the hidden state has not been created yet,
            state = self.init_hidden(x.size(0), x.device) # Then initialize it with our previous function.
        out, state = self.lstm(x, state) # Use the LSTM to update the state and get an output.

        out = self.fc(out) # Get the final output from the linear layer.
        return out if not return_state else (out, state)

The time has come! Let's instantiate the model!

In [None]:
# Build a simple model with default hyperparameters. You will get the chance to change these later.
vocab_size = len(vocab)
embedding_dim = 256
hidden_size = 1024
batch_size = 32

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Make sure our model is using a GPU.

model = LSTMModel(vocab_size, embedding_dim, hidden_size).to(device) # Initialize the model using the GPU.

# Print out a summary of the model.
print(model)

### Test out the RNN model

It's always a good idea to run a few simple checks on our model to see that it behaves as expected.  

We can quickly check the layers in the model, the shape of the output of each of the layers, the batch size, and the dimensionality of the output. Note that the model can be run on inputs of any length.

In [None]:
# Test the model with some sample data
x, y = get_batch(vectorized_songs, seq_length=100, batch_size=32) # Get a batch with 32 sequences of 100 characters each for a test prediction.
# Put the input on our GPU to be passed to the model.
x = torch.tensor(x).to(device)
y = torch.tensor(y).to(device)

pred = model(x) # Pass the input batch into the model for prediction.
print("Input shape:      ", x.shape, " # (batch_size, sequence_length)")
print("Prediction shape: ", pred.shape, "# (batch_size, sequence_length, vocab_size)")

### Predictions from the untrained model

Let's take a look at what our untrained model is predicting.

To get actual predictions from the model, we sample from the output distribution, which is defined by a torch.softmax over our character vocabulary. This will give us actual character indices. This means we are using a [categorical distribution](https://en.wikipedia.org/wiki/Categorical_distribution) to sample over the example prediction. This gives a prediction of the next character (specifically its index) at each timestep. [`torch.multinomial`](https://pytorch.org/docs/stable/generated/torch.multinomial.html#torch.multinomial) samples over a categorical distribution to generate predictions.

Note here that we sample from this probability distribution, as opposed to simply taking the `argmax`, which can cause the model to get stuck in a repetitive loop.

Let's try this sampling out for the first example in the batch.

In [None]:
sampled_indices = torch.multinomial(torch.softmax(pred[0], dim=-1), num_samples=1)
sampled_indices = sampled_indices.squeeze(-1).cpu().numpy()
sampled_indices # Use a probability distribution to find the characters with the highest prediction probability.

We can now decode these to see the text predicted by the untrained model:

In [None]:
print("Input: \n", repr("".join(idx2char[x[0].cpu()]))) # An example input of the first sequence from our batch of 32.
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices]))) # The predicted output

As you can see, the text predicted by the untrained model is pretty nonsensical! How can we do better? Well, we can train the network!

## 2.5 Training the model: loss and training operations

Now it's time to train the model!

At this point, we can think of our next character prediction problem as a standard classification problem. Given the previous state of the RNN, as well as the input at a given time step, we want to predict the class of the next character -- that is, to actually predict the next character.

To train our model on this classification task, we can use a form of the `crossentropy` loss (i.e., negative log likelihood loss). Specifically, we will use PyTorch's [`CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html), as it combines the application of a log-softmax ([`LogSoftmax`](https://pytorch.org/docs/stable/generated/torch.nn.LogSoftmax.html#torch.nn.LogSoftmax)) and negative log-likelihood ([`NLLLoss`](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss) in a single class and accepts integer targets for categorical classification tasks. We will want to compute the loss using the true targets -- the `labels` -- and the predicted targets -- the `logits`.

Let's define a function to compute the loss, and then use that function to compute the loss using our example predictions from the untrained model.

In [None]:
### Defining the loss function ###

'''TODO: Define the compute_loss function to compute and return the loss between the true labels and predictions (logits). '''
cross_entropy = nn.CrossEntropyLoss() # This type of loss is highly efficient and simple yet also accurate, making it the best option for our task.
def compute_loss(labels, logits):
    """Inputs:
    * (batch_size, sequence_length) - labels
    * (batch_size, sequence_length, vocab_size) - logits

    Output:
    * scalar cross entropy loss over the batch and sequence length - loss
    """

    # Put the labels into batches so that their shape is (B * L).
    batched_labels = labels.view(-1)

    # Put the logits into batches so that their shape is (B * L, V).
    batched_logits = logits.view(-1, logits.size(-1))

    '''TODO: Compute the cross-entropy loss using the batched next characters and predictions. Hint: It is define directly above the function.'''
    loss = cross_entropy(batched_logits, batched_labels)
    # loss = # TODO
    return loss

In [None]:
# Compute the loss on the predictions from the untrained model.

'''TODO: Compute the loss using the true next characters from the example batch
    and the predictions from the untrained model we created before the loss function.'''
example_batch_loss = compute_loss(y, pred)
# example_batch_loss = compute_loss('''TODO''', '''TODO''') # TODO

print(f"Prediction shape: {pred.shape} # (batch_size, sequence_length, vocab_size)")
print(f"scalar_loss:      {example_batch_loss.mean().item()}")

Let's start by defining some hyperparameters for training the model. To start, we have provided some reasonable values for some of the parameters. It is up to you to use what we've learned in class to help optimize the parameter selection here!

In [None]:
### Hyperparameter setting and optimization ###

vocab_size = len(vocab)

# Model parameters:
params = dict(
  num_training_iterations = 5000,  # Increase this to train longer
  batch_size = 16,  # Experiment between 1 and 64
  seq_length = 100,  # Experiment between 50 and 500
  learning_rate = 1e-4,  # Experiment between 1e-5 and 1e-1
  embedding_dim = 256,
  rnn_units = 1024,  # Experiment between 1 and 2048
)

# Create a file to store our model's weights.
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "my_ckpt.weights.h5")
os.makedirs(checkpoint_dir, exist_ok=True)

Now, we are ready to define our training operation -- the optimizer and duration of training -- and use this function to train the model. You will experiment with the choice of optimizer and the duration for which you train your models, and see how these changes affect the network's output. Some optimizers you may like to try are [`Adam`](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) and [`Adagrad`](https://pytorch.org/docs/stable/generated/torch.optim.Adagrad.html).

First, we will instantiate a new model and an optimizer, and ready them for training. Then, we will use [`loss.backward()`](https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html), enabled by PyTorch's [autograd](https://pytorch.org/docs/stable/generated/torch.autograd.grad.html) method, to perform the backpropagation. Finally, to update the model's parameters based on the computed gradients, we will utake a step with the optimizer, using [`optimizer.step()`](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.step.html).

We will also generate a print-out of the model's progress through training, which will help us easily visualize whether or not we are minimizing the loss.

In [None]:
### Define optimizer and training operation ###

'''TODO: instantiate a new LSTMModel model for training using the hyperparameters
    created above.'''
model = LSTMModel(vocab_size, params["embedding_dim"], params["rnn_units"])
# model = LSTMModel('''TODO: arguments''')

# Move the model to the GPU.
model.to(device)

'''TODO: Create an optimizer with the set learning rate.
  The PyTorch website has a list of all the optimizers available, try some others to see what works best.
  https://pytorch.org/docs/stable/optim.html
  Try using the Adam optimizer to start.'''
optimizer = torch.optim.Adam(model.parameters(), lr=params["learning_rate"])
# optimizer = # TODO

def train_step(x, y):
  # Set the model's mode to the training stage.
  model.train()

  # Reset all gradients for every training step to ensure nothing interferes with our current step.
  optimizer.zero_grad()

  '''TODO: Feed the provided input into the model and generate predictions'''
  y_hat = model(x) # TODO
  # y_hat = model('''TODO''')

  '''TODO: Compute the loss based on these predictions.'''
  loss = compute_loss(y, y_hat) # TODO
  # loss = compute_loss('''TODO''', '''TODO''')

  '''TODO: Complete the gradient computation and update the model using the optimizer and the gradient.
    The steps to do this are:
      1. Backpropagating the loss with .backward()
      2. Update the model parameters using the optimizer with .step()
  '''
  loss.backward() # TODO
  optimizer.step() # TODO

  return loss

##################
# Begin training!#
##################

history = [] # Create a list to store all losses for graphing.
plotter = mdl.util.PeriodicPlotter(sec=2, xlabel='Iterations', ylabel='Loss') # Initialize the graph that will continuously update for every new loss.

if hasattr(tqdm, '_instances'): tqdm._instances.clear() # Clear any previous progress bars if they exist.
for iter in tqdm(range(params["num_training_iterations"])): # Create a new progress bar to track training progress.

    # Grab a batch and propagate it through the network.
    x_batch, y_batch = get_batch(vectorized_songs, params["seq_length"], params["batch_size"])

    # Convert numpy arrays to PyTorch tensors to be compatible with the model.
    x_batch = torch.tensor(x_batch).to(device)
    y_batch = torch.tensor(y_batch).to(device)

    loss = train_step(x_batch, y_batch) # Use the batch to optimize the weights using the previous function.

    # Update the progress bar and also visualize within notebook.
    history.append(loss.item()) # Add the computed loss to the graphing list.
    plotter.plot(history) # Plot the new graph for the new loss.

    # Update the model with the changed weights!
    if iter % 100 == 0:
        torch.save(model.state_dict(), checkpoint_prefix) # Every 100 training iterations, save the weights.
                                              # If the training is interrupted, the best weights will be safe and can be used for music generation.

# Save the trained model and the weights.
torch.save(model.state_dict(), checkpoint_prefix) # If training is not interrupted, save the final weights. They will be used for music generation later.

## 2.6 Generate music using the RNN model

Now, we can use our trained RNN model to generate some music! When generating music, we'll have to feed the model some sort of seed to get it started (because it can't predict anything without something to start with!).

Once we have a generated seed, we can then iteratively predict each successive character (remember, we are using the ABC representation for our music) using our trained RNN. More specifically, recall that our RNN outputs a `softmax` over possible successive characters. For inference, we iteratively sample from these distributions, and then use our samples to encode a generated song in the ABC format.

Then, all we have to do is write it to a file and listen!

### The prediction procedure

Now, we're ready to write the code to generate text in the ABC music format:

* Initialize a "seed" start string and the RNN state, and set the number of characters we want to generate.

* Use the start string and the RNN state to obtain the probability distribution over the next predicted character.

* Sample from multinomial distribution to calculate the index of the predicted character. This predicted character is then used as the next input to the model.

* At each time step, the updated RNN state is fed back into the model, so that it now has more context in making the next prediction. After predicting the next character, the updated RNN states are again fed back into the model, which is how it learns sequence dependencies in the data, as it gets more information from the previous predictions.

![LSTM inference](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2019/lab1/img/lstm_inference.png)

Complete and experiment with this code block (as well as some of the aspects of network definition and training!), and see how the model performs. How do songs generated after training with a small number of epochs compare to those generated after a longer duration of training?

In [None]:
### Prediction of a generated song ###

def generate_text(model, start_string, generation_length=1000):
  # Evaluation step (generating ABC text using the learned RNN model)

  '''TODO: Convert the start string to numbers (vectorize)'''
  input_eval = [char2idx[s] for s in start_string] # Vectorize the given start string.
  # input_eval = ['''TODO''']
  input_eval = torch.tensor([input_eval], dtype=torch.long).to(device) # Change input shape for compatibility with the generative model.

  text_generated = [] # Create an empty list to store all generate characters.

  state = model.init_hidden(input_eval.size(0), device) # Makes sure that previous LSTM predictions don't interfere with this one.
  tqdm._instances.clear() # Clear any previous progress bars.

  for i in tqdm(range(generation_length)): # Create a new progress bar to track generation progress.
    '''TODO: Evaluate the inputs and generate the next character predictions.'''
    predictions, hidden_state = model(input_eval, state, return_state=True) # Get model predictions based on the initial input string.
    # predictions, hidden_state = model('''TODO''', '''TODO''', return_state=True)

    predictions = predictions.squeeze(0) # Change output shape for compatibility with output.

    '''TODO: Use a multinomial distribution to sample.'''
    input_eval = torch.multinomial(torch.softmax(predictions, dim=-1), num_samples=1) # Similar to before, use a probability distribution to find the characters with the highest prediction probability.
    # input_eval = torch.multinomial(torch.softmax('''TODO''', dim=-1), num_samples=1)

    '''TODO: Add the predicted character to the generated text.'''
    # Hint: Consider what format the prediction is in vs. the output.
    text_generated.append(idx2char[input_eval].item()) # Also add the predicted character to the generation list.
    # text_generated.append('''TODO''')

  return (start_string + ''.join(text_generated))  # Once generation is finished, output the entire generated text.

In [None]:
'''TODO: Use the model and the function defined above to generate ABC format text of length 1000!
    As you may notice, ABC files start with "X" - this may be a good start string.'''
generated_text = generate_text(model, start_string="X", generation_length=1000) # Generate 1000 characters with a start string of "X."
                                                                                # "X" is usually at the start of our music files, making it a good choice.
# generated_text = generate_text('''TODO''', start_string="X", generation_length=1000)

### Play back the generated music!

We can now call a function to convert the ABC format text to an audio file, and then play that back to check out our generated music! Try training longer if the resulting song is not long enough, or re-generating the song!

We will save the song to the Files area in Google Colab -- you will be able to find your songs in the left sidebar by clicking on the folder icon.

In [None]:
### Play back generated songs ###

generated_songs = mdl.lab1.extract_song_snippet(generated_text) # Separate the generated text into individual songs.

for i, song in enumerate(generated_songs):
  waveform = mdl.lab1.play_song(song) # Convert the song from text format to an audio file.

  if waveform: # If the song is in the correct format, play it.
    print("Generated song", i) # Identify each song with a number.
    ipythondisplay.display(waveform) # Display the audio file in the output to be played without the need for downloading.

    # Save the song to Google Colab's files if you would like to download your songs.
    numeric_data = np.frombuffer(waveform.data, dtype=np.int16)
    wav_file_path = f"output_{i}.wav"
    write(wav_file_path, 88200, numeric_data)

## 2.7 Experiment and try to make the best songs!

Congrats on making your first sequence model in PyTorch! It's a pretty big accomplishment, and hopefully you have some sweet tunes to show for it.

Consider how you may improve your model and what seems to be most important in terms of performance. Here are some ideas to get you started:

*  How does the number of training epochs affect the performance?
*  What if you alter or augment the dataset?
*  Does the choice of start string significantly affect the result?

Have fun and happy listening!

![Let's Dance!](http://33.media.tumblr.com/3d223954ad0a77f4e98a7b87136aa395/tumblr_nlct5lFVbF1qhu7oio1_500.gif)
