<a href="https://colab.research.google.com/github/Mehulgoyal353/Music-Generation-using-RNN/blob/main/Music_generation_using_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Importing required libraries
This model is trained to learn the patterns in raw sheet music in [ABC notation](https://en.wikipedia.org/wiki/ABC_notation) and then use this model to generate new music.

Comet has been used to track the model development and training runs. The personal API key is generated after logging into [Comet ML](https://www.comet.com/docs/v2/).

In [None]:
!pip install comet_ml > /dev/null 2>&1
import comet_ml
COMET_API_KEY = ""

import tensorflow as tf
!pip install mitdeeplearning --quiet
import mitdeeplearning as mdl

import numpy as np
import os
import time
import functools
from IPython import display as ipythondisplay
from tqdm import tqdm
from scipy.io.wavfile import write
!apt-get install abcmidi timidity > /dev/null 2>&1

assert COMET_API_KEY != "hce3AJwsDNoJJq1rxvSoGTxwc"

#Downloading and inspecting the dataset
This dataset involve a large collection of Irish music. It is downloaded using the mdl library.

In [None]:
# Download the dataset
songs = mdl.lab1.load_training_data()

# Print one of the songs to inspect it in greater detail!
example_song = songs[0]
print("\nExample song: ")
print(example_song)

The mdl library also has functions to convert the strings into wav files which can be listened to on colab itself.

In [None]:
# Convert the ABC notation to audio file and listen to it
mdl.lab1.play_song(example_song)

The ABC notation of these songs doesn't only contain information about the notes being played, but **also has the meta information** like the song title, key and tempo. Thus we get the possible characters in the dataset separately.

In [None]:
# Join our list of song strings into a single string containing all songs
songs_joined = "\n\n".join(songs)

# Find all unique characters in the joined string
vocab = sorted(set(songs_joined))
print("There are", len(vocab), "unique characters in the dataset")

The aim is for the RNN model to learn patterns in ABC music, and then use this model to generate (i.e., predict) a new piece of music based on this learned information.

Henceforth, what is required from the model is: **given a character, or a sequence of characters**, what is the **most probable next character**.

RNNs maintain an **internal state that depends on previously seen elements**, so **information about all characters seen up until a given moment will be taken into account in generating the prediction**.

#Vectorization and Processing
Before training the RNN model, **numerical representations** of the text based dataset need to be created.

For this **two lookup tables** can be created: **one that maps characters to numbers, and a second that maps numbers back to characters.**

In [None]:
### Define numerical representation of text ###

# Create a mapping from character to unique index.
# For example, to get the index of the character "d", we can evaluate `char2idx["d"]`.
char2idx = {u:i for i, u in enumerate(vocab)}

# Create a mapping from indices to characters. This is
#  the inverse of char2idx and allows us to convert back
#   from unique index to the character in our vocabulary.
idx2char = np.array(vocab)

In [None]:
print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

In [None]:
### Vectorize the songs string ###

def vectorize_string(string):
  # Convert each character in the string to its corresponding index
  vectorized = [char2idx[char] for char in string]
  # Convert the list to a numpy array
  return np.array(vectorized)

vectorized_songs = vectorize_string(songs_joined)

In [None]:
print ('{} ---- characters mapped to int ----> {}'.format(repr(songs_joined[:10]), vectorized_songs[:10]))
# check that vectorized_songs is a numpy array
assert isinstance(vectorized_songs, np.ndarray), "returned result should be a numpy array"

In [None]:
### Batch definition to create training examples ###

def get_batch(vectorized_songs, seq_length, batch_size):
  # the length of the vectorized songs string
  n = vectorized_songs.shape[0] - 1
  # randomly choose the starting indices for the examples in the training batch
  idx = np.random.choice(n-seq_length, batch_size)

  input_batch = [vectorized_songs[i:i + seq_length] for i in idx]
  output_batch = [vectorized_songs[i + 1:i + seq_length + 1] for i in idx]

  # x_batch, y_batch provide the true inputs and targets for network training
  x_batch = np.reshape(input_batch, [batch_size, seq_length])
  y_batch = np.reshape(output_batch, [batch_size, seq_length])
  return x_batch, y_batch


# Perform some simple tests to make sure your batch function is working properly!
test_args = (vectorized_songs, 10, 2)
if not mdl.lab1.test_batch_func_types(get_batch, test_args) or \
   not mdl.lab1.test_batch_func_shapes(get_batch, test_args) or \
   not mdl.lab1.test_batch_func_next_step(get_batch, test_args):
   print("======\n[FAIL] could not pass tests")
else:
   print("======\n[PASS] passed all tests!")

In [None]:
x_batch, y_batch = get_batch(vectorized_songs, seq_length=5, batch_size=1)

for i, (input_idx, target_idx) in enumerate(zip(np.squeeze(x_batch), np.squeeze(y_batch))):
    print("Step {:3d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

In [None]:
def LSTM(rnn_units):
  return tf.keras.layers.LSTM(
    rnn_units,
    return_sequences=True,
    recurrent_initializer='glorot_uniform',
    recurrent_activation='sigmoid',
    stateful=True,
  )

In [None]:
import tensorflow as tf

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
        # Layer 1: Embedding layer to transform indices into dense vectors
        tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]),

        # Layer 2: LSTM with `rnn_units` number of units.
        tf.keras.layers.LSTM(rnn_units, return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'),

        # Layer 3: Dense (fully-connected) layer that transforms the LSTM output into the vocabulary size.
        tf.keras.layers.Dense(vocab_size)
    ])

    return model

# Build a simple model with default hyperparameters. You will get the chance to change these later.
model = build_model(len(vocab), embedding_dim=256, rnn_units=1024, batch_size=32)

model.summary()


In [None]:
x, y = get_batch(vectorized_songs, seq_length=100, batch_size=32)
pred = model(x)
print("Input shape:      ", x.shape, " # (batch_size, sequence_length)")
print("Prediction shape: ", pred.shape, "# (batch_size, sequence_length, vocab_size)")

In [None]:
sampled_indices = tf.random.categorical(pred[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()
sampled_indices

In [None]:
print("Input: \n", repr("".join(idx2char[x[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices])))

In [None]:
import tensorflow as tf

# Define the loss function
def compute_loss(labels, logits):
    # Compute the sparse categorical cross-entropy loss
    loss = tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
    return loss

# Generate some example input data for testing
example_input_batch = np.random.randint(0, len(vocab), (32, 10))  # Example shape: (batch_size, sequence_length)

# Make predictions with the untrained model
pred = model(example_input_batch)

# Generate some example labels for testing (same shape as example_input_batch)
example_labels = np.random.randint(0, len(vocab), (32, 10))

# Compute the loss using the true next characters from the example batch and the predictions
example_batch_loss = compute_loss(example_labels, pred)

print("Prediction shape: ", pred.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())


In [None]:
### Hyperparameter setting and optimization ###

vocab_size = len(vocab)

# Model parameters:
params = dict(
  num_training_iterations = 3000,  # Increase this to train longer
  batch_size = 8,  # Experiment between 1 and 64
  seq_length = 100,  # Experiment between 50 and 500
  learning_rate = 5e-3,  # Experiment between 1e-5 and 1e-1
  embedding_dim = 256,
  rnn_units = 1024,  # Experiment between 1 and 2048
)

# Checkpoint location:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "my_ckpt")

In [None]:
import os
import tensorflow as tf
from tqdm import tqdm
import numpy as np
import comet_ml as comet

# Define hyperparameters
vocab_size = len(vocab)
embedding_dim = 256
rnn_units = 1024
batch_size = 32
learning_rate = 0.001

# Instantiate a new model for training
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size)

# Instantiate the optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate)

@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        # Feed the current input into the model and generate predictions
        y_hat = model(x)

        # Compute the loss
        loss = compute_loss(y, y_hat)

    # Compute the gradients
    grads = tape.gradient(loss, model.trainable_variables)

    # Apply the gradients to the optimizer
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    return loss

# Begin training

# Parameters for training
params = {
    "num_training_iterations": 1000,
    "seq_length": 100,
    "batch_size": batch_size
}

# Initialize Comet experiment
api_key = "hce3AJwsDNoJJq1rxvSoGTxwc"
experiment = comet.Experiment(api_key=api_key)

history = []
plotter = mdl.util.PeriodicPlotter(sec=2, xlabel='Iterations', ylabel='Loss')

checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{iter}")

if hasattr(tqdm, '_instances'): tqdm._instances.clear() # clear if it exists
for iter in tqdm(range(params["num_training_iterations"])):
    # Grab a batch and propagate it through the network
    x_batch, y_batch = get_batch(vectorized_songs, params["seq_length"], params["batch_size"])
    loss = train_step(x_batch, y_batch)

    # Log the loss to the Comet interface
    experiment.log_metric("loss", loss.numpy().mean(), step=iter)

    # Update the progress bar and also visualize within the notebook
    history.append(loss.numpy().mean())
    plotter.plot(history)

    # Save the model weights every 100 iterations
    if iter % 100 == 0:
        model.save_weights(checkpoint_prefix.format(iter=iter))

# Save the final trained model and the weights
model.save_weights(checkpoint_prefix.format(iter=params["num_training_iterations"] - 1))
experiment.end()


In [None]:
# Rebuild the model using a batch_size=1
# Use the same vocab_size, embedding_dim, and rnn_units as used during training
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

# Restore the model weights for the last checkpoint after training
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

# Build the model with the new batch size (batch_size=1)
model.build(tf.TensorShape([1, None]))

# Print the model summary
model.summary()

In [None]:
### Prediction of a generated song ###

def generate_text(model, start_string, generation_length=1000):
    # Evaluation step (generating ABC text using the learned RNN model)

    # Convert the start string to numbers (vectorize)
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    # Empty string to store our results
    text_generated = []

    # Here batch size == 1
    model.reset_states()
    tqdm._instances.clear()

    for i in tqdm(range(generation_length)):
        # Evaluate the inputs and generate the next character predictions
        predictions = model(input_eval)

        # Remove the batch dimension
        predictions = tf.squeeze(predictions, 0)

        # Use a multinomial distribution to sample
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()

        # Pass the prediction along with the previous hidden state
        # as the next inputs to the model
        input_eval = tf.expand_dims([predicted_id], 0)

        # Add the predicted character to the generated text
        text_generated.append(idx2char[predicted_id])

    return (start_string + ''.join(text_generated))

In [None]:
### Play back generated songs ###

generated_text = generate_text(model, start_string = "x", generation_length=1000)
generated_songs = mdl.lab1.extract_song_snippet(generated_text)

for i, song in enumerate(generated_songs):
  # Synthesize the waveform from a song
  waveform = mdl.lab1.play_song(song)

  # If its a valid song (correct syntax), lets play it!
  if waveform:
    print("Generated song", i)
    ipythondisplay.display(waveform)

    numeric_data = np.frombuffer(waveform.data, dtype=np.int16)
    wav_file_path = f"output_{i}.wav"
    write(wav_file_path, 88200, numeric_data)

    # save your song to the Comet interface -- you can access it there
    experiment.log_asset(wav_file_path)

In [None]:
# when done, end the comet experiment
experiment.end()