## To do:

1. Data
    1. Use more music files from other artist
    2. Don't just concatenate all the text files together, make sure difference pieces of music are properly separated
    3. Batch the sequences in a better way (not just [1,2,3], [4,5,6] but [1,2,3], [2,3,4], [3,4,5], [5,6,7]
    
    
2. Model
    1. Choose from top k tokens, not just the highest probability one (DONE)
    2. Try optimizing the top k words selection simulating the j-th element.
    2. Try using less hops
    3. Write the music decoder to auto decode the music

In [31]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


<center><h3>**Music generation using GPT architecture**</h3></center>

We are using a Transformer network, from the __[Attention is all you need](http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf)__ paper.
We are going to:
- Process music sheet into sub-word tokens, to avoid fixed vocabulary sizes, and UNK tokens.
- Implement the key conceptual blocks of a Transformer.
- Train a Transformer to generate new tokens.

# Library imports

In [32]:
from transformer import GPT
import sentencepiece as spm
import tensorflow as tf
import numpy as np
import json
import capita

root_folder = ""

In [33]:
# Load the word piece model that will be used to tokenize the texts into
# word pieces with a vocabulary size of 10000

sp = spm.SentencePieceProcessor()
sp.Load(root_folder+"dataset/wp_vocab10000.model")

vocab = [line.split('\t')[0] for line in open(root_folder+"dataset/wp_vocab10000.vocab", "r")]
pad_index = vocab.index('#')

def pad_sequence(numerized, pad_index, to_length):
    pad = numerized[:to_length]
    padded = pad + [pad_index] * (to_length - len(pad))
    mask = [w != pad_index for w in padded]
    return padded, mask

# Creating a GPT

Now that all the blocks of the Transformer are implemented, we can create a full model with placeholders and a loss.

In [1]:
class GPTTrainer():

    def __init__(self, vocab_size, d_model, output_length, n_layers, d_filter, learning_rate=1e-3):

        self.target_sequence = tf.placeholder(tf.int32, shape=(None,output_length),name="target_sequence")
        self.decoder_mask = tf.placeholder(tf.bool, shape=(None,output_length),name="decoder_mask")

        self.model = GPT(vocab_size=vocab_size, d_model=d_model, n_layers=n_layers, d_filter=d_filter)

        self.decoded_logits = self.model(self.target_sequence, decoder_mask=self.decoder_mask)
        self.global_step = tf.train.get_or_create_global_step()
        
        # Summarization loss
        self.loss = tf.losses.sparse_softmax_cross_entropy(self.target_sequence, self.decoded_logits, tf.cast(self.decoder_mask, tf.float32))
        self.optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
        self.train_op = self.optimizer.minimize(self.loss, global_step=self.global_step)
        self.saver = tf.train.Saver()

# Loading music data

In [35]:
# Dataset related parameters
vocab_size = len(vocab)
ilength = 128 # Length of the article (Previously 400)
olength  = 100 # Length of the summaries

# Model related parameters, feel free to modify these.
n_layers = 6  # Originally 12
d_model  = 104
d_filter = 416

model = GPTTrainer(vocab_size, d_model, ilength, n_layers, d_filter)

In [36]:
# replace with any text file containing full set of data
mozart_data = 'txt-files/notewise/custom/mozart.txt'

with open(mozart_data, 'r') as file:
    text = file.read()

In [8]:
# get vocabulary set
vocab = sorted(tuple(set(text.split())))
n = len(vocab)

# create word-integer encoder/decoder
word2int = dict(zip(vocab, list(range(n))))
int2word = dict(zip(list(range(n)), vocab))

# encode all words in dataset into integers
encoded = np.array([word2int[word] for word in text.split()])

# get vocab_size
vocab_size=len(vocab)

In [9]:
# split dataset into 90% train and 10% using index
val_idx = int(len(encoded) * (1 - 0.1))
train_data, val_data = encoded[:val_idx], encoded[val_idx:]

# Preparing data

In [10]:
def prepare_dataset(dataset, seq_len):
    
    # trim data set to be multiple of seq_len
    num_seq = len(dataset)//seq_len
    trim_len = num_seq*seq_len
    dataset = dataset[:trim_len]
    
    # reshape dataset into sequences
    dataset = np.reshape(dataset, (num_seq, seq_len))

    return dataset

In [11]:
def build_batch(dataset, batch_size):
    indices = list(np.random.randint(0, dataset.shape[0], size=batch_size))
    seq_len = dataset.shape[1]
    
    batch_output = dataset[indices,:]
    batch_output_mask = np.ones((batch_size, seq_len), dtype=bool)
    
    return batch_output, batch_output_mask

In [12]:
d_train = prepare_dataset(train_data, 128)
d_valid = prepare_dataset(val_data, 128)

print("Train Set Shape: {}, Test Set Shape: {}".format(d_train.shape, d_valid.shape))

Train Set Shape: (79853, 128), Test Set Shape: (8872, 128)


# Training model

In [None]:
# Skeleton code, as in the previous notebook.
# Write code training code and save your best performing model on the
# validation set. We will be testing the loss on a held-out test dataset.


with tf.Session() as sess:
    # This is how you randomly initialize the Transformer weights.
    sess.run(tf.global_variables_initializer())
    
    epochs = 20 #previously 50
    
    for epoch in range(epochs):
        
        batch_size = 128
        iterations = d_train.shape[0] // batch_size
        
        # build validation set
        e_output, e_output_mask = build_batch(d_valid, 200)
        
        for iteration in range(iterations):       

            # Create a random mini-batch from the training dataset
            batch_output, batch_output_mask = build_batch(d_train, batch_size)
            # Build the feed-dict connecting placeholders and mini-batch
            feed = {model.target_sequence: batch_output, model.decoder_mask: batch_output_mask}

            # Obtain the loss. Be careful when you use the train_op and not, as previously.
            train_loss, _, step = sess.run([model.loss, model.train_op, model.global_step], feed_dict=feed)
            
            if iteration % 50 == 0:
                
                # get validation loss
                feed_val = {model.target_sequence: e_output, model.decoder_mask: e_output_mask}
                valid_loss = sess.run(model.loss, feed_dict=feed_val)
                
                print("Epoch {} Iteration {}, Train Loss: {}, Val Loss: {}".format(epoch, iteration, train_loss, valid_loss))
                
#                 print("Epoch {} Iteration {}, Train Loss: {}".format(epoch, iteration, train_loss))
                
#                 This is how you save model weights into a file
                model.saver.save(sess, root_folder+"models/gpt_music")    

#                 # This is how you restore a model previously saved
#                 model.saver.restore(sess, root_folder+"models/transformer_summarizer")


Epoch 0 Iteration 0, Train Loss: 14.469578742980957, Val Loss: 11.785544395446777
Epoch 0 Iteration 50, Train Loss: 7.486458778381348, Val Loss: 7.417486667633057
Epoch 0 Iteration 100, Train Loss: 7.582685470581055, Val Loss: 7.410603046417236
Epoch 0 Iteration 150, Train Loss: 7.377954483032227, Val Loss: 7.195443153381348
Epoch 0 Iteration 200, Train Loss: 6.044005393981934, Val Loss: 5.789759635925293
Epoch 0 Iteration 250, Train Loss: 5.636773109436035, Val Loss: 5.298721790313721
Epoch 0 Iteration 300, Train Loss: 5.255577564239502, Val Loss: 4.793505668640137
Epoch 0 Iteration 350, Train Loss: 4.646119117736816, Val Loss: 4.2281107902526855
Epoch 0 Iteration 400, Train Loss: 4.030297756195068, Val Loss: 3.804192543029785
Epoch 0 Iteration 450, Train Loss: 3.8890271186828613, Val Loss: 3.352569818496704
Epoch 0 Iteration 500, Train Loss: 3.4022698402404785, Val Loss: 3.048938035964966
Epoch 0 Iteration 550, Train Loss: 3.040712356567383, Val Loss: 2.8604846000671387
Epoch 0 Itera

Epoch 7 Iteration 400, Train Loss: 1.396698236465454, Val Loss: 1.4821534156799316
Epoch 7 Iteration 450, Train Loss: 1.4335107803344727, Val Loss: 1.4759868383407593
Epoch 7 Iteration 500, Train Loss: 1.3463845252990723, Val Loss: 1.4627560377120972
Epoch 7 Iteration 550, Train Loss: 1.4227793216705322, Val Loss: 1.478621244430542
Epoch 7 Iteration 600, Train Loss: 1.3950715065002441, Val Loss: 1.4647849798202515
Epoch 8 Iteration 0, Train Loss: 1.4105567932128906, Val Loss: 1.4455336332321167
Epoch 8 Iteration 50, Train Loss: 1.3231462240219116, Val Loss: 1.452998399734497
Epoch 8 Iteration 100, Train Loss: 1.380520224571228, Val Loss: 1.4401969909667969
Epoch 8 Iteration 150, Train Loss: 1.3742295503616333, Val Loss: 1.4467943906784058
Epoch 8 Iteration 200, Train Loss: 1.335031270980835, Val Loss: 1.443242073059082
Epoch 8 Iteration 250, Train Loss: 1.3730876445770264, Val Loss: 1.4479875564575195
Epoch 8 Iteration 300, Train Loss: 1.312730312347412, Val Loss: 1.4532997608184814
Ep

# Using the music generation gpt model

In [13]:
# Put the file path to your best performing model in the string below.

model_file = root_folder+"models/gpt_music"
# model_file = root_folder+"models/transformer_summarizer"

## The validation loss

Measure the validation loss of your model. We will use the code here with the unreleased test-set to evaluate your model.

In [14]:
with tf.Session() as sess:
    model.saver.restore(sess, model_file)

    e_output, e_output_mask = build_batch(d_valid, 200)
    feed = {model.target_sequence: e_output, model.decoder_mask: e_output_mask}
    valid_loss = sess.run(model.loss, feed_dict=feed)
    print("Validation loss:", valid_loss)

INFO:tensorflow:Restoring parameters from models/gpt_music
Validation loss: 1.41856


## Generating music

In [274]:
# takes significantly longer just taking the max

def choose_top_words(arr, k):

    # get top k indexes
    # argsort in increasing order, get last k highest elements, reverse
    top_k_words = np.flip(arr.argsort(axis=2)[:,:,-k:], axis=2)

    # get corresponding logits sorted in decreasing order
    top_k_logits = np.flip(np.sort(arr, axis=2)[:,:,-k:], axis=2)

    # get sum of logits
    logits_sum = top_k_logits.sum(axis=2)

    # softmax top k logits into probabilities
    p = top_k_logits / logits_sum[0][:, np.newaxis]

    # hold cumulative distribution
    c = p.cumsum(axis=2)

    # generate random uniform samples
    u = np.random.rand(1, len(p[0]), 1)

    # get indexes of selected logits
    choices = (u < c).argmax(axis=2)

    # map selected indexes bacck to top_k_words
    chosen_words = np.array([[top_k_words[0][i][j] for i, j in enumerate(choices[0])]])
    
    return chosen_words


Here, we are generating one composition with our model. To evaluate it, we are going to generate a large number of them (100+) and then apply some statistics such as the BLEU score to estimate how well our model is doing.

In [275]:
output_length = 128

with tf.Session() as sess:
    model.saver.restore(sess, model_file)

    decoded_so_far = [0]
    
    for j in range(output_length):
        padded_decoder_input, decoder_mask = pad_sequence(decoded_so_far, pad_index, output_length)
        padded_decoder_input = [padded_decoder_input]
        decoder_mask = [decoder_mask]
#         print("========================")
#         print(padded_decoder_input)
        # Use the model to find the distrbution over the vocabulary for the next word
        feed = {model.target_sequence: padded_decoder_input,
                model.decoder_mask: decoder_mask}
        logits = sess.run([model.decoded_logits], feed_dict=feed)
    
#        chosen_words = np.argmax(logits[0], axis=2) # Take the argmax, getting the most likely next word
        chosen_words = choose_top_words(logits[0], 3)
        next_word = int(chosen_words[0, j])
        decoded_so_far.append(next_word) # We add it to the summary so far


print("The final summary:")
print(" ".join([vocab[i] for i in decoded_so_far]).replace("▁", " "))


INFO:tensorflow:Restoring parameters from models/gpt_music
The final summary:
endp0 wait1 wait1 p23 p24 p40 p45 wait5 endp45 endp45 wait1 wait3 p42 p43 wait3 wait1 endp42 endp42 wait4 wait2 endp23 p45 endp24 wait5 wait1 endp45 p19 wait3 p47 p40 wait3 endp40 p47 endp47 wait5 p48 endp19 wait5 wait1 endp47 p19 p52 wait3 wait2 endp19 endp43 endp52 wait1 wait1 p50 p50 wait5 wait5 endp38 endp45 endp50 endp47 wait1 wait6 p48 p21 wait6 wait5 p48 endp45 endp48 wait2 wait2 endp21 p47 p47 wait2 wait3 endp47 p48 wait4 endp48 p47 wait1 endp47 endp21 wait2 wait5 endp47 p18 wait1 p42 p50 wait2 wait2 endp50 endp42 wait1 wait1 p48 p48 wait5 wait3 endp18 endp48 endp48 wait1 wait1 p24 p19 p45 p35 wait3 p47 p47 wait3 wait1 endp47 p48 wait5 wait1 endp45 p52 endp47 wait5 wait5 endp57 endp45 wait3 p52 p48 wait3 wait1 endp52
