# Music Generation Using Char-RNN

In this tutorial we will be training a char-RNN on multiple music files represented in ABC notation (https://en.wikipedia.org/wiki/ABC_notation). We train a char-RNN as a character-level language model. The following diagram shows the application of a char-RNN for character prediction at each time-step.

<img src="https://i.imgur.com/ZS1iuCh.png" style="height:100px; width:100px"></img>

An example RNN with 4-dimensional input and output layers, and a hidden layer of 3 units (neurons). This diagram shows the activations in the forward pass when the RNN is fed the characters "hell" as input. The output layer contains confidences the RNN assigns for the next character (vocabulary is "h,e,l,o"); We want the green numbers to be high and red numbers to be low. 
Source : http://karpathy.github.io/2015/05/21/rnn-effectiveness/

In [0]:
import tensorflow as tf
import numpy as np
import urllib
import os.path

The following function reads all the input characters in the text file and preapres a vocabulary of the unique characters ```i2c``` (index to character) and a reverse mapping ```c2i```. It incodes the text file as a list of indices in the vobulary for each character in the text file. You can call this function and print i2c, c2i and data to understand the data structures used.

In [0]:
def load_data(file_path="input.txt"):
    if not os.path.isfile(file_path):
        urllib.urlretrieve("https://paarthneekhara.github.io/input.txt", filename=file_path)
        print "downloaded file"
    f = open(file_path)
    raw_data = f.read()
    f.close()
    vocab = {}
    vocab = {ch : True for ch in raw_data}

    i2c = [ch for ch in vocab]
    c2i = {ch : i for i, ch in enumerate(vocab)}

    data = [c2i[ch] for ch in raw_data]
    
    return data, i2c, c2i

The function below will be used during training to get a sentence (list of 25 contiguous characters) from the data. The target sentence is offset from the source sentence by 1 index. Refer to the diagram above to understand why.

In [0]:
def get_sentence(sentence_index, sentence_length, data):
    si = sentence_index * sentence_length
    ei = min(si + sentence_length, len(data)-1)
    source = np.array([data[si:ei]], 'int32')
    target = np.array([data[si+1:ei+1]], 'int32')

    return source, target

We define our language model below. The model is an implementation of the char-RNN described above. Instead of a simple char-RNN we use an improved model called LSTM which is popular for language modelling tasks. 

In [0]:
class Model:
    def __init__(self, options):
        self.embedding_matrix = tf.get_variable('embedding_matrix', 
                [options['vocab_size'], options['hidden_size']],
                initializer=tf.truncated_normal_initializer(stddev=0.02))
        self.options = options

        self.lstm_init_value = tf.placeholder(
                tf.float32,
                shape=(None, 2 * options['hidden_size']),
                name="lstm_init_value"
            )

    def forward_pass(self, sentence):
        sentence_embedding = tf.nn.embedding_lookup(self.embedding_matrix, 
            sentence, name = "sentence_embedding")
        cell = tf.nn.rnn_cell.LSTMCell(num_units=options['hidden_size'], state_is_tuple=False)
        outputs, last_states = tf.nn.dynamic_rnn(
                cell=cell,
                dtype=tf.float32,
                initial_state=self.lstm_init_value,
                inputs=sentence_embedding)
        outputs = tf.reshape(outputs, shape = (-1, options['hidden_size']))
        logits = tf.layers.dense(outputs, self.options['vocab_size'])
        self.last_states = last_states
        return logits

In [0]:
# IMPLEMENT THIS FUNCTION
# activations is a tensor of shape (Vocab Size,)
def sample(activations, temp):
    # Implelemnt this function
    pass
        

In [0]:
def generate_sample(T = 1.0, sample_length = 1000):
    # seed to start with
    source_np = np.array( [[c2i['<'], c2i['s'], c2i['t'], c2i['a'], c2i['r']]], dtype = 'int32')
    generation = '<star'
    init_state =  np.zeros((1, 2 * HIDDEN_SIZE))
    for i in range(sample_length):
        if i != 0:
            init_state = next_hidden
        
        logits_np, next_hidden = sess.run([logits, model.last_states], 
            feed_dict={
                input_tensor : source_np,
                model.lstm_init_value : init_state
            }
        )

        ch_sampled = sample(logits_np[-1,:], T)
        generation = generation + ch_sampled
        
        source_np = np.array( [[c2i[ch_sampled]]])
        
    return generation

In [0]:
MAX_EPOCHS = 100 
HIDDEN_SIZE = 200 # Hidden Units in the LSTM
SENTENCE_LENGTH = 25 # Sentence Length Used In Training
LR = 0.0004 # Learning Rate
TEMP = 0.7 # Temperature parameter used to control stochasticity in sampling
SAMPLE_EVERY = 1000 # Generate a sample every x iterations
SAMPLE_LENGTH = 1000 # Max Length of sample to be sampled from the model
BATCH_SIZE = 1

In [0]:
data, i2c, c2i = load_data()

input_tensor = tf.placeholder(tf.int32, [BATCH_SIZE, None])
target_tensor = tf.placeholder(tf.int32, [BATCH_SIZE, None])

options = {
    'vocab_size' : len(i2c),
    'hidden_size' : HIDDEN_SIZE,
    'sentence_length' : None
}   

model = Model(options)
logits = model.forward_pass(input_tensor)
target_tensor_flat = tf.reshape(target_tensor, shape = (-1,))
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = target_tensor_flat, logits = logits))


train_op = tf.train.AdamOptimizer(LR).minimize(loss)


downloaded file


In [0]:
print "Vocab"
print i2c[0:50]
print "Data"
print data[0:50]


Vocab
[' ', '(', ',', '0', '4', '8', '<', '@', 'D', 'H', 'L', 'P', 'T', 'X', '\\', 'd', 'h', 'l', 'p', 't', 'x', '|', '#', "'", '+', '/', '3', '7', '?', 'C', 'G', 'K', 'O', 'S', 'W', '[', '_', 'c', 'g', 'k', 'o', 's', 'w', '{', '\n', '"', '&', '*', '.', '2']
Data
[6, 41, 19, 86, 65, 19, 52, 70, 44, 13, 51, 74, 70, 44, 12, 51, 0, 10, 86, 0, 81, 40, 64, 19, 62, 86, 65, 88, 64, 87, 70, 44, 59, 51, 12, 65, 86, 64, 41, 37, 65, 88, 19, 0, 87, 19, 25, 40, 91, 0]


In [0]:
sess = tf.InteractiveSession()
tf.initialize_all_variables().run()
init_state =  np.zeros((1, 2 * HIDDEN_SIZE))
for epoch in range(MAX_EPOCHS):
    for sentence_index in xrange(len(data)/SENTENCE_LENGTH):
        source, target = get_sentence(sentence_index, SENTENCE_LENGTH, data)
        if sentence_index != 0:
            init_state = next_hidden

        _, loss_val, next_hidden = sess.run([train_op, loss, model.last_states], 
            feed_dict={
                input_tensor : source,
                target_tensor : target,
                model.lstm_init_value : init_state
            }
        )

        if sentence_index % SAMPLE_EVERY == 0:
            print sentence_index, epoch, loss_val
            sample_generation = generate_sample(T = TEMP, sample_length = SAMPLE_LENGTH)
            print sample_generation

Instructions for updating:
Use `tf.global_variables_initializer` instead.
0 0 4.542019
Kzd7*d'+hye!
fDzLmG:JNi~rvHqacB@/xOL>OE*"6'er*Kc=M=-r5d})9cK+}DQ~uVsqDFc\*6FGhHic/gHlbdw.8{Y_zR<c6BeM?=Kh7:O&w):CdU4W-b2O>
h.bS.f{=\?]Khw2 #Fb}>3.q( {7HoI*rw2&}=_~?"S"01wHfL{F^#d#UyK
M!e>"N9K>]v6 2)&f!r=PCaPyYOmC+sw]ETc=S_v2zbw?Q"xVR/cJa[aMFv)/J6)3
S~UgAT
Gc!26qJ	<r-,|!xS8_XvL1R
cU1QAD+ROtq 9o]?zre_",vd1DN]sF8CJ]Kn4J
9m? \@qx'6jaE|@0ufn"(A]>A.,V/a+Mi0L1!Il}@::+l7
_ADpr/GmkNA|NODz>RnH"jJOJT(7T*6?."nMT
IfW4JgOEn>':M7w8:eQE\b69bE8JYH}f<Vfa}9TxV^weL*}RSw#fjQhV8,(TOW1 T_q|k64KRxiV\p5wo<3
aGhs-a=4?WxcKjnlMq	,kW=:.H,>At\W25sw1S=Bwn,S!RVe~
'ne# q{2	rr|Y'a9q/o4XzWs
1000 0 2.311238
<starsaieieonpleec D2oheru tanlorire MMa70tavertae
ZG/B | B>2ed |
MrC:uinurt
:taton er doge dc arri tirLioloriailvieullc pvoonmain -tariiileeenoir catLi
B:!ort t CBorn ?
:0:::conetoiierbe
m:?or
Z::2it cerer er 4F
::-ur
<an-
Z:Pru
r:Fta gvaailleee Zrovsvvosaevee cout MLe5ltrtri coaoouier or t-osauiuugr |e3 d>e f2B2B2 | fAd> | 
B3 .ed

Generate midi files using http://mandolintab.net/abcconverter.php  by pasting a sample from start to end.