# Music Generation Using Char-RNN

In this tutorial we will be training a char-RNN on multiple music files represented in ABC notation (https://en.wikipedia.org/wiki/ABC_notation). We train a char-RNN as a character-level language model. The following diagram shows the application of a char-RNN for character prediction at each time-step.

<img src="https://i.imgur.com/ZS1iuCh.png" style="height:100px; width:100px"></img>

An example RNN with 4-dimensional input and output layers, and a hidden layer of 3 units (neurons). This diagram shows the activations in the forward pass when the RNN is fed the characters "hell" as input. The output layer contains confidences the RNN assigns for the next character (vocabulary is "h,e,l,o"); We want the green numbers to be high and red numbers to be low. 
Source : http://karpathy.github.io/2015/05/21/rnn-effectiveness/

In [1]:
import tensorflow as tf
import numpy as np
import urllib
import os.path
import urllib.request

  from ._conv import register_converters as _register_converters


The following function reads all the input characters in the text file and preapres a vocabulary of the unique characters ```i2c``` (index to character) and a reverse mapping ```c2i```. It incodes the text file as a list of indices in the vobulary for each character in the text file. You can call this function and print i2c, c2i and data to understand the data structures used.

In [2]:
def load_data(file_path="input.txt"):
    if not os.path.isfile(file_path):
        urllib.request.urlretrieve("https://paarthneekhara.github.io/input.txt", filename=file_path)
        print("downloaded file")
    f = open(file_path)
    raw_data = f.read()
    f.close()
    vocab = {}
    vocab = {ch : True for ch in raw_data}

    i2c = [ch for ch in vocab]
    c2i = {ch : i for i, ch in enumerate(vocab)}

    data = [c2i[ch] for ch in raw_data]
    
    return data, i2c, c2i

In [3]:
data, i2c, c2i = load_data()

In [4]:
print(c2i)

{'<': 0, 's': 1, 't': 2, 'a': 3, 'r': 4, '>': 5, '\n': 6, 'X': 7, ':': 8, '1': 9, 'T': 10, ' ': 11, 'L': 12, 'M': 13, 'o': 14, 'n': 15, 'f': 16, 'i': 17, 'e': 18, 'Z': 19, 'c': 20, '/': 21, 'u': 22, 'g': 23, '?': 24, 'p': 25, 'h': 26, 'l': 27, 'B': 28, 'E': 29, 'O': 30, 'N': 31, '-': 32, '2': 33, '0': 34, '5': 35, '7': 36, '4': 37, 'P': 38, 'b': 39, 'v': 40, 'm': 41, '@': 42, '.': 43, '8': 44, 'Q': 45, '=': 46, '6': 47, 'F': 48, 'G': 49, 'A': 50, '{': 51, '}': 52, '|': 53, 'D': 54, 'C': 55, '3': 56, ',': 57, '_': 58, 'd': 59, 'S': 60, 'V': 61, '(': 62, 'I': 63, ')': 64, 'K': 65, 'j': 66, 'z': 67, 'w': 68, 'x': 69, ']': 70, '"': 71, '!': 72, '+': 73, '[': 74, "'": 75, 'U': 76, '9': 77, '^': 78, 'R': 79, '\\': 80, 'y': 81, 'H': 82, 'q': 83, 'J': 84, 'W': 85, 'k': 86, '~': 87, '\t': 88, 'Y': 89, '*': 90, '#': 91, '&': 92}


The function below will be used during training to get a sentence (list of 25 contiguous characters) from the data. The target sentence is offset from the source sentence by 1 index. Refer to the diagram above to understand why.

In [5]:
def get_sentence(sentence_index, sentence_length, data):
    si = sentence_index * sentence_length
    ei = min(si + sentence_length, len(data)-1)
    source = np.array([data[si:ei]], 'int32')
    target = np.array([data[si+1:ei+1]], 'int32')

    return source, target

We define our language model below. The model is an implementation of the char-RNN described above. Instead of a simple char-RNN we use an improved model called LSTM which is popular for language modelling tasks. 

In [6]:
class Model:
    def __init__(self, options):
        self.embedding_matrix = tf.get_variable('embedding_matrix', 
                [options['vocab_size'], options['hidden_size']],
                initializer=tf.truncated_normal_initializer(stddev=0.02))
        self.options = options

        self.lstm_init_value = tf.placeholder(
                tf.float32,
                shape=(None, 2 * options['hidden_size']),
                name="lstm_init_value"
            )

    def forward_pass(self, sentence):
        sentence_embedding = tf.nn.embedding_lookup(self.embedding_matrix, 
            sentence, name = "sentence_embedding")
        cell = tf.nn.rnn_cell.LSTMCell(num_units=options['hidden_size'], state_is_tuple=False)
        outputs, last_states = tf.nn.dynamic_rnn(
                cell=cell,
                dtype=tf.float32,
                initial_state=self.lstm_init_value,
                inputs=sentence_embedding)
        outputs = tf.reshape(outputs, shape = (-1, options['hidden_size']))
        logits = tf.layers.dense(outputs, self.options['vocab_size'])
        self.last_states = last_states
        return logits

In [7]:
# IMPLEMENT THIS FUNCTION
# activations is a tensor of shape (Vocab Size,)
def sample(activations, temp):
    # Implelemnt this function
    if temp == 0.0:
        sample = np.argmax(activations)
    else:
        scale = activations / temp
        exp = np.exp(scale - np.max(scale))
        soft = exp / np.sum(exp)

        sample = np.random.choice(len(soft), p=soft)
    return i2c[sample]

In [8]:
def generate_sample(T = 1.0, sample_length = 1000):
    # seed to start with
    source_np = np.array( [[c2i['<'], c2i['s'], c2i['t'], c2i['a'], c2i['r']]], dtype = 'int32')
    generation = '<star'
    init_state =  np.zeros((1, 2 * HIDDEN_SIZE))
    for i in range(sample_length):
        if i != 0:
            init_state = next_hidden
        
        logits_np, next_hidden = sess.run([logits, model.last_states], 
            feed_dict={
                input_tensor : source_np,
                model.lstm_init_value : init_state
            }
        )

        ch_sampled = sample(logits_np[-1,:], T)
        generation = generation + ch_sampled
        
        source_np = np.array( [[c2i[ch_sampled]]])
        
    return generation

In [9]:
MAX_EPOCHS = 100 
HIDDEN_SIZE = 200 # Hidden Units in the LSTM
SENTENCE_LENGTH = 25 # Sentence Length Used In Training
LR = 0.0004 # Learning Rate
TEMP = 0.7 # Temperature parameter used to control stochasticity in sampling
SAMPLE_EVERY = 1000 # Generate a sample every x iterations
SAMPLE_LENGTH = 1000 # Max Length of sample to be sampled from the model
BATCH_SIZE = 1

In [10]:
data, i2c, c2i = load_data()

input_tensor = tf.placeholder(tf.int32, [BATCH_SIZE, None])
target_tensor = tf.placeholder(tf.int32, [BATCH_SIZE, None])

options = {
    'vocab_size' : len(i2c),
    'hidden_size' : HIDDEN_SIZE,
    'sentence_length' : None
}   

model = Model(options)
logits = model.forward_pass(input_tensor)
target_tensor_flat = tf.reshape(target_tensor, shape = (-1,))
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = target_tensor_flat, logits = logits))


train_op = tf.train.AdamOptimizer(LR).minimize(loss)




In [11]:
print ("Vocab")
print (i2c[0:50])
print ("Data")
print (data[0:50])


Vocab
['<', 's', 't', 'a', 'r', '>', '\n', 'X', ':', '1', 'T', ' ', 'L', 'M', 'o', 'n', 'f', 'i', 'e', 'Z', 'c', '/', 'u', 'g', '?', 'p', 'h', 'l', 'B', 'E', 'O', 'N', '-', '2', '0', '5', '7', '4', 'P', 'b', 'v', 'm', '@', '.', '8', 'Q', '=', '6', 'F', 'G']
Data
[0, 1, 2, 3, 4, 2, 5, 6, 7, 8, 9, 6, 10, 8, 11, 12, 3, 11, 13, 14, 15, 2, 16, 3, 4, 17, 15, 18, 6, 19, 8, 10, 4, 3, 15, 1, 20, 4, 17, 2, 11, 18, 2, 21, 14, 22, 11, 20, 14, 4]


In [None]:
sess = tf.InteractiveSession()
tf.initialize_all_variables().run()
init_state =  np.zeros((1, 2 * HIDDEN_SIZE))
for epoch in range(MAX_EPOCHS):
    for sentence_index in range((int)(len(data)/SENTENCE_LENGTH)):
        source, target = get_sentence(sentence_index, SENTENCE_LENGTH, data)
        if sentence_index != 0:
            init_state = next_hidden

        _, loss_val, next_hidden = sess.run([train_op, loss, model.last_states], 
            feed_dict={
                input_tensor : source,
                target_tensor : target,
                model.lstm_init_value : init_state
            }
        )

        if sentence_index % SAMPLE_EVERY == 0:
            print("\n\n*****************************\n\n")
            print (sentence_index, epoch, loss_val,"\n")
            sample_generation = generate_sample(T = TEMP, sample_length = SAMPLE_LENGTH)
            print (sample_generation)





*****************************


0 0 4.532488 

<star)6VU4x7wQr@#~G(@[{"56lXWZP*T-Kowumr&rj4_u'c.W&{
ESu2IfMX'&>2Hzgnq]yFSD=O<wx+#9Uk-fOo*|mNc0M+Q&C[pEJ?+-e5tsd>X^L:-T,6R@j06V1|Jq^b>W>3TNy7c 5(+8VbwDp0WtEVv,d.09+},}lWw\,1Bf~I>u}v}.rDuhatH+9k5IZgc.GQ?yywHS4IUhJ|nu=^]h_\B|C}=a=/hpS(k)OCD^:"~Mv>O[kM48ZyOR ^|U-1/*7,*nl|Wo.>Tl+.]#{paZ1EU0=OL_[9Oa]Pt?Z_t?*_np'}}&C)Y)_)*H|sVPe!7QAGoqk=}<nq9n|z(6(&E(69	
4 c(RD9Eo.h?.'c#<4:erKdd\U4"*LQ6S[AoA"?jdwqXw
a^\WOddX7h}ez.g({^*jR[tCm{Q56}lWwX}4Y/>4UB2~UscvYDJi@\F>4>w1C-I0zBlljhYZ1'90W4[Mtz ->b>{Q_7|W	T/ZG-xb02zCQMIaZ21,:KCXzUcD@.&9xrMjP>uktyhU\E6	9u	.W(lb.("n=!Ij7Hp#v5I]_	I'cte?1qm<i&>33E+	e	TMIk":#c>^EduVvxHg=CKH(@3&GS5g:gHVGNN4-R7X7Cx}ilEG[JD#PUPHQ=vH4^J|{{\o.#(vQq_lzG8DluEVc=7s{FIADkM?2]RYlw)v&d	9n|QseyhCbqBRp>+yM/0[CS>il7Onl_b@:!i@mv^2nPe-*9s Nl]wf}@FP!7	S=e)(tl9+{j2Z-sVtNpM+-7A.ZHcD5?.B]91@Oo,1Wygg:Z[LuPgbWQG/z
5gX?A
e
v6XL99vHco9McXSVoS5_pNADvz+'
hyA?J6&.L<p9HTLu/30}7z{s^I| ![OUKf&M">2[bYRj_J-D]{	dH!a?h4w-lD&N@uvK*ypHK@ns)q^9:K}^A',Q4tL9&*oA)NTPG



*****************************


8000 0 1.5971962 

<starhe
O:Brance
C:Transcog
C:Brance
A:Crance
A:Gransce
A:Branlese@
T:Transcrit et/ou corrig? par Michel BELLON - 2006-1
Z:Pour toute observation mailtion Car Hhaulteon origlle@free.fr
M:4/4
L:1/4
Q:1/4=168
K:C
V:1
V:1/4
K:G
c2d2|e2d4|A2e4|
cd e2d2 | d2d c2|B2A2B2|dccdBeg | de c2B2 | d2c2B2 | GGG2G2 | G2F2 |
G2G2A2|G2B2G2c2|A2c2 | B2B2B2d2e2|G2c2|f2d2 | B3A | G2B2F2E2|cddc2d2c2|d2B2|G2B2E2G2|A2d2d2d2|d4d2d2|d2B2|A2G2A2B2|d2dc | e2d2f2|d2B2d|2B2B2G2 ||
V:G2
d2B2 | G2G2 | B2A2A2 | G2d2d2 |
d4c2A2A2||G2G2 |
G2G2G2|G2A2A2|G2A2 | G2 G2A2|B4d2cBd | d2c2c2 |: cBBc | e2d2c2|Bcc d2d2 | d2dz | c2d2d2 |c4e2d2 | g2c3c2 | G2G2d2|c4B2G2|A2B2B2G2:|
V:2G
G2G2G2A2|A2B2c2|GBG/2G/2G/2G/2G/2B/2G/2B/2G/2F/2G/2d/2c/2d/2d/2e/2g/2d/2d/2d/2d/2g/2d/2e/2g/2d/2 d2d | fd fd | c3c2c2d2|d2d2 |
e4c2B2 |
B3GA | G2G2A2 | F2Gd4|
^2c3|d2e2|c2d2 | dfc | c2e2 | ec d/2d/2d/2d/2B/2c/2F/2 | Bc d2c2|d4d2d2|dcBB | A2G2G2G|F3D4z2 | c4c2 | dd/2e/2ed/2g/2g/2g/2 | efedede | e6 |




*****************************


16000 0 1.9706047 

<star
X:33
T:E2so noth & ond the Bray the the K #317
Z:id:hn-reel-29
M:C|
K:D
D2FD E2:|
BA~F>B BGAB|AGFA FE~D2:|
|:BABG BA~B2|B2BA FE~F2|AFGE FAFA|BA~FA BGFA|
B2FA GF~F2|(3BAB cFEF:|
|:FABA BABA|B2FE E2Bd||
|:fdBd ~e3d|eded BA~A2:|
|:g'fag afeg|
AGEF FAFA|BABd FE~EG:|
|:g2af eAfe|dBAF FAFA|AFAF EFGA:|
P:Variations:
|:G~E2 ~D3D:|
|:BdBA BAFA|BA~F2 AFGB:|
|:B2df e2ag|e2ef f2de|f2de a2fa|g2 (3gfe fd|f2af gedB|AFDF EF~D2:|
<end>
<start>
X:33
T:Pamo trot F.ner tone 2, Bang af or the is oair Co #196.
Z:id:hn-reel-33
M:C|
K:D
D2DF F2AF|ABdc d2dB|AFFA EFFF|E2B2 dcAF|AAFA B2AF:|
|:edBd dBAF|E2FA BAFA|BG~F2 GDFE|~G2~E2 EFGB|GFFE EFEF|F2FG EFAF||
<end>
<start>
X:39
T:Halrmann Che waderen.
Z:id:hn-breel-33
M:C|
K:D
BEEF GBdF|AF~F2 F2AG|B2AF E2BDE:|2 DEDD D2:|
|:FD~F2 ~G3G|F2FG BEFA|AF~A2 FABA|1 AFDF GFGE:|2 dAFA GEFA|B4|
|:dg~~g2 g2ec|d2BA AFEA|BA~B2 BcdB|1 A2FA F2:|
P:Variations:
|:2Bcdg edcA|AB ABdd | ~f3f bgeg | fdcB ABdc|
BGFA BAFA | Bded d



*****************************


3000 1 1.6346502 

<start>
X:101
T:Lar bourie
O:France
A:Provence
C:Marrce
S:Carne du taisourie.
S:Sarnet du tambourinaire Ginas (1924)
Z:Transcrit et/ou corrig? par Michel BELLON - 2005-02-12
Z:Pour toute obscration mailto:galouvielle@free.fr
M:6/8
L:1/8
K:Bb
V:Galoute
B3 | B4 | A2A2 | f4 | f4 | g4 | g2f | g4 | e2 | d2B | B2B2 | d4 | d2 BB |
c2B2 | d2B>B | B2 B2 | B4 | BB/B/B | d2B2 | B2 B2B2 | B2B2 | c2 c2 | e2d>d | c4 | c2A2 | A2B2 | B2B2 | B2B BBB | B2B2 | B4 | B4 | cBA | f2 d>c | d2f | e4d2 | dB B2 | A2A2 | B2 c2 | B2f2e | f2f | e2dc | A2F zz | d4 | d2B2 | BA B2!F2 | f2 fff | f2ef | e2c2d | e2 cde | f2a gfed | g2 f2 | f2 ^c2 | e2g | d2B | Bd/f/ g2g | f2 fed | c3- | c2 d2 | f2c | A2 F2 | d4 | d2d2 | d3 z2zz | B2B2 | B3B | B2B2 | B2B2 | B2 B2z | B2B2 | B2B2 | B2 (3BA B2A | B2BB | B2B2 | d2c B2A2 | F2G2 | A2B2 | B2BB | A2F>B | G2F Bc | B3 z>B | c2B2 | G2z2 | F4 | B2B2 | B2B2 | B4 |
B4 | B2 B2A | B2BB | B2B2 | BB/B/A/ BA | B2BB | B2B2 | B4 | B2BB | A2



*****************************


11000 1 1.2570751 

<starine)
D|Ad cAFA|dfed cBAG|A2A2 A2:|
|: eA|BA BA B2 (3ABc|defe f2de|fgaf edcB|AGFA E2:|
<end>
<start>
X:31
T:Mursipe played of (Cadcly ares of werson & the set iw Cheet ad strate trom The alsonpen tunes ot of of Browes to Jofn of the sameun it fit husst
Z:id:hn-hornpipe-41
M:C|
K:G
(3FGF|D2 G2 A2:|
|: AB | cBAG A2de | fdfe f3e | d2ef edef | edef gfed | e2f2 g2 | fedB G2AB | A2A2 (3cde ~d3c | BAGB A2FE | A2AB A2 (3dfe | (3def ge (3def (3efg | faaf efgf | gfed e2dc | A2FA f2fe | d2BA (3Bcd (3cBA | A2A2 A2 :|
|: (3afa | (3bag (3fed (3cBA (3Bdg | (3fed (3dBc (3def | (3gfe (3de (3def (3gfe | fedc dcBc |
dcBA B2cB | cBAG ABcA | BFAB A2 (3efe | d2G2 (3FAf :|
<end>
<start>
X:44
T:Leit and in wuts on to boun the bott
W:Jornpind pols, The
R:hornpipe
Z:id:hn-hornpipe-48
M:C|
K:G
(3Bcd|efec BABc|dfdc BAGE|G2AB cAAF|DEFA A2AB|c2A2 (3Bcd cBAG|A2G2 A2:|
|: (3Bcd gfef|edec dcBA|
(3FEF GA A2fe|dcde fedc|(3Bcd (3efe (3dBA (3Bcd|(3efg (3fed (3Bcd (



*****************************


19000 1 1.1565655 

<star
X:15
T:Gapres Hun Ba Roud to "The Jigh Lean Candy
R:slide
Z:id:hn-slide-7
M:6/8
K:G
d2d d2B|A2B cAG|A2B A2G|~G3 AGD|1 G3 G2A:|2 G3 G2A||
<end>
<start>
X:7
T:Sand had My Finny Bress se in A, the cown Padyy Gle
R:slide
H:See also #11
D:Nanges Kings
W:Prady Geane: \'als Of Hack id in the the Peadce of the part le ald in the Padd air By watle in Collightmee sling frat Marts
Z:id:hn-slide-12
M:6/8
K:D
F2A d2B|BAB B2A|G3 A2B|c2B A2B|c2B c2A|B2A BAB|c2B cBA|f2e d2B|c3 A2e:|
|:f2a fef|g2e f2g|a2f efe|d2d c2d|c2A A2D|F2A A2d|cAG FED:|
|:d3 efe|d2e dBA|Bcd c2B|B2e f2e|f2f efe|1 d2B A2B:|2 A3 =cAG||
|:~G3 ABG|A2G AGA|G3 AGE|D2B AGE|GAB ~A3|BGE G2A|B2e d2B|c2A B2A|G2g g2B|1 d2e fed|cAA A2A|G2F G2A|G2A Bcd:|
|:g2e d2B|A2B c2d|e2d cAG|A2B d2e|d2e d2e|f2e f2e|d2B d2B|D3 ~g3|
|:g2f g2f|g3 g2e|f2d efe|
d2e f2b|a2g fed|c2e fed|e2d efe|1 d3 d2A:|2 A3 A2e||
P:variations
|:F2A GAB|ABd e2f|g2f g2e|d2B d2e|f2d c3|A2B c2A|BAB d3|d2e f2A|B2G A2c|
d2f 

Generate midi files using http://mandolintab.net/abcconverter.php  by pasting a sample from start to end.