# Generating ABC notation files with LSTM-RNN Text Generation

In the past few years, AI has helped significantly in revolutionizing music generation. The long term impact of this technological innovation has made it progressively easier for artists to realize their creative visions, resulting in AI being seen as a powerful tool and partner for the artists. Despite previous study in music generation through machine learning, there is still room to delve into and build sophisticated models. In this work, we use LSTM(RNN) over an “abc” notation to achieve efficient music production.

Starting from below is the code for training and LSTM(RNN) over our dataset. The dataset we have used is the Nottingham Music Database which has over 1000 folk tunes in ABC notation

In [2]:
#!/usr/bin/env python

from __future__ import print_function
import argparse
import time
import os
from six.moves import cPickle
from six import text_type
import tensorflow as tf
from tensorflow.contrib import rnn
from tensorflow.contrib import legacy_seq2seq
import numpy as np

# Data Pre Processing

As we are treating the ABC notation files as general text files, we would be pre processing the data as any other text file.
Along with pre processing the data, the following code also prepares batches for training.
For pre processing the data we are mapping the characters with integer encoding by assigning a unique number based on the frequency of each character

In [None]:
class TextLoader():
    def __init__(self,encoding='utf-8'):
        # self.data_dir = data_dir
        # self.batch_size = batch_size
        # self.seq_length = seq_length
        self.encoding = encoding

        input_file = os.path.join("data", "input.txt")
        vocab_file = os.path.join("data", "vocab.pkl")
        tensor_file = os.path.join("data", "data.npy")

        if not (os.path.exists(vocab_file) and os.path.exists(tensor_file)):
            print("reading text file")
            self.preprocess(input_file, vocab_file, tensor_file)
        else:
            print("loading preprocessed files")
            self.load_preprocessed(vocab_file, tensor_file)
        self.create_batches()
        self.reset_batch_pointer()

    # preprocess data for the first time.
    def preprocess(self, input_file, vocab_file, tensor_file):
        with codecs.open(input_file, "r", encoding=self.encoding) as f:
            data = f.read()
        counter = collections.Counter(data)
        count_pairs = sorted(counter.items(), key=lambda x: -x[1])
        self.chars, _ = zip(*count_pairs)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        with open(vocab_file, 'wb') as f:
            cPickle.dump(self.chars, f)
        self.tensor = np.array(list(map(self.vocab.get, data)))
        np.save(tensor_file, self.tensor)


    # load the preprocessed the data if the data has been processed before.
    def load_preprocessed(self, vocab_file, tensor_file):
        with open(vocab_file, 'rb') as f:
            self.chars = cPickle.load(f)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        self.tensor = np.load(tensor_file)
        self.num_batches = int(self.tensor.size / 2500)
    # seperate the whole data into different batches.
    def create_batches(self):
        self.num_batches = int(self.tensor.size / 2500)
    

        # When the data (tensor) is too small,
        # let's give them a better error message
        # if self.num_batches == 0:
        #     assert False, "Not enough data"

        # reshape the original data into the length self.num_batches * self.batch_size * self.seq_length for convenience.
        self.tensor = self.tensor[:self.num_batches * 50 * 50]
        xdata = self.tensor
        ydata = np.copy(self.tensor)

        #ydata is the xdata with one position shift.
        ydata[:-1] = xdata[1:]
        ydata[-1] = xdata[0]
        self.x_batches = np.split(xdata.reshape(50, -1),
                                  self.num_batches, 1)
        self.y_batches = np.split(ydata.reshape(50, -1),
                                  self.num_batches, 1)

    def next_batch(self):
        x, y = self.x_batches[self.pointer], self.y_batches[self.pointer]
        self.pointer += 1
        return x, y

    def reset_batch_pointer(self):
        self.pointer = 0

# Training Model

The following code is where we define our training model. As told before, we would be using a Recurrent Neural Network with LSTM ( Long Short Term Memory ) cells.
The core of the model consists of an LSTM cell that processes one word at a time and computes probabilities of the possible values for the next word in the sequence
The characters are embedded into a dense vector representation before we feed it into the LSTM

We are using an RNN of size 128 cells in each layer

We have used softmax for activation in the output layer

Loss Function:
Our aim is to minimize the average negative log probability of the target words

Optimizer:
We have used Adam optimizer which keeps separate learning rates for each weight as well as an exponentially decaying average of previous gradients. This is the best optimizer for noisy data and hence we have used this.


In [None]:
class Model():
    def __init__(self, vocab_size, training=True):
        cell_fn = rnn.LSTMCell
        cells = []
        for _ in range(2):
            cell = cell_fn(128)
            cell = rnn.DropoutWrapper(cell,
                                      input_keep_prob=1.0,
                                      output_keep_prob=1.0)
            cells.append(cell)
        self.cell = cell = rnn.MultiRNNCell(cells, state_is_tuple=True)

        # input/target data (int32 since input is char-level)
        self.input_data = tf.placeholder(
            tf.int32, [50,50])
        self.targets = tf.placeholder(
            tf.int32, [50,50])
        self.initial_state = cell.zero_state(50, tf.float32)

        # softmax output layer, use softmax to classify
        with tf.variable_scope('rnnlm'):
            softmax_w = tf.get_variable("softmax_w",
                                        [128, vocab_size])
            softmax_b = tf.get_variable("softmax_b", [vocab_size])

        # transform input to embedding
        embedding = tf.get_variable("embedding", [vocab_size, 128])
        inputs = tf.nn.embedding_lookup(embedding, self.input_data)

        # dropout beta testing: double check which one should affect next line
        # if training and args.output_keep_prob:
        inputs = tf.nn.dropout(inputs, 1.0)

        # unstack the input to fits in rnn model
        inputs = tf.split(inputs, 50, 1)
        inputs = [tf.squeeze(input_, [1]) for input_ in inputs]

        # loop function for rnn_decoder, which take the previous i-th cell's output and generate the (i+1)-th cell's input
        def loop(prev, _):
            prev = tf.matmul(prev, softmax_w) + softmax_b
            prev_symbol = tf.stop_gradient(tf.argmax(prev, 1))
            return tf.nn.embedding_lookup(embedding, prev_symbol)

        # rnn_decoder to generate the ouputs and final state. When we are not training the model, we use the loop function.
        outputs, last_state = legacy_seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if not training else None, scope='rnnlm')
        output = tf.reshape(tf.concat(outputs, 1), [-1, 128])

        # output layer
        self.logits = tf.matmul(output, softmax_w) + softmax_b
        self.probs = tf.nn.softmax(self.logits)

        # loss is calculate by the log loss and taking the average.
        loss = legacy_seq2seq.sequence_loss_by_example(
                [self.logits],
                [tf.reshape(self.targets, [-1])],
                [tf.ones([2500])])
        with tf.name_scope('cost'):
            self.cost = tf.reduce_sum(loss) / 50 / 50
        self.final_state = last_state
        self.lr = tf.Variable(0.0, trainable=False)
        tvars = tf.trainable_variables()

        # calculate gradients
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
                5.)
        with tf.name_scope('optimizer'):
            optimizer = tf.train.AdamOptimizer(self.lr)

        # apply gradient change to the all the trainable variable.
        self.train_op = optimizer.apply_gradients(zip(grads, tvars))

# Training

The following code is where we train our dataset using the model above. 

We are using tensorboard to illustrate the variation of loss for each epoch. 

For the sake of easy training, we have kept the epoch=50. A higher value would have led to better predictions though we have been able to produce valid ABC notations with with epoch=50 and a relatively small dataset.

We are saving our trained model at certain iterations just in case we wanted to halt the training process at some point. Prediction is done using the latest saved training model.

In [None]:
def train():
    data_loader = TextLoader()
    vocab_size = data_loader.vocab_size
    print(data_loader.num_batches)

    if not os.path.isdir("save"):
        os.makedirs("save")
    with open(os.path.join("save", 'chars_vocab.pkl'), 'wb') as f:
        cPickle.dump((data_loader.chars, data_loader.vocab), f)

    model = Model(vocab_size=vocab_size)

    with tf.Session() as sess:
        # instrument for tensorboard
        summaries = tf.summary.merge_all()
        writer = tf.summary.FileWriter(
                os.path.join("logs", time.strftime("%Y-%m-%d-%H-%M-%S")))
        writer.add_graph(sess.graph)

        sess.run(tf.global_variables_initializer())
        saver = tf.train.Saver(tf.global_variables())
       
        for e in range(50):
            sess.run(tf.assign(model.lr,
                               0.002 * (0.97 ** e)))
            data_loader.reset_batch_pointer()
            state = sess.run(model.initial_state)
            for b in range(data_loader.num_batches):
                start = time.time()
                x, y = data_loader.next_batch()
                feed = {model.input_data: x, model.targets: y}
                print(feed)
                for i, (c, h) in enumerate(model.initial_state):
                    feed[c] = state[i].c
                    feed[h] = state[i].h

                # instrument for tensorboard
                summ, train_loss, state, _ = sess.run([summaries, model.cost, model.final_state, model.train_op], feed)
                writer.add_summary(summ, e * data_loader.num_batches + b)

                end = time.time()
                print("{}/{} (epoch {}), train_loss = {:.3f}, time/batch = {:.3f}"
                      .format(e * data_loader.num_batches + b,
                              50 * data_loader.num_batches,
                              e, train_loss, end - start))
                if (e * data_loader.num_batches + b) % 1000 == 0\
                        or (e == 49 and
                            b == data_loader.num_batches-1):
                    # save for the last result
                    checkpoint_path = os.path.join("save", 'model.ckpt')
                    saver.save(sess, checkpoint_path,
                               global_step=e * data_loader.num_batches + b)
                    print("model saved to {}".format(checkpoint_path))


if __name__ == '__main__':
    train()

# New Music Generation

After we have finished training our model all we need to do is to generate new characters based on our exisiting model.
We see that after training our model for epochs=50 on the Nottingham Music Dataset, our RNN is able to produce valid abc notation files which can be converted to music(not everytime though)

The following code takes the most recently saved trained model to predict a new sequence of characters (which is basically the abc notation)

We have a pre trained model already in place in the "save" folder and hence new music can be produced by using those models by running the following code

In [None]:
def generate():
    with open(os.path.join("save", 'config.pkl'), 'rb') as f:
        saved_args = cPickle.load(f)
    with open(os.path.join("save", 'chars_vocab.pkl'), 'rb') as f:
        chars, vocab = cPickle.load(f)
    model = Model(saved_args, training=False)
    with tf.Session() as sess:
        tf.global_variables_initializer().run()
        saver = tf.train.Saver(tf.global_variables())
        ckpt = tf.train.get_checkpoint_state("save")
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess, ckpt.model_checkpoint_path)
            f=open("new.txt","a+")
            result=(model.sample(sess, chars, vocab, 1000, "X:",  #1000- number of characters produced, "X:"- prime character
                               1).encode('utf-8')).decode('utf-8')
            f.write(result)
    f.close()

if __name__ == '__main__':
    generate()

# Results

The following are some of the abc notation produced when generate.py is run

X: 13

T:Radmy Ball

% Nottingham Music Database

P:AAB

S:Chris Dewhurst 19

M:4/4

L:1/8

R:Hornpipe

K:D

P:A

cf|"D"f2ed f2f2|"Em"g2f2 B2^d2|"Em"efgg "D"fede|"G"dcBA BdcB|"gbe/2^f/2|"Em"edc|"Em"e2f|

"Am"e3/2d/2c/2B/2|"Am"A3/2B/2A|"D7"A3/2F/2A|"G"B3/2c/2d|"Em"g2e|"F"f2a|"Bm"f3|"C/e"ed2 "Am"f2e|"D7"d2c|

"G"dBG|"Am"F2A "E7"FED|"D7"A2G "Am"G2G|"G"F2G "D7/a"AFG|

"C"c2e "F" a3|"Bb"dcB "Dm"A2F|"G"G2D G2B|"Am"e3 "G7"cBA|"F#m"B4 ||






X: 1

T:Dirne's Reel

% Nottingham Music Database

S:Trad

M:3/4

L:1/4

K:Am

P:A

g|"Am"f/2e/2d/2c/2d/2c/2|"G"BB/2G/2F/2B/2|"Em"B/2A/2G/2E/2F/2G/2|"Am"A/2B/2A/2F/2G/2E/2:|

"D"D/2E/2D/2D/2D/2E/2 d/2c/2A/2B/2A/2F/2||/2D/2D/2F/2A/2D/2 GD/2D/2|\

"G"E/2D





X: 12

T:The Welm Brband P1y

% Nottingham Music Database

S:Chris Dewhurst 1955, via Phil Rowe

M:6/8

K:Gm

D|"G"BGB GGB|"C"c2A G3|"C"(3cBcA G2A|"Cm"dcB ecG|"D7"F2D cFD|

"G"DGB dBG|"G"GFG "E7"BAG|"Am"Ace "Dm"fed|"Em"edB "G"G\

:|

K:Am

P:/8C/8C/8Ep8g/8f/8"A7"d/2(3e d/2+ec cAA|"Em"e3 g3:|

P:B

(3G/2G/2B/2|"A"A2A "C"[c3g3|"D"F2A B3-|2c2 B2A|A2B A2B|"A/c"c2A "E7"^G2A|

"Am"cBA c3|"G"dBG "3"G3:|








X: 67

T:The The Fluenssamle Sireby

% Nottingham Music Database

S:Mick Peat

M:4/4

L:1/4

K:D

P:A

:|:A|"D"df dg|"A""f#"fd dB|"Em"d2 fe|"D"d2 -d2|\

"A7"c/2B/2=c fe|"F#7"Ag2e|"D"f2 f3/2f/2|

"Em"ee "E7"ge|"D"a2 fA|"D"d2 d2|"C"eg gc|A3g|"G"d4|"Em"g2 -e/2d/2c|

"G"BG "D"FE|"D"F2 FG|"D"F3/2G/2 Ad|"Bm"de dc|"G"B/2dB/2 Bc|"C"dc cB-|"C"eG2G|B2c|B4|"Am"c3/2d/2 ec|\

"G"G4-|

"G"B3/2c/2 -d2|"C"=ee c3/2a/2|"G"B/2^A/2B/2g/2 gf/2d/2|\

"D7"d2 c2| "G"G2-|D/2D/2E/2d/2 =B3/2d/2|

"G"dd d||

K:G

"G7"b2 "A"a2|"G"g4|"G"a/2g/2f/2e/2 de|"D7/b"d2 "G"AG|"C"E/2D/2E "G"Dd|

Therefore we have been successful in producing valid ABC notation by training a text generating LSTM(RNN) model with a ABC notation dataset