# Music Generation Using Deep Learning

![sheet%20music.jpg](attachment:sheet%20music.jpg)

## Can machines even generate music ?

## Real World Problem
- The objective of music generation is to explore deep learning regarding the field of music composition using artificial intelligence.
- The case study focuses on generating music automatically using Recurrent Neural Networks(RNN).
- We do not need to be an expert to generate music. Even non experts like me can generate a descent quality creative music using RNN.
- Creating music was something unique to humans until now, but some great advancements have been made using deep learning.

## Objective
- Building a model that takes existing music data as input learns the pattern and generates "new" music.
![1_VpTpLc-JYqVOA159c5PiVw.png](attachment:1_VpTpLc-JYqVOA159c5PiVw.png)
- The model generated music need not be professional, as long as it is melodious and good to hear.
- It cannot simply copy and paste from the training data. It has to learn the patterns from the existing music that is enjoyed by humans.

## Now, what is music ?
- Basically, music is a sequence of musical components/events.
- Input- Sequence of musical events/notes
- Output- New sequence of musical events/notes
- In this case study, I have limited myself to single instrument music. You can extend this to multiple instrument music.

## Music Representation
- Sheet music representation can be used for both single instrument and multi instrument.----> visual file


- Abc - notation---- popular


- MIDI---> popular


- Mp3 ---> audio files ----> actual audio file

In this case study, we will focus on abc notation as it is the simplest one and just uses alpha numeric character.


## Data Obtaining:
Refer : http://abc.sourceforge.net/NMD
- It says "ABC version of the Nottingham Music Database" which contains over 1000 folk tunes stored in a special text format.
- Of course, it takes a lot of time to train the model with larger data like 1000 tunes. So I will use the jigs dataset which contains about 340 tunes.
- You get a txt file with multiple tunes here.
- Simply copy and paste into a txt file as input.txt.
- Each tune is having a meta data section and music section.

In [7]:
import os

from keras.models import Sequential, load_model
from keras.layers import LSTM, Dropout, TimeDistributed, Dense, Activation, Embedding


MODEL_DIR = r'D:\Courses\0\1 Programming\Music-Generation-using-deep-learning-main\modelx'

def save_weights(epoch, model):
    if not os.path.exists(MODEL_DIR):
        os.makedirs(MODEL_DIR)
    model.save_weights(os.path.join(MODEL_DIR, 'weights.{}.h5'.format(epoch)))

def load_weights(epoch, model):
    model.load_weights(os.path.join(MODEL_DIR, 'weights.{}.h5'.format(epoch)))

# def build_model(batch_size, seq_len, vocab_size):
#     model = Sequential()
#     model.add(Embedding(vocab_size, 512, batch_input_shape=(batch_size, seq_len)))
#     for i in range(3):
#         model.add(LSTM(256, return_sequences=True, stateful=True))
#         # It creates 256 lstms layers in hiden layers
#         model.add(Dropout(0.2))

#     model.add(TimeDistributed(Dense(vocab_size))) 
#     model.add(Activation('softmax'))
#     return model

def build_model(batch_size, seq_len, vocab_size):
    model = Sequential()
    model.add(Embedding(vocab_size, 512, batch_input_shape=(batch_size, seq_len)))
    for i in range(3):
        model.add(LSTM(256, return_sequences=True, stateful=True))
        # It creates 256 lstms layers in hiden layers
        model.add(Dropout(0.2))
        
    model.add(TimeDistributed(Dense(vocab_size))) 
    model.add(Activation('softmax'))
    return model

if __name__ == '__main__':
    model = build_model(16, 64, 86)
    model.summary()


Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (16, 64, 512)             44032     
                                                                 
 lstm_3 (LSTM)               (16, 64, 256)             787456    
                                                                 
 dropout_3 (Dropout)         (16, 64, 256)             0         
                                                                 
 lstm_4 (LSTM)               (16, 64, 256)             525312    
                                                                 
 dropout_4 (Dropout)         (16, 64, 256)             0         
                                                                 
 lstm_5 (LSTM)               (16, 64, 256)             525312    
                                                                 
 dropout_5 (Dropout)         (16, 64, 256)            

In [4]:
import os
import json
import argparse
import pandas as pd
import numpy as np



DATA_DIR = r'D:\Courses\0\1 Programming\Music-Generation-using-deep-learning-main\data/'
LOG_DIR = r'D:\Courses\0\1 Programming\Music-Generation-using-deep-learning-main\logs\log.csv'

BATCH_SIZE = 16 #batch_size
SEQ_LENGTH = 64 #sequence length

def read_batches(T, vocab_size):
    length = T.shape[0]; #129,665
    batch_chars = int(length / BATCH_SIZE); # 8,104

    for start in range(0, batch_chars - SEQ_LENGTH, SEQ_LENGTH): # (0, 8040, 64)
        X = np.zeros((BATCH_SIZE, SEQ_LENGTH)) # 16X64
        Y = np.zeros((BATCH_SIZE, SEQ_LENGTH, vocab_size)) # 16X64X86
        for batch_idx in range(0, BATCH_SIZE): # (0,16)
            for i in range(0, SEQ_LENGTH): #(0,64)
                X[batch_idx, i] = T[batch_chars * batch_idx + start + i] # 
                Y[batch_idx, i, T[batch_chars * batch_idx + start + i + 1]] = 1
        yield X, Y

def train(text, epochs=100, save_freq=10): 
    # text will contain input.txt file and i want to save my model at the end of every 10 epochs

    # character to index and vice-versa mappings
    char_to_idx = { ch: i for (i, ch) in enumerate(sorted(list(set(text)))) }
    print("Number of unique characters: " + str(len(char_to_idx))) # unique characters = 86

    with open(os.path.join(DATA_DIR, 'char_to_idx.json'), 'w') as f:
        json.dump(char_to_idx, f)

    idx_to_char = { i: ch for (ch, i) in char_to_idx.items() }
    vocab_size = len(char_to_idx)

    #model_architecture
    model = build_model(BATCH_SIZE, SEQ_LENGTH, vocab_size)
    model.summary()
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


    #Train data generation
    T = np.asarray([char_to_idx[c] for c in text], dtype=np.int32) #convert complete text into numerical indices

    print("Length of text:" + str(T.size)) #129,665

    steps_per_epoch = (len(text) / BATCH_SIZE - 1) / SEQ_LENGTH  #126 batches per each epoch
    
    epoch_number, loss_num, acc_num = [], [], []

    for epoch in range(epochs):
        print('\nEpoch {}/{}'.format(epoch + 1, epochs))
        epoch_number.append(epoch+1)
        losses, accs = [], []

        for i, (X, Y) in enumerate(read_batches(T, vocab_size)):
            
            print(X);

            loss, acc = model.train_on_batch(X, Y)
            print('Batch {}: loss = {}, acc = {}'.format(i + 1, loss, acc))
            losses.append(loss)
            accs.append(acc)
            loss_num.append(loss)
        acc_num.append(acc)

#         if (epoch + 1) % save_freq == 0:
#             save_weights(epoch + 1, model)
#             print('Saved checkpoint to', 'weights.{}.h5'.format(epoch + 1))
        save_weights(epoch + 1, model)
        print('Saved checkpoint to', 'weights.{}.h5'.format(epoch + 1))

    #creating dataframe and record all the losses and accuracies at each epoch
    log_frame = pd.DataFrame(columns = ["Epoch", "Loss", "Accuracy"])
    log_frame["Epoch"] = epoch_number
    log_frame["Loss"] = loss_num
    log_frame["Accuracy"] = acc_num
    log_frame.to_csv(r"D:\Courses\0\1 Programming\Music-Generation-using-deep-learning-main\logs\log.csv", index = False)



In [5]:
file = open(os.path.join(DATA_DIR, 'input.txt'), mode = 'r')
data = file.read()
file.close()
if __name__ == "__main__":
    train(data, epochs=1, save_freq=10 )
# epochs=100

Number of unique characters: 87


NameError: name 'build_model' is not defined

In [None]:
log = pd.read_csv(r"D:\Courses\0\1 Programming\Music-Generation-using-deep-learning-main\logs\log.csv")
log

- Batch Size = 16
- Sequence length = 64
- Total length of characters in input.txt file = 129,665
- No of unique characters = 86
- Here, in char_to_idx, char-to-idx is converting every character to a index or numerical value where ch(character) is the key and index or numerical value is the value created in this dictionary.
    - which is jusk like json file :
    {"\n": 0, " ": 1, "!": 2, "\"": 3, "#": 4, "%": 5, "&": 6, "'": 7, "(": 8, ")": 9, "+": 10, ",": 11, "-": 12, ".": 13, "/": 14, "0": 15, "1": 16, "2": 17, "3": 18, "4": 19, "5": 20, "6": 21, "7": 22, "8": 23, "9": 24, ":": 25, "=": 26, "?": 27, "A": 28, "B": 29, "C": 30, "D": 31, "E": 32, "F": 33, "G": 34, "H": 35, "I": 36, "J": 37, "K": 38, "L": 39, "M": 40, "N": 41, "O": 42, "P": 43, "Q": 44, "R": 45, "S": 46, "T": 47, "U": 48, "V": 49, "W": 50, "X": 51, "Y": 52, "[": 53, "\\": 54, "]": 55, "^": 56, "_": 57, "a": 58, "b": 59, "c": 60, "d": 61, "e": 62, "f": 63, "g": 64, "h": 65, "i": 66, "j": 67, "k": 68, "l": 69, "m": 70, "n": 71, "o": 72, "p": 73, "q": 74, "r": 75, "s": 76, "t": 77, "u": 78, "v": 79, "w": 80, "x": 81, "y": 82, "z": 83, "|": 84, "~": 85}
    - Our indices starts from 0 to 85 as the number of unique characters in vocabulary is 86.
- I'm trying to generate new batch everytime using the function caleed "read_batches".
- X is a matrix of (BATCH_SIZE,SEQ_LENGTH) = (16,64)
- Y is a 3D tensor of (BATCH_SIZE,SEQ_LENGTH,vocab_size) = (16,64,86). The vocab size is considered because of one-hot encoding.
- After embedding, (BACTH_SIZE,SEQ_LENGTH,embedding_dim) = (16,64,512)
- We encoded "Y" as one hot encoded because we will be applying softmax on top of it.
- Now, we want to predict the next character which should be one of the 86 unique characters. So, it's a multi-class classification problem. Therefore, our last layer is softmax layer of 86 activations.
- So, I will generating each of my batches and train them. For every training epoch, I will print the categorical crossentropy loss and accuracy.
- Inthe summary of our model, we can observe that, we have 1,904,214 total parameters.
- As we are having so many parameters, so we are using dropouts with keep probability of 0.2.
- By the time we reach 100 epochs while training, roughly around 90% + times, the model is able to predict what the next character is. So, our model is doing a pretty good job.
- At the end of 10 epochs, we are storing the weights of the model. We will use these weights to reconstruct the model and predict.