# **Soni - do**
## **Generating Music with Machine Learning**


#### Author: Sonia Cobo
#### Date: July 2021

### Though this project doesn't have a hypothesis per se, it was done to kind off prove how AI has advanced and it is now able to generate music which has been associated with emotions and human capabilities for a long period of time.

In [12]:
# data augmentation - dividir canciones, modificarlas para tener mas datos

# Data

### The input to the model will be a series of notes from a MIDI file. MIDI (Musical Instrument Digital Interface) is a technical standard that describes a communications protocol, digital interface, and electrical connectors that connect a wide variety of electronic musical instruments and computers. They don't contain actual audio data and are small in size. They explain what notes are played, when they're played, and how long or loud each note should be.

### To keep the project simple only files with one instrument were chosen, in this case the instrument is piano and the type of songs is classical. 
### These songs have been obtained from the following datasets: http://www.piano-midi.de/ and https://www.mfiles.co.uk/classical-midi.htm


In [13]:
# no descargardas aun: https://github.com/Skuldur/Classical-Piano-Composer/tree/master/midi_songs
# https://drive.google.com/file/d/1qnQVK17DNVkU19MgVA4Vg88zRDvwCRXw/view

### Import all libraries

In [2]:
# data manipulation
import numpy as np
import pandas as pd 

# manipulate midi files
import glob
from music21 import converter, instrument, note, chord, meter, stream, duration, corpus
import pygame

# visualization
import seaborn as sns
import matplotlib.pyplot as plt

# route files
import os
import sys

# ml model
import pickle

import tensorflow as tf
from tensorflow import keras

from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.layers import Activation
from keras.layers import BatchNormalization as BatchNorm
from keras.callbacks import ModelCheckpoint
from keras.layers import Bidirectional


pygame 2.0.1 (SDL 2.0.14, Python 3.7.4)
Hello from the pygame community. https://www.pygame.org/contribute.html


In [15]:
len(tf.config.experimental.list_physical_devices('GPU'))

0

In [16]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7525799484492965909
]


### Paths

In [3]:
# The route of this file is added to the sys path to be able to import/export functions
sep = os.sep
def route (steps):
    """
    This function appends the route of the file to the sys path
    to be able to import files from/to other foders within the EDA project folder.
    """
    route = os.getcwd()
    for i in range(steps):
        route = os.path.dirname(route)
    sys.path.append(route)
    return route

In [4]:
# paths
path = route(1) + sep + "data" + sep + "raw_data" + sep
path_1 = route(1) + sep + "data" + sep + "converted_data" + sep
path_2 = route(1) + sep + "data" + sep + "notes" + sep
path_3 = route(1) + sep + "models" + sep
path_4 = route(1) + sep + "reports" + sep

## Midi file exploration

Hablar de frecuencia y la transpuesta de fourier

In [4]:
# All information from the midi file (i.e. notes, pitch, chord, time signature, etc) is contained within the component list
def info_midi (path):
    """
    It returns all midi file information given its path

    """
    file = converter.parse(path)
    components = []
    for element in file.recurse():  
        components.append(element)
    return components

components = info_midi(path + "alb_esp1.mid")
#components

In [None]:
# Objects stored in a Stream are generally spaced in time; each stored object has an offset usually representing how many quarter notes 
# lies from the beginning of the Stream. For instance in a 4/4 measure of two half notes, the first note will be at offset 0.0,  
# and the second at offset 2.0.

### Now that the midi file has been studied and its structure is known, data will be split into two object types: notes, rests and chords. 

### Note objects contain information about the pitch, octave, and offset of the note.
### Pitch refers to the frequency of the sound, or how high or low it is and is represented with the letters [A, B, C, D, E, F, G].
### Octave refers to which set of pitches you use on a piano.
### Offset refers to where the note is located in the piece.
### Rests are the silences in the piece.
### Chord objects are a set of notes that are played at the same time.


### Songs are transposed into C major and A minor key to ease predictions

In [11]:
def convert_to_midi(path, path_1):
    """
    This function returns MIDI files converted to C major or A minor key.

    Params: Path of the original MIDI file and path where to save the converted file.

    """
    import music21

    # major conversions
    majors = dict([("A-", 4),("G#", 4),("A", 3),("A#", 2),("B-", 2),("B", 1),("C", 0),("C#", -1),("D-", -1),("D", -2),("D#", -3),("E-", -3),("E", -4),("F", -5),("F#", 6),("G-", 6),("G", 5)])
    minors = dict([("G#", 1), ("A-", 1),("A", 0),("A#", -1),("B-", -1),("B", -2),("C", -3),("C#", -4),("D-", -4),("D", -5),("D#", 6),("E-", 6),("E", 5),("F", 4),("F#", 3),("G-", 3),("G", 2)])       

    # os.chdir("./")
    for file in glob.glob(path + "*.mid"):
        score = music21.converter.parse(file)
        key = score.analyze('key')
        
        # print key.tonic.name, key.mode
        if key.mode == "major":
            halfSteps = majors[key.tonic.name]
            
        elif key.mode == "minor":
            halfSteps = minors[key.tonic.name]
        
        newscore = score.transpose(halfSteps)
        key = newscore.analyze("key")

        #print(key.tonic.name, key.mode)
        newFileName = "C_" + file[61:]
        newscore.write("midi", path_1 + newFileName)

convert_to_midi(path, path_1)

## Data preparation

### Relevant information from midi file is encoded and saved into an array.

### We append the pitch of every note object using its string notation since the most significant parts of the note can be recreated using the string notation of the pitch. And we append every chord by encoding the id of every note in the chord together into a single string, with each note being separated by a dot. 

In [5]:
# Each midi file contains notes and chords. These two properties will be the input and output of the LSTM network so 
# they need to be taken out from all midi files. 

def get_notes_per_song(path, filename):
    """
    This function extracts all the notes, rests and chords from one midi file
    and saves it in a list in the converted_data folder.

    Param: Path of the midi file, filename (str)
    """
    components = info_midi(path + filename)
    note_list = []
    
    for element in components:
        # note pitches are extracted
        if isinstance(element, note.Note):
            note_list.append(str(element.pitch))
        # chords are extracted
        elif isinstance(element, chord.Chord):
            note_list.append(".".join(str(n) for n in element.normalOrder))    
        # rests are extracted
        elif isinstance(element, note.Rest):
            note_list.append("NULL")    #further transformation needs this value as str rather than np.nan

    with open(path_2 + "notes", "wb") as filepath:
        pickle.dump(note_list, filepath)
    
    return note_list

In [6]:
note_list = get_notes_per_song(path_1, "C_alb_esp1.mid")

In [7]:
len(note_list)

687

In [8]:
# Load notes and chords previously separated
def load_notes (path, filename):
    """
    Load the note list containing pitches, rests and chords.
    
    Param: Path of the saved note list, and its name as string
    """
    with open(path + filename, "rb") as f:
        loaded_notes = pickle.load(f)
        return loaded_notes

note_list = load_notes(path_2, "notes")
#note_list

### The model will be first trained with a small proportioned of the songs to expedite time. Once the model is tunned properly all songs will be passed to improve its training.

### Now that all notes, rests and chords are in a list, these will be transformed from categorical data to integer-based numerical data. It is necessary to create input sequences for the network and their respective outputs. The output for each input sequence will be the first note or chord that comes after the sequence of notes in the input sequence in our list of notes.

In [9]:
def prepare_sequences(notes, sequence_length, step):
    """ 
    Prepare the sequences used by the neural network 

    """
    
    # get all pitch names
    #pitchnames = sorted(set(item for item in notes)) 
    pitchnames = sorted(set(notes))
    print('Total unique notes:', len(pitchnames))

    # create a dictionary to convert pitches (strings) to integers
    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
    # (rests are included)   

    network_input = []
    network_output = []

    #sequence_in = []
    #sequence_out = []
    # create input sequences and the corresponding outputs
    for i in range(0, len(notes) - sequence_length, step):    
        network_input.append(notes[i:i + sequence_length])
        network_output.append(notes[i + sequence_length])
        # exchange their values for their integer-code
        #network_input.append([note_to_int[elem] for elem in sequence_in])
        #network_output.append(note_to_int[sequence_out])

    x = np.zeros((len(network_input), sequence_length, len(pitchnames)))
    y = np.zeros((len(network_input), len(pitchnames)))
    for i, sequence in enumerate(network_input):
        for j, note in enumerate(sequence):
            x[i, j, note_to_int[note]] = 1
        y[i, note_to_int[network_output[i]]] = 1

    #n_patterns = len(network_input)

    # reshape the input into a format compatible with LSTM layers
    #network_input = np.reshape(network_input, (n_patterns, sequence_length, 1)) 
    # normalize input
    #network_input = network_input / float(len(set(notes)))  

    #network_output = np_utils.to_categorical(network_output) # used to convert array of labeled data to one-hot vector

    return x, y

### The length of each sequence will be 100 notes/chords for now. This means that to predict the next note in the sequence the network has the previous 100 notes to help make the prediction

In [10]:
x, y = prepare_sequences(notes=note_list, sequence_length=100, step=3)  # length y step pueden variar  


Total unique notes: 50


In [11]:
print(x.shape)
print(y.shape)

(196, 100, 50)
(196, 50)


# Creation of the model

There are four different types of layers:

LSTM layers is a Recurrent Neural Net layer that takes a sequence as an input and can return either sequences (return_sequences=True) or a matrix.

Dropout layers are a regularisation technique that consists of setting a fraction of input units to 0 at each update during the training to prevent overfitting. The fraction is determined by the parameter used with the layer.

Dense layers or fully connected layers is a fully connected neural network layer where each input node is connected to each output node.

The Activation layer determines what activation function our neural network will use to calculate the output of a node.

In [12]:
def create_network():
    pitchnames = len(set(note_list))

    model = Sequential()
    model.add(LSTM(512, input_shape=(100,50)))
    model.add(Dense(50))
    model.add(Activation("softmax"))
    model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

    return model

In [13]:
model = create_network()

In [14]:
model.fit(x, y, epochs=10)#, batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x296b86011c8>

In [15]:
# save the model
model.save(path_3 + "model_5.h5")

In [16]:
# load the model 
model_4 = tf.keras.models.load_model(path_3 + "model_5.h5")


In [17]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype("float64")
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [79]:
def generate_notes(model, x, temperature=1.0):
    """ 
    Generate notes from the neural network based on a sequence of notes 
    """
    # pick a random sequence from the input as a starting point for the prediction
    start = np.random.randint(0, len(note_list)-100-1)

    pitchnames = sorted(set(note_list))
    note_to_int = dict((note, number) for number, note in enumerate(pitchnames)) 
    int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

    pattern = x[start: start+100]
    prediction_output = []
    patterns = []
    # generate 500 notes, roughly two minutes of music
    for note_index in range(200):
        prediction_input = np.zeros((1, 100, len(pitchnames)))
        for j, note in enumerate(pattern):
            prediction_input[0, j, note_to_int[note]] = 1.0
        preds = model.predict(prediction_input, verbose=0)[0]   #[0]?
        next_index = sample(preds, temperature=temperature)
        next_note = int_to_note[next_index]
        prediction_output.append(next_note)

        patterns.append(next_index)
        #patterns = patterns[1:len(patterns)]

    return prediction_output, patterns, next_index

In [80]:
prediction_output, patterns, next_index, pattern = generate_notes(model, x, temperature=1)
print(prediction_output)

1 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []
2 []


array([], shape=(0, 100, 50), dtype=float64)

# Output

In [56]:
def create_midi(prediction_output, patterns, path):
    """ convert the output from the prediction to notes and create a midi file from the notes"""
    
    offset = 0
    output_notes = []

    # create note and chord objects based on the values generated by the model
    for pattern in prediction_output:
        # pattern is a chord
        if ('.' in pattern) or pattern.isdigit():
            notes_in_chord = pattern.split('.')
            notes = []
            for current_note in notes_in_chord:
                new_note = note.Note(int(current_note))
                new_note.storedInstrument = instrument.Piano()
                notes.append(new_note)
            new_chord = chord.Chord(notes)
            new_chord.offset = offset
            output_notes.append(new_chord)
        # pattern is a rest
        elif ("NULL" in pattern):
            new_rest = note.Rest(pattern)
            output_notes.append(new_rest)
        # pattern is a note
        else:
            new_note = note.Note(pattern)   
            new_note.offset = offset
            new_note.storedInstrument = instrument.Piano()
            output_notes.append(new_note)

        # increase offset each iteration so that notes do not stack
        offset += 0.5

    midi_stream = stream.Stream(output_notes)

    midi_stream.write("midi", fp= path + "test_output_9.mid")   # first output 01/07/2021

    return midi_stream

In [57]:
create_midi = create_midi(prediction_output, patterns, path_4)


In [None]:
def play_music(music_file):
    """
    Play music given a midi file path
    """
    try:
        # allow to stop the piece 
        clock = pygame.time.Clock() 
        pygame.mixer.music.load(music_file)
        pygame.mixer.music.play()
        while pygame.mixer.music.get_busy():
            # check if playback has finished
            clock.tick(10)

        freq = 44100    # audio CD quality
        bitsize = -16   # unsigned 16 bit
        channels = 2    # 1 is mono, 2 is stereo
        buffer = 1024    # number of samples
        pygame.mixer.init(freq, bitsize, channels, buffer)

    except KeyboardInterrupt:
        while True:
            action = input('Enter Q to Quit, Enter to Skip. ').lower()
            if action == 'q':
                pygame.mixer.music.fadeout(1000)
                pygame.mixer.music.stop()
            else:
                break

In [None]:
play_music(path_4 + "test_output_9.mid")

In [None]:
# lo de abajo aun no lo he utilizado para producir ningun midi

In [76]:
def prepare_sequences_out(notes, pitchnames, n_vocab):
    """ Prepare the sequences used by the Neural Network """
    # map between notes and integers and back
    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

    sequence_length = 100
    network_input = []
    output = []
    for i in range(0, len(notes) - sequence_length, 1):
        sequence_in = notes[i:i + sequence_length]
        sequence_out = notes[i + sequence_length]
        network_input.append([note_to_int[char] for char in sequence_in])
        output.append(note_to_int[sequence_out])

    n_patterns = len(network_input)

    # reshape the input into a format compatible with LSTM layers
    normalized_input = np.reshape(network_input, (n_patterns, sequence_length, 1))
    # normalize input
    normalized_input = normalized_input / float(n_vocab)

    return (network_input, normalized_input)

In [89]:
# no la uso aun - lo saque fuera de la funcion arriva
def generate():
    """ Generate a piano midi file """

    # Get all pitch names
    pitchnames = sorted(set(item for item in notes))
    # Get all pitch names
    n_vocab = len(set(notes))

    network_input, normalized_input = prepare_sequences_out(notes, pitchnames, n_vocab)
    model = create_network()
    prediction_output, pattern = generate_notes(model, network_input, n_vocab, loaded_notes)
    create_midi(prediction_output)

In [None]:
# GAN
# red neuronal que conoce los sonidos
# red que predice tb recursiva

In [None]:
# Read h5 format files 
import h5py
filename = "test.h5"

with h5py.File(filename, "r") as f:
    # List all groups
    print("Keys: %s" % f.keys())
    a_group_key = list(f.keys())[0]

    # Get the data
    data = list(f[a_group_key])