# Model Insights
Now that we have proven that this model is capable of generating it's own music, we want to investigate if it is capable of learning some 'music theory'.

For that, we will have to re-create only the model's 'Embeddings' portion, after which we will use cosine similarity to investigate which notes/chords and durations are strongly correlated to each other.

In [1]:
# import necessary libraries

import pickle
import numpy as np
from music21 import instrument, note, stream, chord, converter

# for model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding
from tensorflow.keras.utils import to_categorical

# for cosine similarity
import pandas as pd
from scipy import sparse
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
# import note stuff

with open('../assets_notes_times/notes', 'rb') as filepath:
    notes = pickle.load(filepath)

with open('../assets_notes_times/note_to_int', 'rb') as filepath:
    note_to_int = pickle.load(filepath)
    
with open('../assets_notes_times/song_notes', 'rb') as filepath:
    song_notes = pickle.load(filepath)
    

# import time stuff

with open('../assets_notes_times/time', 'rb') as filepath:
    time = pickle.load(filepath)

with open('../assets_notes_times/time_to_int', 'rb') as filepath:
    time_to_int = pickle.load(filepath)
    
with open('../assets_notes_times/song_times', 'rb') as filepath:
    song_times = pickle.load(filepath)

In [3]:
# create pitchnames, int_to_note and timenames, int_to_time 
pitchnames = sorted(set(item for item in notes))
pitchnames.insert(0,'unkw') # add unknown variable
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

timenames = sorted(set(item for item in time))
timenames.insert(0,'unkw') # add unknown variable
int_to_time = dict((number, time) for number, time in enumerate(timenames))

In [4]:
# check to ensure the lengths are the same
# notes
print(len(note_to_int)) 
print(len(int_to_note))

# durations
print(len(time_to_int)) 
print(len(int_to_time))

383
383
174
174


In [5]:
# notes vocab
n_vocab = len(note_to_int)
print(n_vocab)

# duration vocab
t_vocab = len(time_to_int)
print(t_vocab)

383
174


In [6]:
# recreate only the embedding layer

def create_network(network_input, n_vocab):
    model = Sequential()
    model.add(Embedding(
        n_vocab,
        512,
        input_length=100,
    ))
    
    return model

In [7]:
def prepare_prediction_input(some_notes, something_to_dict):

    sequence_length = 100
    network_input = []
    output = []
    for element in some_notes:
    
        for i in range(0, len(element) - sequence_length, 1):
            sequence_in = element[i:i + sequence_length]
            sequence_out = element[i + sequence_length]
            network_input.append([something_to_dict[char] for char in sequence_in])
            output.append(something_to_dict[sequence_out])

    n_patterns = len(network_input)

    # reshape the input into a format compatible with LSTM layers
    model_input = np.reshape(network_input, (n_patterns, sequence_length, 1))

    return (network_input, model_input)

### Cosine Similarity - Notes

In [8]:
network_input_notes, model_input_notes = prepare_prediction_input(song_notes, note_to_int)

In [9]:
# create the notes model first
model_notes = create_network(model_input_notes, n_vocab)
model_notes.load_weights('../weights/notes/weights-58-0.1027.hdf5', by_name=True) # by_name let's us load a partial model from the original

In [10]:
# Create a list of the integer values in the note_to_int so we can investigate the cosine similarity
note_vectors = [note_to_int[item] for item in pitchnames]
len(note_vectors)

383

In [11]:
# get cosine similarity dataframe for notes
notes_similarity = pd.DataFrame(cosine_similarity(model_notes.predict(note_vectors)))
print(notes_similarity.shape)
notes_similarity.head()

(383, 383)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,373,374,375,376,377,378,379,380,381,382
0,1.0,0.038551,-0.07287,-0.004193,-0.015369,0.029168,0.010707,-0.06086,0.015822,-0.019779,...,-0.046776,-0.108165,0.044953,0.022247,0.024396,0.045351,0.015869,0.046526,0.101601,0.059466
1,0.038551,1.0,-0.24303,0.072804,0.064204,0.085998,-0.14763,0.029056,-0.426236,0.102819,...,-0.030847,0.061613,-0.302708,-0.196858,0.318325,-0.095286,-0.025605,-0.168689,-0.106408,0.081434
2,-0.07287,-0.24303,1.0,0.126948,0.040737,-0.051254,0.097663,0.223988,-0.018082,0.128144,...,0.122071,0.056241,-0.092967,0.086803,-0.001486,-0.092801,0.079039,0.00363,-0.024719,-0.023601
3,-0.004193,0.072804,0.126948,1.0,0.00512,0.064446,0.057941,0.23646,-0.216687,0.222427,...,-0.22699,0.103168,-0.325608,0.09646,0.088987,-0.047852,0.049606,-0.023607,-0.022178,-0.140522
4,-0.015369,0.064204,0.040737,0.00512,1.0,-0.018293,-0.135898,0.046457,0.16267,-0.138097,...,-0.33027,-0.163145,0.019966,0.15168,0.037263,-0.255129,0.055996,0.042339,0.239201,0.221208


In [12]:
# rename the columns to the actual note/chord
notes_similarity.rename(columns=int_to_note, index=int_to_note, inplace=True)
notes_similarity.head()

Unnamed: 0,unkw,0,0.1,0.1.3,0.1.5,0.1.6,0.2,0.2.3.7,0.2.4.7,0.2.5,...,G#3,G#4,G#5,G#6,G1,G2,G3,G4,G5,G6
unkw,1.0,0.038551,-0.07287,-0.004193,-0.015369,0.029168,0.010707,-0.06086,0.015822,-0.019779,...,-0.046776,-0.108165,0.044953,0.022247,0.024396,0.045351,0.015869,0.046526,0.101601,0.059466
0,0.038551,1.0,-0.24303,0.072804,0.064204,0.085998,-0.14763,0.029056,-0.426236,0.102819,...,-0.030847,0.061613,-0.302708,-0.196858,0.318325,-0.095286,-0.025605,-0.168689,-0.106408,0.081434
0.1,-0.07287,-0.24303,1.0,0.126948,0.040737,-0.051254,0.097663,0.223988,-0.018082,0.128144,...,0.122071,0.056241,-0.092967,0.086803,-0.001486,-0.092801,0.079039,0.00363,-0.024719,-0.023601
0.1.3,-0.004193,0.072804,0.126948,1.0,0.00512,0.064446,0.057941,0.23646,-0.216687,0.222427,...,-0.22699,0.103168,-0.325608,0.09646,0.088987,-0.047852,0.049606,-0.023607,-0.022178,-0.140522
0.1.5,-0.015369,0.064204,0.040737,0.00512,1.0,-0.018293,-0.135898,0.046457,0.16267,-0.138097,...,-0.33027,-0.163145,0.019966,0.15168,0.037263,-0.255129,0.055996,0.042339,0.239201,0.221208


We will now investigate the notes/chords most similar to the top 2 most common notes/chords in the train dataset.<br>
A quick recap of the top notes/chords:

In [13]:
note_df = pd.DataFrame(notes)
note_df.rename(columns={0:'notes/chords'}, inplace=True)
note_df['count'] = 1
note_df.groupby(by='notes/chords').count().sort_values(by='count', ascending=False).head()

Unnamed: 0_level_0,count
notes/chords,Unnamed: 1_level_1
A2,1592
D3,1376
A3,1351
E4,1350
G4,1341


We will investigate notes **A2** and **D3**.

In [14]:
# first look at A2

A2_top_3 = notes_similarity[['A2']].sort_values(by='A2', ascending=False).head(4)
A2_top_3 = A2_top_3.iloc[1:4]
A2_top_3

Unnamed: 0,A2
9.11.1,0.400902
7.10.1.3,0.39942
11,0.369358


As these are all chords, we need to convert back into the notes that make up the chord.

In [15]:
def notes_in_chord(chord):
    some_notes = []
    for current_note in chord.split('.'):
        new_note = note.Note(int(current_note)).name
        some_notes.append(new_note)
        
    return some_notes

In [16]:
A2_top_chords = [notes_in_chord(chord) for chord in list(A2_top_3.index)]
A2_top_chords

[['A', 'B', 'C#'], ['G', 'B-', 'C#', 'E-'], ['B']]

'A', 'B', 'C#' make up the **Aadd(2)** chord, which is a variation of the A chord. Furthermore, as all 3 notes are in the key of A major. Hence, we can see why there is high cosine similarity.

'G', 'B-', 'C#', 'E-' make up the **Eb7/G**. This combination of notes is rarely seen in western music, but is more common in eastern music, and is a likely blend of the D harmonic minor and A harmonic minor, both of which feature the **A** note. Since the database is from JRPG like Final Fantasy, it is no suprise that this combination of notes and chords are used, thus explaining the high cosine similarity.

'B' is a prominent note in both the key of A major and A minor, hence we can expect it to have strong cosine similarity as well.

In [17]:
# now look at D3

D3_top_3 = notes_similarity[['D3']].sort_values(by='D3', ascending=False).head(4)
D3_top_3 = D3_top_3.iloc[1:4]
D3_top_3

Unnamed: 0,D3
6.11,0.38304
2.5.8,0.372655
0.3.5,0.352305


In [18]:
D3_top_chords = [notes_in_chord(chord) for chord in list(D3_top_3.index)]
D3_top_chords

[['F#', 'B'], ['D', 'F', 'G#'], ['C', 'E-', 'F']]

'F#', 'B' are both notes in the key of D major. Hence there is high cosine similarity with D3 when these 2 notes are used to form a chord.

'D', 'F', 'G#' make up the **Ddim** chord. Hence it would have high cosine similarity with D3.

'C', 'E-', 'F' is likely a **Cm11** chord, and is in the key of C minor which also consists of the D note. This would explain it's high cosine similarity with D3.

As can be observed from the cosine similarities above, it would appear as if the model did learn some music theory!

### Cosine Similarity - Durations

In [19]:
network_input_times, model_input_times = prepare_prediction_input(song_times, time_to_int)

In [20]:
# create the durations model first
model_time = create_network(model_input_times, t_vocab)
model_time.load_weights('../weights/times/weights-52-0.0870.hdf5', by_name=True) # by_name let's us load a partial model from the original

In [21]:
# Create a list of the integer values in the time_to_int so we can investigate the cosine similarity
time_vectors = [time_to_int[item] for item in timenames]
len(time_vectors)

174

In [22]:
# get cosine similarity dataframe for durations
time_similarity = pd.DataFrame(cosine_similarity(model_time.predict(time_vectors)))
print(time_similarity.shape)
time_similarity.head()

(174, 174)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,164,165,166,167,168,169,170,171,172,173
0,1.0,-0.008981,-0.014426,-0.020388,-0.023268,-0.019359,0.061009,0.016278,0.004741,-0.002624,...,-0.066995,-0.004702,0.058776,-0.04953,0.006857,0.010318,0.017834,0.022846,-0.021683,0.051327
1,-0.008981,1.0,0.002205,0.01634,0.023554,0.102654,-0.078735,-0.03886,0.041682,0.020514,...,-0.005553,-0.072074,-0.031138,0.063865,0.011901,0.039276,-0.00446,-0.039776,-0.013121,-0.009629
2,-0.014426,0.002205,1.0,-0.020053,-0.012992,-0.013104,-0.001285,-0.035276,-0.009312,-0.006195,...,-0.068667,0.037512,-0.022069,-0.015956,-0.002632,0.02344,0.042823,-0.072119,-0.111735,0.061786
3,-0.020388,0.01634,-0.020053,1.0,0.002317,-0.041119,0.031271,0.021083,-0.02041,0.068147,...,-0.006544,-0.054134,0.025427,0.026314,0.063754,0.044528,-0.016823,0.059861,-0.000606,0.024932
4,-0.023268,0.023554,-0.012992,0.002317,1.0,0.023776,-0.03618,-0.024473,0.054268,-0.076596,...,-0.0016,0.003242,0.042838,-0.061985,0.021929,0.038735,0.000115,-0.008129,0.037848,-0.026743


In [23]:
# rename the columns to the actual duration
time_similarity.rename(columns=int_to_time, index=int_to_time, inplace=True)
time_similarity.head()

Unnamed: 0,unkw,0.25,0.5,0.75,1.0,1.25,1.5,1.75,1/12,1/3,...,88/3,89.0,89.75,9.75,90.75,93.75,94.75,95.75,96.0,99.75
unkw,1.0,-0.008981,-0.014426,-0.020388,-0.023268,-0.019359,0.061009,0.016278,0.004741,-0.002624,...,-0.066995,-0.004702,0.058776,-0.04953,0.006857,0.010318,0.017834,0.022846,-0.021683,0.051327
0.25,-0.008981,1.0,0.002205,0.01634,0.023554,0.102654,-0.078735,-0.03886,0.041682,0.020514,...,-0.005553,-0.072074,-0.031138,0.063865,0.011901,0.039276,-0.00446,-0.039776,-0.013121,-0.009629
0.5,-0.014426,0.002205,1.0,-0.020053,-0.012992,-0.013104,-0.001285,-0.035276,-0.009312,-0.006195,...,-0.068667,0.037512,-0.022069,-0.015956,-0.002632,0.02344,0.042823,-0.072119,-0.111735,0.061786
0.75,-0.020388,0.01634,-0.020053,1.0,0.002317,-0.041119,0.031271,0.021083,-0.02041,0.068147,...,-0.006544,-0.054134,0.025427,0.026314,0.063754,0.044528,-0.016823,0.059861,-0.000606,0.024932
1.0,-0.023268,0.023554,-0.012992,0.002317,1.0,0.023776,-0.03618,-0.024473,0.054268,-0.076596,...,-0.0016,0.003242,0.042838,-0.061985,0.021929,0.038735,0.000115,-0.008129,0.037848,-0.026743


We will now investigate the durations most similar to the top 2 most common durations in the train dataset.<br>
A quick recap of the top durations:

In [24]:
time_df = pd.DataFrame(time)
time_df.rename(columns={0:'durations'}, inplace=True)
time_df['count'] = 1
time_df.groupby(by='durations').count().sort_values(by='count', ascending=False).head()

Unnamed: 0_level_0,count
durations,Unnamed: 1_level_1
0.5,25729
0.25,9640
0.75,7990
1.0,4190
1/3,3541


In [25]:
# first look at 0.5

half_top_3 = time_similarity[['0.5']].sort_values(by='0.5', ascending=False).head(4)
half_top_3 = half_top_3.iloc[1:4]
half_top_3

Unnamed: 0,0.5
10/3,0.101279
32.75,0.089962
78.75,0.088906


In [26]:
# now look at 0.25

quater_top_3 = time_similarity[['0.25']].sort_values(by='0.25', ascending=False).head(4)
quater_top_3 = quater_top_3.iloc[1:4]
quater_top_3

Unnamed: 0,0.25
1.25,0.102654
8.0,0.100462
134.75,0.085124


We can see that even for the top 3 durations that are most similar, thet are still very low at less than 0.11 cosine similarity. This is due to most songs having a lack of variation of duration in a song - that would make the song sound messy, and not musical at all! Therefore we see very low cosine simliarity between the various durations.

# Conclusion & Recommendations

As can be seen, or rather, heard, from the music generated, neural networks are quite capable of generating their own music. The music generated using the best model (highest accuracy and lowest loss), sounds pleasing and contains a few good musical ideas.

While not a complete replacement of the human musician, such generated music can no doubt assist the human musician in compositions, which is one of the key objectives of this project. 

A further investigation on the embeddings layer in the model has shown that the model is even capable of understanding music theory, which, given a larger dataset, will definitely be able to generate music of greater length and musicality.

That said, there are a number of improvements that can be made to this model, which can be explored with future iterations of this project:
1. The songs are still of one part, and it would be great to have multiple parts fed into the model. The limitation of the LSTM is that the input has to be one dimensional. There is  a way to input more than one dimension to the LSTM, which is highlighted here https://keras.io/guides/functional_api/. If more instruments can be added, it will add a greater degree of complexity and dynamics to the generated music.<br>


2. Only piano music is used for the model. The reason is that piano music is the least dynamic, with most notes being staccato, where each note sharply detached or separated from the others. To incorporate legato, where notes are played in a smooth flowing manner, without breaks between notes, or other techniques used on other instruments such as guitar or violin, would require another dimension of input, which would make the case for multiple degree input mentioned in point 1, to the LSTM even stronger.<br>


3. LSTMs are increasingly being replaced by transformers. So adapting the model in this project to incoporate transformers rather than LSTMs would be something to explore in the future.<br>


4. One of the other goals of this project was to generate a melody from a chord progression. While I ultimately did not do this, it is something that can be pursued in the near future, as models such as seq2seq architecture, which is widely used in translations, can potentially fufil this task, with the input 'language' being the chord progression, and with the output 'language' being the melody.