# AAI 511 Final Team Project

Group 2

Members: Daniel Shifrin, Jory Hamilton, Alden Caterio

## Introduction

Music is a form of art that is ubiquitous and has a rich history. Different composers have created music with their unique styles and compositions. However, identifying the composer of a particular piece of music can be a challenging task, especially for novice musicians or listeners. The proposed project aims to use deep learning techniques to identify the composer of a given piece of music accurately.




## Objective

The primary objective of this project is to develop a deep learning model that can predict the composer of a given musical score accurately. The project aims to accomplish this objective by using two deep learning techniques: Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN).

In [148]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [149]:
# mido is a python library to works with MIDI files/data (music in this project)
! pip install mido



In [150]:
import pandas as pd
import os
import librosa
import numpy as np
import mido
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Embedding, TimeDistributed, Flatten, Conv2D, MaxPooling2D
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical

In [189]:
# get directory of dataset
data_dir = '/content/drive/MyDrive/ColabNotebooks/AAI511/FinalProject/Composer_Dataset'
initial_filelist = os.listdir(data_dir)
initial_filelist

['midiclassics.zip',
 'Tchaikovsky Lake Of The Swans Act 1 6mov.mid',
 'Tchaikovsky Lake Of The Swans Act 1 3mov.mid',
 'Tchaikovsky Lake Of The Swans Act 1 5mov.mid',
 'Tchaikovsky Lake Of The Swans Act 1 4mov.mid',
 'Tchaikovsky Lake Of The Swans Act 1 2mov.mid',
 'Tchaikovsky Lake Of The Swans Act 1 1mov.mid',
 'Rothchlid Symphony Rmw12 3mov.mid',
 'Tchaicovsky Waltz of the Flowers.MID',
 'Rothchild Symphony Rmw12 2mov.mid',
 'midiclassics',
 'Arndt',
 'Arensky',
 'Ambroise',
 'Alkan',
 'Albe╠üniz',
 'Sibelius Kuolema Vals op44.mid',
 'Wagner Ride of the valkyries.mid',
 'Tchaikovsky Lake Of The Swans Act 2 14mov.mid',
 'Tchaikovsky Lake Of The Swans Act 1 9mov.mid',
 'Tchaikovsky Lake Of The Swans Act 2 12mov.mid',
 'Tchaikovsky Lake Of The Swans Act 2 13mov.mid',
 'Tchaikovsky Lake Of The Swans Act 1 7-8movs.mid',
 'Tchaikovsky Lake Of The Swans Act 2 11mov.mid',
 'Tchaikovsky Lake Of The Swans Act 2 10mov.mid']

In [197]:
# composer folders containing midi files are in 'midiclassics' folder
filelist = os.listdir(data_dir + '/midiclassics')
composers = ['Bach', 'Beethoven', 'Chopin', 'Mozart']
for name in composers:
  if name in filelist:
    print(f"{name} is found in the directory")

Bach is found in the directory
Beethoven is found in the directory
Chopin is found in the directory
Mozart is found in the directory


In [204]:
# map songs to composers
# 0 = bach, 1 = beethoven, 2 = chopin, 3 = mozart
data_dir_new = data_dir + '/midiclassics'
midi_file = []
composer_key = []
for i in range(len(composers)):
  print(f"Getting files for {composers[i]}")
  filenames = os.listdir(data_dir_new + '/' + composers[i])
  for filename in filenames:
    if '.mid' in filename:
      midi_file.append(filename)
      composer_key.append(i)

Getting files for Bach
Getting files for Beethoven
Getting files for Chopin
Getting files for Mozart


In [205]:
# Create array with all files and accompanying composer key
files_keys = np.array(list(zip(midi_file,composer_key)))

# Create dataframe to map composers to songs
df = pd.DataFrame(files_keys, columns = ['Song', 'Composer Key'])

# Print shape and preview
print(f"Shape of df = {np.shape(df)}")
df.head()

Shape of df = (481, 2)


Unnamed: 0,Song,Composer Key
0,Bwv1014 Harpsicord and Violin Sonata 2mov.mid,0
1,Bwv1014 Harpsicord and Violin Sonata 1mov.mid,0
2,Bwv1005 Violin Sonata n3 4mov Allegro.mid,0
3,Bwv0998 Prelude Fugue Allegro for Lute 2mov.mid,0
4,Bwv0997 Partita for Lute 3mov.mid,0


In [207]:
# Create list of datapaths
datapaths = []
for i in range(len(df)):
  temp_str = data_dir_new + '/' + composers[int(df['Composer Key'].loc[i])] + '/' + df['Song'].iloc[i]
  datapaths.append(temp_str)

datapaths[0:5]

['/content/drive/MyDrive/ColabNotebooks/AAI511/FinalProject/Composer_Dataset/midiclassics/Bach/Bwv1014 Harpsicord and Violin Sonata 2mov.mid',
 '/content/drive/MyDrive/ColabNotebooks/AAI511/FinalProject/Composer_Dataset/midiclassics/Bach/Bwv1014 Harpsicord and Violin Sonata 1mov.mid',
 '/content/drive/MyDrive/ColabNotebooks/AAI511/FinalProject/Composer_Dataset/midiclassics/Bach/Bwv1005 Violin Sonata n3 4mov Allegro.mid',
 '/content/drive/MyDrive/ColabNotebooks/AAI511/FinalProject/Composer_Dataset/midiclassics/Bach/Bwv0998 Prelude Fugue Allegro for Lute 2mov.mid',
 '/content/drive/MyDrive/ColabNotebooks/AAI511/FinalProject/Composer_Dataset/midiclassics/Bach/Bwv0997 Partita for Lute 3mov.mid']

In [208]:
def get_notes_from_midi(datapath):
  """
  Get the notes from a Midi file
  """
  notes_from_midi = []
  try:
    midifile = mido.MidiFile(datapath, clip=True)
    for msg in midifile:
      if msg.type == 'note_on' and msg.velocity > 0:
        notes_from_midi.append((msg.time, msg.note, msg.velocity))
  except:
    notes_from_midi = np.zeros((1,3))

  return notes_from_midi

In [209]:
# Find the average length of the midi file note sequences
notes_arr = []
avg_len = []
for path_i in range(len(datapaths)):
  temp_notes_arr = get_notes_from_midi(datapaths[path_i])
  avg_len.append(len(temp_notes_arr))

notes_arr_avg = int(np.mean(avg_len))

In [210]:
print(f"Average length of midi file note sequences = {notes_arr_avg}")

# this number will be used for the padding

Average length of midi file note sequences = 3501


In [211]:
def pad_sequence(note_sequence, maxlen):
  """
  Pad/trim the note sequence to a fixed length (maxlen).
  """
  padded_sequence = np.zeros((maxlen, 3))
  sequence_length = min(len(note_sequence), maxlen)
  padded_sequence[:sequence_length] = note_sequence[:sequence_length]
  return padded_sequence

In [212]:
# Initialize variables
midi_notes = []
rowstodrop = []
print_flag = 1;
key_match = 0;

# iterate through all datapaths
for path_i in range(len(datapaths)):
  # Print what composer's music is being processed
  current_composer = int(df['Composer Key'].iloc[path_i])
  if current_composer == key_match:
    print_flag = 1
    key_match = key_match+1
  if print_flag == 1:
    print(f"Getting midi files for {composers_folders[current_composer]}")
    print_flag = 0

  # Read midi file
  notes_arr = get_notes_from_midi(datapaths[path_i])

  # Pad the file
  notes_arr_padded = pad_sequence(notes_arr, notes_arr_avg)

  # If file is completely empty (all values are zeros),
  # do not add to midi_notes list
  if not np.any(notes_arr_padded):
    # need to drop the row on df to keep it the same length as the midi_notes list
    rowstodrop.append(path_i)
    continue

  # Append to midi list
  midi_notes.append(notes_arr_padded)

# Drop rows of the invalid data to keep dimensions the same as the midi list
df_new = df.drop(rowstodrop)

# Checking if shape is consistent
print(f"""
Shape of midi data = {np.shape(midi_notes)}
Shape of df_new = {np.shape(df_new)}
      """)

Getting midi files for midiclassics.zip
Getting midi files for Tchaikovsky Lake Of The Swans Act 1 6mov.mid
Getting midi files for Tchaikovsky Lake Of The Swans Act 1 3mov.mid
Getting midi files for Tchaikovsky Lake Of The Swans Act 1 5mov.mid

Shape of midi data = (480, 3501, 3)
Shape of df_new = (480, 2)
      


## Long Short-Term Memory (LSTM)

In [213]:
def build_lstm_model(input_shape, num_classes):
    """
      Define the LSTM model
    """
    model = Sequential()
    model.add(LSTM(128, input_shape=input_shape, return_sequences=True))
    model.add(Dropout(0.3))
    model.add(LSTM(128))
    model.add(Dropout(0.3))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [214]:
# Define input shapes for LSTM
input_shape = (notes_arr_avg, 3)  # (sequence_length, feature_dim)
num_classes = 4  # Number of composers

# Build the model
model_LSTM = build_lstm_model(input_shape, num_classes)

# Output summary
model_LSTM.summary()

  super().__init__(**kwargs)


In [215]:
# Define X and y
# X = independent variables
# y = dependent variable
X = np.array(midi_notes)
y = np.array(df_new['Composer Key'])  # Convert composer names to numeric labels

# Convert labels to one-hot encoding
y = to_categorical(y, num_classes=num_classes)

In [216]:
np.shape(X)

(480, 3501, 3)

In [217]:
np.shape(y)

(480, 4)

In [218]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [219]:
# Train the LSTM model
model_LSTM.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))

Epoch 1/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 351ms/step - accuracy: 0.3260 - loss: 1.3521 - val_accuracy: 0.2812 - val_loss: 1.3907
Epoch 2/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 306ms/step - accuracy: 0.3090 - loss: 1.3444 - val_accuracy: 0.3438 - val_loss: 1.3315
Epoch 3/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 318ms/step - accuracy: 0.3835 - loss: 1.3333 - val_accuracy: 0.3542 - val_loss: 1.3255
Epoch 4/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 301ms/step - accuracy: 0.3648 - loss: 1.3028 - val_accuracy: 0.3750 - val_loss: 1.3574
Epoch 5/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 301ms/step - accuracy: 0.3971 - loss: 1.2976 - val_accuracy: 0.3750 - val_loss: 1.3216
Epoch 6/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 304ms/step - accuracy: 0.3908 - loss: 1.2934 - val_accuracy: 0.3542 - val_loss: 1.3141
Epoch 7/10
[1m6/6[0m [32m━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7991ba0f6950>

In [220]:
# Evaluate the model on the test set
test_loss, test_accuracy = model_LSTM.evaluate(X_test, y_test)
print(f'Test Accuracy: {test_accuracy * 100:.2f}%')

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 83ms/step - accuracy: 0.3828 - loss: 1.3539
Test Accuracy: 40.62%


## Convolutional Neural Network (CNN)

In [221]:
np.shape(midi_notes[0])

(3501, 3)

In [222]:
midi_notes[0][0:5]

array([[  0.,  78., 100.],
       [  0.,  62.,  70.],
       [  0.,  71.,  70.],
       [  0.,  47., 100.],
       [  0.,  79., 100.]])

In [223]:
print(midi_notes[0][0][1])
print(midi_notes[0][0][2])

78.0
100.0


In [224]:
# midi note data consists of (msg.time, msg.note, msg.velocity)
# use note and velocity as data for CNN training

trim_to_len = 3200
midi_notes_new = []
for resize_i in range(len(midi_notes)):
  trimmed_notes = midi_notes[resize_i][0:trim_to_len]
  midi_notes_new.append(trimmed_notes)

np.shape(midi_notes_new)

(480, 3200, 3)

In [225]:
x = np.shape(midi_notes_new)

In [226]:
x[1]

3200

In [227]:
# Reshape data
datashape_new = np.shape(midi_notes_new)
dim_x = 80
dim_y = 120
midi_notes_reshaped = []
for reshape_i in range(len(midi_notes_new)):
  # reshape each iteration into 1D array
  data_1D = np.reshape(midi_notes_new[0], datashape_new[1]*datashape_new[2])
  # reshape data into
  data_reshaped = np.reshape(data_1D, (dim_x, dim_y))
  midi_notes_reshaped.append(data_reshaped)

In [228]:
np.shape(midi_notes_reshaped)

(480, 80, 120)

In [229]:
X = np.array(midi_notes_reshaped)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [230]:
# Reshape data for CNN
datashape = np.shape(midi_notes_reshaped)
X_train = X_train.reshape((X_train.shape[0], datashape[1], datashape[2], 1))
X_test = X_test.reshape((X_test.shape[0], datashape[1], datashape[2], 1))
# X_train, X_test = X_train / 255.0, X_test / 255.0

In [231]:
# Preview of the shape of the training (and test) data
X_train.shape

(384, 80, 120, 1)

In [232]:
def build_cnn_model(input_shape, num_classes):
    model = Sequential()

    # First convolutional layer
    model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # Second convolutional layer
    model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # Flatten and fully connected layer
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))

    # Output layer
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [233]:
# Define input shape for CNN
input_shape = (datashape[1], datashape[2], 1)

# Build the model
model_CNN = build_cnn_model(input_shape, num_classes)

# Output model summary
model_CNN.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [234]:
# Train the CNN model
model_CNN.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))

Epoch 1/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 212ms/step - accuracy: 0.2195 - loss: 152.5482 - val_accuracy: 0.3021 - val_loss: 32.0459
Epoch 2/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 28ms/step - accuracy: 0.1990 - loss: 20.8417 - val_accuracy: 0.2083 - val_loss: 1.4186
Epoch 3/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step - accuracy: 0.2794 - loss: 1.3986 - val_accuracy: 0.2083 - val_loss: 1.4075
Epoch 4/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - accuracy: 0.2770 - loss: 1.3717 - val_accuracy: 0.2500 - val_loss: 1.3883
Epoch 5/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - accuracy: 0.2621 - loss: 1.3748 - val_accuracy: 0.2500 - val_loss: 1.3893
Epoch 6/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - accuracy: 0.2970 - loss: 1.3787 - val_accuracy: 0.2500 - val_loss: 1.3923
Epoch 7/10
[1m6/6[0m [32m━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7991e39ecf10>

In [235]:
# Evaluate the model on the test set
test_loss, test_accuracy = model_CNN.evaluate(X_test, y_test)
print(f'Test Accuracy: {test_accuracy * 100:.2f}%')

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.2617 - loss: 1.4178 
Test Accuracy: 25.00%
