# CS89 Final Project: Chord Recognition Model 

### Introduction and Objective: 

Chord recognition in music has long been a topic of interest at the intersection of music and computer science. In this project, I attempt to create my own model that can accurately infer the chord that is being played at different points in a song, given the audio features of that song. 

This has been done before by a few significant research projects, so I began by reading these and learning about the techniques used. See these links if you're interested|: 

- Recognition of Complex Chords: https://ismir2010.ismir.net/proceedings/ismir2010-25.pdf
- Chroma Features for Chord Recognition: https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=54F08AABDAAF6C0780BF6D9B73E93024?doi=10.1.1.400.507&rep=rep1&type=pdf
- Deep Learning for Chord Recognition: https://iopscience.iop.org/article/10.1088/1742-6596/2083/4/042017/meta
- Neural Networks for Chord Recognition: https://www.researchgate.net/publication/252067543_Neural_networks_for_musical_chords_recognition
- Hidden Markov Models for Chord Recognition: https://ccrma.stanford.edu/~kglee/pubs/klee-ismir06.pdf



Although there are many different ways to approach chord recognition, I decided to begin with a simple approach using the scikit learn models we used in class this year with the [Billboard Datset](https://ddmal.music.mcgill.ca/research/The_McGill_Billboard_Project_(Chord_Analysis_Dataset)/). This project is ultimately a classifier, so the McGill Billboard set is perfect because it provides both labels and features, already pre-processed, for the same set of songs. 

I downloaded two main directories from this dataset, one for the chord labels, and the other for the audio features (`chord labels` and `audio features`). 

The chord labels set contains a directory of about 1300 songs from the last century, with chord annotations in the LAB format (including their, start time, end time, and label). These are pretty simple to read-- they contain the root tone of the chord, followed by a colon, and then the version of that chord. This is what they look like: 
```
1.8015782305999999	3.529687074200001	A:min
3.529687074200001	5.257795917800002	A:min
5.257795917800002	6.985904761400003	C:maj
6.985904761400003	8.714013605000009	C:maj
```

The audio features set contains a directory for the same 1300 songs, but instead contains timestamps throughout the song, as well as the chroma features for that timestamp. As explained on the documentation site linked above, the set contains Non-Negative Least Squares (NLSS) chroma features. I didn't read too much into the specific methods used for this feature extraction and the preprocessing methods, but the representation generally works as follows (including details for those less familiar with music theory):

- In the vast majority of music made in the last few hundred years, music is written and played under 12-Tone Equal temperment. This temperment is centered around the concept of an octave, which is the interval between two notes where one is double the frequency of the other. Two notes an octave apart are perceived to be the same note to humans, but with one tone being higher than the other. Thus, octave intervals are labeled as the same note. 
- Each octave interval is then divided equally into 12 notes, usually centered around a frequency of 440hz, which is marked as A4. Therefore, A3 has a frequency of 220hz, while A5 has a frequency of 880hz. Each interval between is divided equally, giving us A, A#, B, C, C#, D, D#, E, F, F#, G, and G#. 
-Chroma features essentially divide all frequencies within a sound into these 12 notes, resulting in a 12-dimensional vector representing the strength of each 12-Tone note (in this case, the dataset contains 24-Dimensional vectors to span two octaves, since the labeling of chords containing the same notes can sometimes vary according to the distribution of these notes or relatively placement to one another). 
- The features in the dataset are adjusted for tuning differences between songs, so the frequencies will be directed to the correct chroma-bins. 

### Part 1: Parsing and Storing the Labels DataFrame

My general approach here is pretty simple: 
```
for each file in the labels directory
    for each line in the file
        create a new row in the data
        add the directory name (which is the song name), start time, end time, and chord label
```

Import Libraries: 

In [3]:
import numpy as np
import pandas as pd
import os
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, classification_report, f1_score
from sklearn.preprocessing import LabelEncoder

Get the chord labels from the `chord labels` directory by looping through each file and splicing the labels into the new dataframe with the timestamps

In [3]:
def read_lab_files(root_folder):
    # Initialize empty lists to store data
    all_chords = []
    all_start_times = []
    all_end_times = []
    song_numbers = []

    # Iterate through subdirectories in the root folder
    for subdir, _, files in os.walk(root_folder):
        # get the song number
        song_number = os.path.basename(subdir)  # Extract only the base name of the directory
        for file in files:
            # Check if the file is 'full.lab'
            if file == 'full.lab':
                file_path = os.path.join(subdir, file)

                # Read the 'full.lab' file and extract data
                with open(file_path, 'r') as lab_file:
                    for line in lab_file:
                        # split the line
                        columns = line.strip().split('\t')

                        # skip empty lines
                        if len(columns) != 3: continue

                        # get the start time, end time, and chord                        
                        start_time, end_time, chord = float(columns[0]), float(columns[1]), columns[2]
                            
                        # Check if the chord is not 'N'
                        if chord != 'N':
                            # Append the data to the lists
                            all_start_times.append(start_time)
                            all_end_times.append(end_time)
                            all_chords.append(chord)
                            song_numbers.append(song_number)

    # Create the DataFrame
    df_result = pd.DataFrame({'song': song_numbers, 'Chord': all_chords, 'Start_Time': all_start_times, 'End_Time': all_end_times})

    return df_result

# Example usage
root_folder_path = 'chord labels'
labels_df = read_lab_files(root_folder_path)

# Display the resulting DataFrame
labels_df


Unnamed: 0,song,Chord,Start_Time,End_Time
0,0003,A:min,1.801578,3.529687
1,0003,A:min,3.529687,5.257796
2,0003,C:maj,5.257796,6.985905
3,0003,C:maj,6.985905,8.714014
4,0003,A:min,8.714014,10.438509
...,...,...,...,...
120097,1300,Cb:maj(9),261.871497,263.559552
120098,1300,Cb:maj(9),263.559552,265.247608
120099,1300,Cb:maj(9),265.247608,266.935663
120100,1300,Cb:maj(9),266.935663,268.412712


Now we have our chord labels, with about 121,000 different samples. 

### Part 2: Adding extracted audio features to the dataset

Instead of making two different datasets for the features and the labels, I decided to keep them all in one dataset and then split them upon training because I didn't want the labeles for the songs to get out of order at all when pulling information from different directories. Thus, I decided to use the following approach:

```
for each row (chord) in the chord labels dataframe
    get the song number for that chord
    if that is not the current song number
        make a new dataframe holding the chroma features at intervals throughout that song
    get the interval that the current chord was played in 
    average all of the chroma features for that interval in the song (using the dataframe made from the chroma features files)
    add them onto the end of that song's row in the dataframe
```

The result of this code is that we have a dataframe containing every chord from the labels dataframe, as well as which song the chord came from, its start and end times, and its averaged chroma features for that time range in the song. 

Test: Make a dataframe with the audio features and time stamps

In [4]:
# Read data into a DataFrame
audio_features_path = 'audio features/0003/bothchroma.csv'

# get the first line of data from of the above file path
df = pd.read_csv(audio_features_path, header=None)

# remove the first column of the dataframe
df = df.drop(df.columns[0], axis=1)


In [5]:
df

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,16,17,18,19,20,21,22,23,24,25
0,0.000000,0.198482,0.000000,0.000000,0.635556,0.741292,1.00430,0.814440,0.029282,0.141189,...,0.197437,0.860228,1.165150,1.13561,0.428420,0.112475,1.492970,0.556156,0.562561,0.864485
1,0.046440,0.310882,0.000000,0.000000,0.693876,0.628553,1.08004,0.676368,0.017598,0.140079,...,0.390680,0.939165,1.053370,1.30823,0.322710,0.067595,1.538040,0.566076,0.627636,0.904673
2,0.092880,0.404969,0.000000,0.037238,0.682770,0.591140,1.14683,0.575229,0.014624,0.128743,...,0.392881,0.924133,0.946938,1.36967,0.234190,0.056174,1.429350,0.532472,0.686886,0.906338
3,0.139320,0.480218,0.000000,0.005002,0.435639,0.450297,1.21112,0.458671,0.006372,0.102629,...,0.323281,0.917475,0.679013,1.48532,0.186736,0.024474,1.227170,0.529464,0.717978,1.057680
4,0.185760,0.539064,0.146614,0.010891,0.444361,0.196939,1.29815,0.239054,0.023305,0.208090,...,0.537718,1.209160,0.135402,1.61202,0.097325,0.089518,0.648249,0.520988,0.703834,1.287430
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3245,150.697506,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3246,150.743946,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3247,150.790385,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3248,150.836825,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


Get the audio features for all songs in a dataframe (see pseudo above)

In [None]:
# initialize the song variable to keep from having to reload data
song = None
features_df = None

# for each row in the labels_df
for index, row in labels_df.iterrows():

    # get important information from the row
    start_time = row['Start_Time']
    end_time = row['End_Time']
    print(row['song'])

    # if this is a different song than the last one, update song and get the new features df
    if song != row['song']:
        song = row['song']

        audio_features_path = 'audio features/' + str(song) + '/bothchroma.csv'
        features_df = pd.read_csv(audio_features_path, header=None)

        # remove the first column of the dataframe
        features_df = features_df.drop(features_df.columns[0], axis=1)

    # Find indices corresponding to the timestamp range
    start_index = features_df[features_df[1].astype(float) >= start_time].index.min()
    end_index = features_df[features_df[1].astype(float) <= end_time].index.max()

    # Extract features from the current row based on the start and end indices
    selected_features = features_df.loc[start_index:end_index]

    # Calculate the sum of each column (starting from column 2) and add them as new columns
    column_sums = selected_features.iloc[:, 1:].mean()
    new_columns = pd.Series(column_sums.values, index=[f'avg_{i}' for i in range(0, len(column_sums))])

    # Append the new columns to the original DataFrame
    labels_df.loc[index, new_columns.index] = new_columns.values


In [7]:
labels_df

Unnamed: 0,song,Chord,Start_Time,End_Time,avg_0,avg_1,avg_2,avg_3,avg_4,avg_5,...,avg_14,avg_15,avg_16,avg_17,avg_18,avg_19,avg_20,avg_21,avg_22,avg_23
0,0003,A:min,1.801578,3.529687,1.296338,0.096233,0.044460,0.014869,0.132211,0.016741,...,0.175255,0.249544,1.671471,0.182295,0.319548,1.617817,0.232052,0.493018,1.028739,0.196863
1,0003,A:min,3.529687,5.257796,1.090067,0.142989,0.229574,0.159840,0.121408,0.186105,...,0.696055,0.523500,0.931865,0.280313,0.503893,1.918059,0.409933,0.244223,0.970336,0.377233
2,0003,C:maj,5.257796,6.985905,0.036607,0.045573,0.072026,2.188380,0.098114,0.130671,...,0.384420,2.203376,0.261813,0.166969,0.285419,2.181979,0.446434,0.436596,1.336964,0.357736
3,0003,C:maj,6.985905,8.714014,0.144676,0.094927,0.347970,0.652635,0.166066,0.592804,...,0.500397,0.873766,0.559218,0.472823,0.352675,1.636583,0.284842,0.416114,0.657960,0.692985
4,0003,A:min,8.714014,10.438509,0.867339,0.000000,0.000000,0.127949,0.063239,0.006907,...,0.137837,0.802652,0.340745,0.067628,0.071948,2.307925,0.213207,0.150309,0.773141,0.019857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120097,1300,Cb:maj(9),261.871497,263.559552,0.370145,0.648166,1.763381,0.161766,0.062197,0.064927,...,2.514952,0.155326,0.427778,0.169117,1.394679,0.298624,0.192295,1.514244,0.336416,0.396198
120098,1300,Cb:maj(9),263.559552,265.247608,0.247420,0.655271,1.907759,0.211228,0.063261,0.115921,...,2.275539,0.160866,0.355767,0.618172,0.639412,0.388607,0.306658,1.446998,0.470964,0.246756
120099,1300,Cb:maj(9),265.247608,266.935663,0.219294,0.804724,1.859394,0.142114,0.009017,0.095662,...,2.146290,0.278917,0.297197,0.378818,1.179582,0.273272,0.254434,1.899630,0.235320,0.157554
120100,1300,Cb:maj(9),266.935663,268.412712,0.032748,0.161259,0.529614,0.006026,0.000000,0.003509,...,0.325127,0.022831,0.174218,0.076228,0.275325,0.061588,0.085875,0.464677,0.042717,0.105184


In [8]:
# save this dataframe to a csv
labels_df.to_csv('labels.csv')

### Part 3: Training the Models: 

Since loading this set takes forever, I've saved it in `labels.csv` for easy reloading. I do that here. 

In [2]:
# load the labels dataframe from a csv
labels_df = pd.read_csv('labels.csv', index_col=0)

# Drop the index column
labels_df.reset_index(drop=True, inplace=True)

In [3]:
labels_df

Unnamed: 0,song,Chord,Start_Time,End_Time,avg_0,avg_1,avg_2,avg_3,avg_4,avg_5,...,avg_14,avg_15,avg_16,avg_17,avg_18,avg_19,avg_20,avg_21,avg_22,avg_23
0,3,A:min,1.801578,3.529687,1.296338,0.096233,0.044460,0.014869,0.132211,0.016741,...,0.175255,0.249544,1.671471,0.182295,0.319548,1.617817,0.232052,0.493018,1.028739,0.196863
1,3,A:min,3.529687,5.257796,1.090067,0.142989,0.229574,0.159840,0.121408,0.186105,...,0.696055,0.523500,0.931865,0.280313,0.503893,1.918059,0.409933,0.244223,0.970336,0.377233
2,3,C:maj,5.257796,6.985905,0.036607,0.045573,0.072026,2.188380,0.098114,0.130671,...,0.384420,2.203376,0.261813,0.166969,0.285419,2.181979,0.446434,0.436596,1.336964,0.357736
3,3,C:maj,6.985905,8.714014,0.144676,0.094927,0.347970,0.652635,0.166066,0.592804,...,0.500397,0.873766,0.559218,0.472823,0.352675,1.636583,0.284842,0.416114,0.657960,0.692985
4,3,A:min,8.714014,10.438509,0.867339,0.000000,0.000000,0.127949,0.063239,0.006907,...,0.137837,0.802652,0.340745,0.067628,0.071948,2.307925,0.213207,0.150309,0.773141,0.019857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120097,1300,Cb:maj(9),261.871497,263.559552,0.370145,0.648166,1.763381,0.161766,0.062197,0.064927,...,2.514952,0.155326,0.427778,0.169117,1.394679,0.298624,0.192295,1.514244,0.336416,0.396198
120098,1300,Cb:maj(9),263.559552,265.247608,0.247420,0.655271,1.907759,0.211228,0.063261,0.115921,...,2.275539,0.160866,0.355767,0.618172,0.639412,0.388607,0.306658,1.446998,0.470964,0.246756
120099,1300,Cb:maj(9),265.247608,266.935663,0.219294,0.804724,1.859394,0.142114,0.009017,0.095662,...,2.146290,0.278917,0.297197,0.378818,1.179582,0.273272,0.254434,1.899630,0.235320,0.157554
120100,1300,Cb:maj(9),266.935663,268.412712,0.032748,0.161259,0.529614,0.006026,0.000000,0.003509,...,0.325127,0.022831,0.174218,0.076228,0.275325,0.061588,0.085875,0.464677,0.042717,0.105184


**Splitting the dataset:**

Split the data into features and labels:

In [4]:
# Assuming columns are named as avg_0, avg_1, ..., avg_25
avg_columns = [f'avg_{i}' for i in range(0, 24)]

# Separate features (X) and labels (y)
X = labels_df[avg_columns]
y = labels_df['Chord']


Print the features:

In [5]:
X

Unnamed: 0,avg_0,avg_1,avg_2,avg_3,avg_4,avg_5,avg_6,avg_7,avg_8,avg_9,...,avg_14,avg_15,avg_16,avg_17,avg_18,avg_19,avg_20,avg_21,avg_22,avg_23
0,1.296338,0.096233,0.044460,0.014869,0.132211,0.016741,0.154689,0.321275,0.182780,0.386259,...,0.175255,0.249544,1.671471,0.182295,0.319548,1.617817,0.232052,0.493018,1.028739,0.196863
1,1.090067,0.142989,0.229574,0.159840,0.121408,0.186105,0.291755,0.743278,0.325715,0.235025,...,0.696055,0.523500,0.931865,0.280313,0.503893,1.918059,0.409933,0.244223,0.970336,0.377233
2,0.036607,0.045573,0.072026,2.188380,0.098114,0.130671,0.048456,0.566124,0.466513,0.455565,...,0.384420,2.203376,0.261813,0.166969,0.285419,2.181979,0.446434,0.436596,1.336964,0.357736
3,0.144676,0.094927,0.347970,0.652635,0.166066,0.592804,0.367234,0.747481,0.173354,0.448949,...,0.500397,0.873766,0.559218,0.472823,0.352675,1.636583,0.284842,0.416114,0.657960,0.692985
4,0.867339,0.000000,0.000000,0.127949,0.063239,0.006907,0.079671,0.191587,0.127434,0.268038,...,0.137837,0.802652,0.340745,0.067628,0.071948,2.307925,0.213207,0.150309,0.773141,0.019857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120097,0.370145,0.648166,1.763381,0.161766,0.062197,0.064927,0.225371,0.227636,0.233065,0.683639,...,2.514952,0.155326,0.427778,0.169117,1.394679,0.298624,0.192295,1.514244,0.336416,0.396198
120098,0.247420,0.655271,1.907759,0.211228,0.063261,0.115921,0.073898,0.360983,0.389042,0.365939,...,2.275539,0.160866,0.355767,0.618172,0.639412,0.388607,0.306658,1.446998,0.470964,0.246756
120099,0.219294,0.804724,1.859394,0.142114,0.009017,0.095662,0.128853,0.085343,0.285248,0.693693,...,2.146290,0.278917,0.297197,0.378818,1.179582,0.273272,0.254434,1.899630,0.235320,0.157554
120100,0.032748,0.161259,0.529614,0.006026,0.000000,0.003509,0.067913,0.071245,0.142333,0.098923,...,0.325127,0.022831,0.174218,0.076228,0.275325,0.061588,0.085875,0.464677,0.042717,0.105184


Print the labels:

In [6]:
y

0             A:min
1             A:min
2             C:maj
3             C:maj
4             A:min
            ...    
120097    Cb:maj(9)
120098    Cb:maj(9)
120099    Cb:maj(9)
120100    Cb:maj(9)
120101    Cb:maj(9)
Name: Chord, Length: 120102, dtype: object

Drop NA values and and check shapes:

In [7]:
print(labels_df.shape)
# Drop rows with NaN values (luckily there are only 8 of these, as shown by the shape print statements)
labels_df_cleaned = labels_df.dropna(subset=avg_columns + ['Chord'])
print(labels_df_cleaned.shape)

# Separate features (X) and target variable (y) from the cleaned dataset
X_cleaned = labels_df_cleaned[avg_columns]
y_cleaned = labels_df_cleaned['Chord']

print(X_cleaned.shape)
print(y_cleaned.shape)



(120102, 28)
(120094, 28)
(120094, 24)
(120094,)


Prepare for training using the Label Encoder to work with string labels:

In [8]:
# Initialize the label encoder
label_encoder = LabelEncoder()

# Fit and transform the labels on the cleaned dataset
y_cleaned_encoded = label_encoder.fit_transform(y_cleaned)

### Part 3a: Logistic Regression Model:

Take a subset of the data (this dataset is too large-- didn't run the LR in 30 minutes on full size), and split the data

In [51]:
# downsize the data because the dataset is too large and crashes the kernel
LR_X_subset, _, LR_y_subset, _ = train_test_split(X_cleaned, y_cleaned_encoded, test_size=0.5, random_state=0)

# Split the cleaned data into training and testing sets
LR_X_train, LR_X_test, LR_y_train_encoded, LR_y_test_encoded = train_test_split(LR_X_subset, LR_y_subset, test_size=0.2, random_state=0)

In [52]:
print(LR_X_subset.shape)

(60047, 24)


Train the model:

In [49]:
# Initialize the Logistic Regression Model:
LR_model = LogisticRegression(max_iter=10000)

# fit the model on the training data
LR_model.fit(LR_X_train, LR_y_train_encoded)


This takes a while to train, so save it in the directory: 

In [76]:
import pickle

# save the model to disk
filename = 'LR_model.sav'
pickle.dump(LR_model, open(filename, 'wb'))

Make Predictions on the Test Set:

In [69]:
# Make predictions on the test set
LR_y_pred_encoded = LR_model.predict(LR_X_test)

# Convert numerical predictions back to chord names
LR_y_test_original = label_encoder.inverse_transform(LR_y_test_encoded)
LR_y_pred_original = label_encoder.inverse_transform(LR_y_pred_encoded)

Evaluate the Predictions:

In [70]:
# Print the classification report
print("Classification Report:")
print(classification_report(LR_y_test_original, LR_y_pred_original))

# print the weighted f1 score
print("Weighted F1 Score:")
print(f1_score(LR_y_test_original, LR_y_pred_original, average='weighted'))

Classification Report:


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


                   precision    recall  f1-score   support

           A#:dim       0.00      0.00      0.00         1
          A#:dim7       0.00      0.00      0.00         1
            A:1/1       0.29      0.14      0.19        28
             A:11       0.00      0.00      0.00        20
             A:13       1.00      0.12      0.22         8
              A:5       0.00      0.00      0.00        40
          A:5(b7)       1.00      0.17      0.29         6
        A:5(b7)/2       0.00      0.00      0.00         3
        A:5(b7)/3       0.00      0.00      0.00         2
              A:7       0.44      0.20      0.28       100
          A:7(b9)       0.00      0.00      0.00         2
        A:7(b9)/3       0.00      0.00      0.00         1
            A:7/3       0.00      0.00      0.00         1
            A:7/4       0.00      0.00      0.00         2
            A:7/5       0.00      0.00      0.00         1
              A:9       0.00      0.00      0.00       

In [60]:
# make a dataframe with two columns: the actual and predicted chords
LR_comparison_df = pd.DataFrame({'Actual': LR_y_test_original, 'Predicted': LR_y_pred_original})

# add two more columns, by taking the string before the colon in each column and adding it as a new column
LR_comparison_df['Actual Root'] = LR_comparison_df['Actual'].str.split(':', expand=True)[0]
LR_comparison_df['Predicted Root'] = LR_comparison_df['Predicted'].str.split(':', expand=True)[0]

# display the dataframe
LR_comparison_df

Unnamed: 0,Actual,Predicted,Actual Root,Predicted Root
0,Eb:maj,Eb:maj,Eb,Eb
1,B:min,B:min,B,B
2,A:min,A:min,A,A
3,D:min6,D:maj,D,D
4,G:min9,G:min7,G,G
...,...,...,...,...
12005,G:maj,Ab:maj,G,Ab
12006,B:7,B:maj,B,B
12007,A:maj,A:maj,A,A
12008,C:maj,C:maj,C,C


In [61]:
# find the percentage of the predicted roots that are correct
print("Root Accuracy:")
print(LR_comparison_df['Actual Root'].eq(LR_comparison_df['Predicted Root']).mean())

Root Accuracy:
0.7967527060782681


### Part 3b: Random Forest Classifier Model:

Split and do the subset:

In [116]:
# downsize the data because the dataset is too large and crashes the kernel
RF_X_subset, _, RF_y_subset, _ = train_test_split(X_cleaned, y_cleaned_encoded, test_size=0.5, random_state=0)

# Split the cleaned data into training and testing sets
RF_X_train, RF_X_test, RF_y_train_encoded, RF_y_test_encoded = train_test_split(RF_X_subset, RF_y_subset, test_size=0.2, random_state=0)

In [117]:
# Initialize the RandomForestClassifier (you can replace this with your desired model)
RFmodel = RandomForestClassifier()

# Fit the model on the training data
RFmodel.fit(RF_X_train, RF_y_train_encoded)

This takes a while to train, so save it in the directory: (this was a lot of space so it failed)

In [13]:
import pickle

# save the model to disk
filename = 'RF_model.sav'
pickle.dump(RFmodel, open(filename, 'wb'))

OSError: [Errno 28] No space left on device

In [14]:
# Make predictions on the test set
RF_y_pred_encoded = RFmodel.predict(RF_X_test)

# Convert numerical predictions back to chord names
RF_y_test_original = label_encoder.inverse_transform(RF_y_test_encoded)
RF_y_pred_original = label_encoder.inverse_transform(RF_y_pred_encoded)

Print the classification report and the weighted f1 score:

In [15]:
# Print the classification report
print("Classification Report:")
print(classification_report(RF_y_test_original, RF_y_pred_original))

# print the weighted f1 score
print("Weighted F1 Score:")
print(f1_score(RF_y_test_original, RF_y_pred_original, average='weighted'))

Classification Report:


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


                   precision    recall  f1-score   support

           A#:dim       0.00      0.00      0.00         1
          A#:dim7       1.00      1.00      1.00         1
            A:1/1       1.00      0.50      0.67        28
             A:11       1.00      0.60      0.75        20
             A:13       1.00      0.62      0.77         8
              A:5       0.90      0.23      0.36        40
          A:5(b7)       1.00      0.33      0.50         6
        A:5(b7)/2       0.00      0.00      0.00         3
        A:5(b7)/3       1.00      0.50      0.67         2
              A:7       0.74      0.37      0.49       100
          A:7(b9)       0.00      0.00      0.00         2
        A:7(b9)/3       0.00      0.00      0.00         1
            A:7/3       0.00      0.00      0.00         1
            A:7/4       0.00      0.00      0.00         2
            A:7/5       0.00      0.00      0.00         1
              A:9       0.00      0.00      0.00       

Compare the predicted chords to the actual chords in a dataframe:

In [16]:
# make a dataframe with two columns: the actual and predicted chords
RF_comparison_df = pd.DataFrame({'Actual': RF_y_test_original, 'Predicted': RF_y_pred_original})

# add two more columns, by taking the string before the colon in each column and adding it as a new column
RF_comparison_df['Actual Root'] = RF_comparison_df['Actual'].str.split(':', expand=True)[0]
RF_comparison_df['Predicted Root'] = RF_comparison_df['Predicted'].str.split(':', expand=True)[0]

# display the dataframe
RF_comparison_df

Unnamed: 0,Actual,Predicted,Actual Root,Predicted Root
0,Eb:maj,Eb:maj,Eb,Eb
1,B:min,B:min,B,B
2,A:min,A:min,A,A
3,D:min6,D:1/1,D,D
4,G:min9,G:min7,G,G
...,...,...,...,...
12005,G:maj,G:maj,G,G
12006,B:7,B:7,B,B
12007,A:maj,A:maj,A,A
12008,C:maj,C:maj,C,C


In [17]:
# find the percentage of the predicted roots that are correct
print("Root Accuracy:")
print(RF_comparison_df['Actual Root'].eq(RF_comparison_df['Predicted Root']).mean())

Root Accuracy:
0.8572023313905079


### Part 3c: Random Forest Classifier Model on Full dataset:

**NOTE:** This was not actually executed or used in the end becuase the model was massive so it took too long to train and also ran out of space on my hard drive. 

I did write the code for it, and I assume it would achieve much better accuracy than the first model becuase the ones above train-test split only *half* of the original dataset.

Split and do the subset:

In [10]:
# Split the cleaned data into training and testing sets
Full_RF_X_train, Full_RF_X_test, Full_RF_y_train_encoded, Full_RF_y_test_encoded = train_test_split(X_cleaned, y_cleaned_encoded, test_size=0.2, random_state=0)

In [11]:
# Initialize the RandomForestClassifier (you can replace this with your desired model)
Full_RFmodel = RandomForestClassifier()

# Fit the model on the training data
Full_RFmodel.fit(Full_RF_X_train, Full_RF_y_train_encoded)

MemoryError: could not allocate 497025024 bytes



This takes a while to train, so save it in the directory: 


In [None]:
import pickle

# save the model to disk
filename = 'Full_RF_model.sav'
pickle.dump(Full_RF_model, open(filename, 'wb'))

Make Predictions on the Test Set:

In [None]:
# Make predictions on the test set
Full_RF_y_pred_encoded = Full_RFmodel.predict(Full_RF_X_test)

# Convert numerical predictions back to chord names
Full_RF_y_test_original = label_encoder.inverse_transform(Full_RF_y_test_encoded)
Full_RF_y_pred_original = label_encoder.inverse_transform(Full_RF_y_pred_encoded)

Print the classification report and the weighted f1 score:

In [None]:
# Print the classification report
print("Classification Report:")
print(classification_report(Full_RF_y_test_original, Full_RF_y_pred_original))

# print the weighted f1 score
print("Weighted F1 Score:")
print(f1_score(Full_RF_y_test_original, Full_RF_y_pred_original, average='weighted'))

Classification Report:


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


                   precision    recall  f1-score   support

           A#:dim       0.00      0.00      0.00         1
          A#:dim7       1.00      1.00      1.00         1
            A:1/1       0.93      0.50      0.65        28
             A:11       0.92      0.60      0.73        20
             A:13       1.00      0.75      0.86         8
              A:5       1.00      0.23      0.37        40
          A:5(b7)       1.00      0.67      0.80         6
        A:5(b7)/2       0.00      0.00      0.00         3
        A:5(b7)/3       0.00      0.00      0.00         2
              A:7       0.71      0.40      0.51       100
          A:7(b9)       0.00      0.00      0.00         2
        A:7(b9)/3       0.00      0.00      0.00         1
            A:7/3       0.00      0.00      0.00         1
            A:7/4       0.00      0.00      0.00         2
            A:7/5       0.00      0.00      0.00         1
              A:9       0.00      0.00      0.00       

Compare the predicted chords to the actual chords in a dataframe:

In [None]:
# make a dataframe with two columns: the actual and predicted chords
RF_comparison_df = pd.DataFrame({'Actual': RF_y_test_original, 'Predicted': RF_y_pred_original})

# add two more columns, by taking the string before the colon in each column and adding it as a new column
RF_comparison_df['Actual Root'] = RF_comparison_df['Actual'].str.split(':', expand=True)[0]
RF_comparison_df['Predicted Root'] = RF_comparison_df['Predicted'].str.split(':', expand=True)[0]

# display the dataframe
RF_comparison_df

Unnamed: 0,Actual,Predicted,Actual Root,Predicted Root
0,Eb:maj,Eb:maj,Eb,Eb
1,B:min,B:min,B,B
2,A:min,A:min,A,A
3,D:min6,D:1/1,D,D
4,G:min9,G:min,G,G
...,...,...,...,...
12005,G:maj,G:maj,G,G
12006,B:7,B:7,B,B
12007,A:maj,A:maj,A,A
12008,C:maj,C:maj,C,C


In [None]:
# find the percentage of the predicted roots that are correct
print("Root Accuracy:")
print(RF_comparison_df['Actual Root'].eq(RF_comparison_df['Predicted Root']).mean())

Root Accuracy:
0.8574521232306411


### Part 4: Chord Transition Identification and Labeling Using the Model

The RF model was very successful, so we use that instead of the LR model to try and predict the chords for a given song (Here Comes the Sun by The Beatles). 

I wanted to test this model's function on a song of my choice, outside of the dataset. However, I needed a way to find the chord timestamp annotations (explained more below), and I eventually arrived at just using the sonic visualizer to mark these on my own. I know here comes the sun so this was pretty easy to do accurately. 

First, we import the model we want to use using pickle: (this didn't work because my machine ran out of space on the drive every time I tried to save this model since the model takes up more space in saved format). Therefore, I just had to do this step directly after running the RF model above

In [None]:
import pickle

# import the RF model
filename = 'RF_model.sav'

# load the model from disk
model = pickle.load(open(filename, 'rb'))


#### Annotations: 

I wanted to try this model out on songs that I actually know to see if the chords it outputs are accurate and can be used practically. However, after some research, I decided it would be a separate project to develop something to detect *when* the chord changes are, so I decided to annotate the chord changes manually using Sonic Visualizer (the same software that both McGill's dataset and I used to extract the chroma features). Thus, I have to redo the averaged vectors for each time range that a chord is being played in. 

Read in the chord change timings:

In [99]:
def get_chord_changes(file_path):
    # Read chord change points
    chord_changes = pd.read_csv(file_path, header=None, names=['timestamp', ''])

    # drop the second column
    chord_changes = chord_changes.drop(chord_changes.columns[1], axis=1)

    return chord_changes

chord_changes = get_chord_changes('hcs_timings.csv')
chord_changes

Unnamed: 0,timestamp
0,3.501950
1,5.647392
2,7.481610
3,10.919070
4,12.796168
...,...
116,178.682358
117,179.369342
118,180.181814
119,180.826871


Read in the features Dataframe: (all the values are zeros because the song starts and ends with silence and that's all that shows up in the preview)

In [100]:
# Read features data
def get_features(file_path):
    # Read the features data
    features = pd.read_csv(file_path, header=None)

    return features

features = get_features('hcs_features.csv')
features


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,15,16,17,18,19,20,21,22,23,24
0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.046440,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.092880,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.139320,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.185760,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4000,185.759637,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4001,185.806077,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4002,185.852517,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4003,185.898957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Average the vectors for each timeframe and make a new dataframe:

In [101]:
def get_averaged_features(timings, features):
    # Initialize a DataFrame to store the averaged vectors
    averaged_vectors = pd.DataFrame()
    previous_timestamp = 0

    # Iterate through chord change points
    for index, row in chord_changes.iterrows():
        timestamp = row['timestamp']

        # Find the corresponding indices in the features DataFrame
        selected_features = features[
            (features[0].astype(float) >= previous_timestamp) & 
            (features[0].astype(float) <= timestamp)
        ]

        # If features are found for the timestamp range, calculate the mean
        if not selected_features.empty:
            # Calculate the mean of each column (excluding the timestamp column)
            averaged_vector = selected_features.iloc[:, 1:].mean(axis=0)

            # Append the averaged vector to the new DataFrame
            averaged_vectors = pd.concat([averaged_vectors, averaged_vector.to_frame().T], ignore_index=True)

        # Update the previous timestamp
        previous_timestamp = timestamp

    return averaged_vectors

averaged_vectors = get_averaged_features(chord_changes, features)

In [102]:
averaged_vectors

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,15,16,17,18,19,20,21,22,23,24
0,0.305837,0.220323,0.140283,0.050559,0.067687,0.068751,0.077468,0.331604,0.095178,0.126946,...,0.573559,0.013299,1.241581,0.053218,0.056283,1.748938,0.195071,0.264753,0.111941,0.430536
1,0.147062,0.267066,0.300512,0.045796,0.089711,1.055696,0.072961,0.122940,0.048593,0.244235,...,1.152237,0.022119,0.357331,2.891696,0.038202,0.209391,0.029314,2.092565,0.138717,0.467935
2,0.185950,0.168294,0.371642,0.206185,0.137045,0.091321,0.099670,0.755296,0.017308,0.097803,...,2.058543,0.076407,0.241332,1.799922,0.379164,1.691299,0.008413,0.755467,0.104670,1.480187
3,0.270492,0.247323,0.187593,0.112199,0.117829,0.237820,0.140693,0.394209,0.146781,0.120027,...,1.009341,0.031355,1.372785,0.141800,0.076582,1.815634,0.193436,0.231199,0.137206,0.481900
4,0.364036,0.404669,0.314562,0.033203,0.107459,0.960382,0.089106,0.161010,0.057320,0.193089,...,1.343767,0.009796,0.712894,2.106066,0.074584,0.322960,0.038718,2.395824,0.104163,0.586315
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
116,0.071674,0.129855,0.757752,0.128901,0.116807,0.121086,0.216524,1.078237,0.390479,0.074554,...,1.213195,0.262701,0.152785,0.294732,0.162984,2.592958,0.167124,0.669634,0.096723,1.565816
117,0.146788,0.045926,0.156564,1.516276,0.243394,0.047071,0.069705,0.417726,0.264011,0.088127,...,0.230480,2.085351,0.162459,0.288974,0.077296,1.623091,0.271792,0.046922,1.215296,0.150611
118,0.091358,0.022814,0.145600,1.314087,0.225952,0.143281,0.037280,0.042932,0.189198,0.198734,...,0.593788,1.540709,0.311844,1.655614,0.184766,0.761021,0.123983,0.439552,1.249135,0.166130
119,0.012283,0.404819,0.274997,0.737840,0.119020,0.324558,0.033959,0.008117,0.051442,0.191210,...,0.228681,0.963080,0.592186,2.339270,0.011460,0.752252,0.014328,1.325643,0.309356,0.339290


Use these features averaged over the correct time periods to predict the sequence of chords in the song:

In [107]:
def get_chord_predictions(model, averaged_vectors):
    # Make predictions on the averaged vectors
    predictions = model.predict(averaged_vectors)

    # Convert numerical predictions back to chord names
    predicted_chords = label_encoder.inverse_transform(predictions)

    # return it as a df
    return pd.DataFrame({'Predicted': predicted_chords})

predictions_df = get_chord_predictions(RFmodel, averaged_vectors)



In [106]:
predictions_df

Unnamed: 0,Predicted
0,A:maj
1,D:maj
2,E:7
3,A:maj
4,D:maj
...,...
116,E:maj
117,C:maj
118,C:maj
119,D:maj


Function to do all of this for a given song features and timings path:

In [112]:
def predict_chords(timings_path, features_path, model):
    # get the chord changes and features
    chord_changes = get_chord_changes(timings_path)
    features = get_features(features_path)

    # get the averaged features
    averaged_vectors = get_averaged_features(chord_changes, features)

    # get the predictions
    return get_chord_predictions(model, averaged_vectors)

Make Predictions on Dancing in the Moonlight- King Harvest

In [111]:
dml_predictions = predict_chords('dml_timings.csv', 'dml_features.csv', RFmodel)
dml_predictions



Unnamed: 0,Predicted
0,F:maj
1,Bb:maj
2,Eb:maj
3,F:min
4,Bb:maj
...,...
116,E:maj
117,E:maj
118,E:maj
119,E:maj


Make predictions on Vanilla Tobacco- Eloise

In [114]:
vtb_predictions = predict_chords('vtb_timings.csv', 'vtb_features.csv', RFmodel)
vtb_predictions



Unnamed: 0,Predicted
0,G:maj
1,D:maj
2,E:maj
3,E:min
4,E:maj
...,...
101,C:maj
102,D:maj
103,C:maj
104,D:maj


## Discussion and Results:

Overall, I'm pretty satisfied with the outcome of this project. 

The main takeaway that I have right now is that this is truly a resource-intensive project-- compiling the data for, training, and storing these models takes massive amounts of space and is very slow, even when I ran this on my PC. Even trying to save the Random Forest Model on *half* of the original dataset took 20GB. 

Unfortunately, there's not a real way around this-- unlike binary classifiers that are quite easy to run, there are a few hundred chord labels here, so having a huge set is pretty important. 
I tried very hard to work around this by enlisting my GPU for computation in addition to the CPU for model training, but my the setup wasn't working and I figured my time was better spent researching other ways to approach the problem. 

I tried training the LR and RF models on larger portions of the dataset, but this inevitably lead to repeated crashes related memory and storage issues. Therefore, I ended up reducing the set to 50% of its original size, which allowed me to actually train both models (while sacrificing some accuracy). 

The logistic regression model from part 3a performed pretty poorly, only scoring 51% on the weighted f1. However, I made some dataframes comparing the predicted chords with the actual chords, and found that a lot of the time there were just minor errors, such as labeling a C:Maj as a C5 (which is just a Cmaj without the E (the 3)), or would get the root right but just identify another note as part of the chord when it was likely part of the melody. Thus, I decided to test the accuracy of the model's predictions at just guessing the root note of the chord, which the logistic regression scored 80% on (weighted f1). This wasn't bad at all, for a basic model at least. 

The random forest did much better, scoring a 66% on the weighted f1. I did the same method on the RF as I did for the LR to see the accuracy of the model only considering the root notes, and it scored over 85% correct. 

These numbers aren't extremely high, but I'm very satisfied with them given the simplicity of the model and the complexity of this task. The leading research teams in the past 10 years were very excited to get accuracy scores in the mid 70s, even using myriad other techniques to achieve higher accuracy. I looked into trying some of these, but a lot of them are extremely complicated-- such as Hidden Markov Models to incorporate transition probabilties and use the sequence of chords to infer subsequent chords. These are very cool to read about, but actually finding the data for a model like this and implementing it effectively would've been a much larger project than this one. 

I also would've liked to have tried to infer the chord transition times without supervision, but I read into this and also determined it would be a many-hour type of task, so I decided to do the changes manually to test a few songs on my own. 

At the end (Part 4), you can see the functions I wrote to use the model on 3 different songs-- Here Comes the Sun- The beatles, Dancing in the Moonlight- King Harvest, and Vanilla Tobacco- Eloise. I did these in order of increasing difficulty and differences from the songs in the billboard set, and my predictions were correct. Here comes the sun was incredibly accurate-- I was able to play along with the song nearly perfectly using just the outputs from my model (If you look up the chord sheet for this online, it's usually a transposed version in D that is easier for noobie guitar players, so don't compare it with mine. It is in A.). It was also a pretty standard pop song format with familiar instruments, simple bass lines, etc., which made it easier for the model. Dancing in the moonlight was pretty good as well, but was a little worse than Here comes the sun. There were a decent amount of errors from the model, but they were usually errors that were pretty close to the correct chord- either sharing roots or other important tones. Lastly, Vanilla Tobacco was pretty tough. I suspect some of this was due to the inaccuracy generated from me trying to tap my keyboard as the song played so it would perfectly capture only the correct snippets. Furthermore, the song uses a fair amount of unrooted voicings (this essentially means a shape that is harder to accurately recognize for a model like this), and I think the instrumentation probably also made it a bit harder. She also sings loudly and does a lot of scatting which completely throws of the chromagram. The model got a few of the chords right here, but as I suspected, this song pushed its limits. 



Although I didn't get to do any of these on this project since they are much more complex issues, I am interested to try and implement the use of some sort of sequence-based model such as an HMM to help with this, as well as other more advanced pre-processing for more accurate reads. 

Mainly though, this model would be a lot more effective if generated with access to more computing resources-- a much larger dataset or perhaps even genre-specific models would be extremely helpful, and this isn't really possible without massive amounts of ram and stable GPU integration. 

That being said, I am very pleased with how it turned out even given the limited resources I had on this project. 