# ChorismaAI: Training a Chord Recognition Model

## Introduction to Project Scenario
**Problem Statement**

As avid guitarists, Liam and myself constantly love to experiment with new guitar tunings and chords. However, there are moments when we may play a chord without explicitly knowing its name in the process of music creation. For alike musicians, this imposes a strict barrier on using music theory knowledge to build on nicely sounding chords and 

**Proposed Solution**

Our team consisting of members Dillon de Silva, Liam Ling, Lachlan Liu and Bernard Tam has designed the app *Chorisma*, which aims to provides users with a simple and intuitive way to leverage an accurate and expandable ML model for chord recognition. Not only do we provide a scalable, ML solution but we also aim to provide the maximum benefit to musicians creating music, making our tool an essential for production.

## Model Training

ChorismaAI takes raw time-domain signal recordings as user input and aims to return a categorical label, annotating what chord is recognized in the recording. At a high level, we constructed our model by extracting useful features from this signal and taking classical approaches to developing a high accuracy model

### Feature Extraction

Chords are comprised of several musical notes (typically $\ge$ 3 notes). Each note corresponds to a musical pitch and therefore, some vibrational frequency. Also, time-domain signals alone are not inherently useful for performing chord recognition. 

With all these constraints and facts in mind, we require the use of a Fourier Transform to be able to analyze the spectral densities of frequencies present in the signal. Using this, we can obtain the chromagram of our signal which provides us with the spectral amplitudes of the 12 musical pitches. Since different chords will have varying intensities of each pitch, we can use this as a means of classification for our model.

Note: Short-Time Fourier Transform (STFT) was used to ensure our model handles performing fourier transforms on windowed data.

In [7]:
# Installing dependencies for feature extraction
%pip install pandas librosa numpy tqdm --quiet

# Import modules
from tqdm import tqdm

import librosa
import numpy as np
import os
import pandas as pd
import pickle

# Configuring path to training data
TRAINING_PATH = "Data/Training"


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [10]:
# Getting chromagram from raw audio file
def get_chromagram(raw_audio_path):
    raw_audio_ts, sr = librosa.load(raw_audio_path)
    chromagram = np.array(librosa.feature.chroma_stft(y=raw_audio_ts, sr=sr, center=True), dtype=object)
    chromagram = np.mean(chromagram, axis=1) # Calculating the mean spectral intensity across each of 12 pitches
    return chromagram

# Gets name of chord based on subdirectory folder name
def get_chord_from_subdir(subdir):
    subdir_components = subdir.split("/")
    return subdir_components[-1]

In [11]:
# Aggregating our chromagram to chord feature set
chromagram_to_chord_data = []

# Loop through chord directories in test data
for subdir, dirs, files in tqdm(os.walk(TRAINING_PATH)):
    chord_name = get_chord_from_subdir(subdir)
    for file in files:
        raw_audio_path = os.path.join(subdir, file)
        chromagram = get_chromagram(raw_audio_path)
        feature_data = [chord_name] 
        for pitch in chromagram:
            feature_data.append(pitch) # Add each pitch as its own element in our feature data

        chromagram_to_chord_data.append(feature_data)

9it [00:29,  3.28s/it]


In [12]:
# Exporting to pickled object/csv for training
chromagram_to_chord_df = pd.DataFrame(data=chromagram_to_chord_data)
chromagram_to_chord_df.to_csv('TrainingData.csv')
chromagram_to_chord_df.to_pickle('TrainingData.pkl')

chromagram_to_chord_df.head(n=6)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,Am,0.544287,0.150015,0.062872,0.291705,0.57869,0.165433,0.073326,0.089554,0.139484,0.314026,0.457331,0.546481
1,Am,0.477938,0.15954,0.078124,0.217118,0.757103,0.235772,0.070104,0.206644,0.27098,0.513716,0.444653,0.706167
2,Am,0.29792,0.117693,0.047192,0.235275,0.913052,0.269611,0.079305,0.302352,0.216573,0.346569,0.235025,0.291966
3,Am,0.34331,0.127324,0.049306,0.214362,0.883645,0.234823,0.086475,0.29938,0.235693,0.430943,0.284206,0.359154
4,Am,0.416165,0.251908,0.26586,0.153626,0.819963,0.738397,0.08305,0.099816,0.180761,0.394568,0.429308,0.353965
5,Am,0.086091,0.148181,0.082484,0.203551,0.734694,0.381427,0.066975,0.080422,0.164592,0.461862,0.269435,0.110513


### Initial Model Architecture Choice: Random Forest Classifier

Currently, only Random Forest Classifiers are used by ChorismaAI due to its simplicity in implementation and reliable performance with trivial classification problems using annotated datasets. However in future, the use of ASTs could be interesting as these are the state of the art technology in Audio ML Tasks.

Training was performed using the pickled training data object from feature extraction. Additionally, we used the default hyperparameters for a RandomForestClassifier provided by Scikit-Learn on an 80/20 training-test split (Pareto Distribution).

In [21]:
# Installing dependencies for model training
%pip install sklearn --quiet

# Import modules
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

import pandas as pd
import numpy as np


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [22]:
# Initializing random forest classifier (with default hyperparameters) 
# and performing model training
rf_clf = RandomForestClassifier(
    random_state=44
)

train_data = pd.read_pickle('TrainingData.pkl').reset_index(drop=True)

Y = train_data.iloc[0:, 0]
X = train_data.iloc[0:, 1:]
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

rf_clf.fit(X_train, y_train)

In [23]:
r2_score = rf_clf.score(X_test, y_test)
print(f"R-Squared Score on Test Split: {r2_score}")

R-Squared Score on Test Split: 0.96875


Our reported $r^2$ accuracy is $\approx 97\%$. Whilst this is a satisfactory result, let's also write an interactive set of code blocks to allow us to experiment with our own guitar recordings and "spot-check" if the model is working as expected. 

In [24]:
# -- MODIFY THIS ---
MY_CHORD_PATH = 'Data/Experimental/Random_Bb_Chord.wav' # Put the path to your recorded chord file here

In [26]:
# -- DO NOT CHANGE THIS ---
# Run this block to find predicted chord based on your file
my_chord = get_chromagram(MY_CHORD_PATH).reshape(1, -1)
pred = rf_clf.predict(my_chord)
print(f"PREDICTED CHORD: {pred}")

PREDICTED CHORD: ['Bb']
