# Carnatic Beat Detection: what is that beat?


### Overview

This project develops an ML/AI classifier that identifies the Carnatic beat (taalam) of a mridangam solo

### Background
In the South Indian classical music genre (also called Carnatic music), the concept of rhythm is very well developed and sophisticated. All songs follow a certain beat cycle called taalam. During a concert the percussionist, who typically plays a hand drum called mridangam, is given an opportunity to perform a drum solo, where he/she brings out the intricacies of the taalam of the preceding song. The solo can last from 5 to 10 minutes to as much as 30 to 40 minutes.


### Goal

Given a clip of a drum solo, identify the taalam (beat cycle) in which it is performed.

While there are 5 main taalam types that are commonly performed (and in principle thousands of possible taalams), I have restricted the scope of this project to 3 well-known and often-used taalams: aadi talam (8-beat cycle), mishra-chapu talam (7 beat cycle) and khanda-chapu talam (5-beat cycle).

### Data

In [68]:
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt
import pandas as pd
import warnings
import seaborn as sns
import os
import random
import math

from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import PolynomialFeatures, StandardScaler, OneHotEncoder, TargetEncoder, LabelEncoder
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import make_column_transformer, TransformedTargetRegressor, ColumnTransformer
from sklearn.feature_selection import SequentialFeatureSelector, RFE
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression 
from sklearn.metrics import mean_squared_error, accuracy_score

from sklearn.linear_model import LinearRegression, Ridge, Lasso, LogisticRegression, LogisticRegressionCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

import time

In [38]:
# Load the data
beatsDf1 = pd.read_csv("data/beats.csv")
beatsDf1.sample(7)

Unnamed: 0,FileName,Beat,1,2,3,4,5,6,7,8,...,491,492,493,494,495,496,497,498,499,500
7,Aadi-1007,A,0.001736,0.004051,0.010127,0.030382,0.036748,0.050058,0.054977,0.057002,...,,,,,,,,,,
60,Aadi-2Kalai-3,A,0.00407,0.006105,0.010756,0.015407,0.024419,0.033721,0.040116,0.04186,...,,,,,,,,,,
0,Aadi-1000,A,0.002612,0.005514,0.008996,0.013349,0.016831,0.019733,0.023506,0.027858,...,,,,,,,,,,
1,Aadi-1001,A,0.0,0.001448,0.003476,0.005214,0.006952,0.011008,0.014195,0.017961,...,,,,,,,,,,
29,Aadi-1029,A,0.001165,0.002621,0.00495,0.008445,0.011066,0.01456,0.017764,0.021549,...,,,,,,,,,,
55,UKS-KhandaChapu-101,K,0.001181,0.004135,0.009746,0.013881,0.018311,0.024808,0.030715,0.056999,...,,,,,,,,,,
26,Aadi-1026,A,0.00029,0.002902,0.005223,0.007545,0.010157,0.011898,0.01451,0.017992,...,,,,,,,,,,


In [39]:
beatsDf1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Columns: 502 entries, FileName to 500
dtypes: float64(500), object(2)
memory usage: 259.0+ KB


#### Cleanup 1:
We will set the number of numeric (time-lapse) features to be 250. Drop features with names > 250.

In [40]:
# We will drop features with names 251 -- 500
featuresToDrop = []
for nn in range(251, 501):
    featuresToDrop.append(str(nn))

In [41]:
# Drop the columns above
beatsDf1 = beatsDf1.drop(featuresToDrop, axis = 1)

In [42]:
beatsDf1.sample(7)

Unnamed: 0,FileName,Beat,1,2,3,4,5,6,7,8,...,241,242,243,244,245,246,247,248,249,250
35,MisraChapu-1001,M,0.00377,0.007831,0.009571,0.011891,0.016241,0.019722,0.022042,0.025522,...,0.845998,0.851798,0.854408,0.857599,0.861079,0.863979,0.866299,0.868039,0.87007,0.87239
58,Aadi-2,A,0.003766,0.014774,0.019409,0.024623,0.030127,0.032735,0.035342,0.037949,...,0.772016,0.77752,0.780417,0.783024,0.785632,0.793453,0.79635,0.800985,0.803882,0.8146
1,Aadi-1001,A,0.0,0.001448,0.003476,0.005214,0.006952,0.011008,0.014195,0.017961,...,0.756083,0.75956,0.761298,0.762746,0.765643,0.76825,0.769988,0.775203,0.77752,0.779258
2,Aadi-1002,A,0.00204,0.00641,0.009324,0.011364,0.013695,0.017191,0.019522,0.025058,...,0.795746,0.79866,0.800699,0.803322,0.806527,0.809441,0.81148,0.813228,0.815268,0.827214
26,Aadi-1026,A,0.00029,0.002902,0.005223,0.007545,0.010157,0.011898,0.01451,0.017992,...,0.813407,0.81776,0.820371,0.824144,0.827046,0.830238,0.83314,0.837783,0.840104,0.842716
62,Aadi-3,A,0.001157,0.003762,0.006655,0.009259,0.011574,0.014178,0.017361,0.023727,...,0.725116,0.729745,0.73206,0.737558,0.740451,0.743345,0.746238,0.748553,0.760706,0.7636
25,Aadi-1025,A,0.00029,0.002613,0.004355,0.006969,0.011034,0.014228,0.016551,0.018873,...,0.796167,0.797909,0.802555,0.807491,0.813589,0.820267,0.8223,0.825203,0.828107,0.830139


In [43]:
beatsDf1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Columns: 252 entries, FileName to 250
dtypes: float64(250), object(2)
memory usage: 130.1+ KB


#### Cleanup 2:
Drop rows with NaNs. These are really short clips and probably will not contain sufficient data for analysis. There is no meaningful way to fill in missing values.

In [44]:
beatsDf = beatsDf1.dropna()

In [45]:
beatsDf.info()

<class 'pandas.core.frame.DataFrame'>
Index: 52 entries, 0 to 64
Columns: 252 entries, FileName to 250
dtypes: float64(250), object(2)
memory usage: 102.8+ KB


In [46]:
beatsDf.sample(5)

Unnamed: 0,FileName,Beat,1,2,3,4,5,6,7,8,...,241,242,243,244,245,246,247,248,249,250
25,Aadi-1025,A,0.00029,0.002613,0.004355,0.006969,0.011034,0.014228,0.016551,0.018873,...,0.796167,0.797909,0.802555,0.807491,0.813589,0.820267,0.8223,0.825203,0.828107,0.830139
11,Aadi-1011,A,0.001448,0.004343,0.008107,0.010133,0.012449,0.015924,0.020556,0.023451,...,0.786624,0.791257,0.794152,0.797047,0.799942,0.802837,0.805443,0.808049,0.812102,0.816734
10,Aadi-1010,A,0.007817,0.013318,0.017661,0.022293,0.02403,0.025767,0.028083,0.030689,...,0.784019,0.787203,0.790677,0.794152,0.797626,0.800521,0.802837,0.804574,0.806891,0.809786
49,Palghat mani iyer Aadi Taalam - 102,A,0.000872,0.00407,0.006977,0.010465,0.014244,0.018314,0.021512,0.024419,...,0.887209,0.890407,0.89186,0.893895,0.896221,0.899128,0.902326,0.905814,0.907849,0.911047
0,Aadi-1000,A,0.002612,0.005514,0.008996,0.013349,0.016831,0.019733,0.023506,0.027858,...,0.799187,0.80325,0.806152,0.807893,0.809344,0.811956,0.814568,0.816599,0.819501,0.820952


In [47]:
# Drop the "FileName" column
beatsDf = beatsDf.drop("FileName", axis = 1)

In [48]:
beatsDf.value_counts('Beat')

Beat
A    36
M    15
K     1
Name: count, dtype: int64

In [49]:
# For the current state of the data, we have only one 'K' beat. 
#Drop it so that we have a 2-fold (binary) classification situation.
beatsDf = beatsDf.drop(beatsDf[beatsDf.Beat == 'K'].index)

In [50]:
beatsDf.value_counts('Beat')

Beat
A    36
M    15
Name: count, dtype: int64

### Split data into training and test sets

In [52]:
# Data: indepndent and dependent variables
X = beatsDf.drop(['Beat'], axis = 1)

# target
labelEnc = LabelEncoder()
y = labelEnc.fit_transform(beatsDf['Beat'])

In [53]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
       0, 1, 1, 0, 0, 0, 1])

In [56]:
# Preprocessing pipeline for numerical features
numeric_feats = []
for nn in range(1, 251):
    numeric_feats.append(str(nn))    

In [57]:
preprocPipe = ColumnTransformer(
    transformers=[
        ('numeric', StandardScaler(), numeric_feats)
    ])

In [58]:
# Data, split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 32)

### Building a model for each classifier

In [82]:
# Model pipelines
logRegPipe = make_pipeline(preprocPipe, LogisticRegression(max_iter=10000))
decTreePipe = make_pipeline(preprocPipe, DecisionTreeClassifier())
knnPipe = make_pipeline(preprocPipe, KNeighborsClassifier())
svmPipe = make_pipeline(preprocPipe, SVC())
logRegCVPipe = make_pipeline(preprocPipe, LogisticRegressionCV(cv=5, random_state=32, max_iter=10000))

pipelines = [logRegPipe, decTreePipe, knnPipe, svmPipe, logRegCVPipe]

### Model evaluation

In [83]:
# Evaluating the models
model_performance = []

for pipe in pipelines:
        #Start a timer
        start_time = time.time()
        
        # fit the data
        pipe.fit(X_train, y_train)
        
        #End the timer, get elapsed time
        end_time = time.time()
        fit_time = end_time - start_time

        # Make a prediction, measure the accuracy
        y_pred = pipe.predict(X_test)
        score = accuracy_score(y_test, y_pred)
        
        modelName = type(pipe._final_estimator).__name__

        model_performance.append({
            'Model': modelName,
            'Score': score,
            'Time': fit_time
            })
        

In [84]:

# Dataframe out of the results
performDf = pd.DataFrame(model_performance)

In [85]:
performDf

Unnamed: 0,Model,Score,Time
0,LogisticRegression,0.692308,0.030917
1,DecisionTreeClassifier,0.615385,0.012002
2,KNeighborsClassifier,0.384615,0.006988
3,SVC,0.461538,0.007216
4,LogisticRegressionCV,0.461538,0.262327


In [86]:
pFig1 = px.bar(performDf, x = 'Model', y = 'Score')
pFig1.update_layout(
            title={
            'text' : 'Model Accuracy',
            'x':0.5,
            'xanchor': 'center'
        })
pFig1.show()

In [87]:
pFig2 = px.bar(performDf, x = 'Model', y = 'Time',
                labels = {
                     "Model": "Model",
                     "Time": "Time (seconds)",
                 })

pFig2.update_layout(
            title={
            'text' : 'Model Computation Time',
            'x':0.5,
            'xanchor': 'center'
        })
pFig2.show()

### Evaluation

#### Model Accuracy
Most models fared very poorly; in fact, worse than random chance! The exception was <b>Logictic Regression</b> which had an approximately 70% accuracy.

#### Computational performance

LogisticRegressionCV took the longest time. Our best accuracy performer, LogsisticRegression, while slower than the remaining classifiers, was acceptably fast.

### Recommendations

Based on the results of our Machine Learning models, considering both Model Accuracy with Computational Performance, we can recommend that <b>Logistic Regression </b> is the best model to use to predict the beat of a Carnatic drum solo.

### Next Steps

#### Sound Clip Vectorization
At a meta level, is there a better approach to vectorizing a sound clip to select out drum beats, other than intensity analysis? Experts in the domain (music) might have some suggestions about how they discern different beat cycles, which might be translatable into a vectorization technique. 
 
#### Neural Nets
Since we have pretty high-dimensional data to begin with, a Neural Net approach might give much better results than the classic ML/AI regression techniques that we have applied here. This path is definitely worth exploring further.
