# Carnatic Beat Detection: what is that beat?


### Overview

This project develops an ML/AI classifier that identifies the Carnatic beat cycle (taalam) of a mridangam solo

### Background
In the South Indian classical music genre (also called Carnatic music), the concept of rhythm is very well developed and sophisticated. All songs follow a certain beat cycle called taalam. During a concert the percussionist, who typically plays a hand drum called mridangam, is given an opportunity to perform a drum solo, where he/she brings out the intricacies of the taalam of the preceding song. The solo can last from 5 to 10 minutes to as much as 30 to 40 minutes.


### Goal

Given a clip of a drum solo, identify the taalam (beat cycle) in which it is performed.

While there are 5 main taalam types that are commonly performed (and in principle thousands of possible taalams), I have restricted the scope of this project to 3 well-known and often-used taalams: aadi talam (8-beat cycle), mishra-chapu talam (7 beat cycle) and khanda-chapu talam (5-beat cycle).

### Data

In [88]:
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt
import pandas as pd
import warnings
import seaborn as sns
import os
import random
import math

from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import PolynomialFeatures, StandardScaler, OneHotEncoder, TargetEncoder, LabelEncoder
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import make_column_transformer, TransformedTargetRegressor, ColumnTransformer
from sklearn.feature_selection import SequentialFeatureSelector, RFE
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression 
from sklearn.metrics import mean_squared_error, accuracy_score

from sklearn.linear_model import LinearRegression, Ridge, Lasso, LogisticRegression, LogisticRegressionCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

import time

In [89]:
# Load the data
beatsDf1 = pd.read_csv("data/beats.csv")
beatsDf1.sample(7)

Unnamed: 0,FileName,Beat,1,2,3,4,5,6,7,8,...,491,492,493,494,495,496,497,498,499,500
70,MisraChapu-1018,M,0.000291,0.003781,0.006108,0.007853,0.011344,0.014834,0.017743,0.026178,...,,,,,,,,,,
45,MisraChapu-1011,M,0.003187,0.006373,0.00956,0.011877,0.016512,0.01883,0.020568,0.029258,...,,,,,,,,,,
3,Aadi-1003,A,0.003196,0.006392,0.009587,0.028181,0.031668,0.035445,0.03864,0.04968,...,,,,,,,,,,
40,MisraChapu-1006,M,0.003472,0.005787,0.007234,0.010417,0.0136,0.016204,0.019097,0.021991,...,,,,,,,,,,
58,Aadi-2,A,0.003766,0.014774,0.019409,0.024623,0.030127,0.032735,0.035342,0.037949,...,,,,,,,,,,
56,UKS-KhandaChapu-102,K,0.0,0.00232,0.00493,0.009861,0.015371,0.020302,0.024942,0.030742,...,,,,,,,,,,
54,Palghat raghu misra chapu - 103,M,0.001448,0.004922,0.011581,0.014765,0.01824,0.025188,0.028662,0.031847,...,,,,,,,,,,


In [90]:
beatsDf1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 78 entries, 0 to 77
Columns: 502 entries, FileName to 500
dtypes: float64(500), object(2)
memory usage: 306.0+ KB


#### Cleanup 1:
We will set the number of numeric (time-lapse) features to be 250. Drop features with names > 250.

In [91]:
# We will drop features with names 251 -- 500
featuresToDrop = []
for nn in range(251, 501):
    featuresToDrop.append(str(nn))

In [92]:
# Drop the columns above
beatsDf1 = beatsDf1.drop(featuresToDrop, axis = 1)

In [93]:
beatsDf1.sample(7)

Unnamed: 0,FileName,Beat,1,2,3,4,5,6,7,8,...,241,242,243,244,245,246,247,248,249,250
20,Aadi-1020,A,0.002031,0.005514,0.007835,0.012478,0.018862,0.020894,0.022635,0.027568,...,,,,,,,,,,
55,UKS-KhandaChapu-101,K,0.001181,0.004135,0.009746,0.013881,0.018311,0.024808,0.030715,0.056999,...,,,,,,,,,,
47,MisraChapu-1013,M,0.009059,0.016949,0.02367,0.031268,0.038574,0.04588,0.049971,0.053185,...,0.789889,0.793103,0.797779,0.800994,0.802747,0.805961,0.808884,0.812098,0.814144,0.815605
66,MisraChapu-1014,M,0.003198,0.006977,0.009012,0.011919,0.015988,0.020058,0.024419,0.032558,...,0.846802,0.848837,0.852035,0.85436,0.85814,0.861337,0.863372,0.865407,0.867733,0.871221
3,Aadi-1003,A,0.003196,0.006392,0.009587,0.028181,0.031668,0.035445,0.03864,0.04968,...,0.938698,0.941894,0.944509,0.946833,0.948286,0.949739,0.951772,0.957873,0.96165,0.964265
0,Aadi-1000,A,0.002612,0.005514,0.008996,0.013349,0.016831,0.019733,0.023506,0.027858,...,0.799187,0.80325,0.806152,0.807893,0.809344,0.811956,0.814568,0.816599,0.819501,0.820952
29,Aadi-1029,A,0.001165,0.002621,0.00495,0.008445,0.011066,0.01456,0.017764,0.021549,...,0.72452,0.728014,0.730635,0.735585,0.739662,0.741701,0.74403,0.747816,0.75131,0.754805


In [94]:
beatsDf1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 78 entries, 0 to 77
Columns: 252 entries, FileName to 250
dtypes: float64(250), object(2)
memory usage: 153.7+ KB


#### Cleanup 2:
Drop rows with NaNs. These are really short clips and probably will not contain sufficient data for analysis. There is no meaningful way to fill in missing values.

In [95]:
beatsDf = beatsDf1.dropna()

In [96]:
beatsDf.info()

<class 'pandas.core.frame.DataFrame'>
Index: 63 entries, 0 to 77
Columns: 252 entries, FileName to 250
dtypes: float64(250), object(2)
memory usage: 124.5+ KB


In [97]:
beatsDf.sample(5)

Unnamed: 0,FileName,Beat,1,2,3,4,5,6,7,8,...,241,242,243,244,245,246,247,248,249,250
36,MisraChapu-1002,M,0.002024,0.008965,0.011278,0.014748,0.020532,0.023713,0.02834,0.030943,...,0.91845,0.92192,0.924234,0.926836,0.930596,0.93262,0.93609,0.93956,0.941874,0.945633
76,MisraChapu-1024,M,0.018841,0.023188,0.025797,0.028696,0.031594,0.035942,0.037681,0.04058,...,0.793333,0.795652,0.79971,0.801739,0.803188,0.805797,0.810725,0.814203,0.821739,0.823478
53,Palghat raghu misra chapu - 102,M,0.004345,0.006373,0.00956,0.013326,0.021147,0.023465,0.027231,0.033604,...,0.795191,0.797798,0.800695,0.803592,0.806199,0.809096,0.811414,0.813152,0.816049,0.818366
32,Aadi-1032,A,0.000958,0.00447,0.006386,0.009898,0.013729,0.016922,0.020115,0.023308,...,0.856641,0.859834,0.863346,0.866858,0.869413,0.873244,0.875798,0.878991,0.882822,0.884738
40,MisraChapu-1006,M,0.003472,0.005787,0.007234,0.010417,0.0136,0.016204,0.019097,0.021991,...,0.678819,0.681713,0.684028,0.6875,0.690104,0.693576,0.699653,0.703993,0.705729,0.70978


In [98]:
# Drop the "FileName" column
beatsDf = beatsDf.drop("FileName", axis = 1)

In [99]:
beatsDf.value_counts('Beat')

Beat
A    36
M    26
K     1
Name: count, dtype: int64

In [100]:
# For the current state of the data, we have only one 'K' beat. 
#Drop it so that we have a 2-fold (binary) classification situation.
beatsDf = beatsDf.drop(beatsDf[beatsDf.Beat == 'K'].index)

In [101]:
beatsDf.value_counts('Beat')

Beat
A    36
M    26
Name: count, dtype: int64

### Split data into training and test sets

In [102]:
# Data: indepndent and dependent variables
X = beatsDf.drop(['Beat'], axis = 1)

# target
labelEnc = LabelEncoder()
y = labelEnc.fit_transform(beatsDf['Beat'])

In [103]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
       0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [104]:
# Preprocessing pipeline for numerical features
numeric_feats = []
for nn in range(1, 251):
    numeric_feats.append(str(nn))    

In [105]:
preprocPipe = ColumnTransformer(
    transformers=[
        ('numeric', StandardScaler(), numeric_feats)
    ])

In [114]:
rand_state = 44
# Data, split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = rand_state)

### Building a model for each classifier

In [116]:
# Model pipelines
logRegPipe = make_pipeline(preprocPipe, LogisticRegression(max_iter=10000, random_state = rand_state))
decTreePipe = make_pipeline(preprocPipe, DecisionTreeClassifier(random_state = rand_state))
knnPipe = make_pipeline(preprocPipe, KNeighborsClassifier())
svmPipe = make_pipeline(preprocPipe, SVC(random_state = rand_state))
logRegCVPipe = make_pipeline(preprocPipe, LogisticRegressionCV(cv=5, random_state = rand_state, max_iter=10000))

pipelines = [logRegPipe, decTreePipe, knnPipe, svmPipe, logRegCVPipe]

### Model evaluation

In [117]:
# Evaluating the models
model_performance = []

for pipe in pipelines:
        #Start a timer
        start_time = time.time()
        
        # fit the data
        pipe.fit(X_train, y_train)
        
        #End the timer, get elapsed time
        end_time = time.time()
        fit_time = end_time - start_time

        # Make a prediction, measure the accuracy
        y_pred = pipe.predict(X_test)
        score = accuracy_score(y_test, y_pred)
        
        modelName = type(pipe._final_estimator).__name__

        model_performance.append({
            'Model': modelName,
            'Score': score,
            'Time': fit_time
            })
        

In [118]:

# Dataframe out of the results
performDf = pd.DataFrame(model_performance)

In [119]:
performDf

Unnamed: 0,Model,Score,Time
0,LogisticRegression,0.6875,0.026623
1,DecisionTreeClassifier,0.6875,0.017951
2,KNeighborsClassifier,0.625,0.007979
3,SVC,0.875,0.006981
4,LogisticRegressionCV,0.625,0.505644


In [120]:
pFig1 = px.bar(performDf, x = 'Model', y = 'Score')
pFig1.update_layout(
            title={
            'text' : 'Model Accuracy',
            'x':0.5,
            'xanchor': 'center'
        })
pFig1.show()

In [121]:
pFig2 = px.bar(performDf, x = 'Model', y = 'Time',
                labels = {
                     "Model": "Model",
                     "Time": "Time (seconds)",
                 })

pFig2.update_layout(
            title={
            'text' : 'Model Computation Time',
            'x':0.5,
            'xanchor': 'center'
        })
pFig2.show()

### Evaluation

#### Model Accuracy
All models fared better than random chance but most were not very impressive, scoring in the 60% to 70% range. The exception was <b>Support Vector Classifier (SVC)</b> which scored 87%!

#### Computational performance

LogisticRegressionCV took the longest time by far. Our best accuracy performer, svc, was also the speediest.

#### Caveat
The above analysis was done with a very small dataset! The results might change as more data is generated and brought into the analysis.

### Recommendations

Based on the results of our Machine Learning models, considering both Model Accuracy with Computational Performance, we can recommend that <b>SVC</b> is a promising model to use to identify the beat of a Carnatic drum solo.

### Next Steps

#### Data, data, data
There is an old adage that the 3 most important aspects in real estate are location, location, location. One could say that in an AI/ML project, the 3 most important aspects are data, data, data! Without a lot of high-quality data, the results of any analysis has to be taken with a grain of salt. We need **a lot** more data to analyze before we can confidently assert that we have built a reliable AI model to identify a Carnatic beat cycle.

#### Sound Clip Vectorization
At a meta level, is there a better approach to vectorizing a sound clip to select out drum beats, other than intensity analysis? Experts in the domain (music) might have some suggestions about how they discern different beat cycles, which might be translatable into a vectorization technique. 
 
#### Neural Nets
Since we have pretty high-dimensional data to begin with, a Neural Net approach might give much better results than the classic ML/AI regression techniques that we have applied here. This path is definitely worth exploring further.
