# Melodic Pattern Recognition in Carnatic Music

In [1]:
%load_ext autoreload

%autoreload 2

Melodic patterns (motifs and phrases), known as sañcāras, play a crucial structural and expressive role in Carnatic Music. These melodic patterns are the means through which the character of the rāga is expressed and form the basis of various improvisatory and compositional formats in the style [2,9]. There exists no definitive lists of all possible sañcāras in each rāga, rather the body of existing compositions and the living oral tradition of rāga performance act as repositories for that knowledge.

This notebook will walk through the task of identifying and annotating sañcāras from audio using the [compIAM tools repository](https://github.com/MTG/compIAM).

The methodology follows that presented in [_Nuttall, Thomas, Genís Plaja-Roglans, Lara Pearson, and Xavier Sierra. "In search of Sañcāras: tradition-informed repeated melodic pattern recognition in carnatic music." (2022)._](https://repositori.upf.edu/handle/10230/54155)

In [2]:
from compiam import load_model
from compiam.visualisation.waveform_player import Player
from compiam.melody.pattern.sancara_search.extraction.evaluation import load_annotations_brindha, to_aeneas

## 1. Performance

We use as an example a performance of insert_raga_here by insert_performer_here, in the raga, insert_raga_here. From the Saraga dataset.

In [3]:
audio_path = "/Volumes/Shruti/asplab2/cae-invar/audio/multitrack/Sharanu Janakana.mp3"
title = 'Sharanu Janakana'

In [16]:
Player(title, audio_path)

Player(value='<iframe srcdoc="&lt;!DOCTYPE html&gt;\n&lt;html lang=&quot;en&quot;&gt;\n  &lt;head&gt;\n  &lt;m…

In [5]:
# Maybe include: Source separation?

# TODO, for each stage include a visualisation [VIS]
# [VIS] Audio waveform and playback

## 2. Feature Extraction

Two feature sets are extracted from each recording in SCV: (1) An automated transcription of the predominant sung pitch in Cents from which we derive a mask corresponding to silent/stable regions, and (2) Transformation-invariant melodic features extracted using a Complex Autoencoder (CAE) from audio in CQT representation, which we use for self-similarity computation.

### 2.1 Predominant Pitch Track

Load FTA-Net predominant pitch extraction trained on carnatic dataset

In [6]:
ftanet = load_model('melody:ftanet-carnatic')

2022-11-16 18:50:09.304957: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Extract predominant pitch track

In [7]:
pitch_track = ftanet.predict(audio_path)
pitch = pitch_track[:,1]
time  = pitch_track[:,0]
timestep  = time[2]-time[1]

CFP process in /Volumes/Shruti/asplab2/cae-invar/audio/multitrack/Sharanu Janakana.mp3


2022-11-16 18:52:35.588288: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)


In [8]:
# [VIS] Pitch track + waveform + audio playback

# Pitch distribution using pypeaks

Relative to other musical styles, Carnatic Music is heavily ornamented. This ornamentation is not superficial decoration but rather is integral to musical meaning [2]. The ornaments, known as gamakas, can greatly alter the sound of the notated svaras (notes); for example, some gamakas do not rest at all on the theoretical pitch of the notated svara, and instead involve oscillations between pitches either side of it [3,4]. This oscillatory movement is particularly characteristic of the style, and can often subsume individual svaras [4]. The surface effect on the melodic line is that it typically has fewer stable pitch regions than many other styles. Such qualities makes it difficult for researchers who are not themselves Carnatic musicians to reliably identify svaras from audio recordings.

#### 2.1.1 Silence and Stability

In [10]:
# Stability Track
from compiam.utils.pitch import extract_stability_mask
import numpy as np

stability_mask = extract_stability_mask(
    pitch=pitch, 
    min_stab_sec=1.0, 
    hop_sec=0.2,
    var=60,
    timestep=timestep)

# Silence Mask
silence_mask = pitch==0

exclusion_mask = np.logical_or(silence_mask==1, stability_mask==1)

#### 2.1.2 Visualising Pitch

In [11]:
# Pitch curve
# Load expected notes for this svara
# Can you see more peaks than svaras?

### 2.2 Melodic Feature Embeddings

In [12]:
# Pattern Extraction for a Given Audio
from compiam import load_model

# Feature Extraction
# CAE features 
cae = load_model("melody:cae-carnatic")

ampl, phase = cae.extract_features(audio_path)

# [VIS] Feature activation

loading file /Volumes/Shruti/asplab2/cae-invar/audio/multitrack/Sharanu Janakana.mp3




#### 2.2.1 Self Similarity Matrix

In [13]:
# [VIS] Stability and silence annotation

# Self Similarity
from compiam.melody.pattern import self_similarity

ss = self_similarity(ampl, exclusion_mask=exclusion_mask, timestep=timestep, hop_length=cae.hop_length, sr=cae.sr)
X, orig_sparse_lookup, sparse_orig_lookup, boundaries_orig, boundaries_sparse = ss

In [14]:
# [VIS] Self similarity matrix
# [VIS] Self similarity matrix with boundaries annotated
# [VIS] Self similarity matrix full silent and stable regions annotated

# Segment Extraction

# Segment Grouping

## 3. Melodic Pattern Recognition

Another important feature of the style is the structural and expressive significance of motifs and phrases known as sañcāras, which can be defined as coherent segments of melodic movement that follow the grammar of the rāga (melodic framework) [2,8]. Some musicians use the term in a narrower sense to refer to phrases that are particularly characteristic of the rāga, but here we use the term in its broader sense, referring to any melodic segment that is coherent from a Carnatic performer’s perspective. These melodic patterns are the means through which the character of the rāga is expressed and form the basis of various improvisatory and compositional formats in the style [2,9]. There exists no definitive lists of all possible sañcāras in each rāga, rather the body of existing compositions and the living oral tradition of rāga performance act as repositories for that knowledge.

In [None]:
# Segment extraction
segment_extractor

In [None]:
# Grouping 

## 4. Evaluation

Ground-truth annotations of sañcāras in certain audio recordings were created by professional Carnatic vocalist, Brindha Manickavasakan. Annotations were created in Carnatic notation, known as sargam [36]. As there is no definitive list of sañcāras in a given rāga, the segmentations are based on the annotator’s experience as a professional performer and student of a highly esteemed musical lineage. These annotations are therefore subjective to some degree, but have the benefit of being based on expert performer knowledge rather than on an externally imposed metric that may be irrelevant to musical concepts held by culture bearers.

In [None]:
# load annotations

In [None]:
# explore annotations 

In [None]:
# metrics for evaluation

In [None]:
Player(title, audio_path)

In [None]:
# visualise annotations vs returned patterns

## 5. Exploring Results