# Instrument Classification in Carnatic Music (ICCM)
Group 4 - Guillem Gauchia - Àlex Herrero - Gerard San Miguel- Roddie Mc Guinness

# Dataset Creation

In [2]:
%load_ext autoreload
%autoreload 2

### Explore Dataset

You can access the Saraga Carnatic dataset using the [mirdata API](https://github.com/mir-dataset-loaders/mirdata). You should already have the dataset downloaded on your machine in the mirdata repository.

In [3]:
pip install mirdata

Note: you may need to restart the kernel to use updated packages.


In [4]:
import mirdata, librosa
import matplotlib.pyplot as plt
import soundfile as sf
import IPython
import IPython.display as ipd

In [5]:
data_home = '/Users/alex/mir_datasets/saraga_carnatic'

In [6]:
saraga = mirdata.initialize('saraga_carnatic', data_home=data_home)

In [7]:
all_tracks = saraga.load_tracks()
list_keys = all_tracks.keys()
count = 0
for key in list_keys:
    print(key)
    count += 1
    if count == 5:
        break

0_Dorakuna
1_Ganamuda_Panam
2_Chidambara_Natarajam
3_Vandalum
4_Thiruveragane_Saveri_Varnam


You can choose a random track using `.choice_track()`. This returns a Track object.

In [8]:
example_track = saraga.choice_track()

You can load all tracks and information to a dict using `.load_tracks()`

In [9]:
all_tracks = saraga.load_tracks()

This returns a dict of `unique track identifier` : `track` object for each track.

Track objects contain all filepaths of audios and metadata associated with the chosen track, and some information related to the recording itself (such as artist names and instruments). Remember, that for many recordings, we have 4 audio files relevant to our task...


The path of the final mixed performance:

In [10]:
example_track.audio_path

'/Users/alex/mir_datasets/saraga_carnatic\\saraga1.5_carnatic/Kanakadurga Venkatesh at Arkay by Kanakadurga Venkatesh/Lokavana Chatura/Lokavana Chatura.mp3.mp3'

The path of the vocal microphone:

In [11]:
example_track.audio_vocal_path

'/Users/alex/mir_datasets/saraga_carnatic\\saraga1.5_carnatic/Kanakadurga Venkatesh at Arkay by Kanakadurga Venkatesh/Lokavana Chatura/Lokavana Chatura.multitrack-vocal.mp3'

The path of the violin microphone:

In [12]:
example_track.audio_violin_path

'/Users/alex/mir_datasets/saraga_carnatic\\saraga1.5_carnatic/Kanakadurga Venkatesh at Arkay by Kanakadurga Venkatesh/Lokavana Chatura/Lokavana Chatura.multitrack-violin.mp3'

And two mridangam microphones (one for each head):

In [13]:
example_track.audio_mridangam_left_path

'/Users/alex/mir_datasets/saraga_carnatic\\saraga1.5_carnatic/Kanakadurga Venkatesh at Arkay by Kanakadurga Venkatesh/Lokavana Chatura/Lokavana Chatura.multitrack-mridangam-left.mp3'

In [14]:
example_track.audio_mridangam_right_path

'/Users/alex/mir_datasets/saraga_carnatic\\saraga1.5_carnatic/Kanakadurga Venkatesh at Arkay by Kanakadurga Venkatesh/Lokavana Chatura/Lokavana Chatura.multitrack-mridangam-right.mp3'

Navigate to these files and listen to the audios. What do you notice about them? Are they the same intensity? Is there any undesirable artifacts such as leaking or noise?

Take note, the `mirdata` `Track` object will not have a `audio_vocal_path` (or vocal or mridangam) attribute if for the given track there is no multi-microphone recordings. Can you use this information to determine how many tracks we have multi-microphone recordings for? (HINT: You can check if an object has a specific attribute using the hasattr function: `hasattr(obj, "<attribute_to_check_for>")`.

In [15]:
# How many tracks with multitrack recordings?

Another important path is the metadata_path:

In [16]:
metadata_path = example_track.metadata_path

Here you will find information relating to the recording such as artist names, instruments, raaga.

Can you create some functions to explore these tracks and metadata? Perhaps it would be useful to know that JSON can be loaded in python using the `json` library:

In [17]:
import json

with open(metadata_path, 'r') as f:
    loaded_json = json.loads(f.read())

In [19]:
def get_metadata(track_id):
    """
    For <track_id>, return a dataframe of associated metadata
    """
    # code here
    metadata = saraga.track(track_id).metadata
    
    return metadata

def get_performer(track_id):
    """
    For <track_id>, return the performer
    """
    # code here
    performer = saraga.track(track_id).metadata['artists']
    return performer

def get_performance(track_id):
    """
    For <track_id>, return the performance name
    """
    # code here
    performance = saraga.track(track_id).metadata['title']
    return performance

def get_raga(track_id):
    """
    For <track_id>, return the raga name
    """
    # code here
    raga = saraga.track(track_id).metadata['raaga']
    return raga


def get_tonic(track_id):
    """
    For <track_id>, return the tonic in hertz
    """
    # code here
    tonic = saraga.track(track_id).tonic
    return tonic

How many ragas/performers/performances are available? How does that breakdown across performances for which we have multi-track recordings and those we dont?

In [20]:
# get dataset statistics
key = '49_Shankari_Shankuru'
get_metadata(key)
get_performer(key)
get_performance(key)
get_raga(key)
get_tonic(key)

195.997718

In [21]:
track = saraga.track(key)
track

Track(
  audio_ghatam_path="...kkarai Sisters at Arkay by Akkarai Sisters/Shankari Shankuru/Shankari Shankuru.multitrack-ghatam.mp3",
  audio_mridangam_left_path="...isters at Arkay by Akkarai Sisters/Shankari Shankuru/Shankari Shankuru.multitrack-mridangam-left.mp3",
  audio_mridangam_right_path="...sters at Arkay by Akkarai Sisters/Shankari Shankuru/Shankari Shankuru.multitrack-mridangam-right.mp3",
  audio_path="...1.5_carnatic/Akkarai Sisters at Arkay by Akkarai Sisters/Shankari Shankuru/Shankari Shankuru.mp3.mp3",
  audio_violin_path="...kkarai Sisters at Arkay by Akkarai Sisters/Shankari Shankuru/Shankari Shankuru.multitrack-violin.mp3",
  audio_vocal_path="...Akkarai Sisters at Arkay by Akkarai Sisters/Shankari Shankuru/Shankari Shankuru.multitrack-vocal.mp3",
  audio_vocal_s_path="...karai Sisters at Arkay by Akkarai Sisters/Shankari Shankuru/Shankari Shankuru.multitrack-vocal-s.mp3",
  ctonic_path="..._carnatic/Akkarai Sisters at Arkay by Akkarai Sisters/Shankari Shankuru/Shan

### Load Audio

The mirdata API returns paths to audio files associated with each track. Can you create some loaders to load an audio based on a given track name? 

**Hint**: The `librosa` library contains functions to load audio from file to an array of amplitude values. `y, sr = librosa.load(audio_path, sr=44100)`. `sr` in this instance refers to the sampling rate of the audio, i.e. how many individual amplitude energy values there are per second (typically 44100Hz). It is important to remember this resolution when converting between number of elements in the returned array and time in the track.

In [22]:
def load_mixed_audio(track_id):
    """
    For <track_id>, return the loaded audio
    """
    # code here
    audio_path = saraga.track(track_id).audio_path
    audio_array, sr = librosa.load(audio_path, sr=44100)
    return audio_array

def load_violin_audio(track_id):
    """
    For <track_id>, return the isolated violin track
    """
    # code here
    audio_path = saraga.track(track_id).audio_violin_path
    audio_array, sr = librosa.load(audio_path, sr=44100)
    return audio_array

def load_voice_audio(track_id):
    """
    For <track_id>, return the isolated voice track
    """
    # code here
    audio_path = saraga.track(track_id).audio_vocal_path
    audio_array, sr = librosa.load(audio_path, sr=44100)
    return audio_array

def load_mridangam_audio_right(track_id):
    """
    For <track_id>, return the isolated mridangam track
    """
    # code here
    
    audio_path = saraga.track(track_id).audio_mridangam_right_path
    audio_array_right, sr_right = librosa.load(audio_path, sr=44100)
    
    audio_array = audio_array_right
    
    return audio_array

def load_mridangam_audio_left(track_id):
    """
    For <track_id>, return the isolated mridangam track
    """
    # code here
    audio_path = saraga.track(track_id).audio_mridangam_left_path
    audio_array_left, sr_left = librosa.load(audio_path, sr=44100)
    
    audio_array = audio_array_left
    
    return audio_array

In [77]:
load_mixed_audio(key)
load_violin_audio(key)
load_voice_audio(key)
load_mridangam_audio_left(key)
load_mridangam_audio_right(key)

array([0.0003028 , 0.00037034, 0.00031961, ..., 0.00024857, 0.0008841 ,
       0.00077049], dtype=float32)

### Listen to Audio

Let's write some functions to listen and visualise these audio arrays in the notebook. 

**Hint**: You should find that the `Ipythoon.display.Audio` useful for playing audio inline in a Jupyter notebook.

**Hint2**: Using the `matplotlib` library you can plot on two dimensions as so:

```
import matplotlib.pyplot as plt

plt.plot(x, y)
```
More information on enhancing these plots (e.g. with titles, axis labels and gridlines) can be found [here](https://matplotlib.org/stable/gallery/lines_bars_and_markers/simple_plot.html).

In [23]:
def plot_waveform(audio_array):
    """
    Plot waveform for <audio_array> using matplotlib.pyplot
    """
    plt.plot(audio_array)


def play_audio(audio_array):
    """
    Generate audio player for <audio_array> using Ipython library
    """
    sf.write('aux_audio.wav', audio_array, 44100)
    return ipd.Audio('aux_audio.wav')
    

In [24]:
x1 = 90*44100 #1:30
x2 = 105*44100 #1:45
key = '49_Shankari_Shankuru'
plot_waveform(load_voice_audio(key)[x1:x2])

In [25]:
play_audio(load_mridangam_audio_right(key)[x1:x2])

Are there any important observations about the mixed or isolated instrument tracks? What is the quality like, do you here all of the instruments clearly? Are there any differences between the audios of the individual instrument tracks?

### Processing

Are the isolated vocal tracks sufficiently isolated? Libraries like [`spleeter`](https://github.com/deezer/spleeter) can help separate singing sources from background instruments. Does it help here?

In [26]:
pip install spleeter

Collecting spleeter
  Using cached spleeter-2.3.2-py3-none-any.whl (51 kB)
Collecting typer<0.4.0,>=0.3.2
  Using cached typer-0.3.2-py3-none-any.whl (21 kB)
Collecting tensorflow<3.0.0,>=2.5.0
  Using cached tensorflow-2.12.0-cp38-cp38-win_amd64.whl (1.9 kB)
Collecting ffmpeg-python==0.2.0
  Using cached ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
Collecting httpx[http2]<0.20.0,>=0.19.0
  Using cached httpx-0.19.0-py3-none-any.whl (77 kB)
Collecting llvmlite<0.39.0,>=0.38.0
  Using cached llvmlite-0.38.1-cp38-cp38-win_amd64.whl (23.2 MB)
Collecting librosa<0.9.0,>=0.8.0
  Using cached librosa-0.8.1-py3-none-any.whl (203 kB)
Collecting norbert==0.2.1
  Using cached norbert-0.2.1-py2.py3-none-any.whl (11 kB)
Collecting httpcore<0.14.0,>=0.13.3
  Using cached httpcore-0.13.7-py3-none-any.whl (58 kB)
Collecting h2<5,>=3
  Using cached h2-4.1.0-py3-none-any.whl (57 kB)
Collecting hpack<5,>=4.0
  Using cached hpack-4.0.0-py3-none-any.whl (32 kB)
Collecting hyperframe<7,>=6.0
  Using cached

ERROR: Cannot uninstall 'llvmlite'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.


How does the quality compare? Does spleeter work effectively? Do we lose any important information?

### Tagging Audio

We want to tag our audios with whether or not a particular instrument is sounding. We can do this by identifying non-silent regions in the isolated tracks and tagging the mixed tracks with the instrument. The `librosa` library contains functionality for identifying silent regions in audio (`librosa.effects.split`).

In [171]:
# Load audios for each instrument track using previously defined functions

In [172]:
# Plot samples of audios (remember the relationship between elements and sampling rate)

Define a function to identify silent regions in an audio array. Look at the documentation for `librosa.effects.split` ([here](https://librosa.org/doc/main/generated/librosa.effects.split.html)). 

**Hint** - The `top_db` parameter tunes the harshness of the cut (a higher value considers louder regions as "silent"). Experiment with this value and compare the results with the audio plots. Do they correspond to what you visualise/hear?
**Remember** - `librosa.effects.split` returns NON-silent intervals.

In [43]:
import librosa
import numpy as np

track = saraga.choice_track()
print(track.track_id)

# load audio
sr = 44100 # sampling rate
y, sr = librosa.load(track.audio_mridangam_left_path, sr=sr)

239_Bhavamulona


In [44]:
t1 = 0 # seconds
t2 = 20 # seconds
sample = y

In [45]:
def build_silence_arr(y,
                      top_db,
                      t1=0,
                      t2=len(y)):
  
    sr = 44100
    # sample variable should also be an input of the function
    sample = y[int(t1*sr):int(t2*sr)]

    # identify silence intervals
    out = librosa.effects.split(y=sample, top_db=top_db, frame_length=int(0.1*44100))
    
    # create binarized array
    silence_array = np.zeros(len(sample), int)
    for start, end in out:
        silence_array[start:end] = 1
    return silence_array, top_db

In [None]:
top_db=22

key = '49_Shankari_Shankuru'
track_arr = load_mridangam_audio_left(key)
print(key)

t1 = 1195
t2 = 1200

silence_array, top_db = build_silence_arr(track_arr, top_db, t1, t2)

sample = track_arr[int(t1*sr):int(t2*sr)]

# plot waveform with silence array
fig, ax1 = plt.subplots()

ax2 = ax1.twinx()

print('Plotting results...')
plt.title(f'top_db={top_db}')
ax1.plot(sample, color='blue')
ax2.plot(silence_array, color='red')

In [None]:
plot_waveform(sample)

In [None]:
play_audio(sample)

In [None]:
sum(mridangam_left_silence)/len(mridangam_left_silence)

In [None]:
sum(mridangam_right_silence)/len(mridangam_right_silence)

In [None]:
combined = [1 in [x,y] for x,y in zip(mridangam_left_silence, mridangam_right_silence)]

In [None]:
sum(combined)/len(combined)

In [None]:
mridangam_right_silence

Do these regions correspond to what you hear when playing the audio with `play_audio` or what you see with `plot_waveform`?

In [186]:
mridangam_left_array = load_mridangam_audio_left(key)
mridangam_right_array = load_mridangam_audio_right(key) 

y_mridangam_left, top_db = build_silence_arr(mridangam_left_array, t1, t2)
y_mridangam_right, top_db = build_silence_arr(mridangam_right_array, t1, t2)
mridangam = [int(any([x or y] for x, y in zip(y_mridangam_left, y_mridangam_right)))]

### Extracting Samples

We should now have all the tools necessary to load and annotated audio. We now want to extract small snippets of audio  from the mixed tracks across the dataset and annotate each of these snippets as either containing voice, mridangam, violin or none of the above (a single audio should be able to have more than one tag). 

It is important that we have examples for all combinations of tags (violin, voice, mridangam, none). Each sample should be of the same length (what should that length be? think about the two extreme cases of very very short and very long, what problems would arise in each of these cases).

Each sample should have a unique identifier (index). The information relating to their tags should be stored in a metadata DataFrame where you can also find information about the performance.

These should all be saved in individual audio files.

Let us try with just on track to begin with...

1. For a certain track id, load all audio files (mix, violin, etc...)

In [219]:
key = '49_Shankari_Shankuru'
x1 = 900 * 44100
x2 = 1200 * 44100
mix_array = load_mixed_audio(key)[x1:x2]
violin_array = load_violin_audio(key)[x1:x2]
vocal_array = load_voice_audio(key)[x1:x2]
mridangam_left_array = load_mridangam_audio_left(key)[x1:x2]
mridangam_right_array = load_mridangam_audio_right(key)[x1:x2]

2. Create a silent/non-silent array using `detect_silence()` defined earlier. 

      **Remember**: The mridangam has two tracks corresponding to it, you must combine them to identify whether either is sounding

In [220]:
top_db = 15
top_db_mrid = 22
print('Now processing silence in mridangam left...\n')
mridangam_left_silence, top_db = build_silence_arr(mridangam_left_array, top_db_mrid)
print('Now processing silence in mridangam right...\n')
mridangam_right_silence, top_db = build_silence_arr(mridangam_right_array, top_db_mrid)
print('Merging them together...\n')
mridangam_silence = [1 in [x,y] for x,y in zip(mridangam_left_silence, mridangam_right_silence)]
mridangam_silence = [int(x) for x in mridangam_silence]
print('Now processing silence in violin...\n')
violin_silence, top_db = build_silence_arr(violin_array, top_db)
print('Now processing silence in vocal...\n')
vocal_silence, top_db = build_silence_arr(vocal_array, top_db)

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...



3. Split mixed audio into small chunks using [numpy array indexing](https://numpy.org/doc/stable/user/basics.indexing.html) (the size of these chunks should be informed by the literature)

In [221]:
chunk_size = 0.5*44100  # Define the desired size of each audio chunk
num_chunks = len(mix_array) // chunk_size
print(num_chunks)

mix_chunks = np.array_split(mix_array, num_chunks)

600.0


4. Determine from your silent/non-silent arrays in Step 2 whether the chunk contains each instrument (voice, vocal, mridangam

In [222]:
# Slice silence arrays identically to mix slice and determine yes/no does chunk contain instrument
instrument_tags = []
time_stamp = x1/44100

for chunk in mix_chunks:
    start_index = np.where(mix_array == chunk[0])[0][0]
    end_index = start_index + len(chunk)
    
    contains_voice = np.any(vocal_silence[start_index:end_index] == 1)
    contains_violin = np.any(violin_silence[start_index:end_index] == 1)
    contains_mridangam = np.any(mridangam_silence[start_index:end_index], where=1)
    #print(mridangam_silence[start_index:end_index])
    instrument_tags.append({
        'index': len(instrument_tags),
        'track_id': key,
        'time_stamp': time_stamp,
        'contains_voice': contains_voice,
        'contains_violin': contains_violin,
        'contains_mridangam': contains_mridangam
    })
    time_stamp = (time_stamp + (chunk_size/44100))
instrument_tags

[{'index': 0,
  'track_id': '49_Shankari_Shankuru',
  'time_stamp': 900.0,
  'contains_voice': False,
  'contains_violin': True,
  'contains_mridangam': False},
 {'index': 1,
  'track_id': '49_Shankari_Shankuru',
  'time_stamp': 900.5,
  'contains_voice': False,
  'contains_violin': True,
  'contains_mridangam': False},
 {'index': 2,
  'track_id': '49_Shankari_Shankuru',
  'time_stamp': 901.0,
  'contains_voice': False,
  'contains_violin': True,
  'contains_mridangam': False},
 {'index': 3,
  'track_id': '49_Shankari_Shankuru',
  'time_stamp': 901.5,
  'contains_voice': False,
  'contains_violin': True,
  'contains_mridangam': False},
 {'index': 4,
  'track_id': '49_Shankari_Shankuru',
  'time_stamp': 902.0,
  'contains_voice': False,
  'contains_violin': True,
  'contains_mridangam': False},
 {'index': 5,
  'track_id': '49_Shankari_Shankuru',
  'time_stamp': 902.5,
  'contains_voice': False,
  'contains_violin': True,
  'contains_mridangam': False},
 {'index': 6,
  'track_id': '49_Sh

5. Save each audio with a unique index.

    **Hint**: Audio arrays can be saved to file using the `soundfile` library:
    `sf.write('<filename>.wav', <audio_array>, <sampling rate>)`
    
    **Remember**: Each audio chunk  needs to be assigned a unique index so as to be managed correctly later on. Feel free to use numbers, hashes or uuids

In [223]:
# store audio with soundfile
os.makedirs('test/', exist_ok=True)
for i, chunk in enumerate(mix_chunks):
    sf.write('test/'f'{i}.wav', chunk, 44100)

6. Add row to metadata table containing relevant track information, index, and instrument annotations.

    **Hint** - A `pandas` dataframe is a suitable place to store information relating to track and instrument annotations. You can create one using: 

    `import pandas as pd`

    `df = pd.DataFrame(columns=<list of columns names>])`
    
    Add new rows using append:
    
    `df.append({dict of {column_name:value>, ignore_index=True)`
    
    And save using:
    
    `df.to_csv('<path.csv>', index=False)`
    
    **Remember** - This table should include the metadata relating to the track, the unique chunk index and a column indicating whether or not it includes each instrument 
    

In [None]:
# metadata dataframe
import pandas as pd

metadata = pd.DataFrame(instrument_tags)
metadata.to_csv('metadata.csv', index=False)

7. Repeat for many tracks and many chunks. Now you have written the individual code to do this for one track/chunk. Let's combine this and apply to a large number of tracks/chunks. Storing each with a unique index and a row in the metadata dataframe.

### Load Dataset

With our dataset created and saved in an intuitive and accessible format. Let's create some loaders to load the files and get metadata.

In [None]:
import os
import pandas as pd

def load_sample(index):
    """
    Load sample with index, <index>
    """
    # Load the audio file using the index
    audio_path = f'audio_chunk_{index}.wav'
    audio, sr = librosa.load(audio_path, sr=None)
    return audio, sr

def get_metadata(index):
    """
    Get metadata for sample with index, <index>
    """
    # Load the metadata CSV file
    metadata = pd.read_csv('metadata.csv')
    
    # Get the metadata for the specified index
    sample_metadata = metadata.loc[metadata['index'] == index].squeeze()
    return sample_metadata


### Main Code for Dataset Creation
<p> Execute all the previous cells in order to get all the function definition and the dataset source files spleeter function definitions can be ignored</p>

In [54]:
#IMPORTS
import pandas as pd
import os
import librosa
import numpy as np
#from spleeter.separator import Separator
import mirdata, librosa
import matplotlib.pyplot as plt
import soundfile as sf
import IPython
import IPython.display as ipd
import json
import shutil

# Choose the number of songs to be in the dataset (number between 1 and 248)
num_songs = 100

# Define the desired size of each audio chunk
chunk_size = 0.25*44100

# Choose if you want to process the entire song or only an interval specified below
full_song = True 

# Take into account that some songs are longer than others when selecting the interval time
x1 = 30 * 44100
x2 = 31 * 44100

# Delete a metadatafile.csv
file_path = 'metadata.csv'
if os.path.exists(file_path):
    os.remove(file_path)
    print(f"File '{file_path}' deleted.")

# Delete the folder with chunks
folder_path = 'samples'
if os.path.exists(folder_path):
    shutil.rmtree(folder_path)
    print(f"Folder '{folder_path}' and its contents deleted.")

total_chunks = 0

# Global counters
count = 0
c_vo = 0
c_vi = 0
c_mr = 0
c_vo_vi = 0
c_vo_mr = 0
c_vi_mr = 0
c_vo_vi_mr = 0
c_none = 0

# Select manually the range of keys you want to select
selected_keys = []
for key in list_keys:
    selected_keys.append(key)
selected_keys = selected_keys[:]
selected_keys

for n_song, key in enumerate(selected_keys):
    print('Key: 'f'{key}\n')
    try:
        # Step 1 - Load the audio arrays
        if full_song:
            mix_array = load_mixed_audio(key)
            violin_array = load_violin_audio(key)
            vocal_array = load_voice_audio(key)
            mridangam_left_array = load_mridangam_audio_left(key)
            mridangam_right_array = load_mridangam_audio_right(key)
        else:
            mix_array = load_mixed_audio(key)[x1:x2]
            violin_array = load_violin_audio(key)[x1:x2]
            vocal_array = load_voice_audio(key)[x1:x2]
            mridangam_left_array = load_mridangam_audio_left(key)[x1:x2]
            mridangam_right_array = load_mridangam_audio_right(key)[x1:x2]

        # Rest of your code for processing the audio arrays

    except Exception as e:
        print('An error occurred for key:', key)
        print('Error:', str(e))
        continue

    # Step 2 - Build the binary arrays the determine the silences of each instrument
    top_db = 15
    top_db_mrid = 22
    print('Now processing silence in mridangam left...\n')
    mridangam_left_silence, top_db = build_silence_arr(mridangam_left_array, top_db_mrid)
    print('Now processing silence in mridangam right...\n')
    mridangam_right_silence, top_db = build_silence_arr(mridangam_right_array, top_db_mrid)
    print('Merging them together...\n')
    mridangam_silence = [1 in [x,y] for x,y in zip(mridangam_left_silence, mridangam_right_silence)]
    mridangam_silence = [int(x) for x in mridangam_silence]
    print('Now processing silence in violin...\n')
    violin_silence, top_db = build_silence_arr(violin_array, top_db)
    print('Now processing silence in vocal...\n')
    vocal_silence, top_db = build_silence_arr(vocal_array, top_db)

    # Step 3 - Determine the total number of chunks for each song
    num_chunks = len(mix_array) // chunk_size
    print('Number of chunks: 'f'{num_chunks}\n')

    mix_chunks = np.array_split(mix_array, num_chunks)

    # Step 4 - Slice silence arrays identically to mix slice and determine yes/no does chunk contain instrument
    instrument_tags = []
    if not full_song:
        time_stamp = x1/44100
    else:
        time_stamp = chunk_size/44100

    for i, chunk in enumerate(mix_chunks):
        # Generate chunk ID with the format "000001", "000002", etc.
        chunk_id = total_chunks

        start_index = np.where(mix_array == chunk[0])[0][0]
        end_index = start_index + len(chunk)
        contains_voice = np.any(vocal_silence[start_index:end_index] == 1)
        contains_violin = np.any(violin_silence[start_index:end_index] == 1)
        contains_mridangam = np.any(mridangam_silence[start_index:end_index], where=1)

        instrument_tags.append({
            'chunk_id': chunk_id,
            'track_id': key,
            'time_stamp': time_stamp,
            'performance': get_performance(key),
            'contains_voice': contains_voice,
            'contains_violin': contains_violin,
            'contains_mridangam': contains_mridangam
        })

        time_stamp = (time_stamp + (chunk_size/44100))
        # Step 5 - Create the chunk audio files
        # store audio with soundfile
        
        if contains_voice and not contains_violin and not contains_mridangam:
            os.makedirs('samples/voice/', exist_ok=True)
            sf.write('samples/voice/'f'{(total_chunks)}.wav', chunk, 44100)
            c_vo += 1
            total_chunks = total_chunks + 1
        elif not contains_voice and contains_violin and not contains_mridangam:
            os.makedirs('samples/violin/', exist_ok=True)
            sf.write('samples/violin/'f'{(total_chunks)}.wav', chunk, 44100)
            c_vi += 1
            total_chunks = total_chunks + 1
        elif not contains_voice and not contains_violin and contains_mridangam:
            os.makedirs('samples/mridangam/', exist_ok=True)
            sf.write('samples/mridangam/'f'{(total_chunks)}.wav', chunk, 44100)
            c_mr += 1
            total_chunks = total_chunks + 1
        elif contains_voice and contains_violin and not contains_mridangam:
            os.makedirs('samples/voice+violin/', exist_ok=True)
            sf.write('samples/voice+violin/'f'{(total_chunks)}.wav', chunk, 44100)
            c_vo_vi += 1
            total_chunks = total_chunks + 1
        elif contains_voice and not contains_violin and contains_mridangam:
            os.makedirs('samples/voice+mridangam/', exist_ok=True)
            sf.write('samples/voice+mridangam/'f'{(total_chunks)}.wav', chunk, 44100)
            c_vo_mr += 1
            total_chunks = total_chunks + 1
        elif not contains_voice and  contains_violin and contains_mridangam:
            os.makedirs('samples/violin+mridangam/', exist_ok=True)
            sf.write('samples/violin+mridangam/'f'{(total_chunks)}.wav', chunk, 44100)
            c_vi_mr += 1
            total_chunks = total_chunks + 1
        elif contains_voice and  contains_violin and contains_mridangam:
            os.makedirs('samples/voice+violin+mridangam/', exist_ok=True)
            sf.write('samples/voice+violin+mridangam/'f'{(total_chunks)}.wav', chunk, 44100)
            c_vo_vi_mr += 1
            total_chunks = total_chunks + 1
        else:
            os.makedirs('samples/none/', exist_ok=True)
            sf.write('samples/none/'f'{(total_chunks)}.wav', chunk, 44100)
            c_none += 1
            total_chunks = total_chunks + 1
    print('All chunks processed...\n')
    # Step 6 - Create the metadata.csv
    # metadata dataframe
    metadata_file = 'metadata.csv'

    if os.path.isfile(metadata_file):
        metadata = pd.read_csv(metadata_file)
    else:
        metadata = pd.DataFrame()

    new_metadata = pd.DataFrame(instrument_tags)
    metadata = pd.concat([metadata, new_metadata], ignore_index=True)
    metadata.to_csv(metadata_file, index=False)

    all_tracks = saraga.load_tracks()
    list_keys = all_tracks.keys()
    count += 1
    if count == num_songs: 
        break
print('Dataset created!\n')

File 'metadata.csv' deleted.
Folder 'samples' and its contents deleted.
Key: 0_Dorakuna

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 8249.0

All chunks processed...

Key: 1_Ganamuda_Panam

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 7150.0

All chunks processed...

Key: 2_Chidambara_Natarajam

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 2930.0

All chunks processed...

Key: 3_Vandalum

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

No

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 1351.0

All chunks processed...

Key: 51_Pavamana_Suthudu_Mangalam

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 341.0

All chunks processed...

Key: 52_Geeta_Nayakan

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 1506.0

All chunks processed...

Key: 53_Siddhi_Vinayakam

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 3019.0

All chunks processed...

Key: 54_Thillana_Pahadi

Now processing silence in mridangam left...



Key: 81_Enthavedukonthu_Raghava

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 1819.0

All chunks processed...

Key: 82_Koluvamaregatha

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 12010.0

All chunks processed...

Key: 83_Neevada_Negana

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 1426.0

All chunks processed...

Key: 84_Vara_Leela_Gana_Lola

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now proces

All chunks processed...

Key: 119_Shlokam_-_Shivah_Shaktyayukto

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 371.0

All chunks processed...

Key: 120_Velum_Mayilume

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 2936.0

All chunks processed...

Key: 121_Tolinenu_Jesina

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence in violin...

Now processing silence in vocal...

Number of chunks: 1035.0

All chunks processed...

Key: 122_Kailasapathe

Now processing silence in mridangam left...

Now processing silence in mridangam right...

Merging them together...

Now processing silence