(music-segmentation)=
# Music segmentation
We have seen in the musicological introduction that we may come across different formats of [Carnatic](carnatic-formats) and Hindustani performances. These must be taken very much into account when designing strategies to segment the different sections in these musical pieces. 

In [None]:
## Importing compiam to the project
import compiam

# Import extras and supress warnings to keep the tutorial clean
import os
from pprint import pprint

import warnings
warnings.filterwarnings('ignore')

Let's first list the available tools for music segmentation in `compiam`.

In [None]:
compiam.structure.segmentation.list_tools()


## Dhrupad Bandish segmentation

In this section we will showcase a tool that attempts to identify, through the use of rhythmic features, the different sections in a Dhrupad Bandish performances {cite}`rohit_2020`, one of the main formats in Hindustani music. As seen in the [documentation](https://mtg.github.io/compIAM/source/structure.html#dhrupad-bandish-segmentation), this segmentation model is based on PyTorch. Therefore, we proceed to install ``torch``.

In [None]:
%pip install torch==1.8.0

This tool may be accessed from the ``structure.segmentation``, however, the tool name has an ``*`` appended, therefore we can use the wrapper for models to rapidly initialize it with the pre-trained weights loaded.

```{tip}
Get the correct code for the wrapper by running ``compiam.list_models()``.
```

In [None]:
dbs = compiam.load_model("structure:dhrupad-bandish-segmentation")

In the documentation we observe that this model includes quite a number of attributes, and particularly we observe two of them that are interesting:
* ``mode``
* ``fold``

These attributes are important because define the training pipeline that has been used and therefore, a different mode of operating with this model. ``mode`` has options: *net*, *voc*, or *pakh*, which indicate the source for  surface tempo multiple (s.t.m.) estimation. *net* mode is for input mixture signal, *voc* is for clean or source-separated singing voice recordings, and *pakh* for pakhawaj tracks (pakhawaj is a percussion instrument from Northern India). ``fold`` is basically an integer indicating with validation fold we do consider for training.

These configuration variables are loaded by default as ``net`` and ``0`` respectively, however these may be easily changed.

In [None]:
dbs.update_mode(mode="voc")
dbs.update_fold(fold=1)

At this moment, the ``mode`` and ``fold`` have been updated and consequently, the class has automatically loaded the model weights corresponding to ``mode=voc`` and ``fold=1``.

```{note}
Typically in `compiam`, importing a model from the corresponding module or initializing it using the wrapper, can make an important difference on how the loaded instance works. Generally speaking, if you use the wrapper you will probably be only interested in running inference. If your goal is to train or deep-dive into a particular model, you should avoid the use of the model wrapper and start from a clean model instance.
```

Let's now run prediction on an input file. Our mode now is ``voc``, therefore the model expects a clean or source separated vocal signal. Isolated singing voice signals are not commonly available for the case of Carnatic and Hindustani music. We will use a state-of-the-art and out-of-the-box model, [`Spleeter`](https://github.com/deezer/spleeter), to try to separate the singing voice from the accompaniment.

In [None]:
%pip install spleeter

We will now directly download the pre-trained models for `Spleeter`, and use these for inference in this walkthrough.

In [None]:
!wget https://github.com/deezer/spleeter/releases/download/v1.4.0/2stems.tar.gz

In [None]:
# importing the "tarfile" module
import tarfile

# open file
file = tarfile.open("2stems.tar.gz")
os.mkdir("pretrained_models/")

# extracting file
file.extractall(
    os.path.join("pretrained_models", "2stems")
)

file.close()

`Spleeter` is based on `TensorFlow`. We disable the GPU usage and the `TensorFlow` related warnings just like we did in the [pitch extraction walkthrough](melody-extraction).

In [None]:
# Disabling tensorflow warnings and debugging info
import os 
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" 

# Importing tensorflow and disabling GPU usage
import tensorflow as tf
tf.config.set_visible_devices([], "GPU")

We may now load the `Spleeter` separator, which will automatically load the pre-trained weights for the model. We will use the ``2:stems`` model, which has been trained to separate vocals and accompaniment.

In [None]:
from spleeter.separator import Separator

# Load default 2-stem spleeter separation
separator = Separator('spleeter:2stems')

The `Separator` class in `Spleeter` has a method to directly separate the singing voice from an audio file, and the prediction is stored in a given output folder. Let's use this method and get a source separated version of our file.

In [None]:
# Separating!
separator.separate_to_file(
    os.path.join(
        "..", "audio", "mir_datasets", "CMR_full_dataset_1.0",
        "audio", "10001_05_Thunga_Theera_Virajam.wav"
    ),
    os.path.join("..", "audio")
)

In [None]:
%ls ../audio/

In [None]:
dbs.predict_stm(path_to_file=os.path.join("..", "audio", "test_1_vocals.wav"))