# Tutorial on EMMIT framework

This notebook shows how to use the `EMMIT` framework that explains a deep learning model behaviour. 

The model that we will use for this example is a CNN trained on the data from the [Ballroom](http://mtg.upf.edu/ismir2004/contest/tempoContest/node5.html) dataset. The dataset contains audio recordings of 8 classical dances: Samba, Jive, Rumba, Quickstep, Tango, Cha Cha, VienneseWaltz and Waltz. For more details about architecture, training procedure of the model, please check folder `models`. Here we demonstrate how to use two pipelines:

1. The pipeline that transforms audio files based on the configuration file.
2. The pipeline that generates confusion matrices based on model predictions.


## Transformation pipeline

The first step is creating a configuration file (an example is in `configuration.yml`) that describes which transformations and to which extent need to be applied to the data. Then, a target classification model will be run on all of the transformed files. In our case, predictions are the type of a dance. A YAML file will be generated as a result, where information about transformations and information about the predictions is stored. 

### Set up configuration for audio augmentation pipeline

Here are the parameters that need to be defined in a configuration file:

- `augmented_audio_save_path`: The path to save the augmented audio outputs.
- `augmented_meta_save_path`: The path to save the augmented metadata.
- `mir_dataset_path`: The path to save the data sources.
- `hpss`: Set up harmonic/percussive source separation.
- `tempo_factor`: Set up logspace time stretch, for the detail of parameter, please check [muda](https://muda.readthedocs.io/en/stable/#).
- `keys`: Set up linear pitch shift, for the detail of parameter, please check [muda](https://muda.readthedocs.io/en/stable/#).
- `drc`: Set up dynamic range compression, for the detail of parameter, please check [muda](https://muda.readthedocs.io/en/stable/#).


A safe way to install the library would be to create a virtual environment:

```

conda create --name emmit_env python=3.11
conda activate emmit_env
pip install -r requirements.txt
python -m ipykernel install --user --name emmit_env

```me musicnn_env

In [None]:
pip install .

In [None]:
import numpy as np
np.float_ = np.float64

In [None]:
from emmit import aitah

# Create an instance of AudioTransformation from the aitah module, configured via 'configuration.yml'
audio_bank = aitah.AudioTransformation(config_file="configuration.yml")
# Generate augmented audio samples based on the specified number of samples in the configuration
audio_bank.synthesis(n_samples=1)

### Feeding augmented audio samples to the model

Let's load a model that we will be using in this tutorial and save prediction results. Replace with your own model to get your analysis.

In [None]:
import tensorflow as tf

# Load the pre-trained model from the directory specified
model = tf.saved_model.load("models")

In [None]:
import pandas as pd
import os
from tqdm import tqdm

SAV_PATH = r"metadata_with_predict_results.csv"

if os.path.exists(SAV_PATH):
    sav_df = pd.read_csv(SAV_PATH)
    # Get the list of filenames from sav_df
    existing_filenames = sav_df["file_name"].tolist()

    # Filter out rows in int_df that contain filenames in the existing_filenames list
    int_df = int_df[~int_df["file_name"].isin(existing_filenames)]
    int_df.reset_index(drop=True, inplace=True)
else:
    print(f"File {SAV_PATH} does not exist.")


for i, row in tqdm(int_df.iterrows(), total=int_df.shape[0]):

    # Join the path with the file name for each row in the DataFrame
    audio_file = os.path.join(
        config["augmented_audio_save_path"], row["type"], row["file_name"] + ".wav"
    )
    prediction = model(audio_file) #you might need to replace this part depending on how your model's inference is triggered

    int_df.at[i, "y"] = prediction["class_names"].numpy().decode("utf-8")
    int_df.at[i, "y_id"] = prediction["class_ids"].numpy()
    # Get row i from the DataFrame
    current_row = pd.DataFrame(int_df.iloc[i]).T
    current_row.to_csv(SAV_PATH, mode="a", index=False, header=False)

Load the predictions with transformations metadata.


In [None]:
import pandas as pd

df = pd.read_csv("metadata_with_predict_results_tempo_only.csv")

print(df.shape)
df

## Interpretation pipeline

The interpretation pipeline of EMMIT presents techniques for understanding the outcomes of how the model reacts to modified files, including **accuracy impact summary table**, **confusion matrix**, and **LIME plot**.


### Plotting confusion matrix

You can plot the confusion matrix by following function:

- `visualize_in_notebook`: Plot the confusion matrix in the notebook.
- `save_as_file`: Save the confusion matrix as a file.
- `visualize_subtracted_in_notebook`: Plot the confusion matrix with the target confusion matrix subtracted from it.
- `save_subtracted_as_file`: Save the confusion matrix with the target confusion matrix subtracted from it as a file.

In [None]:
from emmt import palun

# Create an instance of the ConfusionMatrix class from the palun module
palun = palun.ConfusionMatrix()
filtered_df = df[df["rate"] == 0.7071067811865476]


# Use the visualize_in_notebook method to display the confusion matrix in the notebook
# The method takes two arguments: the true labels and the predicted labels
palun.visualize_in_notebook(filtered_df["type"], filtered_df["y"])

In [None]:
# This line of code saves the confusion matrix as a PNG file named "confusion_matrix.png".
# It uses the 'type' column from the dataframe 'df' as the class labels and the 'y' column as the predictions.
palun.save_as_file(
    music_class=df["type"], prediction=df["y"], filename="confusion_matrix.png"
)

In [None]:
import numpy as np

# Here you can input a confusion matrix produced by the model originally on the same data that was input into 
# EMMIT framework
target_confusion_matrix = np.array(
    [
        [42, 0, 3, 0, 1, 3, 0, 0],  # True ChaChaCha classified as each dance type
        [2, 12, 4, 1, 1, 0, 2, 0],  # True Jive classified as each dance type
        [1, 0, 30, 3, 0, 0, 2, 0],  # True Quickstep classified as each dance type
        [0, 1, 5, 30, 0, 0, 5, 4],  # True Rumba classified as each dance type
        [0, 2, 4, 0, 30, 2, 0, 0],  # True Samba classified as each dance type
        [2, 0, 1, 1, 0, 27, 3, 0],  # True Tango classified as each dance type
        [0, 0, 0, 4, 0, 0, 26, 0],  # True VienneseWaltz classified as each dance type
        [0, 0, 1, 2, 0, 0, 10, 35],  # True Waltz classified as each dance type
    ]
)

# List of class names corresponding to the dances in the confusion matrix.
classes = [
    "ChaChaCha",
    "Jive",
    "Quickstep",
    "Rumba",
    "Samba",
    "Tango",
    "VienneseWaltz",
    "Waltz",
]

# Visualize the difference between the target confusion matrix and the actual confusion matrix
# derived from the 'type' and 'y' columns of the dataframe 'df'. This visualization helps in
# understanding how the predicted classifications deviate from the expected ones.
palun.visualize_subtracted_in_notebook(
    target_classes=classes,
    target_confusion_matrix=target_confusion_matrix,
    music_class=filtered_df["type"],
    prediction=filtered_df["y"],
)

### Plotting LIME explanation

`EMMT` also implements the LIME technique for interpreting the MIR task model. The method centers on training Logistic Regression with Lasso regularization on transformed audio attributes, to estimate the forecasts of the CNN model underneath. Its objective is to comprehend the reasoning behind a specific prediction made by the CNN model, with the explanatory model reducing loss and complexity while closely imitating the original model’s forecasts.

You can plot LIME explanation by following function:

- `show_lime_explanation`: Plot the LIME explanation in the notebook.
- `save_lime_as_file`: Save the LIME explanation as a file.


In [None]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()


df = pd.read_csv("metadata_with_predict_results_all.csv")
df["preset_id"] = le.fit_transform(df["preset"])
print(df.columns)
df.head()

In [None]:
from emmt import palun

lime_explanation = palun.LimeExplainer()
features = ["n_semitones", "rate", "hpss", "preset_id"]
instance = df.iloc[0][features]
print(instance)

In [None]:
lime_explanation.show_lime_explanation(
    local_data=df[features],
    predictions=df["y_id"],
    instance=instance,
    features=features,
)

In [None]:
lime_explanation.save_lime_as_file(
    filename="lime_explanation.png",
    local_data=df[features],
    predictions=df["y_id"],
    instance=instance,
    features=features,
)