# Arab-Andalusian Corpus - Nawba Recognition using Templates from Scores

This notebook computes several experiments to evaluate the performance of templates to recognize the nawba of a Arab-Andalusian recording. Each template is synthesized from several folded pitch class distributions belonging to a nawba using Gaussian distribution. The folded pitch distribution of a track is compared to the templates and the best match predicts the nawba. 
The experiments test different distance measures and standard deviation values.

## Inizialization (MANDATORY)

In this cell, all the libraries are loaded. 
Furthermore, a function check if the metadata related to the Arab-Andalusian corpus of Dunya has been downloaded. If necessary, all the metadata will be downloaded. 
At the end, an object to manage the Dunya metadata is created.

#### NB: Before to run, remember to add the dunya token in the costants.py file. This file is in the directory "utilities".

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.ioff()
import numpy as np

from shutil import copyfile
from utilities.recordingcomputation import *
from utilities.dunyautilities import *
from utilities.metadataStatistics import *
from utilities.generalutilities import *
from utilities.experiments import *

# download metadata from Dunya
if not check_dunya_metadata():
    print("Downloading metadata from Dunya...")
    collect_metadata()

# create an object with all the well-structured metadata
print("Analyzing Dunya Metadata...")
cm = CollectionMetadata()
print("Collection of metadata created")

## Dataset creation (MANDATORY)

An empty object to manage the dataset of the experiments is created.
Than, a list of recordings are imported from a csv and added to the dataset.

In [None]:
# create an empty object
do = DataSet(cm)
csv_filename = "dataset_nawba_77_recordings.csv" #"dataset_test.csv" "dataset_nawba_77_recordings.csv"
# add recording mbids of an external file in the dataset 
do.import_dataset_from_csv(csv_filename)

## Nawba Recognition

The parameters of the experiments are defined. An object is created for every experiment and added to a list.

The distance measure parameters are: "city block (L1)", "euclidean (L2)", "correlation", "canberra".

The standard deviation values tested are 20, 30 and 40, but they could be changed.


In [None]:
distance_measures_list = ["euclidean (L2)"] #["city block (L1)", "euclidean (L2)", "correlation", "canberra"]
random_state = 20
std_list = [30] #[20,30,40]
esperiment_name = "nr_L2_20_30_with_correct_nawba"
source_dir = os.path.join(EXPERIMENT_DIR, esperiment_name)
sub_esperiment_suffix = "exp"

experiment_list = list()
for i_element in range(7):  
    sub_esperiment_name = "{}_{}".format(sub_esperiment_suffix, i_element + 1)
    experiment_list.append(Nawba_Recognition_Experiment(do, i_element, random_state, std_list, distance_measures_list, sub_esperiment_name, source_dir))

In this cell, the experiments will be computed. If the plot_flag is True the plots of the templates and of the best matches will be saved in the experiment directory as png.

In [None]:
counter = 1
plot_flag = True
zip_path = "dataset_nawba_77_recordings.zip"

# check if all the necessary files related to each recording of the dataset are available. 
recordings_with_missing_files = experiment_list[0].get_recordings_without_experiment_files()
# If not import the file from zip
if len(recordings_with_missing_files) != 0:
    extract_files_from_zip(RECORDINGS_DIR, zip_path)
    # second check after unzip
    recordings_with_missing_files = experiment_list[0].get_recordings_without_experiment_files()
    if len(recordings_with_missing_files) != 0:
        raise Exception ("A/some file/s is/are missing")

# run the experiment
for index in range(len(experiment_list)):
    name = "exp_{} results: ".format(counter)
    print()
    print(name)
    experiment_list[index].run(plot_flag=plot_flag)
    experiment_list[index].compute_summary()
    print()
    print(experiment_list[index].df_summary )
    counter += 1

The overall results will be computed and all the results of each experiment will be exported in csv and stored the experiment directory. The confusion matrix of the best parameters combination will be plotted in a png file.  

In [None]:
# compute and print overall results
df_overall =  experiment_list[0].df_summary
for index in range(len(experiment_list)-1):
    df_overall = df_overall.add(experiment_list[index+1].df_summary)
df_overall = df_overall.divide(len(experiment_list))
print(df_overall)

# export the results
export_overall_experiment(experiment_list)

## Test

#### Export in json the dataset in order to use it in the machine learning experiments

In [None]:
rmbid_list = do.get_rmbid_list()
list_of_dicts = list()

for rmbid in rmbid_list:

    df_sections = cm.get_description_of_single_recording(rmbid)
    column_list = ["tab", "tonic", "start_time", "end_time"]

    section_dict = list()
    for row_index in range(len(df_sections.index.values.tolist())):
        row_dict = dict()
        row_dict[column_list[0]] = df_sections.loc[row_index, column_list[0]] 
        row_dict[column_list[2]] = df_sections.loc[row_index, column_list[2]] 
        row_dict[column_list[3]] = df_sections.loc[row_index, column_list[3]]
        row_dict[column_list[1]] = random.uniform(100, 200) # TODO: add the real value
        section_dict.append(row_dict)

    #df_reduced = pd.DataFrame(0, columns = column_list, index = df_sections.index.values.tolist())
    #for row_index in range(len(df_sections.index.values.tolist())):
    #    df_reduced.loc[row_index, column_list[0]] = df_sections.loc[row_index, column_list[0]] 
    #    df_reduced.loc[row_index, column_list[2]] = df_sections.loc[row_index, column_list[2]] 
    #    df_reduced.loc[row_index, column_list[3]] = df_sections.loc[row_index, column_list[3]] 
        # TODO: load real value
    #    df_reduced.loc[row_index, column_list[3]] = random.uniform(100, 200)

    main_dict = dict()
    main_dict["mbid"] = rmbid
    main_dict["nawba"] = df_sections.loc[0,"nawba"]
    main_dict["section"] = section_dict
    #print(main_dict["section"])
    list_of_dicts.append(main_dict)

json_path = os.path.join(DATA_DIR, "dataset_77_tab_tonic.json")
with open(json_path, 'w') as outfile:
    json.dump(list_of_dicts, outfile)
    

#### Extract dataset pitch distributions and xmls 

In [None]:
rmbid_list = do.get_rmbid_list()
dir_name = "dataset"
main_path = os.path.join(DATA_DIR, dir_name)
for rmbid in rmbid_list:
    # create a directory for all rmbid
    rmbid_dir = os.path.join(main_path, rmbid)
    if not os.path.exists(rmbid_dir):
        os.makedirs(rmbid_dir)
    input_dir = os.path.join(RECORDINGS_DIR, rmbid)
    input_pd = os.path.join(input_dir, FN_PD)
    score_name = "{}.xml".format(rmbid)
    input_score = os.path.join(input_dir, score_name)
    output_pd = os.path.join(rmbid_dir, FN_PD)
    output_score = os.path.join(rmbid_dir, score_name)
    
    copyfile(input_pd, output_pd)
    copyfile(input_score, output_score)
    

In [None]:
import zipfile
import os