# Arab-Andalusian Collection - Metadata
With this notebook, it is possible to group, visualize and analyse metadata in order to extract general statistics. It has four main parts: (1) a widget for metadata analysis of the corpus, grouped by a musical characteristic (nawba, tab, mizan and form); (2) a widget that combines two musical characteristics; (3) a piece of code to extract overall general metadata; and (4) a widget to analyse a single recording.

## Initialization (MANDATORY)
In this cell, all the libraries are loaded. 
Furthermore, a function checks if the metadata related to Arab-Andalusian corpus of Dunya has been downloaded: if not, all metadata will be downloaded. 
At the end, the code creates an object to manage the Dunya metadata.

#### NB: Before to run, remember to add the Dunya token in the costants.py file. This file is in the directory "utilities".

In [37]:
%load_ext autoreload

%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [1]:
from utilities.recordingcomputation import *
from utilities.dunyautilities import *
from utilities.metadataStatistics import *
from utilities.generalutilities import *

from gui.gui_corpora import *
from gui.gui_metadata import *

# download metadata from Dunya
if not check_dunya_metadata():
    print("Downloading metadata from Dunya...")
    collect_metadata()

# create an object with all the well-structured metadata
print("Analyzing Dunya Metadata...")
cm = CollectionMetadata()
print("Collection of metadata created")

Analyzing Dunya Metadata...


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  df = pd.concat([df, new_row])
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  self.df_recording = pd.concat([self.df_recording, new_row])


Collection of metadata created


## Statistics by nawba, tab, mizan and form
Running this cell, it is possible to group all the metadata and to extract information for each characteristic (nawba, tab, mizan, and form). For each element of the list, it is possible to visualize the number of recordings and sections of that type, overall and average section time length. All these values are plotted in a histogram placed at the bottom of the table. 
#### Visualize - NB: recordings without sections description are not considered.

In [2]:
vd = VisualizeDataframeGui(cm)

VBox(children=(VBox(children=(Tab(children=(HBox(children=(VBox(children=(Label(value='name'), Label(value='ال…

## Cross Statistics

In this cell, it is possible to extract statistics from the combination of two musical characteristics. 

#### Visualize - NB: recordings without sections description are not considered

In [5]:
cr = CrossMetadataVisualization(cm)

VBox(children=(HBox(children=(VBox(children=(Label(value='Column:'), Dropdown(layout=Layout(width='180px'), op…

## General Statistics
These statistics are obtained by computing all the recordings.

In [6]:
print("Overall computable time (only recordings with sections) = " + get_time(cm.get_overall_sections_time()))
print("Number of recordings without sections = " + str(len(cm.mbid_no_sections)) + '/' + str(len(cm.df_recording)) )
print("Number of recordings without score = " + str(len(cm.get_recordings_without('musescore_url'))) + '/' + str(len(cm.df_recording)) )
# possible value 'archive_url', 'musescore_url', 'title', 'transliterated_title' 
print("Number of recordings without archive_url = " + str(len(cm.get_recordings_without('archive_url'))) + '/' + str(len(cm.df_recording)) )
print()
print("Recordings with different nawbas in the same track:" + str(cm.get_recordings_with_diff_('nawba')) )
print("Recordings with different tab in the same track:" + str(cm.get_recordings_with_diff_('tab')) )

Overall computable time (only recordings with sections) = 100:11:01
Number of recordings without sections = 6/164
Number of recordings without score = 6/164
Number of recordings without archive_url = 6/164

Recordings with different nawbas in the same track:['0386e377-7212-43e5-89b6-7f4c42d0ae74']
Recordings with different tab in the same track:['0386e377-7212-43e5-89b6-7f4c42d0ae74']


## Single recording information
With this cell, it is possible to visualise and select a single recording, in order to find its Musicbrainz ID necessary in the next cell.

In [31]:
selector = SelectionGui(cm, 10)

VBox(children=(Label(value='   SELECT CHARACTERISTICS: '), HBox(children=(VBox(children=(Label(value='   ṭāb‘'…

By adding a Musicbrainz ID in the variable "rmbid", it is possible to visualise the characteristcs of the recording.

In [39]:
rmbid = 'b3059282-a235-4fa4-9093-cb16a70d4b5d' # add a MusicBrainz id 
srv = SingleRecordingVisualization(cm, rmbid)

VBox(children=(Label(value='MBID: b3059282-a235-4fa4-9093-cb16a70d4b5d'), Label(value='DUNYA API info: dunya.c…

## Test 