# Intro
 
 This notebook contains examples of how you can use the utility code provided in this repository 
 
 ### TIP: don't forget to set your token
 


In [3]:
import os
import sys
import pdb
import pandas as pd
import numpy as np
sys.path.append(os.path.join(os.path.dirname('__file__'), '../'))
from saraga_utils.saraga import Saraga
from saraga_utils.dataset import Dataset

# Get metadata and file statistics of a dataset

In [4]:
dataset_slug = 'dunya-hindustani-cc'
api_token = '' # get from https://dunya.compmusic.upf.edu/

### Get metadata stats for the dataset

In [5]:
# first create a object of the dataset class
obj_dataset = Dataset(tradition_slug=dataset_slug, api_token=api_token)

# compute metadata stats
meta_stats = obj_dataset.get_metadata_stats()

# print them to view in notebook
obj_dataset.print_metadata_stats()

Computing metadata stats now...
----------------------------------------
Stats for hindustani tradition:
Total number of unique release are:36
Total number of unique works are:113
Total number of unique raags are:61
Total number of unique taals are:9
Total number of unique forms are:5
Total number of unique layas are:2
Total number of unique artists are:36
Total number of unique album_artists are:11
Total length of the recordings: 43.59 hrs
Total number of recordings 108


### Get file stats for the dataset
NOTE: this step takes quite a bit of time, be patient :) 

In [6]:
# compute file stats
meta_stats = obj_dataset.get_file_stats()

# print them to view in notebook
obj_dataset.print_file_stats()

Computing file stats now...
This function might take some time...
-------------------------------------------------------------
-------------------------------------------------------------
These are the stats for annotation type of files
|                   |   0 |
|:------------------|----:|
| sama-manual       |  75 |
| bpm-manual        |  67 |
| tempo-manual      |  75 |
| sections-manual-p |  75 |
| mphrases-manual   |  53 |
-------------------------------------------------------------
-------------------------------------------------------------
These are the stats for audio_stereo type of files
|     |   0 |
|:----|----:|
| mp3 | 108 |
-------------------------------------------------------------
-------------------------------------------------------------
These are the stats for descriptor type of files
|        |   0 |
|:-------|----:|
| pitch  | 108 |
| ctonic | 108 |


## Don't understand so many different types of files?, lets fetch an explanation

In [7]:
obj_dataset.explain_filetypes()

There are 3 types of file types in this collection
------------
File type: annotation
There are 5 types of files within this file type.
Slug of the file: sama-manual		, description: Manually annotated sama locations
Slug of the file: bpm-manual		, description: Manually annotated BPM
Slug of the file: tempo-manual		, description: Manually annotated tempo of the recording
Slug of the file: sections-manual-p		, description: Manually annotated sections in the recording
Slug of the file: mphrases-manual		, description: Manually annotated melodic phrases
------------
------------
File type: audio_stereo
There are 1 types of files within this file type.
Slug of the file: mp3		, description: Stereo mix of the recording
------------
------------
File type: descriptor
There are 2 types of files within this file type.
Slug of the file: pitch		, description: Automatically extracted predominant melody
Slug of the file: ctonic		, description: Automatically extracted tonic of the recording
----------

# Now that we understand file types, lets download them

In [None]:
# note: you can choose any filetype that you want to download
# NOTE: IF NO FILETYPES ARE SELECTED IT WILL DOWNLOAD ALL THE FILES IN THE ENTIRE DATSET
# AUDIO AND PITCH FILES are sometimes huge, you would need a good amount of space (~20GB for both the datasets)!
# We suggest you select specific file types, which you actually need!!
dir_name = 'temp'
obj_dataset.download_files(dir_name)

### You can also download selected file types

In [16]:
# You can do two types of filtering
# file_types: Any of 'annotation' or 'audio_stereo' or 'descriptor' or 'multitrack'
# thetype: You can specify slug of individual file types, for example 'mphrases-manual' within annotation type file.
obj_dataset.download_files(dir_name, thetype=['mphrases-manual'])