In this notebook, we track the changes in annotations between the _master_ and [v3.0.0-alfa](https://github.com/MTG/otmm_tonic_dataset/tree/v3.0.0-alfa). In _v3.0.0-alfa_ is the snapshot of the dataset when all the tonic test datasets had been merged but the annotation verification had not started yet.


In [1]:
import urllib2
import json
import numpy as np
from unittests.validate_annotations import test_annotations

In [2]:
# load master data
anno_master = json.load(open('../annotations.json'))

# load v3.0.0-alfa data
anno_alfa_url = 'https://raw.githubusercontent.com/MTG/otmm_tonic_dataset/v3.0.0-alfa/annotations.json'
response = urllib2.urlopen(anno_alfa_url)
anno_alfa = json.load(response)


First, we run the [automatic validation tests](https://github.com/MTG/otmm_tonic_dataset/blob/master/unittests/validate_annotations.py#L11) on _v3.0.0-alfa_ to find how many recordings had annotation inconsistencies before verification.

In [3]:
try:
    test_annotations(anno_alfa)
except AssertionError as err:
    # get the number of mismatches
    print err
    num_err = [int(s) for s in err.args[0].split() if s.isdigit()][0]


INFO:root:- Validating 2007 recordings
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/5a44e069-3b0e-4a28-b515-b7b257b6373a
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/428a80a9-d545-4ad3-9324-237368bcf027
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/5f11df16-816d-4253-ae6d-04f6ed0ed309
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/52fb0080-4c80-44c0-858a-46e4a4bded0a
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/d622b79a-a6f8-40f9-b002-3b68890e4e25
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/b0ff86ee-25ca-415e-b377-84038efe43a9
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/9c9d3440-011b-4357-9eb7-303a8aeb5e8f
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/d2ebf684-8b56-418c-866a-0fa6b8acda1e
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/632656b7-6a0f-476

Annotations in 23 recording(s) are inconsistent


As can be seen _v3.0.0-alfa_ has inconsistencies in the annotation of 23 recordings out of 2007 according to the automatic tests. Nevertheless, there are some additional cases, which are not found by the automatic validation:

- In _v3.0.0-alfa_, there were several recordings, where the tonic frequency/symbol varies throughout the recording (e.g. [Isfahan Peşrev by Mesut Cemil](http://musicbrainz.org/recording/ed189797-5c50-4fde-abfa-cb1c8a2a2571)). This variation was not annotated so the tonic informaiton is partiall true. 
    
  Most of these recordings have been removed due to the rigor in re-annotating or the ambiguity in where the change occurs (e.g. in geçiş taksims). See [removed.json](https://github.com/MTG/otmm_tonic_dataset/blob/master/removed.json) for detailed reasonings.
  
- XX

Therefore the automatic validation result above must be interpreted as the minimum number of recordings with inconsistencies.

Below, we compare the annotations from v3.0.0 to master and see what has changed

In [4]:
changed_recs = {}
for aa_key, aa_val in anno_alfa.items():
    try:
        # get the relevant recording entry in master
        am_val = anno_master[aa_key]
        changed_recs[aa_key] = {'num_deleted_anno': 0, 'status': 'kept', 
                                'num_added_anno': 0, 'num_unchanged_anno': 0,
                                'num_modified_anno': 0, 'num_auto_anno': 0}
        
        # note automatic annotations in master; they did not exist in v3.0.0-alfa
        for jj, am_anno in reversed(list(enumerate(am_val['annotations']))):
            if 'jointanalyzer' in am_anno['source']:
                changed_recs[aa_key]['num_auto_anno'] += 1
                am_val['annotations'].pop(jj)
        
        # start comparison from v3.0.0 to master
        for ii, aa_anno in reversed(list(enumerate(aa_val['annotations']))):
            passed_break = False
            for jj, am_anno in reversed(list(enumerate(am_val['annotations']))):
                if aa_anno['source'] == am_anno['source']:  # annotation exists
                    # unchanged anno; allow a change less than 0.051 Hz due to 
                    # decimal point rounding
                    if abs(aa_anno['value'] - am_anno['value']) < 0.06:
                        changed_recs[aa_key]['num_unchanged_anno'] += 1
                    else:  # modified anno (by a human verifier)
                        changed_recs[aa_key]['num_modified_anno'] += 1
                        
                        # TODO: note the deviation
                    
                    # pop annotations
                    am_val['annotations'].pop(jj)
                    aa_val['annotations'].pop(ii)
                    break
                    
        # the remainders are human addition and deletions
        changed_recs[aa_key]['num_added_anno'] = len(am_val['annotations'])
        changed_recs[aa_key]['num_deleted_anno'] = len(aa_val['annotations'])
                              
    except KeyError as kerr:  # removed 
        changed_recs[kerr.args[0]] = {'num_deleted_anno':len(aa_val['annotations']),
                                'status': 'removed', 'num_added_anno': 0, 
                                'num_modified_anno': 0, 'num_unchanged_anno': 0,
                                'num_auto_anno': 0}


There are a few additions to the master, let's also add them to the comparison:

In [5]:
new_recs = set(anno_master.keys()) - set(anno_alfa.keys())
for am_key in new_recs:
    am_val = anno_master[am_key]
    changed_recs[am_key] = {'num_deleted_anno': 0, 'status': 'new', 
                            'num_added_anno': 0, 'num_unchanged_anno': 0,
                            'num_modified_anno': 0, 'num_auto_anno': 0}

    # note automatic annotations; they did not exist in v3.0.0-alfa
    for jj, am_anno in reversed(list(enumerate(am_val['annotations']))):
        if 'jointanalyzer' in am_anno['source']:
            changed_recs[am_key]['num_auto_anno'] += 1
            am_val['annotations'].pop(jj)
    
    # the remainders are human additions
    changed_recs[am_key]['num_added_anno'] = len(am_val['annotations'])

Finally, we are reporting all the differences:

In [None]:
# TODO add statistics

# how many recordings have changed
# many many annotation modifications/additions/deletions in how many recordings
# how many automatic annotations in how many recordings
# how many human modifications/additions/deletions in how many recordings
# distribution of human frequency modifications