In this notebook, we track the changes in annotations between the _master_ and [v3.0.0-alpha](https://github.com/MTG/otmm_tonic_dataset/tree/v3.0.0-alpha). _v3.0.0-alpha_ is the pre-release of the dataset when all the tonic test datasets had been merged prior to annotation verification.

In [1]:
import urllib2
import json
import numpy as np
from unittests.validate_annotations import test_annotations
import warnings

In [2]:
# load master data
anno_master = json.load(open('../annotations.json'))

# load v3.0.0-alpha data
anno_alpha_url = 'https://raw.githubusercontent.com/MTG/otmm_tonic_dataset/v3.0.0-alpha/annotations.json'
response = urllib2.urlopen(anno_alpha_url)
anno_alpha = json.load(response)


Next, we run the [automatic validation tests](https://github.com/MTG/otmm_tonic_dataset/blob/master/unittests/validate_annotations.py#L11) on _master_ to find how many recordings has XX.

In [3]:
100-108600/1994.0

45.53660982948846

In [4]:
test_annotations(anno_master)

INFO:root:- Validating 1994 recordings


Next, we run the [automatic validation tests](https://github.com/MTG/otmm_tonic_dataset/blob/master/unittests/validate_annotations.py#L11) on _v3.0.0-alpha_ to find how many recordings had annotation inconsistencies before verification.

In [5]:
try:
    test_annotations(anno_alpha)
except AssertionError as err:
    # get the number of mismatches
    print err
    num_err = [int(s) for s in err.args[0].split() if s.isdigit()][0]
    inconsistent_percent = float(num_err) * 100 / (2007-1589)  # taken from the penultimate warning
    print("The human annotators show inconsistencies in {:.1f}% of "
          "the {:d} tested recordings.".format(inconsistent_percent, 2007-1589))
    


INFO:root:- Validating 2007 recordings
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/5a44e069-3b0e-4a28-b515-b7b257b6373a
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/428a80a9-d545-4ad3-9324-237368bcf027
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/5f11df16-816d-4253-ae6d-04f6ed0ed309
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/52fb0080-4c80-44c0-858a-46e4a4bded0a
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/d622b79a-a6f8-40f9-b002-3b68890e4e25
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/b0ff86ee-25ca-415e-b377-84038efe43a9
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/9c9d3440-011b-4357-9eb7-303a8aeb5e8f
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/d2ebf684-8b56-418c-866a-0fa6b8acda1e
ERROR:root:> Mismatch in http://dunya.compmusic.upf.edu/makam/recording/632656b7-6a0f-476

Annotations in 23 recording(s) are inconsistent
The human annotators show inconsistencies in 5.5% of the 418 tested recordings.


As can be seen _v3.0.0-alpha_ has inconsistencies in the annotation of 23 recordings out of 2007 according to the automatic tests. Nevertheless, there are some additional cases, which are not found by the automatic validation. Therefore the automatic validation result above must be interpreted as the minimum number of recordings with inconsistencies. Below the two major reasonings are explained:

- In _v3.0.0-alpha_, there were several recordings, where the tonic frequency/symbol varies throughout the recording (e.g. [Isfahan Peşrev by Mesut Cemil](http://musicbrainz.org/recording/ed189797-5c50-4fde-abfa-cb1c8a2a2571)). This variation was not annotated so the tonic informaiton is partiall true. 
    
  Most of these recordings have been removed due to the rigor in re-annotating or the ambiguity in where the change occurs (e.g. in geçiş taksims). See [removed.json](https://github.com/MTG/otmm_tonic_dataset/blob/master/removed.json) for detailed reasonings.
  
- As seen from the output warning for the validation test, only 2007 - 1589 = 418 recordings (~20% of the dataset) have been validated. Therefore, the coverage of the automatic validation is not adequate in _v3.0.0-alpha_. Note that the human annotators show inconsistencies in 5.5% of the tested recordings. 

  To compensate, we added the automatic tonic annotations obtained by joint analysis. Next, the recordings with automatic annotations went thorough a final verification by a human annotator. After verification 789 (in 782 recordings, see the statistics printed in the 6th cell, below.) automatic annotations were found to be correct and the rest of automatic annotations are discarded (<20 annotations; less than 2.4% error). The accuracy of the automatic annotation method less than but is similar to the 99% accuracy reported in the original paper (Şentürk, S., Gulati, S., and Serra, X., 2013) and better than humans annotators. 

Reference
----------
Şentürk, S., Gulati, S., and Serra, X. (2013). Score informed tonic identification for makam music of Turkey. In Proceedings of 14th International Society for Music Information Retrieval Conference (ISMIR 2013), pages 175–180, Curitiba, Brazil.

Below, we compare the annotations from v3.0.0 to master and see what has changed

In [6]:
rec_stats = {}
for aa_key, aa_val in anno_alpha.items():
    try:
        # get the relevant recording entry in master
        am_val = anno_master[aa_key]
        rec_stats[aa_key] = {'num_deleted_anno': 0, 'status': 'kept', 
                             'num_added_anno': 0, 'num_unchanged_anno': 0,
                             'num_modified_anno': 0, 'num_auto_anno': 0,
                             'verified': am_val['verified']}
        
        # note automatic annotations in master; they did not exist in v3.0.0-alpha
        for jj, am_anno in reversed(list(enumerate(am_val['annotations']))):
            if 'jointanalyzer' in am_anno['source']:
                rec_stats[aa_key]['num_auto_anno'] += 1
                am_val['annotations'].pop(jj)
        
        # start comparison from v3.0.0 to master
        for ii, aa_anno in reversed(list(enumerate(aa_val['annotations']))):
            passed_break = False
            for jj, am_anno in reversed(list(enumerate(am_val['annotations']))):
                if aa_anno['source'] == am_anno['source']:  # annotation exists
                    # unchanged anno; allow a change less than 0.051 Hz due to 
                    # decimal point rounding
                    if abs(aa_anno['value'] - am_anno['value']) < 0.06:
                        rec_stats[aa_key]['num_unchanged_anno'] += 1
                    else:  # modified anno (by a human verifier)
                        rec_stats[aa_key]['num_modified_anno'] += 1
                        
                        # TODO: note the deviation
                    
                    # pop annotations
                    am_val['annotations'].pop(jj)
                    aa_val['annotations'].pop(ii)
                    break
                    
        # the remainders are human addition and deletions
        rec_stats[aa_key]['num_added_anno'] = len(am_val['annotations'])
        rec_stats[aa_key]['num_deleted_anno'] = len(aa_val['annotations'])
                              
    except KeyError as kerr:  # removed 
        rec_stats[kerr.args[0]] = {'num_deleted_anno':len(aa_val['annotations']),
                                   'status': 'removed', 'num_added_anno': 0, 
                                   'num_modified_anno': 0, 'num_unchanged_anno': 0,
                                   'num_auto_anno': 0, 'verified': True}


There are a few additions to the master, let's also add them to the comparison:

In [7]:
new_recs = set(anno_master.keys()) - set(anno_alpha.keys())
for am_key in new_recs:
    am_val = anno_master[am_key]
    rec_stats[am_key] = {'num_deleted_anno': 0, 'status': 'new', 
                         'num_added_anno': 0, 'num_unchanged_anno': 0,
                         'num_modified_anno': 0, 'num_auto_anno': 0,
                         'verified': am_val['verified']}

    # note automatic annotations; they did not exist in v3.0.0-alpha
    for jj, am_anno in reversed(list(enumerate(am_val['annotations']))):
        if 'jointanalyzer' in am_anno['source']:
            rec_stats[am_key]['num_auto_anno'] += 1
            am_val['annotations'].pop(jj)
    
    # the remainders are human additions
    rec_stats[am_key]['num_added_anno'] = len(am_val['annotations'])

Finally, we are reporting all the differences:

In [8]:
# removed 
rm_recs_in_json = json.load(open('../removed.json')).keys()

# TODO add statistics
num_removed_rec = 0
num_new_rec = 0

num_changed_rec = 0  # num recordings with changes, incl. automatic annotations
num_human_changed_rec = 0  # num recordings with human changes

num_anno = 0  # total number of annotations
num_verified_anno = 0  # total number of verified annotations
num_human_verified_anno = 0  # totola number of annotations verified by humans

num_additions = 0  # number of added annotations
num_deletions = 0  # number of deleted annotations
num_modifications = 0  # number of modified annotations
num_unchanged = 0  # number of unchanged annotations
num_auto = 0  # number of automatic annotations

num_rec_add = 0  # number of recordings with additions
num_rec_del = 0  # number of recordings with deletions
num_rec_mod = 0  # number of recordings with modification
num_rec_auto = 0  # number of recordings with automatic annotations

for rk, rs in rec_stats.items():
    # get the number of removed and new recordings
    if rs['status'] == 'removed':
        num_removed_rec += 1
        if rk not in rm_recs_in_json:  # verify they are listed in removed.json
            warnings.warn('%s is removed but not listed in removed.json' % rk)
    elif rs['status'] == 'new':
        num_new_rec += 1
        
    num_anno += (rs['num_added_anno'] + rs['num_auto_anno'] + 
                 rs['num_modified_anno'] + rs['num_unchanged_anno'])
    
    # how many recordings have changed
    if any([rs['num_added_anno'], rs['num_auto_anno'], 
            rs['num_deleted_anno'], rs['num_modified_anno']]):
        num_changed_rec += 1
        num_verified_anno += (rs['num_added_anno'] + rs['num_auto_anno'] + 
                              rs['num_modified_anno'] + rs['num_unchanged_anno'])
        
    # how many recordings have changed only by humans
    if any([rs['num_added_anno'], rs['num_deleted_anno'], rs['num_modified_anno']]):
        num_human_changed_rec += 1
        num_human_verified_anno += (rs['num_added_anno'] + rs['num_auto_anno'] + 
                                   rs['num_modified_anno'] + rs['num_unchanged_anno'])
        if not rs['verified']:
            warnings.warn("%s has changes but verified flag is False" % rk)
        
    # how many automatic annotations in how many recordings
    num_auto += rs['num_auto_anno']
    num_rec_auto += rs['num_auto_anno'] > 0
    
    # how many annotation modifications/additions/deletions in how many recordings
    num_additions += rs['num_added_anno']
    num_rec_add += rs['num_added_anno'] > 0
    num_deletions += rs['num_deleted_anno']
    num_rec_del += rs['num_deleted_anno'] > 0
    num_modifications += rs['num_modified_anno']
    num_rec_mod += rs['num_modified_anno'] > 0
    
    # how many unchanged annotations
    num_unchanged += rs['num_unchanged_anno']
    
    # distribution of human frequency modifications
    
# print 
print('In master, there are %d annotations in total in %d recordings.' 
      % (num_anno, len(anno_master)))
print('Since v3.0.0-alpha, %d recordings are removed and %d new recordings are added.'
      % (num_removed_rec, num_new_rec))

print('%d recordings are changed (incl. automatic annotations). '
      '%d of the annotations are verified in these recordings.' 
      % (num_changed_rec, num_verified_anno))

print('%d annotations in %d recordings are changed by humans in total. '
      '%d of the annotations are verified by humans in these recordings.' 
      % (num_additions + num_deletions + num_modifications, 
         num_human_changed_rec, num_human_verified_anno))

print('%d annotations are added to %d recordings by humans.' %(num_additions, num_rec_add))
print('%d annotations are deleted from %d recordings by humans.' %(num_deletions, num_rec_del))
print('%d annotations are modified in %d recordings by humans.' %(num_modifications, num_rec_mod))

print('%d automatic annotations are added to %d recordings.' %(num_auto, num_rec_auto))


In master, there are 3262 annotations in total in 1994 recordings.
Since v3.0.0-alpha, 15 recordings are removed and 2 new recordings are added.
805 recordings are changed (incl. automatic annotations). 1932 of the annotations are verified in these recordings.
98 annotations in 83 recordings are changed by humans in total. 174 of the annotations are verified by humans in these recordings.
12 annotations are added to 11 recordings by humans.
25 annotations are deleted from 23 recordings by humans.
61 annotations are modified in 56 recordings by humans.
789 automatic annotations are added to 782 recordings.
