# Calculating Agreement for Brat annotations

Now we have your annotations ready and have learned the agreement formulas, let's try some exercises to calculate the agreement betwee each other.

Although the formulas are simple, efficiently getting the numbers in the contingency table is not trivial. We have provided an optimized function for you here (If you are interested how we implemented it, check [here](./compare_utils.py). ). Let's try it out.


In [1]:
!pip install intervaltree

Collecting intervaltree
  Downloading https://files.pythonhosted.org/packages/e8/f9/76237755b2020cd74549e98667210b2dd54d3fb17c6f4a62631e61d31225/intervaltree-3.0.2.tar.gz
Building wheels for collected packages: intervaltree
  Building wheel for intervaltree (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/gastonq/.cache/pip/wheels/08/99/c0/5a5942f5b9567c59c14aac76f95a70bf11dccc71240b91ebf5
Successfully built intervaltree
Installing collected packages: intervaltree
Successfully installed intervaltree-3.0.2


In [2]:
# import packages
import os
from compare_utils import compare_projects,show_annotations
from IPython.display import HTML

## 1. Initiate the directories and read the annotations

First, we need to tell compare who against who. In Brat, annotations are saved in directories, thus the question is equivalent to compare which directory against which.

If you are not sure what directories you should look for, check the list here:
https://brat.jupyter.med.utah.edu/#/student_folders/

In [3]:
# tell where is the projects located, you need to replace them with your project name and reference project name
import getpass
annotator_a=getpass.getuser()
annotator_b='goldstandard'

In [4]:
# convert the project name to real directory path

brat_projects_loc=os.path.join(os.path.expanduser('~'),'BRAT')
annotator_a_dir=os.path.join(brat_projects_loc, annotator_a)
annotator_b_dir=os.path.join(brat_projects_loc, annotator_b)

# you could try to print annotator_a and annotator_b out to see where they are


## 2. Strict comparison

**compare_projects** is the function that we wrapped up the meat in. It takes in 2~3 paramters:
1. Your directory 
2. The directory that you want to compare against
3. compare method ('strict' or 'relax')

It turns a dictionary of evaluators with annotation types as the key, an Evaluator as the value. The Evaluator class will contain all the numbers in the contingency table we need.

In [5]:
doc_map, evaluators = compare_projects(annotator_a_dir, annotator_b_dir, 'strict')

**compare_projects** returns two values:
1. *doc_map* contains a dictionary that maps a document name to its content text
2. *evaluators* contains a dictionary that maps an annotation type to the corresponding compared results--an object of [Evaluator](./compare_utils.py)

Next, let's take a look at what's inside evaluators:

In [15]:
for type_name, evaluator in evaluators.items():
    print(type_name)
    a,b,c,d=evaluator.get_values()
#   now you can print these numbers
    print(a,b,c,d)
#   or display in a contingency table
    display(evaluator.display_values())

PNEUMONIA_DOC_NO
4 2 0 None


Unnamed: 0,B+,B-
A+,4,2.0
A-,0,


PNEUMONIA_DOC_YES
4 0 7 None


Unnamed: 0,B+,B-
A+,4,0.0
A-,7,


Now you can caculate your IAA:

In [15]:
# your code goes here:


## 3. Relaxed comparsion
When comparin mention level annotations, it is more useful to use relaxed comparision -- consider a match if an annotation of annotator A overlaps with the annotator B's. For instance, "Left lower lobe pneumonia" vs "pneumonia".

In [10]:
# the code is very similar to the above
doc_map,evaluators = compare_projects(annotator_a_dir, annotator_b_dir, 'relax')

In [11]:
for type_name, evaluator in evaluators.items():
    print(type_name)
    a,b,c,d=evaluator.get_values()
#   now you can print these numbers
    print(a,b,c,d)
#   or display in a contingency table
    display(evaluator.display_values())

CONSOLIDATION
1 3 1 None


Unnamed: 0,B+,B-
A+,1,3.0
A-,1,


EVIDENCE_OF_PNEUMONIA
0 0 13 None


Unnamed: 0,B+,B-
A+,0,0.0
A-,13,


LOCAL_INFILTRATE
0 1 1 None


Unnamed: 0,B+,B-
A+,0,1.0
A-,1,


PNEUMONIA
0 9 0 None


Unnamed: 0,B+,B-
A+,0,9.0
A-,0,


PNEUMONIA_DOC_NO
4 2 0 None


Unnamed: 0,B+,B-
A+,4,2.0
A-,0,


PNEUMONIA_DOC_YES
4 0 7 None


Unnamed: 0,B+,B-
A+,4,0.0
A-,7,


Now, you can try to calculate your IAA:

If you only want to compare some types, here is the code you can use:

In [12]:
doc_map,evaluators = compare_projects(annotator_a_dir, annotator_b_dir, 'relax',['PNEUMONIA_DOC_NO','PNEUMONIA_DOC_YES'])

In [21]:
# your code goes here: observed agreement
d = 11-(a+b+c)
#a b
#c d

obs = (a+d)/(a+b+d+c)
print(obs)



0.36363636363636365


## 4. Show the disagreement

Now we are wondering where are the disagreement annotations. Evaluator saved that information as well. Let's try to display them.

### 4.1 Show the annotations in annotator_a, but not annotator_b (false positive)

In [16]:
for type_name, evaluator in evaluators.items():
    print(type_name)
    print(evaluator.get_values())
    fps=evaluator.get_fps()
    show_annotations(fps, doc_map,annotator_a,annotator_b,900,200)

PNEUMONIA_DOC_NO
(4, 2, 0, None)


HTML(value='<html><div style="background-color:#f9f9f9;padding-left:10px;width: 877px; "><table width=100% ><c…

PNEUMONIA_DOC_YES
(4, 0, 7, None)
	No documents to display.


### 4.2 Show the annotations in annotator_b, but not annotator_a (false negative)

In [14]:
for type_name, evaluator in evaluators.items():
    print(type_name)
    fns=evaluator.get_fns()
    print(evaluator.get_values())
    show_annotations(fns, doc_map,annotator_a,annotator_b,900,200)

PNEUMONIA_DOC_NO
(4, 2, 0, None)
	No documents to display.
PNEUMONIA_DOC_YES
(4, 0, 7, None)


HTML(value='<html><div style="background-color:#f9f9f9;padding-left:10px;width: 877px; "><table width=100% ><c…

<br/><br/>This material presented as part of the DeCART Data Science for the Health Science Summer Program at the University of Utah in 2019.<br/>
Presenters : Dr. Wendy Chapman, Kelly Peterson, Alec Chapman, Jianlin Shi <br> Acknowledgement: Many thanks to Olga Patterson because part of the materials are adopted from his previous work.