# How to read human annotations

In [1]:
import json
import pandas as pd

## Raw annotations in CoNLL 

In [2]:
raw_annotations_conll = pd.read_csv('raw_annotations_conll_dev.tsv', sep='\t')
raw_annotations_conll.head()

Unnamed: 0,id,task,token,worker,label,class_label
0,5206:0,0002769553--650327fb84b28317ce825e08-0,Photo,8cbc1164ecffabf180b33958b13904d8,0,Kill
1,5206:0,0002769553--650327fb84b28317ce825e08-1,",",8cbc1164ecffabf180b33958b13904d8,0,Kill
2,5206:0,0002769553--650327fb84b28317ce825e08-2,Dueling,8cbc1164ecffabf180b33958b13904d8,0,Kill
3,5206:0,0002769553--650327fb84b28317ce825e08-3,Over,8cbc1164ecffabf180b33958b13904d8,0,Kill
4,5206:0,0002769553--650327fb84b28317ce825e08-4,a,8cbc1164ecffabf180b33958b13904d8,0,Kill


Each row in the datagrame corresponds to one token.

| Field | Explanation |
| ------------- |:-------------:|
| **id** |  consists of two numbers. the first one corresponds to the document id in the dataset, the second one corresponds to the index of the relation as in the document.|
| **task** | id of the crowdsourcing task. the last number corresponds to the token in the document. |
| **token** | the token :) |
| **worker** | id of the worker, unique for each worker |
| **label** | binary human annotation of the token. 1 means that the token is important, and 0 otherwise |
| **class_label** | relation label of the document |

## Aggregated annotations 

In [5]:
aggregated_annotations_conll = pd.read_csv('aggregated_annotations_conll_dev.tsv', sep='\t')
aggregated_annotations_conll.set_index('INPUT:id', inplace=True)
aggregated_annotations_conll.head()

Unnamed: 0_level_0,INPUT:task_id,INPUT:text,INPUT:sentence,INPUT:agg_annotation,INPUT:label,INPUT:golden_label,INPUT:agg_annotation_str
INPUT:id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
477:0,0002769553--650327f984b28317ce825bf6,Prime Minister Felipe Gonzalez of Spain told r...,## What is the relation type between ''Felipe ...,"[{'offset': 0, 'text': 'Prime', 'length': 5, '...",Work_For,Live_In,"[{""offset"": 0, ""text"": ""Prime"", ""length"": 5, ""..."
3559:0,0002769553--650327f984b28317ce825bf8,"( Text ) Taipei , Jan. 20 ( CNA ) -- Foreign C...",## What is the relation type between ''Chien F...,"[{'offset': 0, 'text': '(', 'length': 1, 'weig...",Live_In,Live_In,"[{""offset"": 0, ""text"": ""("", ""length"": 1, ""weig..."
5318:0,0002769553--650327f984b28317ce825bfa,"Meanwhile , the historical society in Beaumont...",## What is the relation type between ''Richard...,"[{'offset': 0, 'text': 'Meanwhile', 'length': ...",Live_In,Live_In,"[{""offset"": 0, ""text"": ""Meanwhile"", ""length"": ..."
2365:0,0002769553--650327f984b28317ce825bfc,` ` I don 't think there are any good answers ...,## What is the relation type between ''Virgini...,"[{'offset': 0, 'text': '`', 'length': 1, 'weig...",Live_In,Live_In,"[{""offset"": 0, ""text"": ""`"", ""length"": 1, ""weig..."
1895:5,0002769553--650327f984b28317ce825bfe,"Michael Henley Jr. , 10 , disappeared on a tur...",## What is the relation type between ''Ms. Cal...,"[{'offset': 0, 'text': 'Michael', 'length': 7,...",Live_In,Live_In,"[{""offset"": 0, ""text"": ""Michael"", ""length"": 7,..."


Each row in the dataframe corresponds to one document in the dataset.

| Field | Explanation |
| ------------- |:-------------:|
| **INPUT:id** | id of the document, same as in the raw annotations|
| **INPUT:task_id** | id of the crowdsourcing task |
| **INPUT:text** | text of the document |
| **INPUT:sentence** | the question that the crowd annotators were asked |
| **INPUT:agg_annotation** | annotation of the document aggregated by majority vote |
| **INPUT:label** | aggregated relation label of the document |
| **INPUT:golden_label** | golden label of the document (a.k.a. ground truth) |
| **INPUT:agg_annotation_str** | aggregated annotation of the document in string format for convenience |

In [12]:
json.loads(aggregated_annotations_conll.loc['477:0']['INPUT:agg_annotation_str'])

[{'offset': 0, 'text': 'Prime', 'length': 5, 'weight': 2, 'total': 3},
 {'offset': 6, 'text': 'Minister', 'length': 8, 'weight': 2, 'total': 3},
 {'offset': 15, 'text': 'Felipe', 'length': 6, 'weight': 3, 'total': 3},
 {'offset': 22, 'text': 'Gonzalez', 'length': 8, 'weight': 3, 'total': 3},
 {'offset': 31, 'text': 'of', 'length': 2, 'weight': 3, 'total': 3},
 {'offset': 34, 'text': 'Spain', 'length': 5, 'weight': 3, 'total': 3},
 {'offset': 40, 'text': 'told', 'length': 4, 'weight': 0, 'total': 3},
 {'offset': 45, 'text': 'reporters', 'length': 9, 'weight': 0, 'total': 3},
 {'offset': 55, 'text': ',', 'length': 1, 'weight': 0, 'total': 3},
 {'offset': 57, 'text': '`', 'length': 1, 'weight': 0, 'total': 3},
 {'offset': 59, 'text': '`', 'length': 1, 'weight': 0, 'total': 3},
 {'offset': 61, 'text': 'The', 'length': 3, 'weight': 0, 'total': 3},
 {'offset': 65, 'text': 'police', 'length': 6, 'weight': 0, 'total': 3},
 {'offset': 72, 'text': 'actions', 'length': 7, 'weight': 0, 'total': 3}