## Creating annotation schema

Manual annotation is a process of marking documents with annotations that can be used to train or test a natural language processing system. We are using brat annotation tool to acquire annotations for your final project. Setting up a project within brat tool requires a set of files:

*    *.txt files - Text files to be annotated
*    *.ann files - Annotation files that will store annotations
*    annotation.conf: annotation type configuration
*    visual.conf: annotation display configuration
*    tools.conf: annotation tool configuration
*    kb_shortcuts.conf: keyboard shortcut tool configuration

Each annotation project typically defines its own annotation.conf. Defining visual.conf, tools.conf and kb_shortcuts.conf is not necessary, and the system falls back on simple default visuals, tools and shortcuts if these files are not present.

We will first define the schema as a string variable and then write it into a file. Annotation schema will be created as ***annotation.conf*** file that has the following format:

---

     [entities]
     [relations]
     [events]
     [attributes]
     # This is a comment line. 
---


Let's define schema for four annotation types:  temperature **term**, temperature **value**, **unit** of measure, and mentions of **fever**.

Also, let's specify a **link** between termperature term and value, and between value and unit of measure.

Also, let's define an **attribute** for Fever as either Affirmed or Negated



In [33]:
annotation_schema = """
[entities]
Ankle-brachial_index
Value
ABI

[relations]
Concept_Value   Arg1:<ENTITY>
Value      Arg1:Index,Arg1:Index,Arg1:Index,Arg1:Value

[events]

[attributes]
Affirmed   Arg:Peripheral_arterial_disease, Value:Affirmed
           Arg_PAD__, Value: Affirmed
           Arg:Peripheral_vascular_disease, Value: Affirmed
           Arg_PVD__, Value: Affirmed
Negation   Arg:Ankle-brachial index, Value:Negated
           Arg:Peripheral_arterial_disease, Value:Negated
           Arg_PAD_-, Value: Negated
           Arg:Peripheral_vascular_disease, Value: Negated
           Arg_PVD__, Value: Negated
Value      Arg:Value

# This is a comment line. 
"""

Annotations look pretty when we add some color. 
The visual configuration file, ***visual.conf***, is divided into the following sections:

---
    [labels]
    [drawing]
---

Each of these sections must be present in the configuration file, but they can be empty. 

The format of the [labels] section is simple: each line contains a set of strings, separated by pipe characters ("|"). The first string should correspond to a type defined in annotation.conf; the second is taken to be the preferred, full form to use for the type, and the remaining (if any) should correspond to its progressively shorter abbreviations. 

Drawing section specifies colors for your annotations.
The recognized visual configuration keys, their values, and purpose are

* fgColor: any HTML color specification (e.g. "black"), sets the color of a span text in the visualization.
* bgColor: any HTML color specification (e.g. "white"), sets the color of a span "box" background in the visualization.
* borderColor: any HTML color specification (e.g. "black"), sets the color of a span "box" border in the visualization. Also supports special value "darken", which specifies to use a darker shade of bgColor for the border.
* color: any HTML color specification (e.g. "black"), sets the color of an arc in the visualization. 

[https://www.w3schools.com/html/html_colors.asp](https://www.w3schools.com/html/html_colors.asp)



In [34]:
visual = """
[labels]
Peripheral_arterial_disease | Peripheral_arterial_disease | PAD
Peripheral_vascular_disease | Peripheral_vascular_disease | PVD
Ankle_brachial_index | Ankle_brachial_index | ABI
Ankle-brachial_index | Ankle-brachial_index | ABI
Value | Value | V


[drawing]
SPAN_DEFAULT	fgColor:black, bgColor:lightgreen, borderColor:darken
ARC_DEFAULT	color:black, bgColor:lightgreen, dashArray:-, arrowHead:triangle-5

Peripheral_arterial_disease bgColor:green, borderColor:red
Peripheral_vascular_disease bgColor:purple, borderColor:red
Ankle_brachial_index bgColor:pink, borderColor:red
Ankle-brachial_index bgColor:pink, borderColor:red
Value bgColor:#ee82ee


Concept_Value color:darkgreen
Value fgColor:blue

Negation  box:crossed
"""

There are two other configuration files that you can specify, but at this time we will keep them blank.

## Wrting configurations into files

To keep your documents separate from everyone elses, enter your UNID.

In [35]:
unid = 'u1166466'
project_name = 'Project_pad'


Check the folder just to see that you have prepared your workspace.

In [36]:
%%bash  -s "$unid" "$project_name"
echo ~/BRAT/$1/$2
ls   ~/BRAT/$1/$2 

/home/u1166466/BRAT/u1166466/Project_pad
10083_67.ann
10083_67.txt
10594_30.ann
10594_30.txt
10594_47.ann
10594_47.txt
12272_8.ann
12272_8.txt
12403_18.ann
12403_18.txt
12403_19.ann
12403_19.txt
12573_45.ann
12573_45.txt
1266_34.ann
1266_34.txt
1266_43.ann
1266_43.txt
1266_57.ann
1266_57.txt
13260_29.ann
13260_29.txt
13625_33.ann
13625_33.txt
1369_68.ann
1369_68.txt
1371_55.ann
1371_55.txt
14566_11.ann
14566_11.txt
15011_0.ann
15011_0.txt
1604_31.ann
1604_31.txt
1604_48.ann
1604_48.txt
16993_50.ann
16993_50.txt
17411_37.ann
17411_37.txt
17411_51.ann
17411_51.txt
17758_28.ann
17758_28.txt
17758_61.ann
17758_61.txt
1795_9.ann
1795_9.txt
1822_20.ann
1822_20.txt
18600_3.ann
18600_3.txt
18839_23.ann
18839_23.txt
18839_27.ann
18839_27.txt
18992_26.ann
18992_26.txt
18992_32.ann
18992_32.txt
19468_46.ann
19468_46.txt
1979_24.ann
1979_24.txt
20794_17.ann
20794_17.txt
21223_5.ann
21223_5.txt
21380_2.ann
21380_2.txt
21380_40.ann
21380_40.txt
22043_16.ann
22043_16.txt
22043_65.ann
22043_65.txt
223

Now let's write the files into the correct folder.

In [37]:
path = "/home/"+str(unid)+"/BRAT/"+str(unid)+"/"+project_name
path

'/home/u1166466/BRAT/u1166466/Project_pad'

In [38]:
annotation_conf_file=path+"/annotation.conf"
f=open(annotation_conf_file, "w")
f.write(annotation_schema)
f.close()

In [39]:
visual_conf_file=path+"/visual.conf"
f=open(visual_conf_file, "w")
f.write(visual)
f.close()

Check to make sure that the files got written to the correct folder.

In [40]:
%%bash  -s "$unid" "$project_name"
echo ~/BRAT/$1/$2
ls   ~/BRAT/$1/$2 

/home/u1166466/BRAT/u1166466/Project_pad
10083_67.ann
10083_67.txt
10594_30.ann
10594_30.txt
10594_47.ann
10594_47.txt
12272_8.ann
12272_8.txt
12403_18.ann
12403_18.txt
12403_19.ann
12403_19.txt
12573_45.ann
12573_45.txt
1266_34.ann
1266_34.txt
1266_43.ann
1266_43.txt
1266_57.ann
1266_57.txt
13260_29.ann
13260_29.txt
13625_33.ann
13625_33.txt
1369_68.ann
1369_68.txt
1371_55.ann
1371_55.txt
14566_11.ann
14566_11.txt
15011_0.ann
15011_0.txt
1604_31.ann
1604_31.txt
1604_48.ann
1604_48.txt
16993_50.ann
16993_50.txt
17411_37.ann
17411_37.txt
17411_51.ann
17411_51.txt
17758_28.ann
17758_28.txt
17758_61.ann
17758_61.txt
1795_9.ann
1795_9.txt
1822_20.ann
1822_20.txt
18600_3.ann
18600_3.txt
18839_23.ann
18839_23.txt
18839_27.ann
18839_27.txt
18992_26.ann
18992_26.txt
18992_32.ann
18992_32.txt
19468_46.ann
19468_46.txt
1979_24.ann
1979_24.txt
20794_17.ann
20794_17.txt
21223_5.ann
21223_5.txt
21380_2.ann
21380_2.txt
21380_40.ann
21380_40.txt
22043_16.ann
22043_16.txt
22043_65.ann
22043_65.txt
223

### Now your project is setup and ready for annotation.