# Adding additional elements to help annotation

This notebook assumes that you are familiar with the basic usage of PyLigher. If not you can read the README.md or check the Simple_usage notebook.

The aim of this notebook is to add elements specific to each document when annotating. There is two types of information you may want to add:
- Additional information
- Additional outputs

Additional information are information that are specific to each document and that are not meant to be changed during the annotation. For instance, the source of the document and the date of the document are additional information.

Additional outputs are information that can be changed during the annotation. For instance, you may want to have  inputs to write your level of confidence in your annotation, to write comments, to flag the document as inappropriate, etc.

The first part of this notebook covers additional information while the second part covers additional outputs.

In [None]:
# Defining the corpus of documents to use throughout this notebook
corpus = [
     "PyLighter is an annotation tool for NER tasks directly on Jupyter. " 
    + "It aims on helping data scientists easily and quickly annotate datasets. "
    + "This tool was developed by Paylead.",
    "PayLead is a fintech company specializing in transaction data analysis. "
    + "Paylead brings retail and banking together, so customers get rewarded when they buy. " 
    + "Welcome to the data-for-value economy."
]

## Adding additional information

The goal of this section is to show the source and the date of each document during the annotation.

Additonal information need to be in pandas DataFrame. The name of the column will be used to display additional text. When using it, make sure that your DataFrame has _len(corpus)_ rows.

In [None]:
# Defining the information to add
import pandas as pd
import datetime
additional_infos = pd.DataFrame({
    "source":["Github", "paylead.fr"],
    "date":[datetime.datetime.now(), datetime.datetime.now() - datetime.timedelta(days=365)]
})

## Start annotating

In [None]:
from pylighter import Annotation
annotation = Annotation(corpus, additional_infos=additional_infos)

## Adding additional outputs

The goal of this section is to add an input to flag a document as inappropriate, to add an input field to mark our confidence in our annotation, and an input to add comments.

Note: A good way to handle:
- boolean values is to use a checkbox.
- float values is to use float_text (int_text also exists for ints)
- text is to use text (text_area also exists for large texts)

To add an input to pylighter you need to use AdditionalOutputElement.

In [None]:
from pylighter import AdditionalOutputElement, Annotation

In [None]:
flag_element = AdditionalOutputElement(
    name="is_inappropriate",  # Name of the field
    display_type="checkbox",  # Type of display in ["checkbox", "int_text", "float_text", "text", "text_area"]
    description="Is this document inappropriate",  # Description to display
    default_value=False,  # Default value to use
)

confidence_element = AdditionalOutputElement(
    name="confidence_score",  # Name of the field
    display_type="float_text",  # Type of display in ["checkbox", "int_text", "float_text", "text", "text_area"]
    description="Confidence in your annotation",  # Description to display
    default_value=1,  # Default value to use
)

comment_element = AdditionalOutputElement(
    name="comment",  # Name of the field
    display_type="text_area",  # Type of display in ["checkbox", "int_text", "float_text", "text", "text_area"]
    description="Comment",  # Description to display
    default_value="",  # Default value to use
)

additional_outputs_elements = [flag_element, confidence_element, comment_element]

In [None]:
annotation = Annotation(corpus, additional_outputs_elements=additional_outputs_elements)

### Retrieving the results

At that point, you should have finished your annotation but you may wonder how to get your annotations and your additional ouputs. There is two ways:
- Clicking on the save button
- Accessing the labelise corpus directly

When clicking on the save button, the additional outputs will be automatically added to the csv.

Or if you want it them right away, you can access the element _annotation.additional_ouptuts_values_ which is a pandas DataFrame.

In [None]:
from copy import deepcopy
my_annotation = deepcopy(annotation.labels)
my_additional_outputs = annotation.additional_outputs_values.copy()

### Using an already annotated corpus with additional outputs

In some cases, you may want to annotate an already annotated corpus and thus use the already filled additional outputs. To do so, you can use the _additional_outputs_values_ argument.

It works the same as the argument labels does. You can see more on this in the associated notebook or in the README.md.

Note: Do not forget to still add your additional outputs elements and make sure that it is having the correct size (ie. len(corpus)) and the same columns names as the names in the _additional_output_elements_.

In [None]:
already_annotated = Annotation(corpus, 
                               labels=my_annotation, 
                               additional_outputs_values=my_additional_outputs, 
                               additional_outputs_elements=additional_outputs_elements)