<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_import/text.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_import/text.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Text Annotation Import
* This notebook will provide examples of each supported annotation type for text assets. It will cover the following:
    * Model-assisted labeling - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission.
    * Label Import - used to provide ground truth labels. These can in turn be used and compared against prediction labels, or used as benchmarks to see how your labelers are doing.

* For information on what types of annotations are supported per data type, refer to this [documentation](https://docs.labelbox.com/docs/model-assisted-labeling#option-1-import-via-python-annotation-types-recommended)

* Notes:
    * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly.

# Installs

In [2]:
!pip install -q 'labelbox[data]'

[K     |████████████████████████████████| 162 kB 7.4 MB/s 
[K     |████████████████████████████████| 10.9 MB 13.3 MB/s 
[K     |████████████████████████████████| 6.3 MB 40.7 MB/s 
[?25h  Building wheel for pygeotile (setup.py) ... [?25l[?25hdone


# Imports

In [3]:
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
from labelbox import Client, LabelingFrontend, LabelImport, MALPredictionImport
from labelbox.data.annotation_types import (
    Label, TextData, Checklist, Radio, ObjectAnnotation, TextEntity,
    ClassificationAnnotation, ClassificationAnswer
)
from labelbox.data.serialization import NDJsonConverter
import uuid
import json
import numpy as np
import copy

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [4]:
# Add your api key
API_KEY = False
client = Client(api_key=API_KEY)

---- 
### Steps
1. Make sure project is setup
2. Collect annotations
3. Upload

### Project setup

We will be creating two projects, one for model-assisted labeling, and one for label imports

In [5]:
ontology_builder = OntologyBuilder(
    tools=[
        Tool(tool=Tool.Type.NER, name="named_entity")
        ],
    classifications=[
        Classification(class_type=Classification.Type.CHECKLIST, instructions="checklist", options=[
            Option(value="first_checklist_answer"),
            Option(value="second_checklist_answer")            
        ]),
        Classification(class_type=Classification.Type.RADIO, instructions="radio", options=[
            Option(value="first_radio_answer"),
            Option(value="second_radio_answer")
        ])])

In [6]:
mal_project = client.create_project(name="text_mal_project")
li_project = client.create_project(name="text_label_import_project")


dataset = client.create_dataset(name="text_annotation_import_demo_dataset")
test_txt_url = "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt"
data_row = dataset.create_data_row(row_data=test_txt_url)
editor = next(client.get_labeling_frontends(where=LabelingFrontend.name == "Editor"))

mal_project.setup(editor, ontology_builder.asdict())
mal_project.datasets.connect(dataset)

li_project.setup(editor, ontology_builder.asdict())
li_project.datasets.connect(dataset)

### Create Label using Annotation Type Objects
* It is recommended to use the Python SDK's annotation types for importing into Labelbox.

### Object Annotations

In [7]:
named_enity_annotation = ObjectAnnotation(
    value=TextEntity(start=10,end=20), 
    name="named_entity"
)

### Classification Annotations

In [8]:
checklist_annotation = ClassificationAnnotation(
    value=Checklist(answer=[ClassificationAnswer(name="first_checklist_answer"),ClassificationAnswer(name="second_checklist_answer")]), 
    name="checklist"
)

In [9]:
radio_annotation = ClassificationAnnotation(
    value=Radio(answer = ClassificationAnswer(name = "second_radio_answer")), 
    name="radio"
)

### Create a Label object with all of our annotations

In [10]:
image_data = TextData(uid=data_row.uid)

label = Label(
    data=image_data,
    annotations = [
        named_enity_annotation, checklist_annotation, radio_annotation
    ]
)

dict(label)



{'annotations': [ObjectAnnotation(name='named_entity', feature_schema_id=None, extra={}, value=TextEntity(start=10, end=20, extra={}), classifications=[]),
  ClassificationAnnotation(name='checklist', feature_schema_id=None, extra={}, value=Checklist(name='checklist', answer=[ClassificationAnswer(name='first_checklist_answer', feature_schema_id=None, extra={}, keyframe=None), ClassificationAnswer(name='second_checklist_answer', feature_schema_id=None, extra={}, keyframe=None)])),
  ClassificationAnnotation(name='radio', feature_schema_id=None, extra={}, value=Radio(answer=ClassificationAnswer(name='second_radio_answer', feature_schema_id=None, extra={}, keyframe=None)))],
 'data': TextData(file_path=None,text=None,url=None),
 'extra': {},
 'uid': None}

### Model Assisted Labeling 

To do model-assisted labeling, we need to convert a Label object into an NDJSON. 

This is easily done with using the NDJSONConverter class

We will create a Label called mal_label which has the same original structure as the label above

Notes:
* Each label requires a valid feature schema id. We will assign it using our built in `assign_feature_schema_ids` method
* the NDJsonConverter takes in a list of labels

In [11]:
# For the purpose of this notebook, we will need to reset the schema ids of our checklist and radio answers
mal_label = copy.deepcopy(label)

In [12]:
mal_label.assign_feature_schema_ids(ontology_builder.from_project(mal_project))

ndjson_labels = list(NDJsonConverter.serialize([mal_label]))

ndjson_labels

[{'classifications': [],
  'dataRow': {'id': 'cl1bb4e2h0wew0zc0a6xu15gj'},
  'location': {'end': 20, 'start': 10},
  'schemaId': 'cl1bb4elg02z80z9hcozp9awl',
  'uuid': '33c9aee2-7902-4f2f-90b1-428b4ffbec0d'},
 {'answer': [{'schemaId': 'cl1bb4elh02zb0z9h66l02vju'},
   {'schemaId': 'cl1bb4elh02zd0z9hc6rf1x26'}],
  'dataRow': {'id': 'cl1bb4e2h0wew0zc0a6xu15gj'},
  'schemaId': 'cl1bb4elh02za0z9h27bpam3h',
  'uuid': '1a5a30b1-77ab-42b4-b3ea-67af0f1262ab'},
 {'answer': {'schemaId': 'cl1bb4elh02zj0z9he5macy00'},
  'dataRow': {'id': 'cl1bb4e2h0wew0zc0a6xu15gj'},
  'schemaId': 'cl1bb4elh02zg0z9hafb0a053',
  'uuid': '21f6021f-0763-4e37-a7bf-496730b12a46'}]

In [13]:
upload_job = MALPredictionImport.create_from_objects(
    client = client, 
    project_id = mal_project.uid, 
    name="upload_label_import_job", 
    predictions=ndjson_labels)

In [14]:
# Errors will appear for each annotation that failed.
# Empty list means that there were no errors
# This will provide information only after the upload_job is complete, so we do not need to worry about having to rerun
print("Errors:", upload_job.errors)

Errors: []


### Label Import

Label import is very similar to model-assisted labeling. We will need to re-assign the feature schema before continuing, 
but we can continue to use our NDJSonConverter

We will create a Label called li_label which has the same original structure as the label above

In [15]:
# For the purpose of this notebook, we will need to reset the schema ids of our checklist and radio answers
li_label = copy.deepcopy(label)

In [16]:
li_label.assign_feature_schema_ids(ontology_builder.from_project(li_project))

ndjson_labels = list(NDJsonConverter.serialize([li_label]))

ndjson_labels, li_project.ontology().normalized

([{'classifications': [],
   'dataRow': {'id': 'cl1bb4e2h0wew0zc0a6xu15gj'},
   'location': {'end': 20, 'start': 10},
   'schemaId': 'cl1bb4fas0prw0zae4n9r5tn7',
   'uuid': '6161e297-2b3a-41fd-b5d6-ab6a8344c233'},
  {'answer': [{'schemaId': 'cl1bb4fat0prz0zae3wwwcpu0'},
    {'schemaId': 'cl1bb4fat0ps10zae1dr8g9eg'}],
   'dataRow': {'id': 'cl1bb4e2h0wew0zc0a6xu15gj'},
   'schemaId': 'cl1bb4fat0pry0zaeelu6bhxj',
   'uuid': '231c19e4-21a1-47e2-9c47-4fdab5c30c26'},
  {'answer': {'schemaId': 'cl1bb4fat0ps70zae2yvt0ngm'},
   'dataRow': {'id': 'cl1bb4e2h0wew0zc0a6xu15gj'},
   'schemaId': 'cl1bb4fat0ps40zaefb2o7p7j',
   'uuid': '65a415d8-7fcc-4f31-81ee-8e27027e2793'}],
 {'classifications': [{'archived': 0,
    'featureSchemaId': 'cl1bb4fat0pry0zaeelu6bhxj',
    'instructions': 'checklist',
    'name': 'checklist',
    'options': [{'featureSchemaId': 'cl1bb4fat0prz0zae3wwwcpu0',
      'label': 'first_checklist_answer',
      'schemaNodeId': 'cl1bb4fat0ps00zaeaomnd9ca',
      'value': 'first_che

In [17]:
upload_job = LabelImport.create_from_objects(
    client = client, 
    project_id = li_project.uid, 
    name="upload_label_import_job", 
    labels=ndjson_labels)

In [18]:
print("Errors:", upload_job.errors)

Errors: []
