<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/static/images/logo-v4.svg" width=190/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_import/text.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_import/text.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Text Annotation Import
* This notebook will provide examples of each supported annotation type for text assets. It will cover the following:
    * Model-assisted labeling - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission.
    * Label Import - used to provide ground truth labels. These can in turn be used and compared against prediction labels, or used as benchmarks to see how your labelers are doing.

* For information on what types of annotations are supported per data type, refer to this [documentation](https://docs.labelbox.com/docs/model-assisted-labeling#option-1-import-via-python-annotation-types-recommended)

* Notes:
    * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly.

# Installs

In [None]:
!pip install -q 'labelbox[data]'

# Imports

In [2]:
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
from labelbox import Client, LabelingFrontend, LabelImport, MALPredictionImport, MediaType
from labelbox.schema.queue_mode import QueueMode
from labelbox.data.annotation_types import (
    Label, TextData, Checklist, Radio, ObjectAnnotation, TextEntity,
    ClassificationAnnotation, ClassificationAnswer, LabelList
)
from labelbox.data.serialization import NDJsonConverter
import uuid
import json
import numpy as np

# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [3]:
# Add your api key
API_KEY=None
client = Client(api_key=API_KEY)

---- 
### Steps
1. Make sure project is setup
2. Collect annotations
3. Upload

### Project setup

We will be creating two projects, one for model-assisted labeling, and one for label imports

In [4]:
ontology_builder = OntologyBuilder(
    tools=[
        Tool(tool=Tool.Type.NER, name="named_entity")
        ],
    classifications=[
        Classification(class_type=Classification.Type.CHECKLIST, instructions="checklist", options=[
            Option(value="first_checklist_answer"),
            Option(value="second_checklist_answer")            
        ]),
        Classification(class_type=Classification.Type.RADIO, instructions="radio", options=[
            Option(value="first_radio_answer"),
            Option(value="second_radio_answer")
        ])])

In [5]:
# Project defaults to batch mode with benchmark quality settings if this argument is not provided
# Queue mode will be deprecated once dataset mode is deprecated

mal_project = client.create_project(name="text_mal_project_demo",
                                    queue_mode=QueueMode.Batch,
                                    auto_audit_percentage=1,
                                    auto_audit_number_of_labels=1,
                                    media_type=MediaType.Text)

li_project = client.create_project(name="text_label_import_project_demo",
                                    queue_mode=QueueMode.Batch,
                                    auto_audit_percentage=1,
                                    auto_audit_number_of_labels=1,
                                    media_type=MediaType.Text)


dataset = client.create_dataset(name="text_annotation_import_demo_dataset")

test_txt_url = {
    "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt",
    "global_key": "TEST-ID-%id" % uuid.uuid1()
    }
data_row = dataset.create_data_row(test_txt_url)
print(data_row)
print(data_row.uid)



######################### DATASET CONSENSUS OPTION ########################
#Note that dataset base projects will be deprecated in the near future.

#To use Datasets/Consensus instead of Batches/Benchmarks use the following query: 
#In this case, 10% of all data rows need to be annotated by three labelers.

# dataset_project = client.create_project(name="dataset-test-project",
#                                 description="a description",
#                                 media_type=MediaType.Text,
#                                 auto_audit_percentage=0.1,
#                                 auto_audit_number_of_labels=3,
#                                 queue_mode=QueueMode.Dataset)

# dataset_project.datasets.connect(dataset)

<DataRow {'created_at': datetime.datetime(2022, 10, 28, 13, 49, 2, tzinfo=datetime.timezone.utc), 'external_id': None, 'global_key': 'TEST-ID-95211973871544529747766652579222388738d', 'media_attributes': {}, 'metadata': [], 'metadata_fields': [], 'row_data': 'https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt', 'uid': 'cl9sjvxht5wu707trh6wabl57', 'updated_at': datetime.datetime(2022, 10, 28, 13, 49, 2, tzinfo=datetime.timezone.utc)}>
cl9sjvxht5wu707trh6wabl57


In [6]:
# Setup Batches and Ontology

# We need the data row ID to create a batch
batch_datarows = [dr.uid for dr in list(dataset.export_data_rows())]

# Create a batch to send to your MAL project
batch_mal = mal_project.create_batch(
  "first-batch-MAL-text-demo", # Each batch in a project must have a unique name
  batch_datarows, # A list of data rows or data row ids
  5 # priority between 1(Highest) - 5(lowest)
)

# Create a batch to send to you LI project
batch_li = li_project.create_batch(
    "first-batch-LI-text-demo", # Each batch in a project must have a unique name
    batch_datarows, # A list of data rows or data row ids
    5 # priority between 1(Highest) - 5(lowest)
)

# Setup your ontology / labeling editor
editor = next(client.get_labeling_frontends(where=LabelingFrontend.name == "Editor")) # Unless using a custom editor,
# Connect your ontology and editor to your MAL and LI project
mal_project.setup(editor, ontology_builder.asdict())
li_project.setup(editor, ontology_builder.asdict())

print("Batch Li: ", batch_li)
print("Batch Mal: ", batch_mal)

Batch Li:  <Batch {'created_at': datetime.datetime(2022, 10, 28, 13, 49, 10, tzinfo=datetime.timezone.utc), 'name': 'first-batch-LI-text-demo', 'size': 1, 'uid': '4cd63220-56c7-11ed-a38a-0d6305c21022', 'updated_at': datetime.datetime(2022, 10, 28, 13, 49, 10, tzinfo=datetime.timezone.utc)}>
Batch Mal:  <Batch {'created_at': datetime.datetime(2022, 10, 28, 13, 49, 7, tzinfo=datetime.timezone.utc), 'name': 'first-batch-MAL-text-demo', 'size': 1, 'uid': '4ab8d060-56c7-11ed-8034-7b5989fe5ecd', 'updated_at': datetime.datetime(2022, 10, 28, 13, 49, 7, tzinfo=datetime.timezone.utc)}>


### Create Label using Annotation Type Objects
* It is recommended to use the Python SDK's annotation types for importing into Labelbox.

### Object Annotations

In [7]:
def create_objects():
  named_enity = TextEntity(start=10,end=20)
  named_enity_annotation = ObjectAnnotation(value=named_enity, name="named_entity")
  return named_enity_annotation

### Classification Annotations

In [8]:
def create_classifications():
  checklist = Checklist(answer=[ClassificationAnswer(name="first_checklist_answer"),ClassificationAnswer(name="second_checklist_answer")])
  checklist_annotation = ClassificationAnnotation(value=checklist, name="checklist")
  radio = Radio(answer = ClassificationAnswer(name = "second_radio_answer"))
  radio_annotation = ClassificationAnnotation(value=radio, name="radio")
  return checklist_annotation, radio_annotation

### Create a Label object with all of our annotations

In [9]:
image_data = TextData(uid=data_row.uid)

named_enity_annotation = create_objects()
checklist_annotation, radio_annotation = create_classifications()

label = Label(
    data=image_data,
    annotations = [
        named_enity_annotation, checklist_annotation, radio_annotation
    ]
)

label.__dict__

{'uid': None,
 'data': TextData(file_path=None,text=None,url=None),
 'annotations': [ObjectAnnotation(name='named_entity', feature_schema_id=None, extra={}, value=TextEntity(start=10, end=20, extra={}), classifications=[]),
  ClassificationAnnotation(name='checklist', feature_schema_id=None, extra={}, value=Checklist(name='checklist', answer=[ClassificationAnswer(name='first_checklist_answer', feature_schema_id=None, extra={}, keyframe=None), ClassificationAnswer(name='second_checklist_answer', feature_schema_id=None, extra={}, keyframe=None)])),
  ClassificationAnnotation(name='radio', feature_schema_id=None, extra={}, value=Radio(answer=ClassificationAnswer(name='second_radio_answer', feature_schema_id=None, extra={}, keyframe=None)))],
 'extra': {}}

### Model Assisted Labeling 

To do model-assisted labeling, we need to convert a Label object into an NDJSON. 

This is easily done with using the NDJSONConverter class

We will create a Label called mal_label which has the same original structure as the label above

Notes:
* the NDJsonConverter takes in a list of labels

In [10]:
mal_label = Label(
    data=image_data,
    annotations = [
        named_enity_annotation, checklist_annotation, radio_annotation
    ]
)
mal_label_list = [mal_label]

mal_ndjson = list(NDJsonConverter.serialize(mal_label_list))

mal_ndjson

[{'uuid': 'e827d42d-4221-4f29-bc52-3dee2ddb8b21',
  'dataRow': {'id': 'cl9sjvxht5wu707trh6wabl57'},
  'name': 'named_entity',
  'classifications': [],
  'location': {'start': 10, 'end': 20}},
 {'name': 'checklist',
  'uuid': '1d522d16-87da-406a-a736-3e0488c7b300',
  'dataRow': {'id': 'cl9sjvxht5wu707trh6wabl57'},
  'answer': [{'name': 'first_checklist_answer'},
   {'name': 'second_checklist_answer'}]},
 {'name': 'radio',
  'answer': {'name': 'second_radio_answer'},
  'uuid': '32ce5a4f-b936-4a87-b196-6ce3447ef897',
  'dataRow': {'id': 'cl9sjvxht5wu707trh6wabl57'}}]

In [11]:
upload_job = MALPredictionImport.create_from_objects(
    client = client, 
    project_id = mal_project.uid, 
    name="upload_label_import_job_demo", 
    predictions=mal_ndjson)

In [12]:
# Errors will appear for each annotation that failed.
# This will provide information only after the upload_job is complete, so we do not need to worry about having to rerun
upload_job.wait_until_done();
print("Errors:", upload_job.errors)

Errors: []


### Label Import

Label import is very similar to model-assisted labeling. We will create a Label called li_label which has the same original structure as the label above

In [14]:
#for the purpose of this notebook, we will need to reset the schema ids of our checklist and radio answers
image_data = TextData(uid=data_row.uid)

named_enity_annotation = create_objects()
checklist_annotation, radio_annotation = create_classifications()

li_label = Label(
    data=image_data,
    annotations = [
        named_enity_annotation, checklist_annotation, radio_annotation
    ]
)
li_label_list = [li_label]

li_ndjson = list(NDJsonConverter.serialize(li_label_list))

li_ndjson

[{'uuid': '7c26b3a6-15fe-4ca2-b031-5ec13e128cbb',
  'dataRow': {'id': 'cl9sjvxht5wu707trh6wabl57'},
  'name': 'named_entity',
  'classifications': [],
  'location': {'start': 10, 'end': 20}},
 {'name': 'checklist',
  'uuid': '3b7e603c-d92c-43b4-b709-b39db6725dd0',
  'dataRow': {'id': 'cl9sjvxht5wu707trh6wabl57'},
  'answer': [{'name': 'first_checklist_answer'},
   {'name': 'second_checklist_answer'}]},
 {'name': 'radio',
  'answer': {'name': 'second_radio_answer'},
  'uuid': '9ff3c585-4cfb-4805-8a79-f7c565d8f924',
  'dataRow': {'id': 'cl9sjvxht5wu707trh6wabl57'}}]

In [15]:
li_upload_job = LabelImport.create_from_objects(
    client = client, 
    project_id = li_project.uid, 
    name="upload_label_import_job_demo", 
    labels=li_ndjson)

In [16]:
# Errors will appear for each annotation that failed.
# This will provide information only after the upload_job is complete, so we do not need to worry about having to rerun
li_upload_job.wait_until_done();
print("Errors:", li_upload_job.errors)

Errors: []


## Cleanup

In [None]:
# li_project.delete()
# mal_project.delete()
# dataset.delete()
