<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/static/images/logo-v4.svg" width=190/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_import/text.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_import/text.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Text Annotation Import
* This notebook will provide examples of each supported annotation type for text assets, and also cover MAL and Label Import methods.

Supported annotations that can be uploaded through the SDK: 

* Entity
* Classification radio 
* Classification checklist 
* Classification free-form text 

**Not** supported:
* Segmentation mask
* Polygon
* Bounding box 
* Polyline
* Point 

MAL and Label Import: 

* Model-assisted labeling - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission.
* Label Import - used to provide ground truth labels. These can in turn be used and compared against prediction labels, or used as benchmarks to see how your labelers are doing.

For information on what types of annotations are supported per data type, refer to the Import text annotations [documentation](https://docs.labelbox.com/reference/import-text-annotations).

Notes:
  * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly.
  * You may need to refresh your browser in order to see the results of the import job.

### Setup


In [36]:
!pip install -q 'labelbox[data]'

In [37]:
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
from labelbox import Client, LabelingFrontend, LabelImport, MALPredictionImport, MediaType
from labelbox.schema.queue_mode import QueueMode
from labelbox.data.annotation_types import (
    Label, TextData, Checklist, Radio, ObjectAnnotation, TextEntity,
    ClassificationAnnotation, ClassificationAnswer, LabelList, Text, ImageData
)
from labelbox.data.serialization import NDJsonConverter
import uuid
import json

### Replace with your API key
Guides on [Create an API key](https://docs.labelbox.com/docs/create-an-api-key)

In [38]:
# Add your api key
API_KEY = None
client = Client(API_KEY)

## Supported annotations for text

### Supported Python annotation types and NDJSON

In [39]:
########## Entities ##########

# Python annotation
named_entity = TextEntity(start=10, end=20)
named_entitity_annotation = ObjectAnnotation(value=named_entity, name = "named_entity")


# NDJSON
entities_ndjson = { 
    "name": "named_entity",
    "location": { 
        "start": 67, 
        "end": 128 
    }
}

In [40]:
########## Classification - Radio (single choice ) ##########

# Python annotation 
radio_annotation = ClassificationAnnotation(
    name="radio_question",
    value=Radio(answer = 
        ClassificationAnswer(name = "first_radio_answer")
    )
)


# NDJSON
radio_annotation_ndjson = {
  'name': 'radio_question',
  'answer': {'name': 'first_radio_answer'}
} 

In [41]:
########## Classification - Radio (with subclassifcations) is only suppported with NDJSON tools ##########

# NDJSON
radio_annotation_ndjson_with_subclass = {
  'name': 'radio_question_sub',
  'answer': {
      'name': 'first_radio_answer',
      'classifications': [{
          'name':'sub_radio_question',
          'answer': { 'name' : 'first_sub_radio_answer'}
        }]
    }
}

In [42]:
########## Classification - Checklist (Multi-choice) ##########

# Python annotation
checklist_annotation = ClassificationAnnotation(
    name="checklist_question",
    value=Checklist(answer = [
        ClassificationAnswer(name = "first_checklist_answer"),
        ClassificationAnswer(name = "second_checklist_answer"),
        ClassificationAnswer(name = "third_checklist_answer")
    ])
  )


# NDJSON
checklist_annotation_ndjson = {
  'name': 'checklist_question',
  'answer': [
    {'name': 'first_checklist_answer'},
    {'name': 'second_checklist_answer'},
    {'name': 'third_checklist_answer'},
  ]
}

In [43]:
########## Classification Free-Form text  ##########

# Python annotation
text_annotation = ClassificationAnnotation(
    name = "free_text", 
    value = Text(answer="sample text")
)

#  NDJSON
text_annotation_ndjson = {
  'name': 'free_text',
  'answer': 'sample text',
}

## Upload Annoations - putting it all together 

### Step 1: Import data rows into Catalog

In [44]:
# You can now include ohter fields like attachments, media type and metadata in the data row creation step: https://docs.labelbox.com/reference/text-file   
text_asset = {
    "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt",
    "global_key": "TEST-ID-%id" % uuid.uuid4(),
    "media_type": "TEXT",
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
    }

dataset = client.create_dataset(name="text_annotation_import_demo_dataset")
data_row = dataset.create_data_row(text_asset)
print(data_row)
print(data_row.uid)

<DataRow {
    "created_at": "2023-01-30 18:06:23+00:00",
    "external_id": null,
    "global_key": "TEST-ID-321668750509639104503407808903937477958d",
    "media_attributes": {},
    "metadata": [],
    "metadata_fields": [],
    "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt",
    "uid": "cldj4gyf60fy207xh3z1y2g1h",
    "updated_at": "2023-01-30 18:06:23+00:00"
}>
cldj4gyf60fy207xh3z1y2g1h


### Step 2:  Create/select an ontology
Your project should have the correct ontology setup with all the tools and classifications supported for your annotations, and the tool and classification names should match the `name` field in your annotations to ensure the correct feature schemas are matched.

For example, when we create the checklist annotation above, we provided the `name` as `checklist_question`. Now, when we setup our ontology, we must ensure that the name of my classification tool is also `checklist_question`. The same alignment must hold true for the other tools and classifications we create in our ontology.

[Documentation for reference ](https://docs.labelbox.com/reference/import-text-annotations)

In [45]:
## Setup the ontology and link the tools created above.

ontology_builder = OntologyBuilder(
  classifications=[ # List of Classification objects
    Classification( 
      class_type=Classification.Type.RADIO, 
      name="radio_question", 
      options=[Option(value="first_radio_answer")]
    ),
    Classification( 
      class_type=Classification.Type.RADIO, 
      name="radio_question_sub", 
      options=[
        Option(value="first_radio_answer",
          options=[
              Classification(
                class_type=Classification.Type.RADIO,
                name="sub_radio_question",
                options=[
                  Option(value="first_sub_radio_answer")
                ]
            ),
          ]
        )
      ],
    ),
    Classification( 
      class_type=Classification.Type.CHECKLIST, 
      name="checklist_question", 
      options=[
        Option(value="first_checklist_answer"),
        Option(value="second_checklist_answer"), 
        Option(value="third_checklist_answer")            
      ]
    ), 
     Classification( # Text classification given the name "text"
      class_type=Classification.Type.TEXT,
      name="free_text"
    )
  ],
  tools=[ # List of Tool objects
         Tool(tool=Tool.Type.NER, 
              name="named_entity")
    ]
)

ontology = client.create_ontology("Ontology Text Annotations", ontology_builder.asdict())


### Step 3: Create a labeling project 
Connect the ontology to the labeling project 

In [46]:
# Project defaults to batch mode with benchmark quality settings if this argument is not provided
# Queue mode will be deprecated once dataset mode is deprecated

project = client.create_project(name="text_project_demo",
                                    queue_mode=QueueMode.Batch,
                                    media_type=MediaType.Text)


project.setup_editor(ontology)

######################### DATASET CONSENSUS OPTION ########################
#Note that dataset base projects will be deprecated in the near future.

#To use Datasets/Consensus instead of Batches/Benchmarks use the following query: 
#In this case, 10% of all data rows need to be annotated by three labelers.

# dataset_project = client.create_project(name="dataset-test-project",
#                                 description="a description",
#                                 media_type=MediaType.Text,
#                                 auto_audit_percentage=0.1,
#                                 auto_audit_number_of_labels=3,
#                                 queue_mode=QueueMode.Dataset)

# dataset_project.datasets.connect(dataset)

### Step 4: Send a batch of data rows to the project 

In [47]:
# Setup Batches and Ontology

# Create a batch to send to your MAL project
batch = project.create_batch(
  "first-batch-text-demo", # Each batch in a project must have a unique name
  dataset.export_data_rows(), # A list of data rows or data row ids
  5 # priority between 1(Highest) - 5(lowest)
)

print("Batch: ", batch)

Batch:  <Batch {
    "consensus_settings_json": "{\"numberOfLabels\":1,\"coveragePercentage\":0}",
    "created_at": "2023-01-30 18:06:28+00:00",
    "name": "first-batch-text-demo",
    "size": 1,
    "uid": "d14be9f0-a0c8-11ed-a7e5-f9da6146996d",
    "updated_at": "2023-01-30 18:06:28+00:00"
}>


### Step 5: Create the annotations payload

Create the annotations payload using the snippets of code above

Labelbox support two formats for the annotations payload: NDJSON and Python Annotation types. Both are described below. If you are using Python Annotation types, compose your annotations into Labels attached to the data rows.

#### Python annotations

In [48]:
# Create a Label
label = Label(
    data=ImageData(
        uid=data_row.uid),
    annotations = [
     named_entitity_annotation, 
     radio_annotation, 
     checklist_annotation, 
     text_annotation
    ]
)


# Create urls to mask data for upload
def signing_function(obj_bytes: bytes) -> str:
    url = client.upload_data(content=obj_bytes, sign=True)
    return url    

label.add_url_to_masks(signing_function)

# Convert our label from a Labelbox class object to the underlying NDJSON format required for upload 
label_ndjson = list(NDJsonConverter.serialize([label]))

#### NDJSON annotations

In [49]:
label_ndjson_method2 = []
for annotations in [entities_ndjson, 
                   radio_annotation_ndjson,  
                   radio_annotation_ndjson_with_subclass,
                   checklist_annotation_ndjson,
                   text_annotation_ndjson] :
  annotations.update({
      'uuid': str(uuid.uuid4()),
      'dataRow': {
          'id': data_row.uid
      }
  })                   
  label_ndjson_method2.append(annotations)

### Step 6: Upload annotations to a project as pre-labels or completed labels
For the purpose of this tutorial only run one of the label_ndjosn  annotation type tools at the time (NDJSON or Python Annotation types). 




#### Model-Assisted Labeling (MAL)

In [50]:
# Upload MAL label for this data row in project
upload_job_mal = MALPredictionImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name="mal_import_job"+str(uuid.uuid4()), 
    ### Run label_ndjson_method2 if labels are using NDJSON tools
    predictions=label_ndjson_method2)

upload_job_mal.wait_until_done();
print("Errors:", upload_job_mal.errors)
print("   ")

Errors: []
   


#### Label Import 

In [51]:
# Upload label for this data row in project
upload_job_label_import = LabelImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name="label_import_job"+str(uuid.uuid4()),  
    labels=label_ndjson_method2)

upload_job_label_import.wait_until_done();
print("Errors:", upload_job_label_import.errors)

Errors: []


### Optional deletions for cleanup

In [52]:
# project.delete()
# dataset.delete()