<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/static/images/logo-v4.svg" width=190/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/annotation_import/text.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/master/examples/annotation_import/text.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Text Annotation Import
* This notebook will provide examples of each supported annotation type for text assets, and also cover MAL and Label Import methods.

Supported annotations that can be uploaded through the SDK: 

* Entity
* Classification radio 
* Classification checklist 
* Classification free-form text 


**Not** supported:
* Relationships
* Segmentation mask
* Polygon
* Bounding box 
* Polyline
* Point 

MAL and Label Import: 

* Model-assisted labeling - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission.
* Label Import - used to provide ground truth labels. These can in turn be used and compared against prediction labels, or used as benchmarks to see how your labelers are doing.

For information on what types of annotations are supported per data type, refer to the Import text annotations [documentation](https://docs.labelbox.com/reference/import-text-annotations).

Notes:
  * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly.
  * You may need to refresh your browser in order to see the results of the import job.

### Setup


In [None]:
!pip install -q "labelbox[data]"

In [None]:
import labelbox as lb
import labelbox.types as lb_types
import uuid
import json

### Replace with your API key
Guides on [Create an API key](https://docs.labelbox.com/docs/create-an-api-key)

In [None]:
# Add your api key
API_KEY=""
client = lb.Client(API_KEY)


## Supported annotations for text

### Supported Python annotation types and NDJSON

In [None]:
########## Entities ##########

# Python annotation
named_entity = lb_types.TextEntity(start=10, end=20)
named_entitity_annotation = lb_types.ObjectAnnotation(value=named_entity, name = "named_entity")


# NDJSON
entities_ndjson = { 
    "name": "named_entity",
    "location": { 
        "start": 67, 
        "end": 128 
    }
}

In [None]:
########## Classification - Radio (single choice ) ##########

# Python annotation 
radio_annotation = lb_types.ClassificationAnnotation(
    name="radio_question",
    value=lb_types.Radio(answer = 
        lb_types.ClassificationAnswer(name = "first_radio_answer")
    )
)


# NDJSON
radio_annotation_ndjson = {
  "name": "radio_question",
  "answer": {"name": "first_radio_answer"}
} 

In [None]:
########## Classification - Radio and Checklist (with subclassifications)  ##########

nested_radio_annotation = lb_types.ClassificationAnnotation(
  name="nested_radio_question",
  value=lb_types.Radio(
    answer=lb_types.ClassificationAnswer(
      name="first_radio_answer",
      classifications=[
        lb_types.ClassificationAnnotation(
          name="sub_radio_question",
          value=lb_types.Radio(
            answer=lb_types.ClassificationAnswer(
              name="first_sub_radio_answer"
            )
          )
        )
      ]
    )
  )
)
# NDJSON
nested_radio_annotation_ndjson= {
  "name": "nested_radio_question",
  "answer": {
      "name": "first_radio_answer",
      "classifications": [{
          "name":"sub_radio_question",
          "answer": { "name" : "first_sub_radio_answer"}
        }]
    }
}

nested_checklist_annotation = lb_types.ClassificationAnnotation(
  name="nested_checklist_question",
  value=lb_types.Checklist(
    answer=[lb_types.ClassificationAnswer(
      name="first_checklist_answer",
      classifications=[
        lb_types.ClassificationAnnotation(
          name="sub_checklist_question",
          value=lb_types.Checklist(
            answer=[lb_types.ClassificationAnswer(
            name="first_sub_checklist_answer"
          )]
        ))
      ]
    )]
  )
)
nested_checklist_annotation_ndjson = {
  "name": "nested_checklist_question",
  "answer": [{
      "name": "first_checklist_answer", 
      "classifications" : [
        {
          "name": "sub_checklist_question", 
          "answer": {"name": "first_sub_checklist_answer"}
        }          
      ]         
  }]
}

In [None]:
########## Classification - Checklist (Multi-choice) ##########

# Python annotation
checklist_annotation = lb_types.ClassificationAnnotation(
    name="checklist_question",
    value=lb_types.Checklist(answer = [
        lb_types.ClassificationAnswer(name = "first_checklist_answer"),
        lb_types.ClassificationAnswer(name = "second_checklist_answer"),
        lb_types.ClassificationAnswer(name = "third_checklist_answer")
    ])
  )


# NDJSON
checklist_annotation_ndjson = {
  "name": "checklist_question",
  "answer": [
    {"name": "first_checklist_answer"},
    {"name": "second_checklist_answer"},
    {"name": "third_checklist_answer"},
  ]
}

In [None]:
########## Classification Free-Form text  ##########

# Python annotation
text_annotation = lb_types.ClassificationAnnotation(
    name = "free_text", 
    value = lb_types.Text(answer="sample text")
)

#  NDJSON
text_annotation_ndjson = {
  "name": "free_text",
  "answer": "sample text",
}

## Upload Annoations - putting it all together 

### Step 1: Import data rows into Catalog

In [None]:
# You can now include ohter fields like attachments, media type and metadata in the data row creation step: https://docs.labelbox.com/reference/text-file   
global_key = "lorem-ipsum.txt"
text_asset = {
    "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt",
    "global_key": global_key,
    "media_type": "TEXT",
    "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
    }

dataset = client.create_dataset(
    name="text_annotation_import_demo_dataset", 
    iam_integration=None # Removing this argument will default to the organziation's default iam integration
)
task = dataset.create_data_rows([text_asset])
task.wait_till_done()
print("Errors:",task.errors)
print("Failed data rows:", task.failed_data_rows)

### Step 2:  Create/select an ontology
Your project should have the correct ontology setup with all the tools and classifications supported for your annotations, and the tool and classification `name` should match the `name` field in your annotations to ensure the correct feature schemas are matched.

For example, when we create the checklist annotation above, we provided the `name` as `checklist_question`. Now, when we setup our ontology, we must ensure that the name of my classification tool is also `checklist_question`. The same alignment must hold true for the other tools and classifications we create in our ontology.

[Documentation for reference ](https://docs.labelbox.com/reference/import-text-annotations)

In [None]:
## Setup the ontology and link the tools created above.

ontology_builder = lb.OntologyBuilder(
  classifications=[ # List of Classification objects
    lb.Classification( 
      class_type=lb.Classification.Type.RADIO, 
      name="radio_question", 
      options=[lb.Option(value="first_radio_answer")]
    ),
    lb.Classification( 
      class_type=lb.Classification.Type.RADIO, 
      name="nested_radio_question", 
      options=[
        lb.Option(value="first_radio_answer",
          options=[
              lb.Classification(
                class_type=lb.Classification.Type.RADIO,
                name="sub_radio_question",
                options=[
                  lb.Option(value="first_sub_radio_answer")
                ]
            ),
          ]
        ),
      ], 
    ),
     lb.Classification(
      class_type=lb.Classification.Type.CHECKLIST,
      name="nested_checklist_question",
      options=[
          lb.Option("first_checklist_answer",
            options=[
              lb.Classification(
                  class_type=lb.Classification.Type.CHECKLIST,
                  name="sub_checklist_question", 
                  options=[lb.Option("first_sub_checklist_answer")]
              )
          ]
        )
      ]
    ),
    lb.Classification( 
      class_type=lb.Classification.Type.CHECKLIST, 
      name="checklist_question", 
      options=[
        lb.Option(value="first_checklist_answer"),
        lb.Option(value="second_checklist_answer"), 
        lb.Option(value="third_checklist_answer")            
      ]
    ), 
     lb.Classification( # Text classification given the name "text"
      class_type=lb.Classification.Type.TEXT,
      name="free_text"
    )
  ],
  tools=[ # List of Tool objects
         lb.Tool(
            tool=lb.Tool.Type.NER, 
            name="named_entity"
          ),
    ]
)

ontology = client.create_ontology("Ontology Text Annotations", ontology_builder.asdict())


### Step 3: Create a labeling project 
Connect the ontology to the labeling project 

In [None]:
# Project defaults to batch mode with benchmark quality settings if this argument is not provided
# Queue mode will be deprecated once dataset mode is deprecated

project = client.create_project(name="Text Annotation Import Demo",
                                    media_type=lb.MediaType.Text)


project.setup_editor(ontology)

### Step 4: Send a batch of data rows to the project 

In [None]:
# Setup Batches and Ontology

# Create a batch to send to your MAL project
batch = project.create_batch(
  "first-batch-text-demo", # Each batch in a project must have a unique name
  global_keys=[global_key], # Paginated collection of data row objects, list of data row ids or global keys
  priority=5 # priority between 1(Highest) - 5(lowest)
)

print("Batch: ", batch)

### Step 5: Create the annotations payload

Create the annotations payload using the snippets of code above

Labelbox support two formats for the annotations payload: NDJSON and Python Annotation types. Both are described below. If you are using Python Annotation types, compose your annotations into Labels attached to the data rows.

#### Python annotations

In [None]:
# Create a Label
labels = []
labels.append(
    lb_types.Label(
        data=lb_types.TextData(
            global_key=global_key),
        annotations = [
            named_entitity_annotation, 
            radio_annotation, 
            checklist_annotation, 
            text_annotation,
            nested_checklist_annotation,
            nested_radio_annotation
        ]
    )
)

#### NDJSON annotations

In [None]:
label_ndjson = []
annotations: list[dict] = [
  entities_ndjson, 
  radio_annotation_ndjson,  
  checklist_annotation_ndjson,
  text_annotation_ndjson,
  nested_radio_annotation_ndjson,
  nested_checklist_annotation_ndjson,
]

for annotation in annotations:
  annotation.update({
      "dataRow": { "globalKey": global_key }
  })                   
  label_ndjson.append(annotation)

### Step 6: Upload annotations to a project as pre-labels or ground truth
For the purpose of this tutorial only import one of the annotations payloads at the time (NDJSON or Python Annotation types). 




Option A: Upload to a labeling project as pre-labels (MAL)

In [None]:
# Upload MAL label for this data row in project
upload_job_mal = lb.MALPredictionImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name="mal_import_job"+str(uuid.uuid4()), 
    predictions=labels)

upload_job_mal.wait_until_done()
print("Errors:", upload_job_mal.errors)
print("Status of uploads: ", upload_job_mal.statuses)

Option B: Upload to a labeling project using ground truth

In [None]:
# Upload label for this data row in project 

# upload_job_label_import = lb.LabelImport.create_from_objects(
#     client = client, 
#     project_id = project.uid, 
#     name="label_import_job"+str(uuid.uuid4()),  
#     labels=labels)

# upload_job_label_import.wait_until_done();
# print("Errors:", upload_job_label_import.errors)
# print("Status of uploads: ", upload_job_label_import.statuses)

### Optional deletions for cleanup

In [None]:
# project.delete()
# dataset.delete()