<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/prediction_upload/text_predictions.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/blob/develop/examples/prediction_upload/text_predictions.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Text Prediction Import
* This notebook walks you through the process of uploading model predictions to a Model Run. This notebook provides an example for each supported prediction type for text assets. 

Supported annotations that can be uploaded through the SDK: 

* Entity
* Classification radio 
* Classification checklist 
* Classification free-form text 

**Not** supported:
* Segmentation mask
* Polygon
* Bounding box 
* Polyline
* Point 


A Model Run is a container for the predictions, annotations and metrics of a specific experiment in your ML model development cycle.



## Setup

In [1]:
!pip install -q 'labelbox[data]'

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m185.5/185.5 KB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m82.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for pygeotile (setup.py) ... [?25l[?25hdone


In [2]:
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
from labelbox import Client, MALPredictionImport, LabelImport
from labelbox.data.serialization import NDJsonConverter
from labelbox.schema.media_type import MediaType
from labelbox.data.annotation_types import (
    Label, TextData, Checklist, Radio, ObjectAnnotation, TextEntity,
    ClassificationAnnotation, ClassificationAnswer, LabelList, Text, ImageData
)
import uuid
import numpy as np
from labelbox.schema.queue_mode import QueueMode

## Replace with your API Key 
Guides on [Create an API key](https://docs.labelbox.com/docs/create-an-api-key)

In [None]:
API_KEY = None
client = Client(API_KEY)

## Supported Predictions

In [4]:
########## Entities ##########

# Python annotation
named_entity = TextEntity(start=10, end=20)
entities_prediction = ObjectAnnotation(value=named_entity, name = "named_entity", confidence=0.5)


# NDJSON
entities_prediction_ndjson = { 
    "name": "named_entity",
    "confidence": 0.5, 
    "location": { 
        "start": 67, 
        "end": 128 
    }
}

In [5]:
########## Classification - Radio (single choice ) ##########

# Python annotation 
radio_prediction = ClassificationAnnotation(
    name="radio_question",
    value=Radio(answer = 
        ClassificationAnswer(name = "first_radio_answer", confidence=0.5)
    )
)


# NDJSON
radio_prediction_ndjson = {
  'name': 'radio_question',
  'confidence': 0.5,
  'answer': {'name': 'first_radio_answer', 'confidence': 0.5}
} 

In [6]:
#### Nested Classifications only supported with NDJSON tools ######

nested_radio_prediction_ndjson = {
  'name': 'radio_question_sub',
  'answer': {
      'name': 'first_radio_answer',
      "confidence": 0.5,
      'classifications': [{
          'name':'sub_radio_question',
          'answer': { 'name' : 'first_sub_radio_answer', 'confidence': 0.5 }
        }]
    }
}

nested_checklist_prediction_ndjson = {
  "name": "nested_checklist_question",
  "confidence": 0.01,
  "answer": [{
      "name": "first_checklist_answer", 
      "confidence": 0.01,
      "classifications" : [
        {
          "name": "sub_checklist_question", 
          "answer": {"name": "first_sub_checklist_answer", "confidence": 0.01 }
        }          
      ]         
  }]
}

In [7]:
########## Checklist ##########

# Python annotation
checklist_prediction = ClassificationAnnotation(
    name="checklist_question",
    value=Checklist(
        answer = [
            ClassificationAnswer(
                name = "first_checklist_answer",
                confidence=0.5
            ),
            ClassificationAnswer(
                name = "second_checklist_answer", 
                confidence=0.5
            ),
            ClassificationAnswer(
                name = "third_checklist_answer", 
                confidence=0.5
            )
    ])
  )


# NDJSON
checklist_prediction_ndjson = {
  'name': 'checklist_question',
  'confidence': 0.5,
  'answer': [
    {'name': 'first_checklist_answer', 'confidence': 0.5}
  ]
}




In [8]:
########## Classification Free-Form text  ##########

# Python annotation
text_prediction = ClassificationAnnotation(
    name = "free_text", 
    value = Text(answer="sample text")
)

#  NDJSON
text_prediction_ndjson = {
  'name': 'free_text',
  'answer': 'sample text'
}

## Step 1: Import data rows into Catalog

In [9]:
# send a sample image as batch to the project
test_img_url = {
    "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt",
    "global_key": str(uuid.uuid4())
}
dataset = client.create_dataset(name="text_prediction_import")
data_row = dataset.create_data_row(test_img_url)
print(data_row)

<DataRow {
    "created_at": "2023-01-26 15:41:43+00:00",
    "external_id": null,
    "global_key": "7f9e8601-a206-4e39-a8d6-b6cf6a31e933",
    "media_attributes": {},
    "metadata": [],
    "metadata_fields": [],
    "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt",
    "uid": "cldd9jik93ntp07z75ztshfkr",
    "updated_at": "2023-01-26 15:41:43+00:00"
}>


## Step 2: Create/select an Ontology for your model predictions
Your project should have the correct ontology setup with all the tools and classifications supported for your annotations, and the tool and classification names should match the `name` field in your annotations to ensure the correct feature schemas are matched.


In [10]:
## Setup the ontology and link the tools created above.

ontology_builder = OntologyBuilder(
  classifications=[ # List of Classification objects
    Classification( 
      class_type=Classification.Type.RADIO, 
      name="radio_question", 
      options=[Option(value="first_radio_answer")]
    ),
    Classification( 
      class_type=Classification.Type.RADIO, 
      name="radio_question_sub", 
      options=[
        Option(value="first_radio_answer",
          options=[
              Classification(
                class_type=Classification.Type.RADIO,
                name="sub_radio_question",
                options=[
                  Option(value="first_sub_radio_answer")
                ]
            ),
          ]
        )
      ],
    ),
    Classification( 
      class_type=Classification.Type.CHECKLIST, 
      name="checklist_question", 
      options=[
        Option(value="first_checklist_answer"),
        Option(value="second_checklist_answer"), 
        Option(value="third_checklist_answer")            
      ]
    ), 
     Classification( 
      class_type=Classification.Type.TEXT,
      name="free_text"
    ),
    Classification(
      class_type=Classification.Type.CHECKLIST, 
      name="nested_checklist_question",
      options=[
          Option("first_checklist_answer",
            options=[
              Classification(
                  class_type=Classification.Type.CHECKLIST, 
                  name="sub_checklist_question", 
                  options=[Option("first_sub_checklist_answer")]
              )
          ]
        )
      ]
    )
  ],
  tools=[ # List of Tool objects
         Tool(tool=Tool.Type.NER, 
              name="named_entity")
    ]
)

ontology = client.create_ontology("Ontology Text Predictions", ontology_builder.asdict() , media_type=MediaType.Text)


## Step 3: Create a Model and Model Run

In [11]:
# create Model
model = client.create_model(name="text_model_run_"+ str(uuid.uuid4()), 
                            ontology_id=ontology.uid)
# create Model Run
model_run = model.create_model_run("iteration 1")

## Step 4: Send data rows to the Model Run

In [20]:
model_run.upsert_data_rows([data_row.uid])

True

## Step 5. Create the predictions payload

Create the prediction payload using the snippets of code in the **Supported Predcitions** section

Labelbox support two formats for the annotations payload: NDJSON and Python Annotation types. Both are described below to compose your annotations into Labels attached to the data rows.

The resulting label_ndjson should have exactly the same content for annotations that are supported by both (with exception of the uuid strings that are generated)

In [21]:
# Create a Label for predictions
label_prediction = Label(
    data=TextData(uid=data_row.uid),
    annotations = [
      entities_prediction, 
      radio_prediction, 
      checklist_prediction,
      text_prediction
    ]
)

# Create a label list 
label_list_prediction = [label_prediction]

# Convert the prediction label from a Labelbox class object to the underlying NDJSON format required for upload - uploads can be directly built in this syntax as well
ndjson_prediction = list(NDJsonConverter.serialize(label_list_prediction))

If using NDJSON: 

In [22]:

ndjson_prediction_method2 = []
for annot in [
    entities_prediction_ndjson, 
    radio_prediction_ndjson, 
    nested_radio_prediction_ndjson,
    checklist_prediction_ndjson,
    text_prediction_ndjson, 
    nested_checklist_prediction_ndjson
  ]:
  annot.update({
      'uuid': str(uuid.uuid4()),
      'dataRow': {'id': data_row.uid},
  })
  ndjson_prediction_method2.append(annot)

## Step 6. Upload the predictions payload to the Model Run 

In [23]:
# Upload the prediction label to the Model Run
upload_job_prediction = model_run.add_predictions(
    name="prediction_upload_job"+str(uuid.uuid4()),
    predictions=ndjson_prediction_method2)

# Errors will appear for annotation uploads that failed.
print("Errors:", upload_job_prediction.errors)
print(" ")

Errors: []
 


## Step 7: Send annotations to the Model Run 
To send annotations to a Model Run, we must first import them into a project, create a label payload and then send them to the Model Run.

##### 7.1. Create a labelbox project

In [14]:
# Create a Labelbox project
project = client.create_project(name="Text Prediction Import",                                    
                                    queue_mode=QueueMode.Batch,
                                    # Quality Settings setup 
                                    auto_audit_percentage=1,
                                    auto_audit_number_of_labels=1,
                                    media_type=MediaType.Text)
project.setup_editor(ontology)

##### 7.2. Create a batch to send to the project 

In [15]:
project.create_batch(
  "batch_text_prediction_demo", # Each batch in a project must have a unique name
  dataset.export_data_rows(), # A list of data rows or data row ids
  5 # priority between 1(Highest) - 5(lowest)
)

<Batch ID: 054c4790-9d90-11ed-97d9-e553096b5905>

##### 7.3 Create the annotations payload

In [16]:
entities_ndjson = { 
    "name": "named_entity",
    "location": { 
        "start": 67, 
        "end": 128 
    }
}

radio_annotation_ndjson = {
  "name": "radio_question",
  "answer": {"name": "first_radio_answer"}
} 

radio_annotation_ndjson_with_subclass = {
  "name": "radio_question_sub",
  "answer": {
      "name": "first_radio_answer",
      "classifications": [{
          "name":"sub_radio_question",
          "answer": { "name" : "first_sub_radio_answer"}
        }]
    }
}

checklist_annotation_ndjson = {
  "name": "checklist_question",
  "answer": [
    {"name": "first_checklist_answer"},
    {"name": "second_checklist_answer"},
    {"name": "third_checklist_answer"},
  ]
}

text_annotation_ndjson = {
  "name": "free_text",
  "answer": "sample text",
}

nested_checklist_prediction_ndjson = {
  "name": "nested_checklist_question",
  "answer": [{
      "name": "first_checklist_answer", 
      "classifications" : [
        {
          "name": "sub_checklist_question", 
          "answer": {"name": "first_sub_checklist_answer"}
        }          
      ]         
  }]
}

##### 7.4. Create the label object

In [17]:
# Create a Label object by identifying the applicable data row in Labelbox and providing a list of annotations
ndjson_annotation = []
for annot in [
    entities_ndjson, 
    radio_annotation_ndjson,  
    radio_annotation_ndjson_with_subclass,
    checklist_annotation_ndjson,
    text_annotation_ndjson,
    nested_checklist_prediction_ndjson    
  ]:
  annot.update({
      'uuid': str(uuid.uuid4()),
      'dataRow': {'id': data_row.uid},
  })
  ndjson_annotation.append(annot)

##### 7.5. Upload annotations to the project using Label Import

In [18]:
upload_job_annotation = LabelImport.create_from_objects(
    client = client,
    project_id = project.uid,
    name="text_label_import_job"+ str(uuid.uuid4()),
    labels=ndjson_annotation)

upload_job_annotation.wait_until_done()
# Errors will appear for annotation uploads that failed.
print("Errors:", upload_job_annotation.errors)


Errors: []


##### 7.6 Send the annotations to the Model Run

In [19]:
# get the labels id from the project
label_ids = [x['ID'] for x in project.export_labels(download=True)]
model_run.upsert_labels(label_ids)

True

## Optional deletions for cleanup 


In [53]:

# project.delete()
# dataset.delete()