<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/static/images/logo-v4.svg" width=190/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/prediction_upload/text-prediction.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/blob/develop/examples/prediction_upload/text-prediction.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Text Prediction Import
* This notebook walks you through the process of uploading model predictions to a Model Run. This notebook provides an example for each supported prediction type for text assets. 

A Model Run is a container for the predictions, annotations and metrics of a specific experiment in your ML model development cycle.

* For information on what types of predictions are supported per data type, refer to this documentation:
    * https://docs.labelbox.com/docs/upload-model-predictions#step-6-create-the-predictions-payload

* Notes:
    * If you are importing more than 1,000 mask predictions at a time, consider submitting separate jobs, as they can take longer than other prediction types to import.
    * After the execution of this notebook a complete Model Run with predictions will be created in your organization. 

# Installs

In [1]:
!pip install -q 'labelbox[data]'

# Imports

In [2]:
import labelbox
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
from labelbox.schema.queue_mode import QueueMode
from labelbox import Client, LabelingFrontend, LabelImport, MediaType
from labelbox.data.annotation_types import (
    Label, TextData, ObjectAnnotation, TextEntity,
    Radio, Checklist, Text,
    ClassificationAnnotation, ClassificationAnswer
)
from labelbox.data.serialization import NDJsonConverter
import json
import uuid
import copy
import numpy as np
print(labelbox.__version__)

3.33.1


# API Key and Client
Provide a valid api key below in order to properly connect to the Labelbox Client.

In [3]:
# Add your api key as a string
API_KEY = ""
client = Client(api_key=API_KEY)

---- 
### Steps
1. Make sure project is setup
2. Collect annotations
3. Upload

### Create a Model Run (for predictions) and a Project (for annotations)

We will be creating 
- a Model and a Model Run to contain model predictions
- a project to contain annotations

First, we create an ontology with all the possible tools and classifications supported for text. The official list of supported predictions and annotations that can be uploaded can be found:
- [predictions that can be uploaded to a Model Run](https://docs.labelbox.com/docs/upload-model-predictions#step-6-create-the-predictions-payload)
- [annotations that can be imported in a project as ground-truths](https://docs.labelbox.com/docs/import-ground-truth)

Note: the ontology of the Model Run does not need to match the ontology of the project. However, only the features present in the Model Run ontology can be uploaded as predictions and annotations to the Model Run.

In [4]:
ontology_builder = OntologyBuilder(
    tools=[
        Tool(tool=Tool.Type.NER, name="named_entity")
        ],
  classifications=[ # List of Classification objects
    Classification( # Text classification given the name "text"
      class_type=Classification.Type.TEXT,
      instructions="text"), 
    Classification( # Checklist classification given the name "text" with two options: "first_checklist_answer" and "second_checklist_answer"
      class_type=Classification.Type.CHECKLIST, 
      instructions="checklist", 
      options=[
        Option(value="first_checklist_answer"),
        Option(value="second_checklist_answer")            
      ]
    ), 
    Classification( # Radio classification given the name "text" with two options: "first_radio_answer" and "second_radio_answer"
      class_type=Classification.Type.RADIO, 
      instructions="radio", 
      options=[
        Option(value="first_radio_answer"),
        Option(value="second_radio_answer")
      ]
    )
  ]
)

ontology = client.create_ontology("Ontology Text", ontology_builder.asdict())

We create a Model and a Model Run, to contain the predictions. 

In [6]:
# create Model
model = client.create_model(name="text_model_run", 
                            ontology_id=ontology.uid)
# create Model Run
model_run = model.create_model_run("iteration 1")

We create a project, to contain the annotations.

In [7]:
# Create a Labelbox project
project = client.create_project(name="text_project",                                    
                                    queue_mode=QueueMode.Batch,
                                    # Quality Settings setup 
                                    auto_audit_percentage=1,
                                    auto_audit_number_of_labels=1,
                                    media_type=MediaType.Text)
project.setup_editor(ontology)

### Create a dataset with a data row
We will upload predictions and annotations on this data row. 

In [8]:
# # Create one Labelbox dataset
dataset = client.create_dataset(name="text_prediction_import_demo_dataset")
# Grab an example text and create a Labelbox data row in the dataset
uploads = {
        "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt",
            # To learn more about Global Keys : https://docs.labelbox.com/docs/global-keys
        "global_key": "TEST-ID-%id" % uuid.uuid1()
    }
data_row = dataset.create_data_row(uploads)
print(data_row)

<DataRow {
    "created_at": "2022-12-22 12:50:56+00:00",
    "external_id": null,
    "global_key": "TEST-ID-94148969945369179048273565686632808460d",
    "media_attributes": {},
    "metadata": [],
    "metadata_fields": [],
    "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt",
    "uid": "clbz312d2212e07wldd4zf01r",
    "updated_at": "2022-12-22 12:50:56+00:00"
}>


### Send the data row to the Model Run and to the project

Get the data row IDs that we just uploaded

In [9]:
# Data row ID(s) to send to the Model Run and to the project.
datarow_ids = [dr.uid for dr in list(dataset.export_data_rows())]
print("datarow_ids: ",datarow_ids)

datarow_ids:  ['clbz312d2212e07wldd4zf01r']


Send the data row to the Model Run

In [10]:
model_run.upsert_data_rows(datarow_ids)

True

Send the data row to the project

In [11]:
project.create_batch(
  "first-batch", # Each batch in a project must have a unique name
  datarow_ids, # A list of data rows or data row ids
  5 # priority between 1(Highest) - 5(lowest)
)

<Batch ID: 4c011d00-81f7-11ed-a456-496beef28f21>

### Create the predictions payload
We will upload it to the Model Run.


It is recommended to use the Python SDK's annotation types when importing labels into Labelbox.

Object predictions

In [12]:
# Confidence scores are optional.
# If no confidence is provided, 
# the prediction will be treated as if the confidence score equals 1

named_entity = TextEntity(start=12,end=22)
named_entity_prediction = ObjectAnnotation(value=named_entity, name="named_entity")

Classification predictions

In [13]:
# Confidence scores are optional.
# If no confidence is provided, 
# the prediction will be treated as if the confidence score equals 1

checklist_prediction=ClassificationAnnotation(
    value=Checklist(
        answer=[ # List of the checklist answers in your ontology
            ClassificationAnswer(
                name="first_checklist_answer",
                confidence=0.5
            ),
            ClassificationAnswer(
                name="second_checklist_answer",
                confidence=0.5
            )
        ]
    ), 
    name="checklist" # Name of the classification in your ontology
)

radio_prediction=ClassificationAnnotation(
    value=Radio(
        answer=ClassificationAnswer(
            name="first_radio_answer", # Name of the radio answer in your ontology
            confidence=0.5
        )
    ), 
    name="radio" # Name of the classification in your ontology
)

# Confidence is not supported for text prediction
text_prediction=ClassificationAnnotation(
    value=Text( # String value for the text annotation
        answer="the answer to the text question",
    ), 
    name="text" # Name of the classification in your ontology
)


Create a Label object with all of the predictions created previously.

In [14]:
# Create a Label object by identifying the applicavle data row in Labelbox and providing a list of annotations
label_prediction = Label(
    data=TextData(
        uid=data_row.uid),
    annotations = [
        named_entity_prediction,
        text_prediction, checklist_prediction, radio_prediction,
    ]
)

# Create urls to mask data for upload
def signing_function(obj_bytes: bytes) -> str:
    url = client.upload_data(content=obj_bytes, sign=True)
    return url

label_prediction.add_url_to_masks(signing_function)

label_prediction.__dict__

{'uid': None,
 'data': TextData(file_path=None,text=None,url=None),
 'annotations': [ObjectAnnotation(confidence=None, name='named_entity', feature_schema_id=None, extra={}, value=TextEntity(start=12, end=22, extra={}), classifications=[]),
  ClassificationAnnotation(name='text', feature_schema_id=None, extra={}, value=Text(answer='the answer to the text question')),
  ClassificationAnnotation(name='checklist', feature_schema_id=None, extra={}, value=Checklist(name='checklist', answer=[ClassificationAnswer(confidence=0.5, name='first_checklist_answer', feature_schema_id=None, extra={}, keyframe=None), ClassificationAnswer(confidence=0.5, name='second_checklist_answer', feature_schema_id=None, extra={}, keyframe=None)])),
  ClassificationAnnotation(name='radio', feature_schema_id=None, extra={}, value=Radio(answer=ClassificationAnswer(confidence=0.5, name='first_radio_answer', feature_schema_id=None, extra={}, keyframe=None)))],
 'extra': {}}

### Create the annotations payload
We will upload it to the project.

It is recommended to use the Python SDK's annotation types when importing labels into Labelbox.

Object annotations

In [15]:
named_entity = TextEntity(start=10,end=20)
named_entity_annotation = ObjectAnnotation(value=named_entity, name="named_entity")

Classification annotations

In [16]:
text_annotation=ClassificationAnnotation(
    value=Text( # String value for the text annotation
        answer="the answer to the text question" 
    ), 
    name="text" # Name of the classification in your ontology
)

checklist_annotation=ClassificationAnnotation(
    value=Checklist(
        answer=[ # List of the checklist answers in your ontology
            ClassificationAnswer(name="first_checklist_answer"),
            ClassificationAnswer(name="second_checklist_answer")
        ]
    ), 
    name="checklist" # Name of the classification in your ontology
)

radio_annotation=ClassificationAnnotation(
    value=Radio(
        answer=ClassificationAnswer(
            name="second_radio_answer" # Name of the radio answer in your ontology
        )
    ), 
    name="radio" # Name of the classification in your ontology
)

Create a Label object with all of the annotations created previously.

In [17]:
# Create a Label object by identifying the applicavle data row in Labelbox and providing a list of annotations
label_annotation = Label(
    data=TextData(
        uid=data_row.uid),
    annotations = [
        named_entity_annotation,
        text_annotation, checklist_annotation, radio_annotation
    ]
)

# Create urls to mask data for upload
def signing_function(obj_bytes: bytes) -> str:
    url = client.upload_data(content=obj_bytes, sign=True)
    return url

label_annotation.add_url_to_masks(signing_function)

label_annotation.__dict__

{'uid': None,
 'data': TextData(file_path=None,text=None,url=None),
 'annotations': [ObjectAnnotation(confidence=None, name='named_entity', feature_schema_id=None, extra={}, value=TextEntity(start=10, end=20, extra={}), classifications=[]),
  ClassificationAnnotation(name='text', feature_schema_id=None, extra={}, value=Text(answer='the answer to the text question')),
  ClassificationAnnotation(name='checklist', feature_schema_id=None, extra={}, value=Checklist(name='checklist', answer=[ClassificationAnswer(confidence=None, name='first_checklist_answer', feature_schema_id=None, extra={}, keyframe=None), ClassificationAnswer(confidence=None, name='second_checklist_answer', feature_schema_id=None, extra={}, keyframe=None)])),
  ClassificationAnnotation(name='radio', feature_schema_id=None, extra={}, value=Radio(answer=ClassificationAnswer(confidence=None, name='second_radio_answer', feature_schema_id=None, extra={}, keyframe=None)))],
 'extra': {}}

### Import the annotations payload in the project

In [18]:
## Create a label list 
label_list_annotation = [label_annotation]

# Convert the annotation label from a Labelbox class object to the underlying NDJSON format required for upload - uploads can be directly built in this syntax as well
ndjson_annotation = list(NDJsonConverter.serialize(label_list_annotation))

# Upload the annotation label to the project using Label Import
upload_job_annotation = LabelImport.create_from_objects(
    client = client,
    project_id = project.uid,
    name="annotation_import_job",
    labels=ndjson_annotation)

# This will provide information only after the upload_job is complete, so we do not need to worry about having to rerun
upload_job_annotation.wait_until_done()
# Errors will appear for annotation uploads that failed.
print("Errors:", upload_job_annotation.errors)

Errors: []


### Send the annotations to the Model Run

Get the label IDs that we just uploaded


In [19]:
# get the labels id from the project
label_ids = [x['ID'] for x in project.export_labels(download=True)]
print("label_ids: ",label_ids)

label_ids:  ['clbz31a0f07k90g3ehh4pgghf']


In [20]:
model_run.upsert_labels(label_ids)

True

### Upload the predictions payload to the Model Run

In [21]:
## Create a label list 
label_list_prediction = [label_prediction]

# Convert the prediction label from a Labelbox class object to the underlying NDJSON format required for upload - uploads can be directly built in this syntax as well
ndjson_prediction = list(NDJsonConverter.serialize(label_list_prediction))

# Upload the prediction label to the Model Run
upload_job_prediction = model_run.add_predictions(
    name="prediction_upload_job"+str(uuid.uuid4()),
    predictions=ndjson_prediction)

# Errors will appear for annotation uploads that failed.
print("Errors:", upload_job_prediction.errors)


Errors: []


## Cleanup 

In [22]:
# mal_project.delete()
# li_project.delete()
# dataset.delete()