<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_import/conversational.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_import/conversational.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Conversational Text Annotation Import
* This notebook will provide examples of each supported annotation type for conversational text assets, and also  cover MAL and Label Import methods:

Suported annotations that can be uploaded through the SDK

* Classification Radio 
* Classification Checklist 
* Classification Free Text 
* NER

**Not** supported annotations

* Bouding box 
* Polygon 
* Point
* Polyline 
* Segmentation Mask 

MAL and Label Import:

* Model-assisted labeling - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission.
* Label Import - used to provide ground truth labels. These can in turn be used and compared against prediction labels, or used as benchmarks to see how your labelers are doing.



* For information on what types of annotations are supported per data type, refer to this documentation:
    * https://docs.labelbox.com/docs/model-assisted-labeling#option-1-import-via-python-annotation-types-recommended

* Notes:
    * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly.

In [None]:
!pip install -q 'labelbox[data]'

[K     |████████████████████████████████| 185 kB 29.6 MB/s 
[K     |████████████████████████████████| 7.8 MB 53.9 MB/s 
[?25h  Building wheel for pygeotile (setup.py) ... [?25l[?25hdone


# Setup

In [None]:
from labelbox.schema.ontology import OntologyBuilder, Tool, Classification, Option
from labelbox import Client, LabelingFrontend, MALPredictionImport, LabelImport
from labelbox.data.serialization import NDJsonConverter
from labelbox.schema.media_type import MediaType
import uuid
import json

# Replace with your API key
Guides on [Create an API key](https://docs.labelbox.com/docs/create-an-api-key)

In [None]:
# Add your api key
API_KEY = None
client = Client(api_key=API_KEY)

## Supported annotations for conversational text

### NDJSON Annotations 

In [None]:
# message based classifications

ner_annotation = { 
        "name": "ner",
        "location": { 
            "start": 0, 
            "end": 8 
        },
        "messageId": "4"
    }

text_annotation = {
    'name': 'text_convo',
    'answer': 'the answer to the text questions right here',
    'messageId': "0"
}


checklist_annotation = {
    'name': 'checklist_convo',
    'answers': [
        {'name': 'first_checklist_answer'},
        {'name': 'second_checklist_answer'}
    ],
    'messageId': '2'
}

radio_annotation = {
    'name': 'radio_convo',
    'answer': {
        'name': 'first_radio_answer'
    },
    "messageId": "0",
}

## Upload Annotations - putting it all together 

## Step 1: Import data rows into Catalog

In [None]:
# Create one Labelbox dataset
dataset = client.create_dataset(name="conversational_annotation_import_demo_dataset")

asset = {
    "row_data": "https://storage.googleapis.com/labelbox-developer-testing-assets/conversational_text/1000-conversations/conversation-1.json",
    "global_key": str(uuid.uuid1())
}


data_row = dataset.create_data_row(asset)
print(data_row)


<DataRow {
    "created_at": "2022-12-23 20:18:48+00:00",
    "external_id": null,
    "global_key": "0206acac-82ff-11ed-a415-0242ac1c000c",
    "media_attributes": {},
    "metadata": [],
    "metadata_fields": [],
    "row_data": "https://storage.googleapis.com/labelbox-developer-testing-assets/conversational_text/1000-conversations/conversation-1.json",
    "uid": "clc0ygvde029307yn96gv2byu",
    "updated_at": "2022-12-23 20:18:48+00:00"
}>


## Step 2: Create/select an ontology
Your project should have the correct ontology setup with all the tools and classifications supported for your annotations, and the tool and classification names should match the `name` field in your annotations to ensure the correct feature schemas are matched.

For example, when we create the bounding box annotation [above](https://colab.research.google.com/drive/1rFv-VvHUBbzFYamz6nSMRJz1mEg6Ukqq#scrollTo=3umnTd-MfI0o&line=1&uniqifier=1), we provided the `name` as `text_convo`. Now, when we setup our ontology, we must ensure that the name of my bounding box tool is also `checklist_convo`. The same alignment must hold true for the other tools and classifications we create in our ontology.

In [None]:
ontology_builder = OntologyBuilder(
  tools=[ 
    Tool( # NER tool given the name "ner"
      tool=Tool.Type.NER, 
      name="ner")], 
  classifications=[ 
    Classification( # Text classification given the name "text"
      class_type=Classification.Type.TEXT,
      scope=Classification.Scope.INDEX,          
      name="text_convo"), 
    Classification( # Checklist classification given the name "text" with two options: "first_checklist_answer" and "second_checklist_answer"
      class_type=Classification.Type.CHECKLIST, 
      scope=Classification.Scope.INDEX,                     
      name="checklist_convo", 
      options=[
        Option(value="first_checklist_answer"),
        Option(value="second_checklist_answer")            
      ]
    ), 
    Classification( # Radio classification given the name "text" with two options: "first_radio_answer" and "second_radio_answer"
      class_type=Classification.Type.RADIO, 
      name="radio_convo", 
      scope=Classification.Scope.INDEX,          
      options=[
        Option(value="first_radio_answer"),
        Option(value="second_radio_answer")
      ]
    )
  ]
)


## Step 3: Create a labeling project
Connect the ontology to the labeling project

In [None]:
# Create Labelbox project
project = client.create_project(name="conversational_mal_project", 
                                    media_type=MediaType.Conversational)

# Setup your ontology / labeling editor
editor = next(client.get_labeling_frontends(where=LabelingFrontend.name == "Editor")) # Unless using a custom editor, do not modify this

project.setup(editor, ontology_builder.asdict()) # Connect your ontology and editor to your project




## Step 4: Send a batch of data rows to the project

In [None]:
# Setup Batches and Ontology

# Create a batch to send to your MAL project
batch = project.create_batch(
  "first-batch-convo-demo", # Each batch in a project must have a unique name
  [data_row.uid], # Paginated collection of data row objects
  5 # priority between 1(Highest) - 5(lowest)
)

print("Batch: ", batch)

Batch:  <Batch {
    "consensus_settings_json": "{\"numberOfLabels\":1,\"coveragePercentage\":0}",
    "created_at": "2022-12-23 20:20:51+00:00",
    "name": "first-batch-convo-demo",
    "size": 1,
    "uid": "4bceaa60-82ff-11ed-b68f-3b1759fe9ddf",
    "updated_at": "2022-12-23 20:20:51+00:00"
}>


## Step 5: Create the annotations payload
Create the annotations payload using the snippets of code above

Labelbox support two formats for the annotations payload: NDJSON and Python Annotation types. However,for conversational texts NDJSON is the only supported format. 

### NDJSON annotations 
Here we create the complete label NDJSON payload of annotations only using NDJSON format. There is one annotation for each reference to an annotation that we created [above](https://colab.research.google.com/drive/1rFv-VvHUBbzFYamz6nSMRJz1mEg6Ukqq#scrollTo=3umnTd-MfI0o&line=1&uniqifier=1).

In [None]:
label_ndjson = []
for annotations in [ner_annotation,
                    text_annotation,
                    checklist_annotation,
                    radio_annotation]:
  annotations.update({
      'uuid': str(uuid.uuid4()),
      'dataRow': {
          'id': data_row.uid
      }
  })
  label_ndjson.append(annotations)

In [None]:
label_ndjson

[{'name': 'ner',
  'location': {'start': 0, 'end': 8},
  'messageId': '4',
  'uuid': 'ce5805b9-1353-432e-9f7a-38cdfa901d5d',
  'dataRow': {'id': 'clc0ygvde029307yn96gv2byu'}},
 {'name': 'text_convo',
  'answer': 'the answer to the text questions right here',
  'messageId': '0',
  'uuid': '2852bb2d-9355-42df-bb95-3db48247cbf7',
  'dataRow': {'id': 'clc0ygvde029307yn96gv2byu'}},
 {'name': 'checklist_convo',
  'answers': [{'name': 'first_checklist_answer'},
   {'name': 'second_checklist_answer'}],
  'messageId': '2',
  'uuid': '623fa806-166e-436b-8d1b-dbdc30f23ee5',
  'dataRow': {'id': 'clc0ygvde029307yn96gv2byu'}},
 {'name': 'radio_convo',
  'answer': {'name': 'first_radio_answer'},
  'messageId': '0',
  'uuid': 'a256e84a-5012-4fd4-833f-637935a22fd4',
  'dataRow': {'id': 'clc0ygvde029307yn96gv2byu'}}]

### Step 6: Upload annotations to a project as pre-labels or complete labels

#### Model Assisted Labeling (MAL)
For the purpose of this tutorial only run one of the label_ndjosn annotation type tools at the time (NDJSON or Annotation types). Delete the previous labels before uploading labels that use the 2nd method (ndjson)

In [None]:
# Upload our label using Model-Assisted Labeling
upload_job = MALPredictionImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name=f"mal_job-{str(uuid.uuid4())}", 
    predictions=label_ndjson)

upload_job.wait_until_done();
print("Errors:", upload_job.errors)
print(" ")

Errors: []
 


In [None]:
annotations

[{'name': 'ner', 'location': {'start': 0, 'end': 8}, 'messageId': '4'},
 {'name': 'text_convo',
  'answer': 'the answer to the text questions right here',
  'uuid': '0ae2b42b-0e01-4bd6-8e4f-5ebfe6402a05',
  'dataRow': {'id': 'clc0okhr74aq607yb6fv83crl'},
  'messageId': '0'},
 {'name': 'checklist_convo',
  'uuid': '8a382c09-da4c-455f-80ee-16fb05165e4a',
  'answers': [{'name': 'first_checklist_answer'},
   {'name': 'second_checklist_answer'}],
  'dataRow': {'id': 'clc0okhr74aq607yb6fv83crl'},
  'messageId': '2'},
 {'name': 'radio_convo',
  'uuid': '515e4f05-6cf1-4e8c-b183-d633c49f5106',
  'dataRow': {'id': 'clc0okhr74aq607yb6fv83crl'},
  'answer': {'name': 'first_radio_answer'},
  'messageId': '0'}]

#### Label Import

In [None]:
# Upload label for this data row in project 
upload_job = LabelImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name="label_geo_import_job"+str(uuid.uuid4()),  
    # user label_ndjson if labels were created using python annotation tools
    labels=label_ndjson)

upload_job.wait_until_done();
print("Errors:", upload_job.errors)

Errors: []


### Optional deletions for cleanup 

In [None]:
#upload_job
# project.delete()
# dataset.delete()