<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/annotation_import/conversational.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/master/examples/annotation_import/conversational.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Conversational Text Annotation Import
* This notebook will provide examples of each supported annotation type for conversational text assets, and also  cover MAL and Label Import methods:

Suported annotations that can be uploaded through the SDK

* Classification Radio 
* Classification Checklist 
* Classification Free Text 
* NER

**Not** supported annotations

* Bouding box 
* Polygon 
* Point
* Polyline 
* Segmentation Mask 

MAL and Label Import:

* Model-assisted labeling - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission.
* Label Import - used to provide ground truth labels. These can in turn be used and compared against prediction labels, or used as benchmarks to see how your labelers are doing.



* For information on what types of annotations are supported per data type, refer to this documentation:
    * https://docs.labelbox.com/docs/model-assisted-labeling#option-1-import-via-python-annotation-types-recommended

* Notes:
    * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly.

In [None]:
# !pip install -q 'labelbox[data]'

# Setup

In [None]:
import labelbox as lb
import uuid
import labelbox.types as lb_types

# Replace with your API key
Guides on [Create an API key](https://docs.labelbox.com/docs/create-an-api-key)

In [None]:
# Add your api key
API_KEY = ""
client = lb.Client(api_key=API_KEY)

## Supported annotations for conversational text

### NDJSON Annotations 

In [None]:
# message based classifications
ner_annotation = lb_types.ObjectAnnotation(
    name="ner",
    value=lb_types.ConversationEntity(
        start=0,
        end=8,
        message_id="4"
    )
)

ner_annotation_ndjson = { 
        "name": "ner",
        "location": { 
            "start": 0, 
            "end": 8 
        },
        "messageId": "4"
    }

In [None]:
##### Classification free text #####

text_annotation = lb_types.ClassificationAnnotation(
    name="text_convo",
    value=lb_types.Text(answer="the answer to the text questions right here"),
    message_id="0"
)


text_annotation_ndjson = {
    'name': 'text_convo',
    'answer': 'the answer to the text questions right here',
    'messageId': "0"
}

In [None]:
##### Checklist Classification ####### 

checklist_annotation= lb_types.ClassificationAnnotation(
  name="checklist_convo", # must match your ontology feature's name
  value=lb_types.Checklist(
      answer = [
        lb_types.ClassificationAnswer(
            name = "first_checklist_answer"
        ), 
        lb_types.ClassificationAnswer(
            name = "second_checklist_answer"
        )
      ]
    ),
  message_id="2"
 )


checklist_annotation_ndjson = {
    'name': 'checklist_convo',
    'answers': [
        {'name': 'first_checklist_answer'},
        {'name': 'second_checklist_answer'}
    ],
    'messageId': '2'
}

In [None]:
######## Radio Classification ######

radio_annotation = lb_types.ClassificationAnnotation(
    name='radio_convo', 
    value=lb_types.Radio(answer = lb_types.ClassificationAnswer(name = 'first_radio_answer')),
    message_id="0"
)

radio_annotation_ndjson = {
    'name': 'radio_convo',
    'answer': {
        'name': 'first_radio_answer'
    },
    'messageId': '0',
}

In [None]:
uuid_source = str(uuid.uuid4())
uuid_target = str(uuid.uuid4())

entity_source = {
  'name': 'ner',
  'uuid': uuid_source,
  'location': {
          "start" : 9,
          "end": 11          
      },
  'messageId': '0'
}

entity_target = {
  'name': 'ner',
  'uuid': uuid_target,
  'location': {
    "start": 14,
    "end": 19
  },
  'messageId': '0'
}

relationship_annotation_ndjson = {
    "name": "relationship", 
    "relationship": {
      "source": uuid_source,
      "target": uuid_target,
      "type": "unidirectional"
    }
}

## Upload Annotations - putting it all together 

## Step 1: Import data rows into Catalog

In [None]:
# Create one Labelbox dataset

global_key = "conversation-1.json"

asset = {
    "row_data": "https://storage.googleapis.com/labelbox-developer-testing-assets/conversational_text/1000-conversations/conversation-1.json",
    "global_key": global_key
}

dataset = client.create_dataset(name="conversational_annotation_import_demo_dataset")
task = dataset.create_data_rows([asset])
task.wait_till_done()
print("Errors:", task.errors)
print("Failed data rows: ", task.failed_data_rows)

There are errors present. Please look at `task.errors` for more details


Errors: Duplicate global keys found: conversation-1.json
Failed data rows:  [{'message': 'Duplicate global keys found: conversation-1.json', 'failedDataRows': [{'globalKey': 'conversation-1.json', 'rowData': 'https://storage.googleapis.com/labelbox-developer-testing-assets/conversational_text/1000-conversations/conversation-1.json', 'attachmentInputs': []}]}]


## Step 2: Create/select an ontology

Your project should have the correct ontology setup with all the tools and classifications supported for your annotations, and the tool names and classification instructions should match the `name` fields in your annotations to ensure the correct feature schemas are matched.

For example, when we create the text annotation, we provided the `name` as `text_convo`. Now, when we setup our ontology, we must ensure that the name of the tool is also `text_convo`. The same alignment must hold true for the other tools and classifications we create in our ontology.

In [None]:
ontology_builder = lb.OntologyBuilder(
  tools=[ 
    lb.Tool( # NER tool given the name "ner"
      tool=lb.Tool.Type.NER, 
      name="ner"), 
    lb.Tool(
      tool=lb.Tool.Type.RELATIONSHIP,
      name="relationship"
    )
    ], 
  classifications=[ 
    lb.Classification( 
      class_type=lb.Classification.Type.TEXT,
      scope=lb.Classification.Scope.INDEX,          
      instructions="text_convo"), 
    lb.Classification( 
      class_type=lb.Classification.Type.CHECKLIST, 
      scope=lb.Classification.Scope.INDEX,                     
      instructions="checklist_convo", 
      options=[
        lb.Option(value="first_checklist_answer"),
        lb.Option(value="second_checklist_answer")            
      ]
    ), 
    lb.Classification( 
      class_type=lb.Classification.Type.RADIO, 
      instructions="radio_convo", 
      scope=lb.Classification.Scope.INDEX,          
      options=[
        lb.Option(value="first_radio_answer"),
        lb.Option(value="second_radio_answer")
      ]
    )
  ]
)

ontology = client.create_ontology("Ontology Conversation Annotations", ontology_builder.asdict())




## Step 3: Create a labeling project
Connect the ontology to the labeling project

In [None]:
# Create Labelbox project
project = client.create_project(name="conversational_project", 
                                    media_type=lb.MediaType.Conversational)

# Setup your ontology 
project.setup_editor(ontology) # Connect your ontology and editor to your project

Default createProject behavior will soon be adjusted to prefer batch projects. Pass in `queue_mode` parameter explicitly to opt-out for the time being.


## Step 4: Send a batch of data rows to the project

In [None]:
# Setup Batches and Ontology

# Create a batch to send to your MAL project
batch = project.create_batch(
  "first-batch-convo-demo", # Each batch in a project must have a unique name
  global_keys=[global_key], # Paginated collection of data row objects, list of data row ids or global keys
  priority=5 # priority between 1(Highest) - 5(lowest)
)

print("Batch: ", batch)

Batch:  <Batch {
    "consensus_settings_json": "{\"numberOfLabels\":1,\"coveragePercentage\":0}",
    "created_at": "2023-03-27 19:05:37+00:00",
    "name": "first-batch-convo-demo",
    "size": 0,
    "uid": "5bf477e0-ccd2-11ed-96a9-65c39364316f",
    "updated_at": "2023-03-27 19:05:37+00:00"
}>


## Step 5: Create the annotations payload
Create the annotations payload using the snippets of code above

Labelbox support two formats for the annotations payload: NDJSON and Python Annotation types. However,for conversational texts NDJSON is the only supported format. 

#### Python annotation
Here we create the complete labels ndjson payload of annotations only using python annotation format. There is one annotation for each reference to an annotation that we created. 

In [None]:
label = []
label.append(
  lb_types.Label(
    data=lb_types.ConversationData(
      global_key=global_key
    ),
    annotations=[
      ner_annotation,
      text_annotation,
      checklist_annotation,
      radio_annotation
    ]
  )
)

### NDJSON annotations 
Here we create the complete label NDJSON payload of annotations only using NDJSON format. There is one annotation for each reference to an annotation that we created [above](https://colab.research.google.com/drive/1rFv-VvHUBbzFYamz6nSMRJz1mEg6Ukqq#scrollTo=3umnTd-MfI0o&line=1&uniqifier=1).

In [None]:
label_ndjson = []
for annotations in [ner_annotation_ndjson,
                    text_annotation_ndjson,
                    checklist_annotation_ndjson,
                    radio_annotation_ndjson,
                    entity_source,
                    entity_target,
                    relationship_annotation_ndjson
                    ]:
  annotations.update({
      'dataRow': {
          'globalKey': global_key
      }
  })
  label_ndjson.append(annotations)

### Step 6: Upload annotations to a project as pre-labels or complete labels

#### Model Assisted Labeling (MAL)
For the purpose of this tutorial only run one of the label_ndjosn annotation type tools at the time (NDJSON or Annotation types). Delete the previous labels before uploading labels that use the 2nd method (ndjson)

In [None]:
# Upload our label using Model-Assisted Labeling
upload_job = lb.MALPredictionImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name=f"mal_job-{str(uuid.uuid4())}", 
    predictions=label_ndjson)

upload_job.wait_until_done();
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)

Errors: []
Status of uploads:  [{'uuid': '8ab3c75b-bd79-4495-950b-62f26d64789d', 'dataRow': {'id': 'clfbnskoxb9q9076fbyjngwhm', 'globalKey': 'conversation-1.json'}, 'status': 'SUCCESS'}, {'uuid': 'b5e4cfd7-3b08-441b-aff1-b40af6daa13a', 'dataRow': {'id': 'clfbnskoxb9q9076fbyjngwhm', 'globalKey': 'conversation-1.json'}, 'status': 'SUCCESS'}, {'uuid': '9fa4ba63-f837-40c9-9ed2-d693a442c36f', 'dataRow': {'id': 'clfbnskoxb9q9076fbyjngwhm', 'globalKey': 'conversation-1.json'}, 'status': 'SUCCESS'}, {'uuid': '6bbbf815-cb0c-405f-8808-4b24d2940d96', 'dataRow': {'id': 'clfbnskoxb9q9076fbyjngwhm', 'globalKey': 'conversation-1.json'}, 'status': 'SUCCESS'}, {'uuid': '9c09e912-c702-4ee5-9818-9ee67fc6fa5f', 'dataRow': {'id': 'clfbnskoxb9q9076fbyjngwhm', 'globalKey': 'conversation-1.json'}, 'status': 'SUCCESS'}, {'uuid': 'aa41472e-818e-46dc-91a9-edce9e7c0b3c', 'dataRow': {'id': 'clfbnskoxb9q9076fbyjngwhm', 'globalKey': 'conversation-1.json'}, 'status': 'SUCCESS'}]


#### Label Import

In [None]:
# Upload label for this data row in project 
# Uncomment if you are not importing relationships. 
# Relationships will be supported during label import in the near future. 


# upload_job = lb.LabelImport.create_from_objects(
#     client = client, 
#     project_id = project.uid, 
#     name="label_import_job"+str(uuid.uuid4()),  
#     labels=label_ndjso)

# upload_job.wait_until_done();
# print("Errors:", upload_job.errors)
# print("Status of uploads: ", upload_job.statuses)

### Optional deletions for cleanup 

In [None]:
# project.delete()
# dataset.delete()