<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/llm_import/conversational.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/master/examples/llm_import/conversational.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# LLM pairwise comparison with Conversational text using MAL and Ground truth
This demo is meant to showcase how to upload conversational row data that contains model outputs for pairwise comparisons analysis.


In [None]:
!pip install -q "labelbox[data]"

# Setup

In [None]:
import labelbox as lb
import uuid

# Replace with your API Key

In [None]:
API_KEY = ""
client = lb.Client(api_key=API_KEY)

# Step 1: Create annotations

In [None]:
# Create a gobal radio and text annotation
radio_annotation_ndjson = {
    "name": "Choose the best response",
    "answer": {
      "name": "Response B"
    }
}

text_annotation_ndjson = {
    "name": "Provide a reason for your choice",
    "answer": "This is the more concise answer",

}

# Step 2: Setup a project

## Import data rows with "modelOutputs" into Catalog
In addition to your message based data, you will need to add a list of model outputs to your JSON file:

```
"modelOutputs" : [
  {
      "title": "Name of the response option",
      "content": "Content of the response",
      "modelConfigName": "Name of model configuration"
  }
]
```


### Example of row_data with model outputs

In [None]:
pairwise_shopping_2 =  """
 {
  "type": "application/vnd.labelbox.conversational",
  "version": 1,
  "messages": [
    {
      "messageId": "message-0",
      "timestampUsec": 1530718491,
      "content": "Hi! How can I help?",
      "user": {
        "userId": "Bot 002",
        "name": "Bot"
      },
      "align": "left",
      "canLabel": false
    },
    {
      "messageId": "message-1",
      "timestampUsec": 1530718503,
      "content": "I just bought a vacuum cleaner from you guys like a week ago and it's already broken!!",
      "user": {
        "userId": "User 00686",
        "name": "User"
      },
      "align": "right",
      "canLabel": true
    }

  ],
  "modelOutputs": [
    {
      "title": "Response A",
      "content": "I'm really sorry to hear that your vacuum cleaner is not working as expected. We certainly understand how frustrating this can be, especially with a recent purchase. I assure you, we're here to help!\n\n To proceed with resolving this issue, could you please provide some details about the problem you're experiencing with the vacuum cleaner? Additionally, if you could share your order number or receipt details, that would be very helpful. We will definitely look into the available options for you, which may include a replacement or a refund, depending on the specific circumstances and our return policy.\n\n Your satisfaction is our priority, and we'll do our best to resolve this matter promptly and to your satisfaction.",
      "modelConfigName": "GPT-3.5 with temperature 0"
    },
    {
      "title": "Response B",
      "content": "I'm sorry about the issue with your vacuum cleaner. Please send us the order number or receipt details so we can quickly assist you with a replacement. Your satisfaction is important to us!",
      "modelConfigName": "Fine Tuned GPT-3.5 with demo data"
    }
  ]
}
"""


### Create dataset and data rows using a cloud hosted JSON file with "modelOutputs"

In [None]:
# Generate dummy global keys
global_key_1 = str(uuid.uuid4())
global_key_2 = str(uuid.uuid4())
global_key_3 = str(uuid.uuid4())

# Create a dataset
dataset = client.create_dataset(
    name="pairwise_demo_"+str(uuid.uuid4()),
    iam_integration=None
)
# Upload data rows
task = dataset.create_data_rows([
    {
      "row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_1.json",
      "global_key": global_key_1
    },
    {
        "row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_2.json",
        "global_key": global_key_2
    },
    {
        "row_data": "https://storage.googleapis.com/labelbox-datasets/conversational-sample-data/pairwise_shopping_3.json",
        "global_key": global_key_3
    }
  ])
task.wait_till_done()
print("Errors:",task.errors)
print("Failed data rows:", task.failed_data_rows)


### Create/select an ontology

In [None]:
# Create an ontology with relevant classifications

ontology_builder = lb.OntologyBuilder(
  classifications=[
    lb.Classification(
      class_type=lb.Classification.Type.RADIO,
      scope=lb.Classification.Scope.GLOBAL,
      name="Choose the best response",
      options=[lb.Option(value="Response A"), lb.Option(value="Response B"), lb.Option(value="Tie")]
    ),
    lb.Classification(
      class_type=lb.Classification.Type.TEXT,
      name="Provide a reason for your choice"
    )
  ]
)

ontology = client.create_ontology("Pairwise comparison ontology", ontology_builder.asdict(), media_type=lb.MediaType.Conversational)



### Create a labeling project and send a batch of data rows to the project

In [None]:
project = client.create_project(name="Pairwise Conversational Text Demo",
                                    media_type=lb.MediaType.Conversational)
project.setup_editor(ontology)

In [None]:
project.create_batch(
  "batch_conversational",
  global_keys=[global_key_1, global_key_2, global_key_3],
  priority=5
)

# Step 3: Upload annotations to a project as pre-labels or complete labels

Setup the payload with the annotations that were created in Step 1.

In [None]:
labels = []
for key in [global_key_1, global_key_2, global_key_3]:
  for ann in [radio_annotation_ndjson, text_annotation_ndjson]:
    ann_copy = ann.copy()
    ann_copy.update({
        "dataRow": {
            "globalKey": key
        }
    })
    labels.append(ann_copy)

### Model Assisted Labeling (MAL)

In [None]:
upload_job = lb.MALPredictionImport.create_from_objects(
    client = client,
    project_id = project.uid,
    name=f"mal_job-{str(uuid.uuid4())}",
    predictions=labels)

upload_job.wait_until_done()
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)

## Label Import

In [None]:
upload_job = lb.LabelImport.create_from_objects(
    client = client,
    project_id = project.uid,
    name="label_import_job"+str(uuid.uuid4()),
    labels=labels)

upload_job.wait_until_done();
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)