<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>


<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-notebooks/blob/main/annotation_import/prompt_response.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-notebooks/tree/main/annotation_import/prompt_response.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Prompt and response projects with MAL and Ground Truth

This notebook is meant to showcase how to generate prompts and responses to fine-tune large language models (LLMs) using MAL and Ground truth

## Annotation payload types

Labelbox supports two formats for the annotations payload:

- Python annotation types (recommended)
  - Provides a seamless transition between third-party platforms, machine learning pipelines, and Labelbox.
  - Allows you to build annotations locally with local file paths, numpy arrays, or URLs.
  - Supports easy conversion to NDJSON format to quickly import annotations to Labelbox.
  - Supports one-level nested classification (radio, checklist, or free-form text) under a tool or classification annotation.
- JSON
  - Skips formatting annotation payload in the Labelbox Python annotation type.
  - Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation.

## Label Import Types

Labelbox supports two types of label imports:

- [Model-assisted labeling (MAL)](https://docs.labelbox.com/docs/model-assisted-labeling) allows you to import computer-generated predictions and simple annotations created outside of Labelbox as pre-labels on an asset.
- [Ground truth](hhttps://docs.labelbox.com/docs/import-ground-truth) allows you to bulk import ground truth annotations from an external or third-party labeling system into Labelbox _Annotate_. Using the label import API to import external data can consolidate and migrate all annotations into Labelbox as a single source of truth.

## Set up 

In [None]:
%pip install -q --upgrade "labelbox[data]"

In [None]:
import labelbox as lb
import labelbox.types as lb_types
import time
import uuid

### Replace with your API key

Replace the value of `API_KEY` with a valid [API key]([ref:create-api-key](https://docs.labelbox.com/reference/create-api-key))  to connect to the Labelbox client.

In [None]:
API_KEY = None
client = lb.Client(api_key=API_KEY)

## Supported Annotations

Prompt and response generated projects support the following annotations:

- Prompt and response creation projects
  - Prompt text
  - Radio
  - Checklist
  - Response text

- Prompt creation projects
  - Prompt text

- Response creation projects
  - Radio
  - Checklist

### Prompt

#### Prompt text

In [None]:
prompt_annotation = lb_types.PromptClassificationAnnotation(
    name="prompt text",
    value=lb_types.PromptText(answer="This is an example of a prompt"),
)

prompt_annotation_ndjson = {
    "name": "prompt text",
    "answer": "This is an example of a prompt",
}

### Responses

#### Radio (single-choice)

In [None]:
response_radio_annotation = lb_types.ClassificationAnnotation(
    name="response radio feature",
    value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
        name="first_radio_answer")),
)

response_radio_annotation_ndjson = {
    "name": "response radio feature",
    "answer": {
        "name": "first_radio_answer"
    },
}

#### Checklist (multi-choice)

In [None]:
response_checklist_annotation = lb_types.ClassificationAnnotation(
    name="response checklist feature",
    value=lb_types.Checklist(answer=[
        lb_types.ClassificationAnswer(name="option_1"),
        lb_types.ClassificationAnswer(name="option_2"),
    ]),
)

response_checklist_annotation_ndjson = {
    "name": "response checklist feature",
    "answer": [{
        "name": "option_1"
    }, {
        "name": "option_2"
    }],
}

#### Response text

In [None]:
response_text_annotation = lb_types.ClassificationAnnotation(
    name="response text",
    value=lb_types.Text(answer="This is an example of a response text"),
)

response_text_annotation_ndjson = {
    "name": "response text",
    "answer": "This is an example of a response text",
}

#### Nested classifications

In [None]:
nested_response_radio_annotation = lb_types.ClassificationAnnotation(
    name="nested_response_radio_question",
    value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
        name="first_radio_answer",
        classifications=[
            lb_types.ClassificationAnnotation(
                name="sub_radio_question",
                value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
                    name="first_sub_radio_answer")),
            )
        ],
    )),
)

nested_response_checklist_annotation = lb_types.ClassificationAnnotation(
    name="nested_response_checklist_question",
    value=lb_types.Checklist(answer=[
        lb_types.ClassificationAnswer(
            name="first_checklist_answer",
            classifications=[
                lb_types.ClassificationAnnotation(
                    name="sub_checklist_question",
                    value=lb_types.Checklist(answer=[
                        lb_types.ClassificationAnswer(
                            name="first_sub_checklist_answer")
                    ]),
                )
            ],
        )
    ]),
)

nested_response_radio_annotation_ndjson = {
    "name":
        "nested_response_radio_question",
    "answer": [{
        "name":
            "first_radio_answer",
        "classifications": [{
            "name": "sub_radio_question",
            "answer": {
                "name": "first_sub_radio_answer"
            },
        }],
    }],
}

nested_response_checklist_annotation_ndjson = {
    "name":
        "nested_response_checklist_question",
    "answer": [{
        "name":
            "first_checklist_answer",
        "classifications": [{
            "name": "sub_checklist_question",
            "answer": {
                "name": "first_sub_checklist_answer"
            },
        }],
    }],
}

## Step 1: Create a project and data rows using the Labelbox UI

Each type of the prompt and response generation project requires different setup. See [prompt and response project](https://docs.labelbox.com/reference/prompt-and-response-projects) for more details on the differences.

In this tutorial, we will show how to import annotations for a prompt and response creation (humans generate prompts and responses) project. The process is also similar for prompt creation (humans generate prompts) and response creation (humans generate responses to uploaded prompts) projects. See [import prompt and response annotations](https://docs.labelbox.com/reference/import-prompt-and-response-annotations) for a tutorial and more examples on other project types.

### Prompt response and prompt creation

A prompts and responses creation project automatically generates empty data rows upon creation.

In [None]:
prompt_response_project = client.create_prompt_response_generation_project(
    name="Demo prompt response project",
    media_type=lb.MediaType.LLMPromptResponseCreation,
    dataset_name="Demo prompt response dataset",
    data_row_count=1,
)

## Step 2: Set up ontology

Your project ontology needs to support the classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the `name` parameter needs to match the value of the `name` field in your annotation.  

For example, if you provide a name `annotation_name` for your created annotation, you need to name the bounding box tool as `anotations_name` when setting up your ontology. The same alignment must hold true for the other tools and classifications that you create in the ontology.

This example shows how to create an ontology containing all supported by prompt and response projects [annotation types](#supported-annotations).

In [None]:
ontology_builder = lb.OntologyBuilder(
    tools=[],
    classifications=[
        lb.PromptResponseClassification(
            class_type=lb.PromptResponseClassification.Type.PROMPT,
            name="prompt text",
            character_min=1,  # Minimum character count of prompt field (optional)
            character_max=
            50,  # Maximum character count of prompt field (optional)
        ),
        lb.PromptResponseClassification(
            class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,
            name="response checklist feature",
            options=[
                lb.ResponseOption(value="option_1", label="option_1"),
                lb.ResponseOption(value="option_2", label="option_2"),
            ],
        ),
        lb.PromptResponseClassification(
            class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,
            name="response radio feature",
            options=[
                lb.ResponseOption(value="first_radio_answer"),
                lb.ResponseOption(value="second_radio_answer"),
            ],
        ),
        lb.PromptResponseClassification(
            class_type=lb.PromptResponseClassification.Type.RESPONSE_TEXT,
            name="response text",
            character_min=
            1,  # Minimum character count of response text field (optional)
            character_max=
            50,  # Maximum character count of response text field (optional)
        ),
        lb.PromptResponseClassification(
            class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,
            name="nested_response_radio_question",
            options=[
                lb.ResponseOption(
                    "first_radio_answer",
                    options=[
                        lb.PromptResponseClassification(
                            class_type=lb.PromptResponseClassification.Type.RESPONSE_RADIO,
                            name="sub_radio_question",
                            options=[
                                lb.ResponseOption("first_sub_radio_answer")
                            ],
                        )
                    ],
                )
            ],
        ),
        lb.PromptResponseClassification(
            class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,
            name="nested_response_checklist_question",
            options=[
                lb.ResponseOption(
                    "first_checklist_answer",
                    options=[
                        lb.PromptResponseClassification(
                            class_type=lb.PromptResponseClassification.Type.RESPONSE_CHECKLIST,
                            name="sub_checklist_question",
                            options=[
                                lb.ResponseOption("first_sub_checklist_answer")
                            ],
                        )
                    ],
                )
            ],
        ),
    ],
)

# Create ontology
ontology = client.create_ontology(
    "Prompt and response ontology",
    ontology_builder.asdict(),
    media_type=lb.MediaType.LLMPromptResponseCreation,
)

# Attach ontology to project
prompt_response_project.connect_ontology(ontology)

## Step 3: Export for `global_keys`

 You will then need to obtain either the `global_keys` or `data_row_ids` attached to the generated data rows by exporting them from the created project. Since the generation of data rows is an async process you will need to wait for the project data rows to be completed before exporting.

In [None]:
time.sleep(20)

export_task = prompt_response_project.export()
export_task.wait_till_done()

# Check export for any errors
if export_task.has_errors():
    export_task.get_buffered_stream(stream_type=lb.StreamType.ERRORS).start(
        stream_handler=lambda error: print(error))

stream = export_task.get_buffered_stream()

# Obtain global keys to be used later on
global_keys = [dr.json["data_row"]["global_key"] for dr in stream]

## Step 4: Create the annotations payload

For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the `predictions` parameter.  For ground truths, pass the payload to the `labels` parameter.

In [None]:
# Python annotation objects
label = []
annotations = [
    prompt_annotation,
    response_radio_annotation,
    response_checklist_annotation,
    response_text_annotation,
    nested_response_radio_annotation,
    nested_response_checklist_annotation,
]
label.append(
    lb_types.Label(data={"global_key": global_keys[0]},
                   annotations=annotations))

# NDJSON
label_ndjson = []
annotations = [
    prompt_annotation_ndjson,
    response_radio_annotation_ndjson,
    response_checklist_annotation_ndjson,
    response_text_annotation_ndjson,
    nested_response_radio_annotation_ndjson,
    nested_response_checklist_annotation_ndjson,
]
for annotation in annotations:
    annotation.update({
        "dataRow": {
            "globalKey": global_keys[0]
        },
    })
    label_ndjson.append(annotation)

#### Option A: Upload as [prelabels (model assisted labeling)](doc:model-assisted-labeling)

This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets.

In [None]:
upload_job = lb.MALPredictionImport.create_from_objects(
    client=client,
    project_id=prompt_response_project.uid,
    name=f"mal_job-{str(uuid.uuid4())}",
    predictions=label,
)

upload_job.wait_until_done()
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)

#### Option B: Upload to a labeling project as [ground truth](doc:import-ground-truth)

This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort.

In [None]:
upload_job = lb.LabelImport.create_from_objects(
    client=client,
    project_id=prompt_response_project.uid,
    name="label_import_job" + str(uuid.uuid4()),
    labels=label_ndjson,
)

upload_job.wait_until_done()
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)

## Clean up

Uncomment and run the cell below to optionally delete Labelbox objects created

In [None]:
# project.delete()
# client.delete_unused_ontology(ontology.uid)