<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>




<td>

<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_import/conversational_LLM_data_generation.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>

<a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_import/conversational_LLM_data_generation.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# LLM Data Generation with MAL and Ground Truth
This demo is meant to showcase how to generate prompts and responses to fine-tune large language models (LLMs) using MAL and Ground truth

In [None]:
!pip install -q "labelbox[data]"

## Set up 

In [None]:
import labelbox as lb
import uuid

## Replace with your API key

In [None]:
API_KEY = ""
client = lb.Client(api_key=API_KEY)

## Supported annotations for LLM data generation
Currently, we only support NDJson format for prompt and responses

## Prompt:

### Classification: Free-form text

In [None]:
prompt_annotation_ndjson = {
  "name": "Follow the prompt and select answers",
  "answer": "This is an example of a prompt"
}

# Responses:

### Classification: Radio (single-choice)

In [None]:
response_radio_annotation_ndjson= {
  "name": "response_radio",
  "answer": {
      "name": "response_a"
    }
}

### Classification: Free-form text

In [None]:
# Only NDJson is currently supported
response_text_annotation_ndjson = {
  "name": "Provide a reason for your choice",
  "answer": "This is an example of a response text"
}


### Classification: Checklist (multi-choice)

In [None]:
response_checklist_annotation_ndjson = {
  "name": "response_checklist",
  "answer": [
    {
      "name": "response_a"
    },
    {
      "name": "response_c"
    }
  ]
}

## Step 1: Create a project and data rows in Labelbox UI

Currently we do not support this workflow through the SDK.
#### Workflow:

1. Navigate to annotate and select ***New project***

2. Select ***LLM data generation*** and then select ***Humans generate prompts and responses***

3. Name your project, select ***create a new dataset*** and name your dataset. (data rows will be generated automatically in 
this step)




In [None]:
# Enter the project id
project_id = ""

# Select one of the global keys from the data rows generated
global_key = ""

## Step 2 : Create/select an Ontology in Labelbox UI

Currently we do not support this workflow through the SDK
#### Workflow: 
1. In your project, navigate to ***Settings*** and ***Label editor***

2. Click on ***Edit***

3. Create a new ontology and add the features used in this demo



#### For this demo the following ontology was generated in the UI: 

In [None]:
ontology_json = """
{
 "tools": [],
 "relationships": [],
 "classifications": [
  {
   "schemaNodeId": "clpvq9d0002yt07zy0khq42rp",
   "featureSchemaId": "clpvq9d0002ys07zyf2eo9p14",
   "type": "prompt",
   "name": "Follow the prompt and select answers",
   "archived": false,
   "required": true,
   "options": [],
   "instructions": "Follow the prompt and select answers",
   "minCharacters": 5,
   "maxCharacters": 100
  },
  {
   "schemaNodeId": "clpvq9d0002yz07zy0fjg28z7",
   "featureSchemaId": "clpvq9d0002yu07zy28ik5w3i",
   "type": "response-radio",
   "name": "response_radio",
   "instructions": "response_radio",
   "scope": "global",
   "required": true,
   "archived": false,
   "options": [
    {
     "schemaNodeId": "clpvq9d0002yw07zyci2q5adq",
     "featureSchemaId": "clpvq9d0002yv07zyevmz1yoj",
     "value": "response_a",
     "label": "response_a",
     "position": 0,
     "options": []
    },
    {
     "schemaNodeId": "clpvq9d0002yy07zy8pe48zdj",
     "featureSchemaId": "clpvq9d0002yx07zy0jvmdxk8",
     "value": "response_b",
     "label": "response_b",
     "position": 1,
     "options": []
    }
   ]
  },
  {
   "schemaNodeId": "clpvq9d0002z107zygf8l62ys",
   "featureSchemaId": "clpvq9d0002z007zyg26115f9",
   "type": "response-text",
   "name": "provide_a_reason_for_your_choice",
   "instructions": "Provide a reason for your choice",
   "scope": "global",
   "required": true,
   "archived": false,
   "options": [],
   "minCharacters": 5,
   "maxCharacters": 100
  },
  {
   "schemaNodeId": "clpvq9d0102z907zy8b10hjcj",
   "featureSchemaId": "clpvq9d0002z207zy6xla7f82",
   "type": "response-checklist",
   "name": "response_checklist",
   "instructions": "response_checklist",
   "scope": "global",
   "required": true,
   "archived": false,
   "options": [
    {
     "schemaNodeId": "clpvq9d0102z407zy0adq0rfr",
     "featureSchemaId": "clpvq9d0002z307zy6dqb8xsw",
     "value": "response_a",
     "label": "response_a",
     "position": 0,
     "options": []
    },
    {
     "schemaNodeId": "clpvq9d0102z607zych8b2z5d",
     "featureSchemaId": "clpvq9d0102z507zyfwfgacrn",
     "value": "response_c",
     "label": "response_c",
     "position": 1,
     "options": []
    },
    {
     "schemaNodeId": "clpvq9d0102z807zy03y7gysp",
     "featureSchemaId": "clpvq9d0102z707zyh61y5o3u",
     "value": "response_d",
     "label": "response_d",
     "position": 2,
     "options": []
    }
   ]
  }
 ],
 "realTime": false
}

"""

## Step 3: Create the annotations payload

In [None]:
label_ndjson = []
for annotations in [
    prompt_annotation_ndjson,
    response_radio_annotation_ndjson,
    response_text_annotation_ndjson,
    response_checklist_annotation_ndjson
    ]:
  annotations.update({
      "dataRow": {
          "globalKey": global_key
      }
  })
  label_ndjson.append(annotations)

## Step 4: Upload annotations to a project as pre-labels or complete labels

In [None]:
project = client.get_project(project_id=project_id)

#### Model Assisted Labeling (MAL)

In [None]:
upload_job = lb.MALPredictionImport.create_from_objects(
    client = client,
    project_id = project.uid,
    name=f"mal_job-{str(uuid.uuid4())}",
    predictions=label_ndjson)

upload_job.wait_until_done()
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)

#### Label Import

In [None]:
upload_job = lb.LabelImport.create_from_objects(
    client = client,
    project_id = project.uid,
    name="label_import_job"+str(uuid.uuid4()),
    labels=label_ndjson)

upload_job.wait_until_done();
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)