In [2]:
from utils import chat_completion_request

# Lesson: Structured output

- Structured output is data generated in a specific, pre-defined format, commonly using formats like JSON, XML, etc.

- It often includes additional type information for more precise data handling, using tools like JSON Schema, Pydantic models, or Typescript objects.

There are several ways to get structured output from LLMs. We are going to discuss 3 in this class:

1. Prompting
2. Domain Specific Language (DSL)
3. Model Fine-tuning

## 1. Prompting

In [3]:
# Example messages for weather information retrieval in JSON format
messages = [
    {"role": "system", "content": "You are a model that provides weather information. Respond in JSON format with keys: 'city', 'temperature', 'condition', and 'humidity'."},
    {"role": "user", "content": "What is the weather like in Paris today?"}
]

# Example call to the chat_completion_request function
weather_response = chat_completion_request(messages=messages)
print(weather_response)

INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


{
  "city": "Paris",
  "temperature": "20°C",
  "condition": "Sunny",
  "humidity": "50%"
}


In [4]:
messages = [
    {"role": "system", "content": "You are a model that performs sentiment analysis. Respond in XML format with elements <sentiment> and <score>."},
    {"role": "user", "content": "Analyze the sentiment of this statement: 'I love sunny days but hate the extreme heat.'"}
]

# Example call to the chat_completion_request function
sentiment_response = chat_completion_request(messages=messages)
print(sentiment_response)

INFO:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


<sentiment>negative</sentiment>
<score>-0.6</score>


#### Limitations of Prompting

1. Dependence on Model Understanding: The accuracy of structured outputs is significantly reliant on the model's ability to correctly interpret the prompt. If the model misinterprets the prompt, it can lead to incorrect or irrelevant outputs.

2. Need for Precise Prompting: Effective prompt engineering is crucial. Vague or improperly structured prompts can result in outputs that don't match the intended structure or content, necessitating skill and experience in crafting prompts.

3. Inconsistency in Responses: There is often variability in the model's responses, especially for complex structured outputs or in scenarios where the model lacks sufficient training in similar tasks, leading to inconsistent and unpredictable results.

## 2. Domain Specific Language (DSL)

This is a higher-level approach that operates at the prompt level and allows the user to specify the desired output format. 
Some popular examples of this approach are: 

- Microsoft's Guidance (https://github.com/guidance-ai/guidance)
- Outlines (https://github.com/outlines-dev/outlines)

In [3]:
# Run the code on Colab here:

# https://colab.research.google.com/drive/1PH_keLGxyDf0NJga4aPeynkPHFWXrCjb?authuser=1#scrollTo=0_UlQJmrE3sR

In [None]:
import outlines
import torch

# Define a Pydantic model for our desired JSON output
from pydantic import BaseModel, constr
from enum import Enum

# Define an enumeration for categories
class CategoryEnum(str, Enum):
    technology = "Technology"
    science = "Science"
    art = "Art"
    history = "History"
    literature = "Literature"

class Categorization(BaseModel):
    text: constr(max_length=200)
    category: CategoryEnum
    confidence_score: float

# Load the chosen model
model = outlines.models.transformers("mistralai/Mistral-7B-v0.1", device="cuda")

# Construct guided sequence generator
generator = outlines.generate.json(model, Categorization, max_tokens=100)

# Prepare the input text for categorization
input_text = "The discovery of the Higgs boson was a monumental step forward in particle physics."

# Draw a sample with a random number generator for reproducibility
rng = torch.Generator(device="cuda")
rng.manual_seed(12345)

# Generate the categorization output
sequence = generator(f"Categorize this text: {input_text}", rng=rng)
print(sequence)


## 3. Model Finetuning

1. Task-Specific Adaptation: Fine-tuning allows the model to adapt to specific tasks or domains, enhancing its performance on targeted applications.

2. Efficiency in Learning: Since the model is already pre-trained on a large dataset, fine-tuning requires less data and time to specialize the model for a new task.

3. Improved Accuracy: Fine-tuned models often show improved accuracy and understanding in generating responses tailored to specific schemas or functions

### Limitations

1. Resource Intensive: Fine-tuning requires additional computational resources and data, which can be a limitation for smaller organizations or individual developers.

2. Limited Flexibility: Adapting the model to new input or output schemas can necessitate retraining, making it less flexible to rapid changes or diverse requirements.

3. Provider Specific: Fine-tuned models are often specific to the provider's ecosystem, reducing portability and limiting their application across different platforms or LLMs.