# 4. Chapter: Structured Outputs

Structured outputs are useful when you need your data to fit a specific format, ensuring it's easy to process, analyze, or store. Think of it like this: instead of getting a blob of text, you get neat, well-organized data that can be directly used in your applications.

Here are some common types of structured outputs that LLMs can generate:
- **String**: Simple text, like "Hello" or "World". 
- **Integer**: Whole numbers, such as `123` or `-456`. 
- **Float**: Numbers with decimals, like `3.14` or `-0.001`. 
- **Boolean**: True/False values, useful for binary decisions. 
- **Binary**: Data in binary format, something like `10101010`. 
- **Date**: Calendar dates in formats like `YYYY-MM-DD`. 
- **Timestamp**: Specific points in time like `2023-10-05 14:48:00`.
- **Array**: Lists of elements, which can be strings, numbers, structs, or even other arrays.
- **Struct**: Collections of key-value pairs. 
- **Object**: Objects with specified attributes. 

By generating structured outputs, LLMs can: 
- **integrate with systems**: Easily slot into existing databases, APIs, or other software. 
- **enable automation**: Power automated reporting, decision-making engines, and other workflows. 
- **enhance accuracy**: Ensure data consistency and integrity.

In [1]:
import re
import dirtyjson as json
from typing import Any
from pydantic import BaseModel, EmailStr, ValidationError
from datetime import datetime
from language_models.models.llm import OpenAILanguageModel, ChatMessage, ChatMessageRole
from language_models.proxy_client import ProxyClient
from language_models.settings import settings

In [2]:
proxy_client = ProxyClient(
    client_id=settings.CLIENT_ID,
    client_secret=settings.CLIENT_SECRET,
    auth_url=settings.AUTH_URL,
    api_base=settings.API_BASE,
)

In [3]:
llm = OpenAILanguageModel(
    proxy_client=proxy_client,
    model="gpt-4",
    max_tokens=500,
    temperature=0.2,
)

String output type.

In [4]:
system_prompt = """You are an AI assistant designed to help users with a variety of tasks.

### Instructions ###

Your goal is to solve the problem you will be provided with

You should respond with:
<response to the prompt>

Your <response to the prompt> should be the final answer to the user's query and must be a string"""

prompt = "What is the capital city of France?"

output = llm.get_completion([
    ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
    ChatMessage(role=ChatMessageRole.USER, content=prompt)
])

print(output)

The capital city of France is Paris.


Integer output type.

In [5]:
system_prompt = """You are an AI assistant designed to help users with a variety of tasks.

### Instructions ###

Your goal is to solve the problem you will be provided with

You should respond with:
<response to the prompt>

Your <response to the prompt> should be the final answer to the user's query and must be an integer"""

prompt = "How many continents are there on Earth?"

output = llm.get_completion([
    ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
    ChatMessage(role=ChatMessageRole.USER, content=prompt)
])

print(output)

7


In [6]:
try:
    output = int(output)
    print(output)
except ValueError as error:
    print(error)

7


Float output type.

In [7]:
system_prompt = """You are an AI assistant designed to help users with a variety of tasks.

### Instructions ###

Your goal is to solve the problem you will be provided with

You should respond with:
<response to the prompt>

Your <response to the prompt> should be the final answer to the user's query and must be an float"""

prompt = "What is the value of Pi up to two decimal places?"

output = llm.get_completion([
    ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
    ChatMessage(role=ChatMessageRole.USER, content=prompt)
])

print(output)

3.14


In [8]:
try:
    output = float(output)
    print(output)
except ValueError as error:
    print(error)

3.14


Boolean output type.

In [9]:
system_prompt = """You are an AI assistant designed to help users with a variety of tasks.

### Instructions ###

Your goal is to solve the problem you will be provided with

You should respond with:
<response to the prompt>

Your <response to the prompt> should be the final answer to the user's query and must be a boolean (true, false)"""

prompt = "Is the number 5 greater than 3?"

output = llm.get_completion([
    ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
    ChatMessage(role=ChatMessageRole.USER, content=prompt)
])

print(output)

True


In [10]:
if output.lower() == "true":
    print(True)
elif output.lower() == "false":
    print(False)
else:
    print("Could not parse output")

True


Binary output type.

In [11]:
system_prompt = """You are an AI assistant designed to help users with a variety of tasks.

### Instructions ###

Your goal is to solve the problem you will be provided with

You should respond with:
<response to the prompt>

Your <response to the prompt> should be the final answer to the user's query and must be a binary string"""

prompt = "What is the binary representation of the number 15?"

output = llm.get_completion([
    ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
    ChatMessage(role=ChatMessageRole.USER, content=prompt)
])

print(output)

1111


In [12]:
if bool(re.fullmatch(r"[01]+", output)):
    print(output)
else:
    print("Could not parse output")

1111


Date output type.

In [13]:
system_prompt = """You are an AI assistant designed to help users with a variety of tasks.

Extract the date from the user's input text.

### Instructions ###

Your goal is to solve the problem you will be provided with

You should respond with:
<response to the prompt>

Your <response to the prompt> should be the final answer to the user's query and must be a date with the format: %Y-%m-%d"""

prompt = "We are excited to announce that our annual company retreat will be held on April 15, 2024. This event will be a great opportunity for team building and strategic planning."

output = llm.get_completion([
    ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
    ChatMessage(role=ChatMessageRole.USER, content=prompt)
])

print(output)

2024-04-15


In [14]:
try:
    output = datetime.strptime(output, "%Y-%m-%d")
    print(output)
except ValueError as error:
    print(error)

2024-04-15 00:00:00


Timestamp output type.

In [15]:
system_prompt = """You are an AI assistant designed to help users with a variety of tasks.

Extract the date from the user's input text.

### Instructions ###

Your goal is to solve the problem you will be provided with

You should respond with:
<response to the prompt>

Your <response to the prompt> should be the final answer to the user's query and must be a date with the format: %Y-%m-%d %H:%M:%S"""

prompt = "We are excited to announce that our annual company retreat will be held on April 15, 2024. This event will be a great opportunity for team building and strategic planning."

output = llm.get_completion([
    ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
    ChatMessage(role=ChatMessageRole.USER, content=prompt)
])

print(output)

2024-04-15 00:00:00


In [16]:
try:
    output = datetime.strptime(output, "%Y-%m-%d %H:%M:%S")
    print(output)
except ValueError as error:
    print(error)

2024-04-15 00:00:00


Array output type.

In [17]:
system_prompt = """You are an AI assistant designed to help users with a variety of tasks.

Extract all numbers from the user's input text.

### Instructions ###

Your goal is to solve the problem you will be provided with

You should respond with:
<response to the prompt>

Your <response to the prompt> should be the final answer to the user's query and must be an array of integers"""

prompt = """Last weekend, six of us went on a 15-kilometer hike, starting at 7 AM.

By noon, we had covered 10 kilometers and reached Mount Elbert's 4,401-meter summit by 2 PM, with a temperature of 12°C.

We camped 5 kilometers away by 6 PM with 12 others and returned home by 5 PM the next day."""

output = llm.get_completion([
    ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
    ChatMessage(role=ChatMessageRole.USER, content=prompt)
])

print(output)

[6, 15, 7, 10, 4401, 2, 12, 5, 6, 12, 5]


In [18]:
try:
    output = json.loads(output)
    output = list(output)
    if all(isinstance(entry, int) for entry in output):
        print(output)
    else:
        raise ValueError("Could not parse output")
except Exception as error:
    print(error)

[6, 15, 7, 10, 4401, 2, 12, 5, 6, 12, 5]


Struct output type.

In [19]:
def get_schema_from_args(args: dict[str, Any]) -> dict[str, Any]:
    schema = {}
    for field, details in args.items():
        field_type = details.get("type")
        items_type = details.get("items", {}).get("type")
        format_type = details.get("items", {}).get("format") or details.get("format")
        if field_type == "string":
            if format_type == "date":
                schema[field] = "<date>"
            elif format_type == "date-time":
                schema[field] = "<timestamp>"
            elif format_type == "email":
                schema[field] = "<email>"
            else:
                schema[field] = "<string>"
        elif field_type == "integer":
            schema[field] = "<integer>"
        elif field_type == "number":
            schema[field] = "<float>"
        elif field_type == "boolean":
            schema[field] = "<true or false>"
        elif field_type == "array":
            if items_type == "string":
                if format_type == "date":
                    schema[field] = ["<date>"]
                elif format_type == "date-time":
                    schema[field] = ["<timestamp>"]
                elif format_type == "email":
                    schema[field] = ["<email>"]
                else:
                    schema[field] = ["<string>"]
            elif items_type == "integer":
                schema[field] = ["<integer>"]
            elif items_type == "number":
                schema[field] = ["<float>"]
            else:
                schema[field] = []
        else:
            schema[field] = None
    return schema

In [20]:
class Email(BaseModel):
    to: list[EmailStr]
    subject: str
    body: str

args = Email.model_json_schema()["properties"]
output_schema = get_schema_from_args(args)
print(output_schema)

{'to': ['<email>'], 'subject': '<string>', 'body': '<string>'}


In [21]:
system_prompt = f"""You are an AI assistant designed to help users with a variety of tasks.

Write an email.

### Instructions ###

Your goal is to solve the problem you will be provided with

You should respond with:
<response to the prompt>

Your <response to the prompt> should be the final answer to the user's query and must be a JSON format with the keyword arguments: {output_schema}"""

prompt = """Send the email to:
1. johndoe@example.com
2. janedoe@example.com
3. alice.smith@company.org

Here is what we did: Weekend Hiking Trip Recap

Here is some context:
Last weekend, six of us went on a 15-kilometer hike, starting at 7 AM.
By noon, we had covered 10 kilometers and reached Mount Elbert's 4,401-meter summit by 2 PM, with a temperature of 12°C.
We camped 5 kilometers away by 6 PM with 12 others and returned home by 5 PM the next day."""

output = llm.get_completion([
    ChatMessage(role=ChatMessageRole.SYSTEM, content=system_prompt),
    ChatMessage(role=ChatMessageRole.USER, content=prompt)
])

print(output)

{'to': ['johndoe@example.com', 'janedoe@example.com', 'alice.smith@company.org'], 'subject': 'Weekend Hiking Trip Recap', 'body': "Hello,

I hope this email finds you well. I'm writing to provide a recap of our exciting hiking trip last weekend.

Six of us embarked on a 15-kilometer hike early in the morning at 7 AM. By noon, we had already covered 10 kilometers of our journey. We reached the summit of Mount Elbert, standing tall at 4,401 meters, by 2 PM. The weather was quite pleasant with a temperature of 12°C.

Later, we set up our camp 5 kilometers away from the summit by 6 PM. We were joined by 12 other fellow hikers, making the evening even more enjoyable.

The next day, we packed up and returned home by 5 PM, bringing our adventurous weekend to a close.

Looking forward to more such trips in the future!

Best regards,"}


In [22]:
try:
    output = json.loads(output)
    output = Email.model_validate(output)
    print(output.model_dump())
except ValidationError as error:
    print(error)

{'to': ['johndoe@example.com', 'janedoe@example.com', 'alice.smith@company.org'], 'subject': 'Weekend Hiking Trip Recap', 'body': "Hello,\n\nI hope this email finds you well. I'm writing to provide a recap of our exciting hiking trip last weekend.\n\nSix of us embarked on a 15-kilometer hike early in the morning at 7 AM. By noon, we had already covered 10 kilometers of our journey. We reached the summit of Mount Elbert, standing tall at 4,401 meters, by 2 PM. The weather was quite pleasant with a temperature of 12°C.\n\nLater, we set up our camp 5 kilometers away from the summit by 6 PM. We were joined by 12 other fellow hikers, making the evening even more enjoyable.\n\nThe next day, we packed up and returned home by 5 PM, bringing our adventurous weekend to a close.\n\nLooking forward to more such trips in the future!\n\nBest regards,"}


Object output type.

In [23]:
try:
    output = Email.model_validate(output)
    print(f"To: {output.to}")
    print(f"Subject: {output.subject}")
    print(f"Body: {output.body}")
except ValidationError as error:
    print(error)

To: ['johndoe@example.com', 'janedoe@example.com', 'alice.smith@company.org']
Subject: Weekend Hiking Trip Recap
Body: Hello,

I hope this email finds you well. I'm writing to provide a recap of our exciting hiking trip last weekend.

Six of us embarked on a 15-kilometer hike early in the morning at 7 AM. By noon, we had already covered 10 kilometers of our journey. We reached the summit of Mount Elbert, standing tall at 4,401 meters, by 2 PM. The weather was quite pleasant with a temperature of 12°C.

Later, we set up our camp 5 kilometers away from the summit by 6 PM. We were joined by 12 other fellow hikers, making the evening even more enjoyable.

The next day, we packed up and returned home by 5 PM, bringing our adventurous weekend to a close.

Looking forward to more such trips in the future!

Best regards,
