In [1]:
import os
import instructor
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from groq import Groq
from pprint import pprint

model = "llama-3.3-70b-versatile"

# Load the Groq API key from .env file
load_dotenv()
GROQ_API_KEY = os.environ.get("GROQ_API_KEY")
print(f"GROQ API Key exists and begins {GROQ_API_KEY[:14]}...")

GROQ API Key exists and begins gsk_11hFN1EMfj...


In [2]:
# Describe the desired output schema using pydantic models
class UserInfo(BaseModel):
    name: str
    age: int
    email: str


# The text to extract data from
text = """
John Doe, a 35-year-old software engineer from New York, has been working with large language models for several years.
His email address is johndoe@example.com.
"""

# Patch Groq() with instructor, this is where the magic happens!
client = instructor.from_groq(Groq(), mode=instructor.Mode.JSON)

# Call the API
user_info = client.chat.completions.create(
    model=model,
    response_model=UserInfo,  # Specify the response model
    messages=[
        {
            "role": "system",
            "content": "Your job is to extract user information from the given text.",
        },
        {"role": "user", "content": text},
    ],
    temperature=0.65,
)
print(type(user_info))
print(f"Name: {user_info.name}")
print(f"Age: {user_info.age}")
print(f"Email: {user_info.email}")

<class '__main__.UserInfo'>
Name: John Doe
Age: 35
Email: johndoe@example.com


In the example above, we've defined a simple pydantic model `UserInfo` that specifies a person's name (as a string), age (as an integer), and email (as a string). The `instructor` library ensures that the Groq model's output adheres to this schema. The great thing here is that the `instructor` library ensures the response is valid according to the schema you provided. This eliminates the need for manual validation and reduces the likelihood of errors creeping into your data.

## 2. A More Serious Use Case: Generating Synthetic Data

Imagine you are designing a weather agent capable of calling functions (tools). This agent is given a `get_weather_info` tool to retrieve the latest weather information about a location. The JSON schema for this tool is provided here:

```json
{
    "name": "get_weather_info",
    "description": "Get the weather information for any location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The location for which we want to get the weather information (e.g., New York)" 
            }
        },
        "required": ["location"]
    }
}
```

Our goal is to create a structured dataset of realistic examples that simulate how a user might request weather information in various scenarios. We want to use a large language model (LLM) to generate these examples for us and use them as an evaluation set to test our agent's capabilities. Without such an evaluation, we lack a way to understand the effects of our prompt adjustments. These examples will not only help us evaluate the agent's ability to use the `get_weather_info` tool correctly but also make it easy to detect if any prompt changes have negative effects.

Now, let's use the `instructor` library with Groq to generate synthetic examples for our weather agent.

### Defining the Task and Schema

To generate these examples, we need to write a prompt that instructs the model to create scenarios where an agent would use the `get_weather_info` tool. We can use the following system prompt for this task:

In [3]:
prompt = """
I am designing a weather agent. This agent can talk to the user and also fetch latest weather information.
It has access to the `get_weather_info` tool with the following JSON schema:
{json_schema}

I want you to write some examples for `get_weather_info` and see if this functionality works correctly and can handle all the cases. 
Now given the information so far and the JSON schema of the provided tool, write {num} examples.
Make sure each example is varied enough to cover common ways of requesting for this functionality.
Make sure you fill the function parameters with the correct types when generating the output examples. 
Make sure your output is valid JSON.
"""

We now need to specify the structure of the output. For this task, I want the output to include the example text, the tool to call, and also the parameters of the tool. Something like the following:
```json
{
    "examples": [
        {
            "input_text": "Get the weather information for San Francisco.",
            "tool_name": "get_weather_info",
            "tool_parameters": "{\"location\":\"San Francisco\"}"
        },
        ...
    ]
}
```
We can easily translate this structure into a Pydantic model like the following:

In [4]:
class Example(BaseModel):
    input_text: str = Field(description="The example text")
    tool_name: str = Field(description="The tool name to call for this example")
    tool_parameters: str = Field(
        description="An object containing the key-value pairs for the parameters of this tool as a JSON serializbale STRING, make sure it is valid JSON and parameter values are of the correct type according to the tool schema"
    )


class ResponseModel(BaseModel):
    examples: list[Example]

### Generating the Examples
Now let's call the Groq API with our custom prompt and ask it to generate 5 examples for us:

In [5]:
# The schema for get_weather_info tool
tool_schema = {
    "name": "get_weather_info",
    "description": "Get the weather information for any location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The location for which we want to get the weather information (e.g. New York)",
            }
        },
        "required": ["location"],
    },
}

# Patch Groq() with instructor, this is where the magic happens!
client = instructor.from_groq(Groq(), mode=instructor.Mode.JSON)

# Call the API with our custom prompt and ResponseModel
response = client.chat.completions.create(
    model=model,
    response_model=ResponseModel,  # Specify the response model
    messages=[
        {
            "role": "system",
            "content": prompt.format(
                json_schema=tool_schema, num=5
            ),  # Pass the tool schema and number of examples to the prompt
        },
    ],
    temperature=0.65,
    max_tokens=8000,
)

print(type(response))
pprint(response.examples)

<class '__main__.ResponseModel'>
[Example(input_text='What is the weather like in New York?', tool_name='get_weather_info', tool_parameters='{"location": "New York"}'),
 Example(input_text='Get me the weather information for London', tool_name='get_weather_info', tool_parameters='{"location": "London"}'),
 Example(input_text='I want to know the weather in Paris', tool_name='get_weather_info', tool_parameters='{"location": "Paris"}'),
 Example(input_text="What's the weather like in Sydney today?", tool_name='get_weather_info', tool_parameters='{"location": "Sydney"}'),
 Example(input_text='Can you tell me the weather forecast for Tokyo?', tool_name='get_weather_info', tool_parameters='{"location": "Tokyo"}')]
