### Structured Output with Groq and Instructor

While Large Language Models (LLMs) are often employed for building chatbots or conversational agents, numerous real-world applications require a different approach - one that goes beyond mere dialogue and involves producing structured, machine-readable outputs.

Consider a typical scenario: we want to produce structured JSON data from an LLM. While tools like Python's json module allow us to handle this data, they also come with their own set of challenges, such as validating data types and ensuring consistency across outputs. Manually checking these aspects can be tedious and error-prone. LLMs also tend to forget to include a comma or a closing bracket ('}') somewhere in the produced JSON from time to time, which would invalidate the whole JSON output.

1. A Very Simple Use Case
Let's dive right into how you can set up the instructor library with models powered by Groq to generate structured JSON outputs. We'll keep it simple and straightforward so you can get up and running quickly.

### Lets Dive into Code

#### Installing the Necessary Libraries
Install the required Python libraries. You'll need:
<ul>
<li>groq </li>
<li>instructor </li>
<li>python-dotenv (for loading environment variables) </li>
</ul>

In [1]:
pip install -U  instructor 

Collecting instructor
  Downloading instructor-1.7.2-py3-none-any.whl.metadata (18 kB)
Collecting docstring-parser<1.0,>=0.16 (from instructor)
  Downloading docstring_parser-0.16-py3-none-any.whl.metadata (3.0 kB)
Collecting jiter<0.9,>=0.6.1 (from instructor)
  Downloading jiter-0.8.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Collecting tenacity<10.0.0,>=9.0.0 (from instructor)
  Downloading tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
Downloading instructor-1.7.2-py3-none-any.whl (71 kB)
Downloading docstring_parser-0.16-py3-none-any.whl (36 kB)
Downloading jiter-0.8.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (345 kB)
Downloading tenacity-9.0.0-py3-none-any.whl (28 kB)
Installing collected packages: tenacity, jiter, docstring-parser, instructor
  Attempting uninstall: tenacity
    Found existing installation: tenacity 8.5.0
    Uninstalling tenacity-8.5.0:
      Successfully uninstalled tenacity-8.5.0
  Attempting uninstall: 

In [None]:
import instructor
from dotenv import load_dotenv
from pydantic import BaseModel
from groq import Groq

# Load the Groq API key from .env file
load_dotenv()

# Describe the desired output schema using pydantic models
class UserInfo(BaseModel):
    name: str
    age: int
    email: str

# The text to extract data from
text = """
Ajay, a 21-year-old software engineer from New York, has been working with large language models for several years.
His email address is johndoe@example.com.
"""

# Patch Groq() with instructor, this is where the magic happens!
client = instructor.from_groq(Groq(), mode=instructor.Mode.JSON)

# Call the API
user_info = client.chat.completions.create(
    model="llama-3.1-70b-versatile",
    response_model=UserInfo, # Specify the response model
    messages=[
        {"role": "system", "content": "Your job is to extract user information from the given text."},
        {"role": "user", "content": text}
    ],
    temperature=0.65,
)

print(f"Name: {user_info.name}")
print(f"Age: {user_info.age}")
print(f"Email: {user_info.email}")

Name: John Doe
Age: 35
Email: johndoe@example.com


#### A more complex usecase

Our goal is to create a structured dataset of realistic examples that simulate how a user might request weather information in various scenarios. We want to use a large language model (LLM) to generate these examples for us and use them as an evaluation set to test our agent's capabilities. Without such an evaluation, we lack a way to understand the effects of our prompt adjustments. These examples will not only help us evaluate the agent's ability to use the get_weather_info tool correctly but also make it easy to detect if any prompt changes have negative effects.

Now, let's use the instructor library with Groq to generate synthetic examples for our weather agent.

In [3]:
from pprint import pprint

import instructor
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from groq import Groq

# Load the Groq API key from .env file
load_dotenv()

prompt = """
I am designing a weather agent. This agent can talk to the user and also fetch latest weather information.
It has access to the `get_weather_info` tool with the following JSON schema:
{json_schema}

I want you to write some examples for `get_weather_info` and see if this functionality works correctly and can handle all the cases. 
Now given the information so far and the JSON schema of the provided tool, write {num} examples.
Make sure each example is varied enough to cover common ways of requesting for this functionality.
Make sure you fill the function parameters with the correct types when generating the output examples. 
Make sure your output is valid JSON.
"""

In [4]:
class Example(BaseModel):
    input_text: str = Field(description="The example text")
    tool_name: str = Field(description="The tool name to call for this example")
    tool_parameters: str = Field(description="An object containing the key-value pairs for the parameters of this tool as a JSON serializbale STRING, make sure it is valid JSON and parameter values are of the correct type according to the tool schema")

class ResponseModel(BaseModel):
    examples: list[Example]

In [5]:
# The schema for get_weather_info tool
tool_schema = {
    "name": "get_weather_info",
    "description": "Get the weather information for any location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The location for which we want to get the weather information (e.g. New York)"
            }
        },
        "required": ["location"]
    }
}

# Patch Groq() with instructor, this is where the magic happens!
client = instructor.from_groq(Groq(), mode=instructor.Mode.JSON)

# Call the API with our custom prompt and ResponseModel
response = client.chat.completions.create(
    model="llama-3.1-70b-versatile",
    response_model=ResponseModel, # Specify the response model
    messages=[
        {
            "role": "system", 
            "content": prompt.format(json_schema=tool_schema, num=5), # Pass the tool schema and number of examples to the prompt
        },
    ],
    temperature=0.65,
    max_tokens=8000,
)

print(type(response))
pprint(response.examples)

<class '__main__.ResponseModel'>
[Example(input_text="What's the weather like in New York?", tool_name='get_weather_info', tool_parameters='{"location": "New York"}'),
 Example(input_text="I'm going to London tomorrow, what's the weather forecast?", tool_name='get_weather_info', tool_parameters='{"location": "London"}'),
 Example(input_text='Can you tell me the current weather in Paris?', tool_name='get_weather_info', tool_parameters='{"location": "Paris"}'),
 Example(input_text="What's the weather like in Sydney, Australia?", tool_name='get_weather_info', tool_parameters='{"location": "Sydney"}'),
 Example(input_text="I'm planning a trip to Tokyo, what's the weather forecast?", tool_name='get_weather_info', tool_parameters='{"location": "Tokyo"}')]


## Conclusion

So by this way we can get the structured output from the model.
And also using instructor library we can make a data validation to adhere the response as user wanted