## Token Setting


In [1]:
import os
from dotenv import load_dotenv


env_file = "../.env"
if os.path.exists(env_file):
    load_dotenv()
else:
    os.environ["OPENAI_API_KEY"] = "<YOUR-OPENAI-API-KEY>"
    os.environ["HF_TOKEN"] = "<YOUR-HUGGINGFACE-API-KEY>"

## Model setting


In [2]:
from langchain_community.chat_models import ChatLiteLLM
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage


# openai_llm = ChatLiteLLM(model="openai/gpt-3.5-turbo")
openai_llm = ChatOpenAI(model="gpt-3.5-turbo")
messages = [HumanMessage(content="what model are you")]
openai_llm.invoke(messages).pretty_print()


I am a language model created by OpenAI called GPT-3.


In [3]:
hug_llm = ChatLiteLLM(
    model="huggingface/meta-llama/Llama-3.1-8B-Instruct",
    temperature=0.2,
)
messages = [HumanMessage(content="what model are you")]
hug_llm.invoke(messages).pretty_print()


I’m a large language model. When you ask me a question or provide me with a prompt, I analyze what you say and generate a response that is relevant and accurate. I'm constantly learning and improving, so over time I'll be even better at assisting you. Is there anything I can help you with?


## Structured Output

In this tutorial we will see how the structured output is working.

### Langchain

In langchain you can use `with_structured_output` attribute to get structured output.

- https://python.langchain.com/docs/how_to/structured_output/


In [4]:
from typing import Optional

from pydantic import BaseModel, Field


# Pydantic
class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(
        default=None, description="How funny the joke is, from 1 to 10"
    )

Define `Joke` structure with pydantic format.

In [5]:
# In OpenAI
# both supports
for model_name in ["gpt-3.5-turbo", "gpt-4o-mini"]:
    temp_llm = ChatOpenAI(model=model_name)
    structured_llm = temp_llm.with_structured_output(Joke)
    response = structured_llm.invoke("Tell me a joke about cats")
    print(model_name, response, sep="\t")

gpt-3.5-turbo	setup='Why was the cat sitting on the computer?' punchline='To keep an eye on the mouse!' rating=8
gpt-4o-mini	setup='Why was the cat sitting on the computer?' punchline='Because it wanted to keep an eye on the mouse!' rating=7


You can check that both `gpt-3.5-turbo` and `gpt-4o-mini` are generating output with given format.

## OpenAI

But in [OpenAI document](https://platform.openai.com/docs/guides/structured-outputs), only after gpt-4 models are possible to use structure output format.

> Structured Outputs is the evolution of JSON mode. While both ensure valid JSON is produced, only Structured Outputs ensure schema adherance. Both Structured Outputs and JSON mode are supported in the Chat Completions API, Assistants API, Fine-tuning API and Batch API.
> 
> |                        | Structured Outputs                                           | JSON Mode                                        |
> | :--------------------- | :----------------------------------------------------------- | :----------------------------------------------- |
> | **Outputs valid JSON** | Yes                                                          | Yes                                              |
> | **Adheres to schema**  | Yes (see [supported schemas](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas)) | No                                               |
> | **Compatible models**  | `gpt-4o-mini`, `gpt-4o-2024-08-06`, and later                | `gpt-3.5-turbo`, `gpt-4-*` and `gpt-4o-*` models |
> | **Enabling**           | `response_format: { type: "json_schema", json_schema: {"strict": true, "schema": ...} }` | `response_format: { type: "json_object" }`       |


This is why we use langchain, because it made a unified usage for the various model.
But sometimes there are llm endpoints that cannot use this usage, so its easy usage makes users confused.
We will see whats going on behind the langchain framework.

#### GPT-4o-mini ~

After the `gpt-4-*` models support directly using structured output, so its usage is same as langchain.

In [6]:
from openai import OpenAI

client = OpenAI()

Just use `response_format` with pydantic class.
It will return the structured output as we saw earlier.
This response_format guarantee format for every generation.

In [7]:
for i in range(5):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Extract the event information."},
            {
                "role": "user",
                "content": "Tell me a joke about cats",
            },
        ],
        response_format=Joke,
    )
    event = completion.choices[0].message.parsed
    print(event)

setup='Why was the cat sitting on the computer?' punchline='Because it wanted to keep an eye on the mouse!' rating=7
setup='Why was the cat sitting on the computer?' punchline='Because it wanted to keep an eye on the mouse!' rating=7
setup='Why was the cat sitting on the computer?' punchline='Because it wanted to keep an eye on the mouse!' rating=7
setup='Why was the cat sitting on the computer?' punchline='Because it wanted to keep an eye on the mouse!' rating=7
setup='Why did the cat sit on the computer?' punchline='Because it wanted to keep an eye on the mouse!' rating=7


#### ~ GPT-3.5-turbo

But old models before `gpt-3.5-*`, we can not use pydantic format.

In [8]:
try:
    completion = client.beta.chat.completions.parse(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Extract the event information."},
            {
                "role": "user",
                "content": "Tell me a joke about cats",
            },
        ],
        response_format=Joke,
    )
except Exception as e:
    print(e)

Error code: 400 - {'error': {'message': "Invalid parameter: 'response_format' of type 'json_schema' is not supported with this model. Learn more about supported models at the Structured Outputs guide: https://platform.openai.com/docs/guides/structured-outputs", 'type': 'invalid_request_error', 'param': None, 'code': None}}


So we need to give json parsing instruction.
We can use both way to use this:
- Tool Calling
- Json Mode

##### Tool calling

OpenAI endpoint supports tool calling parameters.
For the base tool usage diagram is like this:

<img src="../assets/function-calling-diagram.png" width=50%>


OpenAI don't use directly tools.
It just response to client, that llm need to use some tools to generate proper response.

But some functions are supported that llm use directly.
It's representative example is pydantic functions tool.

In [9]:
import openai

openai.pydantic_function_tool(Joke)

{'type': 'function',
 'function': {'name': 'Joke',
  'strict': True,
  'parameters': {'description': 'Joke to tell user.',
   'properties': {'setup': {'description': 'The setup of the joke',
     'title': 'Setup',
     'type': 'string'},
    'punchline': {'description': 'The punchline to the joke',
     'title': 'Punchline',
     'type': 'string'},
    'rating': {'anyOf': [{'type': 'integer'}, {'type': 'null'}],
     'description': 'How funny the joke is, from 1 to 10',
     'title': 'Rating'}},
   'required': ['setup', 'punchline', 'rating'],
   'title': 'Joke',
   'type': 'object',
   'additionalProperties': False},
  'description': 'Joke to tell user.'}}

This `openai.pydantic_function_tool` generates json with OpenAI function calling format.
And for this functions it is directly used in OpenAI's LLM.

In [10]:
tools = [openai.pydantic_function_tool(Joke)]


for i in range(5):
    completion = client.beta.chat.completions.parse(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Extract the event information."},
            {
                "role": "user",
                "content": "Tell me a joke about cats",
            },
        ],
        tools=tools,
    )
    event = completion.choices[0].message.tool_calls[0].function
    print(event.arguments)

{"setup":"Why was the cat sitting on the computer?","punchline":"He wanted to keep an eye on the mouse!","rating":8}
{"setup":"Why don't cats play poker in the jungle?","punchline":"Too many cheetahs!","rating":7}
{"setup":"Why was the cat sitting on the computer?","punchline":"He wanted to keep an eye on the mouse!","rating":8}
{"setup":"Why was the cat sitting on the computer?","punchline":"It wanted to keep an eye on the mouse!","rating":8}
{"setup":"Why was the cat sitting on the computer?","punchline":"He wanted to keep an eye on the mouse!","rating":8}


You can see that this generates json format well.

##### Json mode

For the old usage when the function calling was not supported, we have to use `response_format={"type": "json_object"}`.
This is why many old tutorial explains this usage:

In [11]:
completion = client.beta.chat.completions.parse(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "system",
            "content": f"Extract the event information.\nReturn the output as JSON",
        },
        {
            "role": "user",
            "content": "Tell me a joke about cats",
        },
    ],
    response_format={"type": "json_object"},
)
event = completion.choices[0].message.content
print(event)

{
    "type": "joke",
    "category": "cats",
    "content": "Why was the cat sitting on the computer? Because it wanted to keep an eye on the mouse!"
}


For this response_format we have to write JSON in prompt clearly.
Otherwise, you will get error for the response.

In [12]:
try:
    completion = client.beta.chat.completions.parse(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": f"Extract the event information.",
            },
            {
                "role": "user",
                "content": "Tell me a joke about cats",
            },
        ],
        response_format={"type": "json_object"},
    )
except Exception as e:
    print(e)

Error code: 400 - {'error': {'message': "'messages' must contain the word 'json' in some form, to use 'response_format' of type 'json_object'.", 'type': 'invalid_request_error', 'param': 'messages', 'code': None}}


But without any instruction for the our desired key and output, it will generate a random json format like this:

In [13]:
completion = client.beta.chat.completions.parse(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "system",
            "content": f"Extract the event information.\nReturn the output as JSON",
        },
        {
            "role": "user",
            "content": "Tell me a joke about cats",
        },
    ],
    response_format={"type": "json_object"},
)
event = completion.choices[0].message.content
print(event)

{
    "topic": "cats",
    "type": "joke"
}


So we have to give an instruction for our desired format.
Let's use the openai tool for the instruction.

In [14]:
pydantic_parse = openai.pydantic_function_tool(Joke)["function"]["parameters"]
pydantic_parse

{'description': 'Joke to tell user.',
 'properties': {'setup': {'description': 'The setup of the joke',
   'title': 'Setup',
   'type': 'string'},
  'punchline': {'description': 'The punchline to the joke',
   'title': 'Punchline',
   'type': 'string'},
  'rating': {'anyOf': [{'type': 'integer'}, {'type': 'null'}],
   'description': 'How funny the joke is, from 1 to 10',
   'title': 'Rating'}},
 'required': ['setup', 'punchline', 'rating'],
 'title': 'Joke',
 'type': 'object',
 'additionalProperties': False}

But this json format is lack of guarantee for the filled value:

In [15]:
pydantic_parse = openai.pydantic_function_tool(Joke)["function"]["parameters"]

for _ in range(3):
    completion = client.beta.chat.completions.parse(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": f"Extract the event information.\n{pydantic_parse}\nReturn the output as JSON",
            },
            {
                "role": "user",
                "content": "Tell me a joke about cats",
            },
        ],
        response_format={"type": "json_object"},
    )
    event = completion.choices[0].message.content
    print(event)

{
  "setup": "Why was the cat sitting on the computer?",
  "punchline": "To keep an eye on the mouse!",
  "rating": null
}
{
  "setup": "Why was the cat sitting on the computer?",
  "punchline": "To keep an eye on the mouse!",
  "rating": null
}
{
    "setup": "Why was the cat sitting on the computer?",
    "punchline": "To keep an eye on the mouse!",
    "rating": 8
}


##### Langchain with Json Mode

In this case we can use langchain for the output instruction.

In [16]:
from langchain_core.output_parsers import JsonOutputParser


parser = JsonOutputParser(pydantic_object=Joke)
parser_instruction = parser.get_format_instructions()
print(parser_instruction)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user.", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline to the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "How funny the joke is, from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}
```


More detail instruction guarantee for the filled output. 

In [17]:
pydantic_parse = openai.pydantic_function_tool(Joke)["function"]["parameters"]

for _ in range(3):
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": parser_instruction,
            },
            {
                "role": "user",
                "content": "Tell me a joke about cats",
            },
        ],
        response_format={"type": "json_object"},
    )
    event = completion.choices[0].message.content
    print(event)

{
  "setup": "Why was the cat sitting on the computer?",
  "punchline": "Because it wanted to keep an eye on the mouse!",
  "rating": 7
}
{
  "setup": "Why was the cat sitting on the computer?",
  "punchline": "Because it wanted to keep an eye on the mouse!",
  "rating": 7
}
{
  "setup": "Why was the cat sitting on the computer?",
  "punchline": "Because it wanted to keep an eye on the mouse!",
  "rating": 7
}


##### Conclusion

In my opinion, for the structured output using OpenAI it is easy to use tools instead response format.

### Huggingface

But what if tool or response format is not supported like huggingface?

In [18]:
from litellm import get_supported_openai_params

response = get_supported_openai_params("huggingface/meta-llama/Llama-3.1-8B-Instruct")

print(response)
# ['stream', 'temperature', 'max_tokens', 'max_completion_tokens', 'top_p', 'stop', 'n', 'echo']

['stream', 'temperature', 'max_tokens', 'max_completion_tokens', 'top_p', 'stop', 'n', 'echo']


This endpoint cannot use `with_structured_output` attribute.

In [19]:
# With Huggingface
structured_llm = hug_llm.with_structured_output(Joke)
try:
    structured_llm.invoke("Tell me a joke about cats")
except Exception as e:
    print(e)


[1;31mProvider List: https://docs.litellm.ai/docs/providers[0m


[1;31mProvider List: https://docs.litellm.ai/docs/providers[0m

litellm.UnsupportedParamsError: huggingface does not support parameters: {'tools': [{'type': 'function', 'function': {'name': 'Joke', 'description': 'Joke to tell user.', 'parameters': {'properties': {'setup': {'description': 'The setup of the joke', 'type': 'string'}, 'punchline': {'description': 'The punchline to the joke', 'type': 'string'}, 'rating': {'anyOf': [{'type': 'integer'}, {'type': 'null'}], 'default': None, 'description': 'How funny the joke is, from 1 to 10'}}, 'required': ['setup', 'punchline'], 'type': 'object'}}}], 'tool_choice': 'any'}, for model=meta-llama/Llama-3.1-8B-Instruct. To drop these, set `litellm.drop_params=True` or for proxy:

`litellm_settings:
 drop_params: true`



So we have to use instruction to make json output.

#### LiteLLM Completion

To make same usage as OpenAI we will use `litellm.completion` function.

First, lets use the prompt that is used on the OpenAI.

In [20]:
from langchain_core.output_parsers import JsonOutputParser


parser = JsonOutputParser(pydantic_object=Joke)
parser_instruction = parser.get_format_instructions()
print(parser_instruction)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user.", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline to the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "How funny the joke is, from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}
```


In [21]:
import litellm


for i in range(3):
    completion = litellm.completion(
        model="huggingface/meta-llama/Llama-3.1-8B-Instruct",
        messages=[
            {
                "role": "system",
                "content": parser_instruction,
            },
            {
                "role": "user",
                "content": "Tell me a joke about cats",
            },
        ],
        max_tokens=4048,
        temperature=0.2,
    )

    event = completion.choices[0].message.content
    print(event)

Here's one:

**Setup:** Why did the cat join a band?

**Punchline:** Because it wanted to be the purr-cussionist!

How's that?
Here's one:

**Setup:** Why did the cat join a band?

**Punchline:** Because it wanted to be the purr-cussionist!

How's that?
Here's one:

**Setup:** Why did the cat join a band?

**Punchline:** Because it wanted to be the purr-cussionist!

How's that?


You can see that this is not json format.
We have to parse this output as Json.

#### Langchain Output Parser

In this case we can use langchain to parse the output.

In [22]:
from langchain_core.prompts import ChatPromptTemplate


prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a ai assistant.\nFollow the format and answer \nFormat: {format_instructions}",
        ),
        ("user", "Question: {question}"),
    ]
)

prompt = prompt.partial(format_instructions=parser.get_format_instructions())

chain = prompt | hug_llm | parser

In [23]:
for i in range(5):
    response = chain.invoke({"question": "Tell me a joke about cats"})
    print(response)

{'setup': 'Why did the cat join a band?', 'punchline': 'Because it wanted to be the purr-cussionist!', 'rating': 8}
{'setup': 'Why did the cat join a band?', 'punchline': 'Because it wanted to be the purr-cussionist!', 'rating': 8}
{'setup': 'Why did the cat join a band?', 'punchline': 'Because it wanted to be the purr-cussionist!', 'rating': 8}
{'setup': 'Why did the cat join a band?', 'punchline': 'Because it wanted to be the purr-cussionist!', 'rating': 8}
{'setup': 'Why did the cat join a band?', 'punchline': 'Because it wanted to be the purr-cussionist!', 'rating': 8}


You can check that output is in json format.
But this doesn't mean that this guarantee always generating the answer in json format.

In [OpenAI document](https://platform.openai.com/docs/guides/function-calling#edge-cases), it suggest using the json_format like this:

```python
# Check if the conversation was too long for the context window
if response['choices'][0]['message']['finish_reason'] == "length":
    print("Error: The conversation was too long for the context window.")
    # Handle the error as needed, e.g., by truncating the conversation or asking for clarification
    handle_length_error(response)

# Check if the model's output included copyright material (or similar)
if response['choices'][0]['message']['finish_reason'] == "content_filter":
    print("Error: The content was filtered due to policy violations.")
    # Handle the error as needed, e.g., by modifying the request or notifying the user
    handle_content_filter_error(response)

# Check if the model has made a tool_call. This is the case either if the "finish_reason" is "tool_calls" or if the "finish_reason" is "stop" and our API request had forced a function call
if (response['choices'][0]['message']['finish_reason'] == "tool_calls" or
    # This handles the edge case where if we forced the model to call one of our functions, the finish_reason will actually be "stop" instead of "tool_calls"
    (our_api_request_forced_a_tool_call and response['choices'][0]['message']['finish_reason'] == "stop")):
    # Handle tool call
    print("Model made a tool call.")
    # Your code to handle tool calls
    handle_tool_call(response)

# Else finish_reason is "stop", in which case the model was just responding directly to the user
elif response['choices'][0]['message']['finish_reason'] == "stop":
    # Handle the normal stop case
    print("Model responded directly to the user.")
    # Your code to handle normal responses
    handle_normal_response(response)

# Catch any other case, this is unexpected
else:
    print("Unexpected finish_reason:", response['choices'][0]['message']['finish_reason'])
    # Handle unexpected cases as needed
    handle_unexpected_case(response)
```
