## Using LangChain to get structured outputs


In [5]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_anthropic import ChatAnthropic
from langchain_ollama import ChatOllama
from langchain_fireworks import ChatFireworks

from langchain_core.output_parsers import JsonOutputParser, PydanticOutputParser

from typing import Optional
from pydantic import BaseModel, Field
from typing_extensions import Annotated, TypedDict

In [6]:
ANTHROPIC_API_KEY = "<API KEY>"
FIREWORKS_API_KEY = "<API KEY>"

Let's start by creating a LLM model to run our structured output queries. Use a temperature of 0 to improve structured output generation (but at the cost of "creativity").


In [7]:
temperature = 0

Define a LLM model below


In [None]:
llm_model = ChatAnthropic(
    model="claude-3-5-haiku-20241022",
    temperature=temperature,
    api_key=ANTHROPIC_API_KEY,
)
# llm_model = ChatOllama(model="llama3.2", temperature=temperature)
# llm_model = ChatFireworks(
#     model_name="accounts/fireworks/models/llama-v3p1-70b-instruct",
#     temperature=temperature,
#     api_key=FIREWORKS_API_KEY,
# )

Check it works


In [9]:
print(llm_model.invoke("Tell me a joke about zebras").content)

Why did the zebra refuse to play poker?

Because he always got striped of his money! (get it?)


### Structured output methods


We can define a Pydantic model and the output will be returned as a Pydantic object with validation


In [10]:
class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: int = Field(description="How funny the joke is, from 1 to 10")

##### Method 1: Function calling


In [11]:
structured_llm = llm_model.with_structured_output(Joke, method="function_calling")

try:
    output = structured_llm.invoke(f"Tell me a joke about dogs")

    if output is None:
        print("Structured output call failed")
    else:
        print(output)
except Exception as e:
    print(f"  Parsing error \n{type(e)}.__name__: {e}")

setup='Because he was feeling ruff!' punchline='Why did the dog go to the vet?' rating=8


##### Method 2: JSON Mode

Note for JSON Mode we need to include the structure in the prompt as well as providing it to the `.with_structured_output` method. Here I don't provide the Pydantic model purely as this method often fails schema validation, and it's instructive to see the raw JSON output from the model.


In [12]:
output_parser = PydanticOutputParser(pydantic_object=Joke)
format_instructions = output_parser.get_format_instructions()
structured_llm = llm_model.with_structured_output(
    Joke.model_json_schema(), method="json_mode"
)

try:
    output = structured_llm.invoke(
        f"Tell me a joke about rabbits\n {format_instructions}"
    )
    print(output)
except Exception as e:
    print(f"  Parsing error \n{type(e)}.__name__: {e}")

{'properties': {'setup': {'title': 'Why did the rabbit go to the doctor?', 'description': 'Because it had hare loss!', 'type': 'string'}, 'punchline': {'title': 'Punchline', 'description': 'The punchline to the joke', 'type': 'string'}, 'rating': {'title': 'Rating', 'description': 'How funny the joke is, from 1 to 10', 'type': 'integer'}}, 'required': ['setup', 'punchline', 'rating']}


##### Method 3: JSON Schema


In [13]:
structured_llm = llm_model.with_structured_output(Joke, method="json_schema")
output = structured_llm.invoke("Tell me a joke about frogs")
print(output)

setup='Why did the frog go to the doctor?' punchline='Because it had a ribbiting cough!' rating=8


### Vallidation of the returned JSON


Using a Pydantic object directly will return a chain that includes a `PydanticOutputParser` that uses Pydantic to validate the schema of the data.

If this is not desired behaviour then defining the schema using a TypedDict parses the JSON output into a Python dict not a Pydantic object so there's no schema validation.


In [14]:
class JokeTD(TypedDict):
    """Joke to tell user."""

    setup: Annotated[str, ..., "The setup of the joke"]
    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], ..., "How funny the joke is, from 1 to 10"]


structured_llm = llm_model.with_structured_output(JokeTD, method="json_schema")
structured_llm.invoke("Tell me a joke about monkeys")

{'setup': 'Why did the monkey get kicked out of the library?',
 'punchline': 'Because he was caught monkeying around!',
 'rating': 8}

If you already have a Pydantic object specifying the schema but want to validate or fix the data yourself, you can extract the JSON Schema object from the Pydantic model:


In [15]:
structured_llm = llm_model.with_structured_output(
    Joke.model_json_schema(), method="json_schema"
)
structured_llm.invoke("Tell me a joke about raindeer")

{'setup': 'Why did the reindeer go to the party?',
 'punchline': "Because he heard it was going to be a 'hoof' event!",
 'rating': 6}

### Error handling


There are a lot of ways these different methods go wrong.

To catch these different ways, I find it's useful to return the raw message so that the LLM response is available directly to see what happened. This can be done with `include_raw=True`.

Then, we can have the following:

- `output["parsing_error"]` is not `None` if there was a parsing error, most likely the output did not conform to the schema

- `output["parsed"]` is `None` if there was an error returning any output (most common with Method 1, function calling)


In [16]:
class ArticleResponse(BaseModel):
    """A clear and concise answer to the users question."""

    title: str = Field(description="Title of the article")
    context: str = Field(
        description="Provide a brief historical context to answer the question."
    )
    historical_timeline: list[str] = Field(
        description="Provide a list of historical events relevant to the question"
    )

In [17]:
structured_llm = llm_model.with_structured_output(
    ArticleResponse, method="json_mode", include_raw=True
)
output = structured_llm.invoke("Tell me the history of the state of Texas in America")

if output["parsing_error"] is not None:
    print("Error: Parsing failed")
    print(output["parsing_error"])
    print("---")
    print("Raw output:")
    print(output["raw"])
elif output["parsed"] is None:
    print("Error: No output")
else:
    print("Success!")
    print(output["parsed"])

Error: Parsing failed
Failed to parse ArticleResponse from completion {}. Got: 3 validation errors for ArticleResponse
title
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
context
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
historical_timeline
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE 
---
Raw output:
content='{} \n\n   \n\n  \n\n\n\n\n\n  \n\n\n\n\n\n  \n\n\n\n\n\n  \n\n\n\n\n\n  \n\n\n\n\n\n  \n\n\n\n\n\n  \n\n\n\n\n\n  \n\n\n\n\n\n  \n\n\n\n\n\n' additional_kwargs={} response_metadata={} id='run-7fc5fddc-5576-4a3a-9149-4ec21e72c5bf-0'


We can directly create the JSON schema object from the Pydantic object and we get the raw dict output without Pydantic validation


In [18]:
structured_llm_js = llm_model.with_structured_output(
    ArticleResponse.model_json_schema(), method="function_calling"
)
structured_llm_js.invoke("Tell me the history of wombats")

### Under the hood: How Pydantic models are converted to JSONSchema


The JSON schema representation is quite straightforward


In [19]:
Joke.model_json_schema()

{'description': 'Joke to tell user.',
 'properties': {'setup': {'description': 'The setup of the joke',
   'title': 'Setup',
   'type': 'string'},
  'punchline': {'description': 'The punchline to the joke',
   'title': 'Punchline',
   'type': 'string'},
  'rating': {'description': 'How funny the joke is, from 1 to 10',
   'title': 'Rating',
   'type': 'integer'}},
 'required': ['setup', 'punchline', 'rating'],
 'title': 'Joke',
 'type': 'object'}

Note the same schema is contained in the format instructions, expect for 'title' and 'type'


In [20]:
from langchain_core.output_parsers import PydanticOutputParser

output_parser = PydanticOutputParser(pydantic_object=Joke)
print(output_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user.", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline to the joke", "title": "Punchline", "type": "string"}, "rating": {"description": "How funny the joke is, from 1 to 10", "title": "Rating", "type": "integer"}}, "required": ["setup", "punchline", "rating"]}
```
