## Using LangChain to get structured outputs


In [1]:
 %xmode minimal

Exception reporting mode: Minimal


In [2]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_anthropic import ChatAnthropic
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import JsonOutputParser, PydanticOutputParser
import streamlit as st

In [3]:
claude_api_key = "<API KEY>"
claude_api_key = st.secrets["api_keys"]["ANTHROPIC_API_KEY"]

Let's start by creating a LLM model to run our structured output queries.


In [24]:
# llm_model = ChatOllama(model="llama3.2", temperature=0.5)
llm_model = ChatAnthropic(model="claude-3-5-haiku-20241022", api_key=claude_api_key)
# llm_model = ChatOllama(model="nemotron-mini", temperature=0.8)
# llm_model = ChatOllama(model="gemma2", temperature=0.8)
# llm_model = ChatAnthropic(model="claude-3-5-haiku-20241022", api_key=claude_api_key),

Check it works


In [26]:
print(llm_model.invoke("Tell me a joke about zebras").content)

Here's a zebra joke for you:

Why do zebras always argue?
Because they're always seeing things in black and white!


### Method 1: Structured output using the tool-calling API under the hood


We can define a Pydantic model and the output will be returned as a Pydantic object with validation


In [27]:
from typing import Optional
from pydantic import BaseModel, Field


class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: int = Field(description="How funny the joke is, from 1 to 10")

Method 1: Function calling


In [28]:
structured_llm = llm_model.with_structured_output(Joke, method="function_calling")

try:
    output = structured_llm.invoke(f"Tell me a joke about dogs")

    if output is None:
        print("Structured output call failed")
    print(output)
except Exception as e:
    print(f"  Parsing error \n{type(e)}.__name__: {e}")

setup='Why do dogs make terrible dancers?' punchline='Because they have two left feet!' rating=7


Method 2: JSON Schema


In [29]:
structured_llm = llm_model.with_structured_output(Joke, method="json_schema")
structured_llm.invoke("Tell me a joke about cats")

Joke(setup='Why are cats so good at keeping secrets?', punchline="Because they're purr-fect at not letting the cat out of the bag!", rating=7)

Method 3: JSON Mode


In [30]:
output_parser = PydanticOutputParser(pydantic_object=Joke)
format_instructions = output_parser.get_format_instructions()
structured_llm = llm_model.with_structured_output(Joke, method="json_mode")

try:
    output = structured_llm.invoke(f"Tell me a joke about cats\n {format_instructions}")
    print(output)
except Exception as e:
    print(f"  Parsing error \n{type(e)}.__name__: {e}")

setup='Why are cats such terrible storytellers?' punchline="Because they only have one tale to tell - and it's always about themselves!" rating=7


Defining the schema using a TypedDict parses the JSON output into a Python dict not a Pydantic object so there's no schema validation


In [12]:
from typing_extensions import Annotated, TypedDict


class JokeTD(TypedDict):
    """Joke to tell user."""

    setup: Annotated[str, ..., "The setup of the joke"]
    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], ..., "How funny the joke is, from 1 to 10"]


structured_llm = llm_model.with_structured_output(JokeTD, method="json_schema")
structured_llm.invoke("Tell me a joke about monkeys")

{'setup': 'Why did the monkey get kicked out of the library?',
 'punchline': 'Because he was caught monkeying around!',
 'rating': 8}

Or we can just extract the JSON Schema object


In [13]:
structured_llm = llm_model.with_structured_output(
    Joke.model_json_schema(), method="json_schema"
)
structured_llm.invoke("Tell me a joke about raindeer")

{'setup': 'Why did the reindeer go to the party?',
 'punchline': "Because he heard it was a 'hoof' event!",
 'rating': 4}

Let's try a more complicated structure with nested types


In [14]:
class ArticleResponse(BaseModel):
    """A clear and concise answer to the users question."""

    title: str = Field(description="Title of the article")
    context: str = Field(
        description="Provide a brief historical context to answer the question."
    )
    historical_timeline: list[str] = Field(
        description="Provide a list of historical events relevant to the question"
    )


structured_llm = llm_model.with_structured_output(ArticleResponse)
structured_llm.invoke("Tell me the history of the state of Texas in America")

ArticleResponse(title='The History of the State of Texas in America', context='Located in the south-central region of the United States, Texas has a rich and diverse history that spans thousands of years.', historical_timeline=['1519: Spanish explorer Francisco Vásquez de Coronado arrives in present-day Texas', '1821: Mexico gains independence from Spain, and Texas becomes part of the new nation', '1835-1836: The Texas Revolution leads to the establishment of the Republic of Texas', '1845: Texas is annexed by the United States and becomes the 28th state'])

In [15]:
structured_llm = llm_model.with_structured_output(ArticleResponse, method="json_schema")
structured_llm.invoke("Tell me the history of the state of Texas in America")

ArticleResponse(title='History of Texas', context='Statehood and Beyond', historical_timeline=['Pre-Columbian Era (10,000 BCE - 1528 CE)', 'Spanish Colonial Period (1528-1821)', 'Mexican Independence and Early Republic (1821-1836)', 'Texas Revolution and Statehood (1836-1845)', 'Republic of Texas and Annexation by the US (1845-1861)', 'Civil War and Reconstruction (1861-1877)', "Late 19th Century and World's Fair (1878-1900)", '20th Century and Oil Boom (1901-1945)', 'Modern Era and Contemporary Issues (1946-Present)'])

In [16]:
structured_llm = llm_model.with_structured_output(ArticleResponse, method="json_mode")
output_parser = PydanticOutputParser(pydantic_object=ArticleResponse)
format_instructions = output_parser.get_format_instructions()
structured_llm.invoke(
    f"Tell me the history of the state of Texas in America \n {format_instructions}"
)

OutputParserException: Failed to parse ArticleResponse from completion {"properties": {"title": {"description": "The title of the state and its history", "title": "Texas History", "type": "string"}, "context": {"description": "A brief historical context to answer the question.", "title": "Context", "type": "string"}, "historical_timeline": {"description": "Provide a list of historical events relevant to the question", "items": {"type": "string"}, "title": "Historical Timeline", "type": "array"}}, "required": ["title", "context", "historical_timeline"]}. Got: 3 validation errors for ArticleResponse
title
  Field required [type=missing, input_value={'properties': {'title': ... 'historical_timeline']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
context
  Field required [type=missing, input_value={'properties': {'title': ... 'historical_timeline']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
historical_timeline
  Field required [type=missing, input_value={'properties': {'title': ... 'historical_timeline']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE 

### Error handling


There are a lot of ways these different methods go wrong.

To catch these different ways, I find it's useful to return the raw message so that the LLM response is available directly to see what happened. This can be done with `include_raw=True`.

Then, we can have the following:

- `output["parsing_error"]` is not `None` if there was a parsing error, most likely the output did not conform to the schema

- `output["parsed"]` is `None` if there was an error returning any output (most common with Method 1, function calling)


In [17]:
structured_llm = llm_model.with_structured_output(
    ArticleResponse, method="function_calling", include_raw=True
)
output = structured_llm.invoke("Tell me the history of the state of Texas in America")

if output["parsing_error"] is not None:
    print("Error: Parsing failed")
    print(output["parsing_error"])
    print("---")
    print("Raw output:")
    print(output["raw"])
elif output["parsed"] is None:
    print("Error: No output")
else:
    print("Success!")
    print(output["parsed"])

Success!
title='The History of Texas in America' context='Located in the south-central region of the United States, Texas has a rich and diverse history dating back to its early days as a Spanish colony.' historical_timeline=['1821: Texas becomes part of Mexico after gaining independence from Spain', '1836: The Texas Revolution leads to the establishment of the Republic of Texas', '1845: Texas joins the United States as the 28th state', '1860s: Texas plays a significant role in the American Civil War', '1945: World War II ends with the defeat of Nazi Germany and Imperial Japan']


We can directly create the JSON schema object from the Pydantic object and we get the raw dict output without Pydantic validation


In [19]:
structured_llm_js = llm_model.with_structured_output(
    ArticleResponse.model_json_schema(), method="function_calling"
)
structured_llm_js.invoke("Tell me the history of wombats")

{'context': 'Wombats are burrowing marsupials native to Australia.',
 'historical_timeline': ['Prehistoric Era: Wombats evolve from ancient marsupial ancestors',
  'Middle Ages: European settlers arrive in Australia and discover wombats',
  '19th Century: Wombats become an important food source for Australian colonizers',
  '20th Century: Conservation efforts begin to protect wombat habitats'],
 'title': 'Wombats History'}

### Which models support what?


In [22]:
llm_models = {
    "Anthropic_Haiku": ChatAnthropic(
        model="claude-3-5-haiku-20241022", api_key=claude_api_key
    ),
    "Ollama_llama32": ChatOllama(model="llama3.2", temperature=1),
    "Ollama_nemotron": ChatOllama(model="nemotron-mini", temperature=1),
    "Ollama_gemma2": ChatOllama(model="gemma2", temperature=1),
    "Ollama_phi3": ChatOllama(model="phi3", temperature=1),
    "Ollama_phi4": ChatOllama(model="phi4", temperature=1),
}

In [23]:
structured_support = {}

for llm_model in llm_models.values():
    model_name = llm_model.__repr__()
    print(f"Model: {model_name}")
    ss_model = {}
    try:
        structured_llm = llm_model.with_structured_output(
            ArticleResponse.model_json_schema(), method="function_calling"
        )
        output = structured_llm.invoke("Tell the history of New Zealand")
        ss_model["function_calling"] = True
        print("  Tool use support")
    except Exception as e:
        print("  No tool use")

    try:
        structured_llm = llm_model.with_structured_output(
            ArticleResponse.model_json_schema(), method="json_mode"
        )
        output_parser = PydanticOutputParser(pydantic_object=ArticleResponse)
        format_instructions = output_parser.get_format_instructions()
        output = structured_llm.invoke(
            f"Tell the history of New Zealand \n {format_instructions}"
        )
        ss_model["json_mode"] = True
        print("  JSON mode support")
    except Exception as e:
        print("  No JSON mode")

    try:
        structured_llm = llm_model.with_structured_output(
            ArticleResponse.model_json_schema(), method="json_schema"
        )
        output = structured_llm.invoke("Tell the history of New Zealand")
        ss_model["json_schema"] = True
        print("  JSON schema support")
    except Exception as e:
        print("  No JSON schema")

    structured_support[model_name] = ss_model

Model: ChatAnthropic(model='claude-3-5-haiku-20241022', anthropic_api_url='https://api.anthropic.com', anthropic_api_key=SecretStr('**********'), model_kwargs={})
  Tool use support
  JSON mode support
  JSON schema support
Model: ChatOllama(model='llama3.2', temperature=1.0)
  Tool use support
  JSON mode support
  JSON schema support
Model: ChatOllama(model='gemma2', temperature=1.0)
  No tool use
  JSON mode support
  JSON schema support
Model: ChatOllama(model='phi3', temperature=1.0)
  No tool use
  JSON mode support
  JSON schema support
Model: ChatOllama(model='phi3', temperature=1.0)
  No tool use
  JSON mode support
  JSON schema support


#### Under the hood: How Pydantic models are converted to JSONSchema


The JSON schema representation is quite straightforward


In [None]:
Joke.model_json_schema()

{'description': 'Joke to tell user.',
 'properties': {'setup': {'description': 'The setup of the joke',
   'title': 'Setup',
   'type': 'string'},
  'punchline': {'description': 'The punchline to the joke',
   'title': 'Punchline',
   'type': 'string'},
  'rating': {'description': 'How funny the joke is, from 1 to 10',
   'title': 'Rating',
   'type': 'integer'}},
 'required': ['setup', 'punchline', 'rating'],
 'title': 'Joke',
 'type': 'object'}

Note the same schema is contained in the format instructions, expect for 'title' and 'type'


In [None]:
from langchain_core.output_parsers import PydanticOutputParser

output_parser = PydanticOutputParser(pydantic_object=Joke)
print(output_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user.", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline to the joke", "title": "Punchline", "type": "string"}, "rating": {"description": "How funny the joke is, from 1 to 10", "title": "Rating", "type": "integer"}}, "required": ["setup", "punchline", "rating"]}
```
