# Structured Output in LangChain

**Purpose:**  
Demonstrate how to request and enforce structured model output using explicit schemas, enabling deterministic parsing, validation, and downstream automation.


## Structured output

Structured output constrains the model to return responses that conform to a predefined schema.  
This enables:
- Reliable parsing into typed objects,
- Automatic validation,
- Safe integration into pipelines and agents.

LangChain supports structured output via multiple schema representations:
- Pydantic models,
- TypedDict,
- Dataclasses.

Each option trades expressiveness for simplicity.

## Pydantic schemas

Pydantic provides runtime validation, type coercion, nested models, and rich field metadata.
It is the preferred choice when correctness and structure are critical.

In [1]:
import os
from langchain.chat_models import init_chat_model

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
model = init_chat_model("groq:qwen/qwen3-32b")
model

ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 16384, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x000001FCE9F29940>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001FCE9F2A660>, model_name='qwen/qwen3-32b', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [2]:
from pydantic import BaseModel, Field

class Movie(BaseModel):
    title: str = Field(description="The title of the movie")
    year: int = Field(description="The year the movie was released")
    director: str = Field(description="The director of the movie")
    rating: float = Field(description="The movie's rating out of 10")

In [None]:
model_with_structure = model.with_structured_output(Movie)
model_with_structure

RunnableBinding(bound=ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 16384, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x000001FCE9F29940>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001FCE9F2A660>, model_name='qwen/qwen3-32b', model_kwargs={}, groq_api_key=SecretStr('**********')), kwargs={'tools': [{'type': 'function', 'function': {'name': 'Movie', 'description': '', 'parameters': {'properties': {'title': {'description': 'The title of the movie', 'type': 'string'}, 'year': {'description': 'The year the movie was released', 'type': 'integer'}, 'director': {'description': 'The director of the movie', 'type': 'string'}, 'rating': {'description': "The movie's rating out of 10", 'type': 'number'}}, 'required': ['title', 'year', '

In [4]:
# Unstructured response (baseline)
model.invoke("Provide details about the movie Inception")

AIMessage(content='<think>\nOkay, so I need to provide details about the movie Inception. Let me start by recalling what I know about it. It\'s a 2010 film directed by Christopher Nolan, right? The main actor is Leonardo DiCaprio. The title, Inception, refers to the concept of planting an idea into someone\'s mind, which is the main plot. \n\nThe story is about a thief who enters people\'s dreams to steal information. His name is Dom Cobb. He\'s really good at this but has a criminal past. There\'s a group of professionals who help him with the dream heists. One of them is Arthur, played by Joseph Gordon-Levitt, who is the right-hand man. Then there\'s Ariadne, played by Ellen Page, who\'s an architect designing the dream spaces. The concept of Inception is the opposite of extraction; instead of stealing an idea, you plant one. \n\nThe antagonist is Mal, played by Marion Cotillard, who is Cobb\'s wife. There\'s a lot about her being a projection in the dream world. The idea that Cobb c

In [5]:
# Structured response parsed into a Movie instance
response = model_with_structure.invoke("Provide details about the movie Inception")
response

Movie(title='Inception', year=2010, director='Christopher Nolan', rating=8.8)

## Including raw model output

Setting `include_raw=True` returns both the parsed structure and the original model message.
This is useful for debugging, auditing, or partial parsing failures.

In [6]:
class Movie(BaseModel):
    """A movie with details."""
    title: str = Field(..., description="The title of the movie")
    year: int = Field(..., description="The year the movie was released")
    director: str = Field(..., description="The director of the movie")
    rating: float = Field(..., description="The movie's rating out of 10")

model_with_structure = model.with_structured_output(Movie, include_raw=True)
response = model_with_structure.invoke("Provide details about the movie Inception")
response


{'raw': AIMessage(content='', additional_kwargs={'reasoning_content': "Okay, the user is asking for details about the movie Inception. Let me check the tools available. There's a Movie function that requires title, year, director, and rating. I need to make sure I have all that information.\n\nFirst, the title is Inception. The year it was released was 2010. The director is Christopher Nolan. As for the rating, I think it's around 8.8 on IMDb. Let me confirm that. Yes, IMDb gives it 8.8/10. \n\nSo all the required parameters are there. I should structure the tool call with these details. Make sure the JSON is correctly formatted with the right data types: year as integer, rating as a number. No need for any optional parameters here. Alright, that should cover it.\n", 'tool_calls': [{'id': 'yhrbwdtmx', 'function': {'arguments': '{"director":"Christopher Nolan","rating":8.8,"title":"Inception","year":2010}', 'name': 'Movie'}, 'type': 'function'}]}, response_metadata={'token_usage': {'com

## Nested schemas

Schemas can be arbitrarily nested, allowing hierarchical data extraction.

In [7]:
class Actor(BaseModel):
    name: str
    role: str

class MovieDetails(BaseModel):
    title: str
    year: int
    cast: list[Actor]
    genres: list[str]
    budget: float | None = Field(None, description="Budget in millions USD")

model_with_structure = model.with_structured_output(MovieDetails)
response = model_with_structure.invoke("Provide details about the movie Inception")
response


MovieDetails(title='Inception', year=2010, cast=[Actor(name='Leonardo DiCaprio', role='Dom Cobb'), Actor(name='Joseph Gordon-Levitt', role='Arthur'), Actor(name='Ellen Page', role='Ariadne'), Actor(name='Tom Hardy', role='Balthazar')], genres=['Action', 'Sci-Fi', 'Thriller'], budget=160.0)

## TypedDict schemas

TypedDict provides lightweight type hints without runtime validation.
Use this when structure matters but strict validation is unnecessary.

In [8]:
from typing_extensions import TypedDict, Annotated

class MovieDict(TypedDict):
    title: Annotated[str, ..., "The title of the movie"]
    year: Annotated[int, ..., "The year the movie was released"]
    director: Annotated[str, ..., "The director of the movie"]
    rating: Annotated[float, ..., "The movie's rating out of 10"]

model_with_typedict = model.with_structured_output(MovieDict)
response = model_with_typedict.invoke("Provide details about the movie Avengers")
response


{'director': 'Joss Whedon', 'rating': 8, 'title': 'The Avengers', 'year': 2012}

In [9]:
class ActorDict(TypedDict):
    name: str
    role: str

class MovieDetailsDict(TypedDict):
    title: str
    year: int
    cast: list[ActorDict]
    genres: list[str]
    budget: float | None

model_with_structure = model.with_structured_output(MovieDetailsDict)
response = model_with_structure.invoke("Provide details about the movie Avengers")
response


{'budget': 220000000,
 'cast': [{'name': 'Robert Downey Jr.', 'role': 'Iron Man'},
  {'name': 'Chris Evans', 'role': 'Captain America'},
  {'name': 'Mark Ruffalo', 'role': 'Hulk'},
  {'name': 'Chris Hemsworth', 'role': 'Thor'},
  {'name': 'Scarlett Johansson', 'role': 'Black Widow'},
  {'name': 'Jeremy Renner', 'role': 'Hawkeye'}],
 'genres': ['Action', 'Sci-Fi', 'Adventure'],
 'title': 'Avengers',
 'year': 2012}

## Structured output with agents

Agents can be configured to enforce a response schema directly.

In [13]:
from pydantic import BaseModel, Field
from langchain.agents import create_agent

class ContactInfo(BaseModel):
    name: str = Field(description="The name of the person")
    email: str = Field(description="The email address of the person")
    phone: str = Field(description="The phone number of the person")

agent = create_agent(
    model=model,
    response_format=ContactInfo
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Extract contact info from: John Doe, john@example.com, (555) 123-4567"}]
})

result["structured_response"]


ContactInfo(name='John Doe', email='john@example.com', phone='(555) 123-4567')

## TypedDict with agents

In [14]:
from typing_extensions import TypedDict
from langchain.agents import create_agent

class ContactInfoDict(TypedDict):
    name: str
    email: str
    phone: str

agent = create_agent(
    model=model,
    response_format=ContactInfoDict
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Extract contact info from: John Doe, john@example.com, (555) 123-4567"}]
})

result["structured_response"]


{'name': 'John Doe', 'email': 'john@example.com', 'phone': '(555) 123-4567'}

## Dataclasses

Dataclasses provide simple, typed containers without validation or coercion.
They are useful for lightweight structured output.

In [15]:
from dataclasses import dataclass
from langchain.agents import create_agent

@dataclass
class ContactInfo:
    name: str
    email: str
    phone: str

agent = create_agent(
    model= model,
    response_format=ContactInfo
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Extract contact info from: John Doe, john@example.com, (555) 123-4567"}]
})

result["structured_response"]


ContactInfo(name='John Doe', email='john@example.com', phone='(555) 123-4567')

## Summary

- Structured output enforces schema-conformant responses from models.
- Pydantic provides validation and rich metadata.
- TypedDict offers lightweight typing without validation.
- Dataclasses provide simple containers with minimal overhead.
- Agents can enforce structured output as part of execution.