## Structured Output

Structured output is a feature of LangChain that allows you to define a schema for the output of a model. This schema can be used to guide the model in generating output that matches the expected format.
In that way, you can ensure that the output of your model is in the format you expect, and you can also validate the output of your model to ensure that it is in the format you expect.



#### Pydantic Models

Pydantic models are a type of structured output that can be used to define a schema for the output of a model. Pydantic models are defined using the Pydantic library, which provides a way to define a schema for the output of a model.

In [1]:
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model

load_dotenv()

os.environ['GROQ_API_KEY'] = os.getenv('GROQ_API_KEY')

model = init_chat_model('groq:qwen/qwen3-32b')
model

  from pydantic.v1.fields import FieldInfo as FieldInfoV1


ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 16384, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x000001768D19CC20>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001768D19D6A0>, model_name='qwen/qwen3-32b', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [2]:
from pydantic import BaseModel, Field

class Movie(BaseModel):
    name: str = Field(description="The name of the movie")
    director: str = Field(description='The director of the movie')
    year: int = Field(description="The year the movie was released")
    rating: float = Field(description="The rating of the movie on a scale of 1 to 10")
    genre: str = Field(description="The genre of the movie")


In [3]:
model_with_structure = model.with_structured_output(Movie)

model_with_structure

RunnableBinding(bound=ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 16384, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x000001768D19CC20>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001768D19D6A0>, model_name='qwen/qwen3-32b', model_kwargs={}, groq_api_key=SecretStr('**********')), kwargs={'tools': [{'type': 'function', 'function': {'name': 'Movie', 'description': '', 'parameters': {'properties': {'name': {'description': 'The name of the movie', 'type': 'string'}, 'director': {'description': 'The director of the movie', 'type': 'string'}, 'year': {'description': 'The year the movie was released', 'type': 'integer'}, 'rating': {'description': 'The rating of the movie on a scale of 1 to 10', 'type': 'number'}, 'genre': {'descri

In [4]:
response = model.invoke('Provide details about the movie Bahubali')

print(response.content) ## this gives everything but do not give in the structured format...


<think>
Okay, so I need to provide details about the movie Bahubali. Let me start by recalling what I know. Bahubali is a big Indian film, part of a two-part series. I think it's directed by S.S. Rajamouli, who's known for his epic films. The title might be a mix of Telugu and Sanskrit words, but I'm not sure exactly what it means. The main characters are probably named Bahubali and maybe his family, perhaps some rivalry between two brothers.

The first part, Bahubali: The Beginning, was released in 2015. The sequel, Bahubali 2: The Conclusion, came out in 2017. Both were massive hits in India and internationally. The film is in Telugu but was dubbed into other languages like Hindi, Tamil, and Malayalam. It's a mythological drama with elements of action, romance, and political intrigue. The story is set in a fictional kingdom called Mahishmati.

The lead actors are Prabhas and Rana Daggubati. Prabhas plays the protagonist, Bahubali, while Rana Daggubati plays his half-brother Bhallala 

In [5]:
response = model_with_structure.invoke('Provide details about the movie Bahubali')

response

Movie(name='Bahubali', director='S.S. Rajamouli', year=2015, rating=8.5, genre='Action, Epic')

### Message Output Structure

In [6]:
class Movie(BaseModel):
    name: str = Field(..., description="The name of the movie")
    director: str = Field(..., description='The director of the movie')
    year: int = Field(..., description="The year the movie was released")
    rating: float = Field(..., description="The rating of the movie on a scale of 1 to 10")
    genre: str = Field(..., description="The genre of the movie")

model_with_structure = model.with_structured_output(Movie, include_raw=True)

response = model_with_structure.invoke('Provide the details of the movie RRR and its famous song')

response

{'raw': AIMessage(content='', additional_kwargs={'reasoning_content': 'Okay, the user is asking for details about the movie RRR and its famous song. Let me start by recalling what I know about RRR. It\'s a 2022 Indian film directed by SS Rajamouli. The genre is action and drama, and it\'s known for its high-octane sequences. The main actors are Ram Charan and Jr. NTR. Now, the user also mentioned the famous song. RRR has a song called "Naatu Naatu" which became very popular.\n\nWait, the tools provided require specific parameters: name, director, year, rating, genre. The user wants the movie details and the famous song. But the function is for the movie, not the song. So I need to focus on the movie\'s details. The song might be part of the movie\'s description, but the function doesn\'t have a parameter for songs. Hmm, maybe just include the song in the genre or another field if allowed. But according to the function\'s required parameters, it\'s name, director, year, rating, genre. T

### Nested Structured Output

In [7]:
from pydantic import BaseModel, Field

class Actor(BaseModel):
    name: str
    role: str

class MovieDetails(BaseModel):
    title: str
    year: int
    cast: list[Actor]
    genres: list[str]
    budget: float | None = Field(None, description="Budget in millions USD")

model_with_structure = model.with_structured_output(MovieDetails)

response = model_with_structure.invoke('Provide details of the movie "Bahubali"')

response

MovieDetails(title='Bahubali', year=2015, cast=[Actor(name='Prabhas', role='Bahubali'), Actor(name='Rana Daggubati', role='Bhallala Deva'), Actor(name='Tamannaah', role='Avanthika'), Actor(name='Sushmita Sen', role='Devasena')], genres=['Action', 'Adventure', 'Fantasy'], budget=100.0)

### TypedDict
TypedDict provides a simpler alternative using Python’s built-in typing, ideal when you don’t need runtime validation.

In [8]:
from typing_extensions import TypedDict, Annotated

class MovieDict(TypedDict):
    """A movie with details"""
    title: Annotated[str, ..., 'The title of the movie']
    director: Annotated[str, ..., 'The director of the movie']
    year: Annotated[int, ..., 'The year the movie was released']
    rating: Annotated[float, ..., 'The rating of the movie']

model_with_typedict = model.with_structured_output(MovieDict)

response = model_with_typedict.invoke("Provide me the detail of the movie RRR")
response

{'director': 'SS Rajamouli', 'rating': 8.5, 'title': 'RRR', 'year': 2022}

In [9]:
class Actor(TypedDict):
    name: str
    role: str
    description: str

class MovieDetails(TypedDict):
    title: str
    year: int
    actors: list[Actor]
    genres: list[str]
    budget: float | None = Field(None, description="Budget in millions USD")

model_with_structure = model.with_structured_output(MovieDetails)

model_with_structure.invoke("Provide the details of the movie RRR")

{'actors': [{'description': "A tribal leader and revolutionary in the movie's dual narrative.",
   'name': 'Ram Charan',
   'role': 'Komaram Bheem'},
  {'description': 'A freedom fighter and key figure in the historical events depicted.',
   'name': 'Jr. NTR',
   'role': 'Alluri Sitarama Raju'},
  {'description': "A woman whose life intertwines with both main characters' stories.",
   'name': 'Alia Bhatt',
   'role': 'Sita'}],
 'budget': 45000000,
 'genres': ['Action', 'Drama', 'History'],
 'title': 'RRR',
 'year': 2022}

In [10]:
model.profile

{'max_input_tokens': 131072,
 'max_output_tokens': 16384,
 'image_inputs': False,
 'audio_inputs': False,
 'video_inputs': False,
 'image_outputs': False,
 'audio_outputs': False,
 'video_outputs': False,
 'reasoning_output': True,
 'tool_calling': True}

### DataClasses
A data class is a class typically containing mainly data, although there aren’t really any restrictions. You create it using the @dataclass decorator

In [12]:
from pydantic import BaseModel, Field
from langchain.agents import create_agent
from langchain_groq import ChatGroq

class ContactInfo(BaseModel):
    name: str
    email: str
    phone: str

llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0
)

agent = create_agent(
    model=llm,
    response_format=ContactInfo
)

result = agent.invoke({
    "messages": [
        {
            "role": "user",
            "content": "Extract contact info from: Mohan Das, mohan.das@example.com, 1234567890"
        }
    ]
})

result

{'messages': [HumanMessage(content='Extract contact info from: Mohan Das, mohan.das@example.com, 1234567890', additional_kwargs={}, response_metadata={}, id='cf2faa8f-033c-43bc-8146-89986fb5072a'),
  AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'kex53vj0h', 'function': {'arguments': '{"email":"mohan.das@example.com","name":"Mohan Das","phone":"1234567890"}', 'name': 'ContactInfo'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 251, 'total_tokens': 285, 'completion_time': 0.039916421, 'completion_tokens_details': None, 'prompt_time': 0.020508501, 'prompt_tokens_details': None, 'queue_time': 0.056624769, 'total_time': 0.060424922}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_ff2b098aaf', 'service_tier': 'on_demand', 'finish_reason': 'tool_calls', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019c07a9-12c7-7ac0-be0e-7d1456b7363c-0', tool_calls=[{'name': 'ContactInfo', 'args': {'email': '

In [13]:
result['structured_response']

ContactInfo(name='Mohan Das', email='mohan.das@example.com', phone='1234567890')

In [14]:
from langchain_groq import ChatGroq
## typedict

from typing_extensions import TypedDict
from langchain.agents import create_agent

class ContactInfo(TypedDict):
    """Contact information for a person.."""

    name: str
    email: str
    phone: str

# llm = ChatGroq(
#     model='llama-3.1-8b-instant',
#     temperature=0
# )

agent = create_agent(
    model=llm,
    response_format = ContactInfo
)


result = agent.invoke({
    "messages": [
        {
            "role": "user",
            "content": "Extract contact info from: Mohan Das, mohan.das@example.com, 1234567890"
        }
    ]
})


result

{'messages': [HumanMessage(content='Extract contact info from: Mohan Das, mohan.das@example.com, 1234567890', additional_kwargs={}, response_metadata={}, id='9a426e41-9819-45da-9d40-1dbaf6a03168'),
  AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'brc2czen7', 'function': {'arguments': '{"email":"mohan.das@example.com","name":"Mohan Das","phone":"1234567890"}', 'name': 'ContactInfo'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 265, 'total_tokens': 299, 'completion_time': 0.033893887, 'completion_tokens_details': None, 'prompt_time': 0.021507042, 'prompt_tokens_details': None, 'queue_time': 0.056164067, 'total_time': 0.055400929}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_f757f4b0bf', 'service_tier': 'on_demand', 'finish_reason': 'tool_calls', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019c07a9-22fd-7af1-a91d-19a5a2c62c01-0', tool_calls=[{'name': 'ContactInfo', 'args': {'email': '

In [15]:
result['structured_response']

{'name': 'Mohan Das', 'email': 'mohan.das@example.com', 'phone': '1234567890'}

In [16]:
## Dataclass

from dataclasses import dataclass
from langchain.agents import create_agent

@dataclass
class ContactInfo:
    name: str
    email: str
    phone: str

agent = create_agent(
    model=llm,
    response_format= ContactInfo
)



result = agent.invoke({
    "messages": [
        {
            "role": "user",
            "content": "Extract contact info from: Mohan Das, mohan.das@example.com, 1234567890"
        }
    ]
})

result['structured_response']

ContactInfo(name='Mohan Das', email='mohan.das@example.com', phone='1234567890')