# **Structured Outputs**

For many applications, such as chatbots, models need to respond to users directly in natural language. However, there are scenarios where we need models to output in a structured format. For example, we might want to store the model output in a database and ensure that the output conforms to the database schema. This need motivates the concept of structured output, where models can be instructed to respond with a particular output structure.

## **Output Parser**
Often we need the output of a LLM in a particular format, for example, you want a python datetime object, or a JSON object. LangChain come with Parse utilities allowing you to easily convert output into precise data types or even your own custom class instance with Pydantic.

Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data.

Parser consists of two key elements:
- `get_format_instructions()` method:  A method which returns a string containing instructions for how the output of a language model should be formatted.
- `parse()` method: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.
- (Optional)"Parse with prompt": A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to be the prompt that generated such a response) and parses it into some structure. The prompt is largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from the prompt to do so.

Output Parser Types:
- CSL Parser
- Datetime Parser
- JSON Parser
- Pydantic Parser
etc...

**Question: What is Pydantic?**  
Pydantic is a library which allows us to define data models, validate the data and type coercion.  
Coercion in Pydantic refers to its ability to automatically convert input data into the types specified in the model, as long as the conversion is reasonable. 

## **Comma Separated List Parser**

This output parser can be used when you want to return a list of comma-separated items.

In [1]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

csv_output_parser = CommaSeparatedListOutputParser()



In [2]:
# As discussed above, lets experiment with get_format_instructions()

csv_output_parser.get_format_instructions()

'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'

In [3]:
# prompt -> generate a list of modules one must study to become data scientist

example_input = "Python, DA, SQL, ML, DL"

print(type(example_input))

<class 'str'>


In [4]:
example_input = "Python, DA, SQL, ML, DL"

# using parse() method
parsed_output = csv_output_parser.parse(example_input)

print(type(parsed_output))
print(parsed_output)

<class 'list'>
['Python', 'DA', 'SQL', 'ML', 'DL']


## **Building an AI Powered Chef Assistant using CommaSeparatedListOutputParser**

In [5]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

output_parser = CommaSeparatedListOutputParser()

In [6]:
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate(
                messages=[
                      ("system", """You are a helpful AI Chef Assistant. 
                      Given a dish name by user, you can provide the ingredients to prepare the dish.
                      Output Format Instructions:
                      {output_format_instructions}"""),
                      ("human", "Give me the list of ingredients for cooking {dish_name}."),
                  ],
                partial_variables={"output_format_instructions": output_parser.get_format_instructions()}
)

prompt_template

ChatPromptTemplate(input_variables=['dish_name'], input_types={}, partial_variables={'output_format_instructions': 'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['output_format_instructions'], input_types={}, partial_variables={}, template='You are a helpful AI Chef Assistant. \n                      Given a dish name by user, you can provide the ingredients to prepare the dish.\n                      Output Format Instructions:\n                      {output_format_instructions}'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['dish_name'], input_types={}, partial_variables={}, template='Give me the list of ingredients for cooking {dish_name}.'), additional_kwargs={})])

In [7]:
# Import Google ChatModel
from langchain_google_genai import ChatGoogleGenerativeAI

f = open('keys/.gemini.txt')
GOOGLE_API_KEY = f.read()

# Set the OpenAI Key and initialize a ChatModel
chat_model = ChatGoogleGenerativeAI(api_key=GOOGLE_API_KEY, model="gemini-2.0-flash-exp")

I0000 00:00:1740575108.683549 2549997 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


In [8]:
chain = prompt_template | chat_model | output_parser

In [9]:
raw_input = {"dish_name": "paneer biryani"}

response = chain.invoke(raw_input)

print(response)

['paneer', 'basmati rice', 'onions', 'tomatoes', 'ginger-garlic paste', 'green chilies', 'mint leaves', 'coriander leaves', 'yogurt', 'biryani masala', 'turmeric powder', 'red chili powder', 'garam masala', 'saffron', 'milk', 'ghee', 'oil', 'salt', 'whole spices (bay leaf', 'cinnamon stick', 'cloves', 'cardamom)']


In [10]:
print(type(response))

<class 'list'>


## **Building an AI Powered Historical Event Teller using Datetime Parser**

This OutputParser can be used to parse LLM output into datetime format.

In [11]:
from langchain.output_parsers import DatetimeOutputParser

output_parser = DatetimeOutputParser()


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  from langchain.output_parsers.combining import CombiningOutputParser


In [12]:
# As discussed above, lets experiment with get_format_instructions()

output_parser.get_format_instructions()

"Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.\n\nExamples: 0716-07-02T15:45:40.110435Z, 0495-05-18T00:17:02.037880Z, 1152-11-17T05:11:55.348175Z\n\nReturn ONLY this string, no other words!"

In [13]:
from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate(
    messages=[("system", """You are a knowledgeable AI assistant specialized in providing 
                            accurate dates and times for historical events. When asked about 
                            a past event, provide the exact date and time. 
                            If the exact date is uncertain, mention the most widely accepted 
                            estimate along with sources or references. 

                            You generate output while following the below mentioned format.
                            Output Format Instructions:
                            {output_format_instructions}"""), 
              ("user", """Answer the users question:
                          Question:
                          {question}""")],
    partial_variables={"output_format_instructions": output_parser.get_format_instructions()}
)

In [15]:
chain = chat_template | chat_model | output_parser

raw_input = {"question": "What is Indian Independence Day?"}

response = chain.invoke(raw_input)

print(response)

1947-08-15 00:00:00


In [16]:
print(type(response))

<class 'datetime.datetime'>


## **Building an AI Powered Song Recommender using Pydantic Parser**

This output parser allows users to specify an arbitrary Pydantic Model and query LLMs for outputs that conform to that schema.

Use Pydantic to declare your data model. Pydantic’s BaseModel is like a Python dataclass, but with actual type checking + coercion.

You should have some Pydantic knowledge to use it.

`pip install pydantic`

In [17]:
from langchain.output_parsers import PydanticOutputParser

from pydantic import BaseModel, Field

class Song(BaseModel):
    name: str = Field(description="Name of a Song")
    geners: list = Field(description="List of Geners")

output_parser = PydanticOutputParser(pydantic_object=Song)

In [18]:
print(output_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "Name of a Song", "title": "Name", "type": "string"}, "geners": {"description": "List of Geners", "items": {}, "title": "Geners", "type": "array"}}, "required": ["name", "geners"]}
```


In [19]:
from langchain_core.prompts import ChatPromptTemplate

# Template
chat_template = ChatPromptTemplate(
                            messages=[("system", """You are a helpful AI Song Recommendation Engine.
                                        You generate output while following the below mentioned format.
                                        Output Format Instructions:
                                        {output_format_instructions}"""), 
                                      ("human", "What is the most famous song by {singer_name}.")],
                            partial_variables={"output_format_instructions": output_parser.get_format_instructions()}
)

chat_template

ChatPromptTemplate(input_variables=['singer_name'], input_types={}, partial_variables={'output_format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"name": {"description": "Name of a Song", "title": "Name", "type": "string"}, "geners": {"description": "List of Geners", "items": {}, "title": "Geners", "type": "array"}}, "required": ["name", "geners"]}\n```'}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['output_format_instructions'], input_types={}, partial_variables={}, template='You are a helpful AI Song Recommendatio

In [20]:
chain = chat_template | chat_model | output_parser

raw_input = {"singer_name": "sonu nigam"}

chain.invoke(raw_input)

Song(name='Kal Ho Naa Ho', geners=['Bollywood', 'Indian Pop'])

## **JSONParser without Pydantic**

In [21]:
from langchain_core.output_parsers import JsonOutputParser

output_parser = JsonOutputParser()

In [22]:
output_parser.get_format_instructions()

'Return a JSON object.'

In [24]:
output_parser.get_format_instructions()

'Return a JSON object.'

In [26]:
# without pydantic

chain = chat_template | chat_model | output_parser

raw_input = {"singer_name": "sonu nigam", "output_format_instructions": output_parser.get_format_instructions()}

response = chain.invoke(raw_input)

response

{'artist': 'Sonu Nigam',
 'famous_song': 'Kal Ho Naa Ho',
 'reason': "This song is one of Sonu Nigam's most popular and critically acclaimed songs. It won numerous awards and is widely recognized for its emotional depth and memorable melody."}

In [27]:
print(type(response))

<class 'dict'>


## **JSONParser with Pydantic**

In [28]:
output_parser = JsonOutputParser(pydantic_object=Song)

print(output_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "Name of a Song", "title": "Name", "type": "string"}, "geners": {"description": "List of Geners", "items": {}, "title": "Geners", "type": "array"}}, "required": ["name", "geners"]}
```


In [29]:
chain = chat_template | chat_model | output_parser

In [30]:
raw_input = {"singer_name": "sonu nigam", "output_format_instructions": output_parser.get_format_instructions()}

response = chain.invoke(raw_input)

response

{'name': 'Kal Ho Naa Ho', 'geners': ['Bollywood', 'Indian Pop']}

In [31]:
print(type(response))

<class 'dict'>
