# **Structured Outputs**

For many applications, such as chatbots, models need to respond to users directly in natural language. However, there are scenarios where we need models to output in a structured format. For example, we might want to store the model output in a database and ensure that the output conforms to the database schema. This need motivates the concept of structured output, where models can be instructed to respond with a particular output structure.

## **Output Parser**
Often we need the output of a LLM in a particular format, for example, you want a python datetime object, or a JSON object. LangChain come with Parse utilities allowing you to easily convert output into precise data types or even your own custom class instance with Pydantic.

Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data.

Parser consists of two key elements:
- `get_format_instructions()` method:  A method which returns a string containing instructions for how the output of a language model should be formatted.
- `parse()` method: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.
- (Optional)"Parse with prompt": A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to be the prompt that generated such a response) and parses it into some structure. The prompt is largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from the prompt to do so.

Output Parser Types:
- CSV Parser
- Datetime Parser
- JSON Parser
- Pydantic Parser
etc...

**Question: What is Pydantic?**  
Pydantic is a library which allows us to define data models, validate the data and type coercion.  
Coercion in Pydantic refers to its ability to automatically convert input data into the types specified in the model, as long as the conversion is reasonable. 

## **CSV Parser**

This output parser can be used when you want to return a list of comma-separated items.

In [5]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

csv_output_parser = CommaSeparatedListOutputParser()

In [2]:
# As discussed above, lets experiment with get_format_instructions()

csv_output_parser.get_format_instructions()

'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'

In [3]:
# prompt -> generate a list of modules one must study to become data scientist

example_input = "Python, DA, SQL, ML, DL"

type(example_input)

str

In [4]:
example_input = "Python, DA, SQL, ML, DL"

# using parse() method
csv_output_parser.parse(example_input)

['Python', 'DA', 'SQL', 'ML', 'DL']

In [5]:
from langchain_openai import ChatOpenAI

# Read the API Key
f = open('keys/.openai_api_key.txt')
OPENAI_API_KEY = f.read()

# Set the OpenAI Key and initialize a ChatModel
chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-4o-mini")

In [6]:
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful AI Chef Assistant. 
                  Given a dish name by user, you can provide the ingredients to prepare the dish.
                  Output Format Instructions:
                  {output_format_instructions}"""),
    ("human", "Give me the ingredients for cooking {dish_name}."),
])

prompt_template

ChatPromptTemplate(input_variables=['dish_name', 'output_format_instructions'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['output_format_instructions'], template='You are a helpful AI Chef Assistant. \n                  Given a dish name by user, you can provide the ingredients to prepare the dish.\n                  Output Format Instructions:\n                  {output_format_instructions}')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['dish_name'], template='Give me the ingredients for cooking {dish_name}.'))])

In [7]:
chain = prompt_template | chat_model | csv_output_parser

In [8]:
user_input = {"dish_name": "paneer biryani", "output_format_instructions": csv_output_parser.get_format_instructions()}

response = chain.invoke(user_input)

response

['paneer',
 'basmati rice',
 'yogurt',
 'onions',
 'tomatoes',
 'ginger',
 'garlic',
 'green chilies',
 'spices (such as turmeric',
 'cumin',
 'coriander',
 'garam masala)',
 'mint leaves',
 'coriander leaves',
 'ghee',
 'oil',
 'salt',
 'water']

In [9]:
print(type(response))

<class 'list'>


## **JSONParser**

In [10]:
from langchain_core.output_parsers import JsonOutputParser

json_output_parser = JsonOutputParser()

In [13]:
json_output_parser.get_format_instructions()

'Return a JSON object.'

In [20]:
from langchain_core.pydantic_v1 import BaseModel, Field

class QueryResponse(BaseModel):
    item_i: str = Field(description="Breif description and in how much quantity ith item should be used")


json_output_parser = JsonOutputParser(pydantic_object=QueryResponse)

print(json_output_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"item_i": {"title": "Item I", "description": "Breif description and in how much quantity ith item should be used", "type": "string"}}, "required": ["item_i"]}
```


In [21]:
json_chain = chain = prompt_template | chat_model | json_output_parser

In [22]:
user_input = {"dish_name": "paneer biryani", "output_format_instructions": json_output_parser.get_format_instructions()}

response = chain.invoke(user_input)

response

{'properties': {'paneer': {'title': 'Paneer',
   'description': '200 grams, cubed',
   'type': 'string'},
  'basmati rice': {'title': 'Basmati Rice',
   'description': '1 cup, soaked for 30 minutes',
   'type': 'string'},
  'onion': {'title': 'Onion',
   'description': '1 large, thinly sliced',
   'type': 'string'},
  'tomato': {'title': 'Tomato',
   'description': '1 large, chopped',
   'type': 'string'},
  'curd': {'title': 'Curd', 'description': '1/2 cup', 'type': 'string'},
  'ginger-garlic paste': {'title': 'Ginger-Garlic Paste',
   'description': '1 tablespoon',
   'type': 'string'},
  'green chilli': {'title': 'Green Chilli',
   'description': '1, slit',
   'type': 'string'},
  'biryani masala': {'title': 'Biryani Masala',
   'description': '1 tablespoon',
   'type': 'string'},
  'saffron strands': {'title': 'Saffron Strands',
   'description': 'a pinch, soaked in 2 tablespoons of warm milk',
   'type': 'string'},
  'mint leaves': {'title': 'Mint Leaves',
   'description': 'a ha

In [23]:
print(type(response))

<class 'dict'>


## **Datetime Parser**

This OutputParser can be used to parse LLM output into datetime format.

In [24]:
from langchain.output_parsers import DatetimeOutputParser

output_parser = DatetimeOutputParser()

In [25]:
# As discussed above, lets experiment with get_format_instructions()

output_parser.get_format_instructions()

"Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.\n\nExamples: 0980-01-21T10:18:18.427517Z, 0253-06-07T08:17:14.738751Z, 0592-03-29T10:46:14.149807Z\n\nReturn ONLY this string, no other words!"

In [26]:
from langchain.prompts import PromptTemplate

template = """Answer the users question:
Question:
{question}

Output Format Instructions:
{output_format_instructions}"""

prompt_template = PromptTemplate.from_template(template)

In [27]:
from langchain_openai import OpenAI

# Set the OpenAI Key and initialize a ChatModel
model = OpenAI(openai_api_key=OPENAI_API_KEY)

In [29]:
chain = prompt_template | model | output_parser

input = {"question": "What is Indian Independence Day?", "output_format_instructions": output_parser.get_format_instructions()}

response = chain.invoke(input)

response

datetime.datetime(1947, 8, 15, 0, 0)

In [30]:
print(type(response))

<class 'datetime.datetime'>


## **Pydantic Parser**

This output parser allows users to specify an arbitrary Pydantic Model and query LLMs for outputs that conform to that schema.

Use Pydantic to declare your data model. Pydantic’s BaseModel is like a Python dataclass, but with actual type checking + coercion.

You should have some Pydantic knowledge to use it.

`pip install pydantic`

In [31]:
from langchain.output_parsers import PydanticOutputParser

from langchain_core.pydantic_v1 import BaseModel, Field

class Song(BaseModel):
    name: str = Field(description="Name of a Song")
    geners: list = Field(description="List of Geners")

parser = PydanticOutputParser(pydantic_object=Song)

In [32]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"title": "Name", "description": "Name of a Song", "type": "string"}, "geners": {"title": "Geners", "description": "List of Geners", "type": "array", "items": {}}}, "required": ["name", "geners"]}
```


In [33]:
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

# System Template
system_prompt_template = SystemMessagePromptTemplate.from_template("You are a helpful AI Song Recommendation Engine.")

# Human Template
human_prompt_template = HumanMessagePromptTemplate.from_template("What is the most famous song by {singer_name}. \n{output_format_instructions}")

# Compile a chat prompt
chat_template = ChatPromptTemplate.from_messages(
    [system_prompt_template, human_prompt_template]
)

In [34]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)

In [35]:
chain = chat_template | chat_model | parser

input = {"singer_name": "sonu nigam", "output_format_instructions": parser.get_format_instructions()}

chain.invoke(input)

Song(name='Kal Ho Naa Ho', geners=['Bollywood', 'Romantic'])