# **Structured Outputs**

For many applications, such as chatbots, models need to respond to users directly in natural language. However, there are scenarios where we need models to output in a structured format. For example, we might want to store the model output in a database and ensure that the output conforms to the database schema. This need motivates the concept of structured output, where models can be instructed to respond with a particular output structure.

## **Output Parser**
Often we need the output of a LLM in a particular format, for example, you want a python datetime object, or a JSON object. LangChain come with Parse utilities allowing you to easily convert output into precise data types or even your own custom class instance with Pydantic.

Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data.

Parser consists of two key elements:
- `get_format_instructions()` method:  A method which returns a string containing instructions for how the output of a language model should be formatted.
- `parse()` method: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.
- (Optional)"Parse with prompt": A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to be the prompt that generated such a response) and parses it into some structure. The prompt is largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from the prompt to do so.

Output Parser Types:
- CSL Parser
- Pydantic Parser
- JSON Parser with Pydantic
etc...

**Question: What is Pydantic?**  
Pydantic is a library which allows us to define data models, validate the data and type coercion.  
Coercion in Pydantic refers to its ability to automatically convert input data into the types specified in the model, as long as the conversion is reasonable. 

## **Comma Separated List Parser**

This output parser can be used when you want to return a list of comma-separated items.

In [1]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

csv_output_parser = CommaSeparatedListOutputParser()

In [2]:
# As discussed above, lets experiment with get_format_instructions()

csv_output_parser.get_format_instructions()

'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'

In [3]:
# prompt -> generate a list of modules one must study to become data scientist

example_input = "Python, DA, SQL, ML, DL"

print(type(example_input))

<class 'str'>


In [4]:
example_input = "Python, DA, SQL, ML, DL"

# using parse() method
parsed_output = csv_output_parser.parse(example_input)

print(type(parsed_output))
print(parsed_output)

<class 'list'>
['Python', 'DA', 'SQL', 'ML', 'DL']


## **Building an AI System to Auto-Extract Skills from Job Descriptions** 

**Use Case:** Extract a list of required skills from a job description so you can auto-fill a checklist or tag candidate profiles.

In [5]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

output_parser = CommaSeparatedListOutputParser()

In [15]:
# Example: extract skills from a job description
from langchain_core.prompts import PromptTemplate

prompt_template = PromptTemplate(template="""Extract the key technical skills from the following job description.
Return them as a comma-separated list (no extra text).

Job description:
{job}
""")

prompt_template

PromptTemplate(input_variables=['job'], input_types={}, partial_variables={}, template='Extract the key technical skills from the following job description.\nReturn them as a comma-separated list (no extra text).\n\nJob description:\n{job}\n')

In [16]:
# Import Google ChatModel
from langchain_google_genai import ChatGoogleGenerativeAI

f = open('keys/.gemini.txt')
GOOGLE_API_KEY = f.read()

# Set the GoogleAI Key and initialize a ChatModel
chat_model = ChatGoogleGenerativeAI(api_key=GOOGLE_API_KEY, model="gemini-2.0-flash")

In [17]:
chain = prompt_template | chat_model | output_parser

In [18]:
jd_file = open("data/job_desc.txt")

raw_input = {"job": jd_file.read()}

response = chain.invoke(raw_input)

print(response)

['Snowflake', 'SQL', 'Data Modeling', 'ETL/ELT', 'Python', 'Scala', 'JSON', 'Parquet', 'Airflow', 'DBT', 'AWS Glue', 'Informatica', 'AWS', 'Azure', 'GCP', 'CI/CD', 'Git']


In [19]:
print(type(response))

<class 'list'>


## **Building an AI Powered Song Recommender using Pydantic Parser**

This output parser allows users to specify an arbitrary Pydantic Model and query LLMs for outputs that conform to that schema.

Use Pydantic to declare your data model. Pydantic’s BaseModel is like a Python dataclass, but with actual type checking + coercion.

You should have some Pydantic knowledge to use it.

`pip install pydantic`

In [24]:
from langchain_core.output_parsers import PydanticOutputParser

from pydantic import BaseModel, Field

class Song(BaseModel):
    name: str = Field(description="Name of a Song")
    geners: list = Field(description="List of Geners")

output_parser = PydanticOutputParser(pydantic_object=Song)

In [26]:
print(output_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "Name of a Song", "title": "Name", "type": "string"}, "geners": {"description": "List of Geners", "items": {}, "title": "Geners", "type": "array"}}, "required": ["name", "geners"]}
```


In [27]:
from langchain_core.prompts import ChatPromptTemplate

# Template
chat_template = ChatPromptTemplate(
                            messages=[("system", """You are a helpful AI Song Recommendation Engine.
                                        You generate output while following the below mentioned format.
                                        Output Format Instructions:
                                        {output_format_instructions}"""), 
                                      ("human", "What is the most famous song by {singer_name}.")],
                            partial_variables={"output_format_instructions": output_parser.get_format_instructions()}
)

chat_template

ChatPromptTemplate(input_variables=['singer_name'], input_types={}, partial_variables={'output_format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"name": {"description": "Name of a Song", "title": "Name", "type": "string"}, "geners": {"description": "List of Geners", "items": {}, "title": "Geners", "type": "array"}}, "required": ["name", "geners"]}\n```'}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['output_format_instructions'], input_types={}, partial_variables={}, template='You are a helpful AI Song Recommendatio

In [28]:
chain = chat_template | chat_model | output_parser

raw_input = {"singer_name": "sonu nigam"}

chain.invoke(raw_input)

Song(name='Kal Ho Naa Ho', geners=['Bollywood', 'Indian Pop'])

## **Building an Intelligent Parser to Translate User Requests into Order Objects**

**Use case:** Parse complex structured user requests into typed Python objects (e.g., Order object used by downstream business logic).

In [29]:
from pydantic import BaseModel, Field, ValidationError

# Define the pydantic model for expected output
class OrderItem(BaseModel):
    sku: str = Field(description="")
    quantity: int = Field(description="", gt=0)
    price: float = Field(description="")

class Order(BaseModel):
    order_id: str = Field(description="")
    customer_email: str = Field(description="")
    items: list[OrderItem] = Field(description="")
    total: float = Field(description="")

# Create parser from pydantic model
output_parser = PydanticOutputParser(pydantic_object=Order)

In [30]:
print(output_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"OrderItem": {"properties": {"sku": {"description": "", "title": "Sku", "type": "string"}, "quantity": {"description": "", "exclusiveMinimum": 0, "title": "Quantity", "type": "integer"}, "price": {"description": "", "title": "Price", "type": "number"}}, "required": ["sku", "quantity", "price"], "title": "OrderItem", "type": "object"}}, "properties": {"order_id": {"description": "", "title": "Order Id", "type": "string"}, "customer_email": {"description": "", "title": "Customer Email", "type": "string"}, "items": {"description": "", 

In [43]:
prompt_template = PromptTemplate(
    input_variables=["text"],
    template="""User provided this order information in text. 
    Extract into a JSON format that matches the output format instructions provided below.
    Return only a JSON format.
    Text:
    {text}
    
    Output Format Instructions:
    {output_format_instructions}
    """,
    partial_variables={"output_format_instructions": output_parser.get_format_instructions()}
)

In [44]:
pd_file = open("data/prod_desc_asr_output.txt")

pd_file.read()

'uh yeah, for order number ORD 1001, the email is jane at example dot com.\nNeed two of item ABC one twenty three, price is nineteen ninety nine each,\nand one of the X Y Z nine nine nine for hundred twenty nine point five.\ntotal should be one sixty nine forty eight.\n'

In [45]:
chain = prompt_template | chat_model | output_parser

raw_input = {"text": pd_file.read()}

parsed_response = chain.invoke(raw_input)

In [47]:
type(parsed_response)

__main__.Order

In [49]:
parsed_response

Order(order_id='12345', customer_email='test@example.com', items=[OrderItem(sku='SKU123', quantity=2, price=10.0), OrderItem(sku='SKU456', quantity=1, price=25.5)], total=45.5)

In [38]:
# # Parse & validate (throws ValidationError if wrong)
# try:
#     order_obj = parser.parse(llm_response)
#     print(order_obj)
#     print(order_obj.items[0].sku)
# except ValidationError as e:
#     print("Validation failed:", e.json())

## **JSONParser with Pydantic**

In [59]:
output_parser = JsonOutputParser(pydantic_object=Song)

print(output_parser.get_format_instructions())

STRICT OUTPUT FORMAT:
- Return only the JSON value that conforms to the schema. Do not include any additional text, explanations, headings, or separators.
- Do not wrap the JSON in Markdown or code fences (no ``` or ```json).
- Do not prepend or append any text (e.g., do not write "Here is the JSON:").
- The response must be a single top-level JSON value exactly as required by the schema (object/array/etc.), with no trailing commas or comments.

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]} the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema (shown in a code block for readability only — do not include any backticks or Markdown in your output):


In [54]:
chain = chat_template | chat_model | output_parser

In [30]:
raw_input = {"singer_name": "sonu nigam", "output_format_instructions": output_parser.get_format_instructions()}

response = chain.invoke(raw_input)

response

{'name': 'Kal Ho Naa Ho', 'geners': ['Bollywood', 'Indian Pop']}

In [31]:
print(type(response))

<class 'dict'>


## **Building an AI Powered Historical Event Teller using Datetime Parser**

This OutputParser can be used to parse LLM output into datetime format.

In [23]:
from langchain_core.output_parsers import DatetimeOutputParser

output_parser = DatetimeOutputParser()

ImportError: cannot import name 'DatetimeOutputParser' from 'langchain_core.output_parsers' (/Users/kanavbansal/Developer/.env_langchain/lib/python3.13/site-packages/langchain_core/output_parsers/__init__.py)

In [12]:
# As discussed above, lets experiment with get_format_instructions()

output_parser.get_format_instructions()

"Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.\n\nExamples: 0716-07-02T15:45:40.110435Z, 0495-05-18T00:17:02.037880Z, 1152-11-17T05:11:55.348175Z\n\nReturn ONLY this string, no other words!"

In [13]:
from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate(
    messages=[("system", """You are a knowledgeable AI assistant specialized in providing 
                            accurate dates and times for historical events. When asked about 
                            a past event, provide the exact date and time. 
                            If the exact date is uncertain, mention the most widely accepted 
                            estimate along with sources or references. 

                            You generate output while following the below mentioned format.
                            Output Format Instructions:
                            {output_format_instructions}"""), 
              ("user", """Answer the users question:
                          Question:
                          {question}""")],
    partial_variables={"output_format_instructions": output_parser.get_format_instructions()}
)

In [15]:
chain = chat_template | chat_model | output_parser

raw_input = {"question": "What is Indian Independence Day?"}

response = chain.invoke(raw_input)

print(response)

1947-08-15 00:00:00


In [16]:
print(type(response))

<class 'datetime.datetime'>
