# **Output Parsing**

Often we need the output of a LLM in a particular format, for example, you want a python datetime object, or a JSON object. LangChain come with Parse utilities allowing you to easily convert output into precise data types or even your own custom class instance with Pydantic.

Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data.

Parser consists of two key elements:
- `get_format_instructions()` method:  A method which returns a string containing instructions for how the output of a language model should be formatted.
- `parse()` method: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.
- (Optional)"Parse with prompt": A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to be the prompt that generated such a response) and parses it into some structure. The prompt is largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from the prompt to do so.

Output Parser Types:
- CSV Parser
- Datetime Parser
- JSON Parser
- Pydantic Parser
etc...

## **CSV Parser**

This output parser can be used when you want to return a list of comma-separated items.

In [1]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

csv_output_parser = CommaSeparatedListOutputParser()

In [2]:
# As discussed above, lets experiment with get_format_instructions()

csv_output_parser.get_format_instructions()

'Your response should be a list of comma separated values, eg: `foo, bar, baz`'

In [3]:
example_input = "Python, DA, SQL, ML, DL"

# using parse() method
csv_output_parser.parse(example_input)

['Python', 'DA', 'SQL', 'ML', 'DL']

In [4]:
from langchain_openai import ChatOpenAI

# Read the API Key
f = open('keys/.openai_api_key.txt')
OPENAI_API_KEY = f.read()

# Set the OpenAI Key and initialize a ChatModel
chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY)

In [5]:
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI Chef."),
    ("human", "Give me the ingredients for cooking {dish_name}. {output_format_instructions}"),
])

prompt_template

ChatPromptTemplate(input_variables=['dish_name', 'output_format_instructions'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a helpful AI Chef.')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['dish_name', 'output_format_instructions'], template='Give me the ingredients for cooking {dish_name}. {output_format_instructions}'))])

In [6]:
chain = prompt_template | chat_model | csv_output_parser

In [7]:
input = {"dish_name": "paneer biryani", "output_format_instructions": csv_output_parser.get_format_instructions()}

response = chain.invoke(input)

response

['paneer',
 'basmati rice',
 'yogurt',
 'onions',
 'tomatoes',
 'ginger',
 'garlic',
 'green chilies',
 'spices (such as cumin',
 'coriander',
 'turmeric',
 'garam masala)',
 'cilantro',
 'mint',
 'ghee',
 'salt',
 'water']

In [8]:
print(type(response))

<class 'list'>


## **JSONParser**

In [9]:
from langchain_core.output_parsers import JsonOutputParser

json_output_parser = JsonOutputParser()

In [10]:
json_chain = chain = prompt_template | chat_model | json_output_parser

In [11]:
input = {"dish_name": "paneer biryani", "output_format_instructions": json_output_parser.get_format_instructions()}

response = chain.invoke(input)

response

{'Main Ingredients': ['Paneer (Indian cottage cheese)',
  'Basmati rice',
  'Onion',
  'Tomato',
  'Green chili',
  'Ginger',
  'Garlic',
  'Yogurt',
  'Fresh cilantro',
  'Mint leaves',
  'Spices (such as cumin, coriander, turmeric, garam masala)',
  'Oil or ghee',
  'Salt'],
 'Optional Ingredients': ['Mixed vegetables (like carrots, peas, bell peppers)',
  'Cashews',
  'Raisins']}

In [12]:
print(type(response))

<class 'dict'>


## **Datetime Parser**

This OutputParser can be used to parse LLM output into datetime format.

In [13]:
from langchain.output_parsers import DatetimeOutputParser

output_parser = DatetimeOutputParser()

In [14]:
# As discussed above, lets experiment with get_format_instructions()

output_parser.get_format_instructions()

"Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.\n\nExamples: 0766-08-12T18:44:49.184246Z, 1953-02-26T13:12:24.798157Z, 1621-07-14T00:16:29.137230Z\n\nReturn ONLY this string, no other words!"

In [15]:
from langchain.prompts import PromptTemplate

template = """Answer the users question:

{question}

{output_format_instructions}"""

prompt_template = PromptTemplate.from_template(template)

In [16]:
from langchain_openai import OpenAI

# Set the OpenAI Key and initialize a ChatModel
model = OpenAI(openai_api_key=OPENAI_API_KEY)

In [17]:
chain = prompt_template | model | output_parser

input = {"question": "What is Indian Independence Day?", "output_format_instructions": output_parser.get_format_instructions()}

response = chain.invoke(input)

response

datetime.datetime(1947, 8, 15, 0, 0)

In [18]:
print(type(response))

<class 'datetime.datetime'>


## **Pydantic Parser**

This output parser allows users to specify an arbitrary Pydantic Model and query LLMs for outputs that conform to that schema.

Use Pydantic to declare your data model. Pydantic’s BaseModel is like a Python dataclass, but with actual type checking + coercion.

You should have some Pydantic knowledge to use it.

`pip install pydantic`

In [19]:
from langchain.output_parsers import PydanticOutputParser

from langchain_core.pydantic_v1 import BaseModel, Field, validator

class Song(BaseModel):
    name: str = Field(description="Name of a Song")
    geners: list = Field(description="List of Geners")

parser = PydanticOutputParser(pydantic_object=Song)

In [20]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"title": "Name", "description": "Name of a Song", "type": "string"}, "geners": {"title": "Geners", "description": "List of Geners", "type": "array", "items": {}}}, "required": ["name", "geners"]}
```


In [21]:
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

# System Template
system_prompt_template = SystemMessagePromptTemplate.from_template("You are a helpful AI Song Recommendation Engine.")

# Human Template
human_prompt_template = HumanMessagePromptTemplate.from_template("What is the most famous song by {singer_name}. \n{output_format_instructions}")

# Compile a chat prompt
chat_template = ChatPromptTemplate.from_messages(
    [system_prompt_template, human_prompt_template]
)

In [22]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)

In [23]:
chain = chat_template | chat_model | parser

input = {"singer_name": "sonu nigam", "output_format_instructions": parser.get_format_instructions()}

chain.invoke(input)

Song(name='Kal Ho Naa Ho', geners=['Bollywood', 'Romantic'])