# Output Parsers for LLM Input/Output with LangChain

### Enter API Tokens

In [1]:
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

In [2]:
import os
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

## Chat Models and LLMs

LangChain uses existing Large Language Models (LLMs) from various providers like OpenAI and Hugging Face. It does not build its own LLMs but offers a standard API to interact with different LLMs through a standard interface.

### Accessing Commercial LLMs like ChatGPT

In [3]:
from langchain_openai import ChatOpenAI

# instantiate the model
llm = ChatOpenAI(
                    model='gpt-3.5-turbo',
                    temperature=0
              )

## Output Parsers

Output parsers in Langchain are crucial for structuring responses from language models. Here are examples of Langchain's specific parser types:

*   **PydanticOutputParser**:
    - Uses Pydantic models to ensure outputs match a specified schema, providing type checking and coercion similar to Python dataclasses.

*   **JsonOutputParser**:
    - Ensures outputs adhere to an arbitrary JSON schema, with Pydantic models optionally used to declare the data structure.

*   **CommaSeparatedListOutputParser**:
    - Extracts comma-separated values from model outputs, useful for lists of items.


### Pydantic OutputParser

In [4]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field

In [5]:
# define the desired data structure
class QueryResponse(BaseModel):
  description: str = Field(description= "A brief description of the topic asked by the user")
  pros: str = Field(description='three points showing the pros of the topic asked by the user')
  cons: str = Field(description='three points showing the cons of the topic asked by the user')
  conclusion: str = Field(description="summary of topic asked by the user")

# Set up a parser and add instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=QueryResponse)
parser

PydanticOutputParser(pydantic_object=<class '__main__.QueryResponse'>)

In [6]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"description": {"title": "Description", "description": "A brief description of the topic asked by the user", "type": "string"}, "pros": {"title": "Pros", "description": "three points showing the pros of the topic asked by the user", "type": "string"}, "cons": {"title": "Cons", "description": "three points showing the cons of the topic asked by the user", "type": "string"}, "conclusion": {"title": "Conclusion", "description": "summary of topic asked by the user", "type": "string"}}, "required": ["description", "pros", "cons", "c

In [7]:
# create final prompt with formatting instructions from the parser

prompt_txt = """
              Answer the user query and generate the response based on the following formmatted instructions:

              formatted instructions:
              {format_instructions}

              Query:
              {query}
            """

In [8]:
prompt = PromptTemplate(
    template=prompt_txt,
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

prompt

PromptTemplate(input_variables=['query'], partial_variables={'format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"description": {"title": "Description", "description": "A brief description of the topic asked by the user", "type": "string"}, "pros": {"title": "Pros", "description": "three points showing the pros of the topic asked by the user", "type": "string"}, "cons": {"title": "Cons", "description": "three points showing the cons of the topic asked by the user", "type": "string"}, "conclusion": {"title": "Conclusion", "description": "summary of 

In [9]:
chain = (prompt | llm | parser)

In [10]:
question = "Tell me about the carbon sequestration"

# invoke chain
response = chain.invoke({"query": question})

In [11]:
# get the response
response

QueryResponse(description='Carbon sequestration is the process of capturing and storing carbon dioxide to mitigate its presence in the atmosphere and reduce the impact of climate change.', pros='1. Helps reduce greenhouse gas emissions. 2. Can help restore degraded lands. 3. Provides economic opportunities in carbon offset markets.', cons='1. Requires significant investment and technology. 2. Long-term storage risks and uncertainties. 3. Potential for negative environmental impacts if not managed properly.', conclusion='Overall, carbon sequestration has the potential to play a significant role in addressing climate change, but careful planning and monitoring are essential to ensure its effectiveness and sustainability.')

In [12]:
response.description

'Carbon sequestration is the process of capturing and storing carbon dioxide to mitigate its presence in the atmosphere and reduce the impact of climate change.'

In [13]:
# printing as dictionary
response.dict()

{'description': 'Carbon sequestration is the process of capturing and storing carbon dioxide to mitigate its presence in the atmosphere and reduce the impact of climate change.',
 'pros': '1. Helps reduce greenhouse gas emissions. 2. Can help restore degraded lands. 3. Provides economic opportunities in carbon offset markets.',
 'cons': '1. Requires significant investment and technology. 2. Long-term storage risks and uncertainties. 3. Potential for negative environmental impacts if not managed properly.',
 'conclusion': 'Overall, carbon sequestration has the potential to play a significant role in addressing climate change, but careful planning and monitoring are essential to ensure its effectiveness and sustainability.'}

In [14]:
for key, value in response.dict().items():
  print(f"{key}:\n{value}\n")

description:
Carbon sequestration is the process of capturing and storing carbon dioxide to mitigate its presence in the atmosphere and reduce the impact of climate change.

pros:
1. Helps reduce greenhouse gas emissions. 2. Can help restore degraded lands. 3. Provides economic opportunities in carbon offset markets.

cons:
1. Requires significant investment and technology. 2. Long-term storage risks and uncertainties. 3. Potential for negative environmental impacts if not managed properly.

conclusion:
Overall, carbon sequestration has the potential to play a significant role in addressing climate change, but careful planning and monitoring are essential to ensure its effectiveness and sustainability.



### JsonOutputParser

In [15]:
from typing import List

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field

In [16]:
# define the data structure
class QueryResponse(BaseModel):
  description: str = Field(description= "A brief description of the topic asked by the user")
  pros: str = Field(description='three points showing the pros of the topic asked by the user')
  cons: str = Field(description='three points showing the cons of the topic asked by the user')
  conclusion: str = Field(description="summary of topic asked by the user")

# set up parser
parser = JsonOutputParser(pydantic_object=QueryResponse)
parser

JsonOutputParser(pydantic_object=<class '__main__.QueryResponse'>)

In [17]:
# create final prompt with formatting instructions from the parser

prompt_txt = """
              Answer the user query and generate the response based on the following formmatted instructions:

              formatted instructions:
              {format_instructions}

              Query:
              {query}
            """

In [18]:
# create a template for a string prompt
prompt = PromptTemplate(
    template=prompt_txt,
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

prompt

PromptTemplate(input_variables=['query'], partial_variables={'format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"description": {"title": "Description", "description": "A brief description of the topic asked by the user", "type": "string"}, "pros": {"title": "Pros", "description": "three points showing the pros of the topic asked by the user", "type": "string"}, "cons": {"title": "Cons", "description": "three points showing the cons of the topic asked by the user", "type": "string"}, "conclusion": {"title": "Conclusion", "description": "summary of 

In [19]:
# create a chain
chain = (prompt | llm | parser)

In [20]:
chain

PromptTemplate(input_variables=['query'], partial_variables={'format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"description": {"title": "Description", "description": "A brief description of the topic asked by the user", "type": "string"}, "pros": {"title": "Pros", "description": "three points showing the pros of the topic asked by the user", "type": "string"}, "cons": {"title": "Cons", "description": "three points showing the cons of the topic asked by the user", "type": "string"}, "conclusion": {"title": "Conclusion", "description": "summary of 

In [21]:
queries = [
            "Tell me about the carbon sequestration",
            "Tell me about backpropagation algorithm in machine learning"
        ]

In [22]:
queries_formatted = [{
                        "query": subject
                    }
                    for subject in queries]

In [23]:
queries_formatted

[{'query': 'Tell me about the carbon sequestration'},
 {'query': 'Tell me about backpropagation algorithm in machine learning'}]

In [24]:
# get the response
responses = chain.map().invoke(queries_formatted)

In [25]:
import pandas as pd

# convert response to DataFrame
data = pd.DataFrame(responses)
data

Unnamed: 0,description,pros,cons,conclusion
0,Carbon sequestration is the process of capturi...,1. Helps reduce greenhouse gas emissions. 2. C...,1. Requires significant investment and technol...,Carbon sequestration has the potential to play...
1,Backpropagation is a key algorithm used in tra...,1. Backpropagation allows neural networks to l...,1. Backpropagation can suffer from the vanishi...,"In conclusion, backpropagation is a powerful a..."


In [26]:
for response in responses:
  for key, val in response.items():
    print(f"{key}:\n{val}\n")
  print('-----------------------------------------------------------')

description:
Carbon sequestration is the process of capturing and storing carbon dioxide to mitigate its presence in the atmosphere and combat climate change.

pros:
1. Helps reduce greenhouse gas emissions. 2. Can help improve soil quality. 3. Provides potential economic opportunities in carbon trading.

cons:
1. Requires significant investment and technology. 2. Some methods may have limited effectiveness. 3. Long-term storage risks and uncertainties.

conclusion:
Carbon sequestration has the potential to play a significant role in addressing climate change, but it also comes with challenges and uncertainties that need to be carefully considered.

-----------------------------------------------------------
description:
Backpropagation is a key algorithm used in training artificial neural networks in machine learning. It involves calculating the gradient of a loss function with respect to the weights of the network, and then using this gradient to update the weights in order to minimi

### CommaSeparatedListOutputParser

In [27]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

In [28]:
# output parser
output_parser = CommaSeparatedListOutputParser()

# get formatted instructions
format_instructions = output_parser.get_format_instructions()
format_instructions

'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'

In [29]:
# create final prompt with formatting instructions from the parser

prompt_txt = """
              List 5 real-world use cases where object detection can be used:

              output format instructions:
              {format_instructions}

            """

In [30]:
prompt = PromptTemplate.from_template(template=prompt_txt)
prompt

PromptTemplate(input_variables=['format_instructions'], template='\n              List 5 real-world use cases where object detection can be used:\n\n              output format instructions:\n              {format_instructions}\n\n            ')

In [31]:
chain = (prompt | llm | output_parser)

In [32]:
response = chain.invoke({'format_instructions': format_instructions})

In [33]:
# loop through response as it is list
for r in response:
  print(r)

1. Autonomous vehicles for detecting pedestrians
cyclists
and other vehicles on the road
2. Retail stores for tracking inventory levels and monitoring product placement
3. Security systems for identifying unauthorized individuals entering restricted areas
4. Healthcare for analyzing medical images and detecting abnormalities or diseases
5. Agriculture for monitoring crop health and identifying pests or diseases in plants
