# LangChain Output Parser

* Feel free to adjust the prompts to get the desired results
* The prompts may NOT work for certain models
* Although code was tested succesfully with OpenAI GPT3.5, it may break with future releases
* In case of failure please fix the prompt and share with others on Q&A forum

https://python.langchain.com/docs/modules/model_io/output_parsers/

https://python.langchain.com/docs/modules/model_io/output_parsers/quick_start

https://python.langchain.com/docs/modules/model_io/output_parsers/types/json

https://api.python.langchain.com/en/stable/langchain_api_reference.html#module-langchain.output_parsers




#### Google Colab
If you are running the code in Google colab, install the packages by uncommenting/running the cell below

* The API key file file will not be available
* You will be prompted to provide the API Token

Uncomment & run the code in the cell below:

In [None]:
## The script is downloaded and run to setup the utils folder

# !curl -H "Accept: application/vnd.github.VERSION.raw" https://raw.githubusercontent.com/acloudfan/gen-ai-app-dev/main/Setup/gcsetup.sh  > gcsetup.sh
# !chmod u+x gcsetup.sh
# !./gcsetup.sh -l



## Setup the environment

In [1]:
from IPython.display import JSON
from dotenv import load_dotenv
import os

import warnings

warnings.filterwarnings("ignore")

# Load the file that contains the API keys
load_dotenv('C:\\Users\\raj\\.jupyter\\.env')

True

## Create the LLM

In this setup we are using the GPT3.5

**Note**

The utility code provides methods for creating different LLMs. Checkout the code available under **utils/create_llm.py**

In [7]:
import sys
 
# setting path
sys.path.append('../')

from utils.create_llm import create_gpt_llm, create_cohere_llm, create_ollama_llm

# OpenAI GPT args
# openai_args = {"max_tokens": -1, "temperature": 0.5}
# llm = create_gpt_llm(openai_args)

llm = create_cohere_llm()

# Ensure Ollama is running on your machine 
# On Collab you need to install it - it will take time to download the model
# llm = create_ollama_llm()

## 1. Default output parser 

Result is a string.

In [8]:
from langchain.prompts import PromptTemplate
from langchain.chains  import LLMChain
from IPython.display   import JSON

In [9]:
template_1 = """
generate {number} random valid word and number pairs. 
The odd number should be an odd number between between 5 & 50 
""" 

# Create the prompt template
prompt_template_1 = PromptTemplate(
    template = template_1,
    input_variables = ["number",]
)

print(prompt_template_1.format(number=5))

# Create the LLM Chain
llm_chain_word_number_pair = LLMChain(
    prompt = prompt_template_1,
    llm = llm
)

response = llm_chain_word_number_pair.invoke(5)


generate 5 random valid word and number pairs. 
The odd number should be an odd number between between 5 & 50 



In [10]:
print("response :", response)
print("response type :", type(response['text']))

response : {'number': 5, 'text': ' Sure, here are 5 random valid word and odd number pairs:\n\n1. "vizir" (11)\n2. "zodiac" (41)\n3. "tachy" (29)\n4. "kwela" (17)\n5. "mesa" (41) \n\nAll of these words are valid English words, and the numbers are odd and between 5 and 50. '}
response type : <class 'str'>


## 2. Use CSV Output Parser
https://python.langchain.com/docs/modules/model_io/output_parsers/types/csv

In [11]:
from langchain.output_parsers import CommaSeparatedListOutputParser

format_instructions = CommaSeparatedListOutputParser().get_format_instructions()

format_instructions

'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'

In [12]:
template_1 = """
Generate 3 random odd number.
The numbers should be between 5 & 50.

Format instructions:{format_instructions}
""" 

prompt_template_1 = PromptTemplate(
    template = template_1,
    input_variables = [],
    partial_variables = {"format_instructions": format_instructions}
)



In [13]:
# Create the LLM Chain
# Output Parser specified = Result is parsed and ready to be consumed
# Output Parser not specified = Result is string type
llm_chain_three_odd_numbers = LLMChain(
    prompt = prompt_template_1,
    llm = llm,
    output_key = "result",
    output_parser = CommaSeparatedListOutputParser(),
)

response = llm_chain_three_odd_numbers.invoke({})

In [14]:
print('result :', response['result'])
print('result type:', type(response['result']))

result : ['35', '27', '21']
result type: <class 'list'>


## 3. EnumOutputParser

* Return type is < enum >

https://python.langchain.com/docs/modules/model_io/output_parsers/types/enum

Python enum package

https://docs.python.org/3/library/enum.html


In [15]:
from langchain.output_parsers import EnumOutputParser
from enum import Enum

# Create a class with a set of options
class Colors(Enum):
    RED = "red"
    GREEN = "green"
    BLUE = "blue"
    WHITE = "white"
    UNKNOWN = "others"

# Create the output parser
output_parser_enum = EnumOutputParser(enum=Colors)

format_instructions = output_parser_enum.get_format_instructions()

format_instructions

'Select one of the following options: red, green, blue, white, others'

In [18]:

# Works for Open AI GPT but not for Cohere
template_1 = template_1 = """
Answer the question using the following format instructions.

Format instructions:
Select one of the following options: red, green, blue, white, others

If the color is not there in the options just say "others"

Question:blood
Answer: red

Question:Cucumbers
Answer:green

Question:lavender
Answer:others

Question:orange
Answer:others

Question:car
Answer:others

Question:shoes
Answer:others


Question:{object}
Answer:
"""  

prompt_template_1 = PromptTemplate(
    template = template_1,
    input_variables = ["object",],
    partial_variables = {"format_instructions": output_parser_enum.get_format_instructions()}
)

# Legacy : Create the LLM Chain
# llm_chain_get_color = LLMChain(
#     prompt = prompt_template_1,
#     llm = llm,
#     output_key = "result",
#     output_parser = output_parser_enum,
# )

# Use LCEL to setup the chain
llm_chain_get_color = prompt_template_1 | llm | output_parser_enum

print(prompt_template_1.format(object="onion"))


Answer the question using the following format instructions.

Format instructions:
Select one of the following options: red, green, blue, white, others

If the color is not there in the options just say "others"

Question:blood
Answer: red

Question:Cucumbers
Answer:green

Question:lavender
Answer:others

Question:orange
Answer:others

Question:car
Answer:others

Question:shoes
Answer:others


Question:onion
Answer:



In [19]:
response = llm_chain_get_color.invoke(input={"object": "blueberries"})

In [23]:
print('response :', response)
print('response type :', type(response))

response : Colors.BLUE
response type : <enum 'Colors'>


## 4. JSON Output Parser

* Return type is < dict >

https://python.langchain.com/docs/modules/model_io/output_parsers/types/json


In [24]:
from langchain_core.output_parsers import JsonOutputParser
from IPython.display import Markdown, JSON

from langchain_core.pydantic_v1 import BaseModel, Field

template_2 = """
generate {number} random valid word and odd numbers.
The numbers should be between 5 & 50 pairs

Format instructions:
{format_instructions}
""" 

class WordNumberCombo(BaseModel):
    word: str = Field(description="this is a random word")
    odd_number: int = Field(description="this is an odd number between 5 and 50")

class ArrayWordNumberCombo(BaseModel):
    result: list[WordNumberCombo]

# Setup the parser
output_parser_json = JsonOutputParser(pydantic_object=ArrayWordNumberCombo)

In [25]:
prompt_template_2 = PromptTemplate(
    template = template_2,
    input_variables = ["number",],
    partial_variables = {"format_instructions": output_parser_json.get_format_instructions()}
)

print(prompt_template_2.format(number=5))


generate 5 random valid word and odd numbers.
The numbers should be between 5 & 50 pairs

Format instructions:
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"result": {"title": "Result", "type": "array", "items": {"$ref": "#/definitions/WordNumberCombo"}}}, "required": ["result"], "definitions": {"WordNumberCombo": {"title": "WordNumberCombo", "type": "object", "properties": {"word": {"title": "Word", "description": "this is a random word", "type": "string"}, "odd_number": {"title": "Odd Number", "description": "this is an odd number between 5 and 50", "type": "integer"}}

In [26]:
# Create the LLM Chain
# With 'output_parser', set the type of result = <class 'dict'>
llm_chain_json_five_numbers = LLMChain(
    prompt = prompt_template_2,
    llm = llm,
    output_key = "result",
    output_parser = output_parser_json,
)

response = llm_chain_json_five_numbers.invoke(5)

In [27]:
print('result: ',response)
print('type: ',type(response['result']))

result:  {'number': 5, 'result': {'result': [{'word': 'Ephemeral', 'odd_number': 31}, {'word': 'Luncheon', 'odd_number': 39}, {'word': 'Treacle', 'odd_number': 17}, {'word': ' Whirlpool', 'odd_number': 41}, {'word': 'Lumberjack', 'odd_number': 29}]}}
type:  <class 'dict'>


## 5. Using PydanticOutputParser

* Return type is object of specified class

In [28]:
from pydantic import BaseModel, Field
from langchain.output_parsers import PydanticOutputParser

template_3 = """
generate {number} pairs of random word and an odd numbers.
the numbers should be between 5 and 50.

Format instructions:
{format_instructions}
""" 

# Create the class to represent the word-number pair
class WordNumberCombo(BaseModel):
    word: str = Field(description="this is a random word")
    odd_number: int = Field(description="this is an odd number between 5 and 50")

# Create the actual class for output. Its a list of word-number pairs
class ArrayWordNumberCombo(BaseModel):
    result: list[WordNumberCombo]
    
output_parser_pydantic = PydanticOutputParser(pydantic_object=ArrayWordNumberCombo)

# Create the prompt template
# Partial variables can be passed at the time of template creation
prompt_template_pydantic = PromptTemplate(
    template = template_3,
    input_variables = ["number",],
    partial_variables = {"format_instructions": output_parser_pydantic.get_format_instructions()}
)

print(prompt_template_pydantic.format(number=5))


generate 5 pairs of random word and an odd numbers.
the numbers should be between 5 and 50.

Format instructions:
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"WordNumberCombo": {"properties": {"word": {"description": "this is a random word", "title": "Word", "type": "string"}, "odd_number": {"description": "this is an odd number between 5 and 50", "title": "Odd Number", "type": "integer"}}, "required": ["word", "odd_number"], "title": "WordNumberCombo", "type": "object"}}, "properties": {"result": {"items": {"$ref": "#/$defs/WordNumberCombo"}, "title": "Result", "type": "arr

In [29]:
# Create the LLM Chain
llm_chain_pydantic_five_numbers = LLMChain(
    prompt = prompt_template_pydantic,
    llm = llm,
    output_key = 'result',
    output_parser = output_parser_pydantic
)

response = llm_chain_pydantic_five_numbers.invoke(5)

In [30]:
print(response['result'])
print('type: ',type(response['result']))

result=[WordNumberCombo(word='intent', odd_number=47), WordNumberCombo(word='velocity', odd_number=39), WordNumberCombo(word='articulate', odd_number=17), WordNumberCombo(word='fetish', odd_number=41), WordNumberCombo(word='ecliptic', odd_number=29)]
type:  <class '__main__.ArrayWordNumberCombo'>


## 6. StructuredOutputParser

In [31]:
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

In [32]:
response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(
        name="source",
        description="source used to answer the user's question, should be a website.",
    ),
]
output_parser_structured = StructuredOutputParser.from_response_schemas(response_schemas)

In [33]:
format_instructions = output_parser_structured.get_format_instructions()
prompt_template_structured = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions},
)

In [34]:
print(prompt_template_structured.format(question="where is paris?"))

answer the users question as best as possible.
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"answer": string  // answer to the user's question
	"source": string  // source used to answer the user's question, should be a website.
}
```
where is paris?


In [37]:
# Create the LLM Chain
# llm_chain_structured = LLMChain(
#     prompt = prompt_template_4,
#     llm = llm,
#     output_key = 'result',
#     output_parser = output_parser_structured
# )

# Use LCEL notation to create the chain
chain = prompt_template_structured | llm | output_parser_structured

In [38]:
response = chain.invoke({"question":"where is paris?"})

In [39]:
response

{'answer': '48.85341', 'source': 'https://www.openstreetmap.org/place/Paris'}

In [40]:
print('result: ',response)
print('type: ',type(response))

result:  {'answer': '48.85341', 'source': 'https://www.openstreetmap.org/place/Paris'}
type:  <class 'dict'>
