# OutputParsers

#### Output parsers are a way of representing the output that the LLM returns to the user. <br />
#### It is typically provided to the prompt as follows:
1. Create an object of a class (user-defined or library)
2. Pass the class as an argument to a parser object. Alternately, Langchain has also created parsers that don't need any arguments in case the parsing does not require too much complexity / flexibility
3. Pass the parser object as an argument to the PromptTemplate as a partial_variable

This lesson takes on two major types of parsers:
    <ol>
    <li><b>Output Parsers</b></li>
    <ul>
        <li>Pydantic Parser</li>
        <li>Comma Seperated List Output Parser </li>
        <li>Stuctured Output Parser </li><br>
    </ul>
    <li><b>Error fixing parsers</b></li>
    <ul>
        <li>Output Fixing Parsers</li>
        <li>Retry Output Parser </li>
    </ul>
    </ol>

# A. Output Parsers

## 1. Pydantic parser

Pydantic is the most widely used data validation library for Python. Data validation is a process that makes your data compliant with a set of rules, schemas or constraints that you defined for each attribute. This makes your code ingest and return data in the exact way it was expected to.

The PydanticOutputParser class instructs the model to generate its output in a JSON format and then extract the information from the response. You will be able to treat the parser’s output as a list, meaning it will be possible to index through the results without worrying about formatting.

In [1]:
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI



### a. Creating the parser

In [2]:
from pydantic import BaseModel, Field, validator
from typing import List

In [3]:
class Validator(BaseModel):
    #words is a list of strings with a (metadata) description
    words: List[str] = Field(description="list of substitute words based on context")
        
    @validator('words') #Validate the 'words' list as follows
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric(): #Check if the first letter in a string is a number
                raise ValueError('The word cannot start with numbers')
        return field      

In [4]:
pyparser=PydanticOutputParser(pydantic_object=Validator) #Create a PydanticOutputParser with the additional checker as defined by the Validator class

In [5]:
print(pyparser.get_format_instructions())
print('Object schema')
print(pyparser.pydantic_object.not_start_with_number('This is not a number'))

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"words": {"title": "Words", "description": "list of substitute words based on context", "type": "array", "items": {"type": "string"}}}, "required": ["words"]}
```
Object schema
This is not a number


In [6]:
#Checking if the validator works. Expecting a ValueError
print(pyparser.pydantic_object.not_start_with_number('123Hello'))

ValueError: The word cannot start with numbers

### b. Create the prompt

In [7]:
template='Output synonyms for the word {input_word} using the instructions {format_instructions} as per the {context}'
syn_template=PromptTemplate(
            template=template,
            input_variables=['input_word','context'],
            partial_variables={'format_instructions':pyparser.get_format_instructions()} #Basically get_format_instructions gets the instructions on how to format the LLM. Since the not_start_with_number is decorated with the @validator decorator, it will be incorporated 
)

In [8]:
model_input = syn_template.format_prompt(
    input_word="behaviour",
    context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

In [9]:
llm=OpenAI(model='text-davinci-003',temperature=0)

In [10]:
response=llm(prompt=model_input.to_string())

In [11]:
pyparser.parse(response)

Validator(words=['conduct', 'manner', 'action', 'demeanor', 'response', 'reaction', 'attitude'])

In [12]:
model_input2 = syn_template.format_prompt(
    input_word="behaviour",
    context="The behaviour of the the planets with respect to the sun has been well studied and tracked and has been shown to traverse an elliptical orbit"
)

In [13]:
response2=llm(prompt=model_input2.to_string())

In [14]:
pyparser.parse(response2)

Validator(words=['conduct', 'action', 'demeanor', 'manner', 'attitude', 'response', 'reaction', 'activity', 'habit', 'practice'])

'response', 'reaction', 'activity', 'habit' and 'practice' have all been added but what we were looking for was mostly around characteristics or laws or something similar. 'manner' seems to be appropriate though

### c. Requesting for and parsing multiple outputs

In this exercise, we will ask the LLM for two outputs: the synonym AND the reasoning on why the synonym is appropriate
In this case, we will run two checks on the output:
<ol>
    <li> Check if all synonyms have the appropriate structure i.e. not starting with a number (As before) </li>
    <li> Checking if there is '.' after the reasoning indicating a complete sentence

In [15]:
class Validation2(BaseModel):
    word:List[str]=Field(description="list of substitute words based on context")
    response:List[str]=Field(description='reasoning why the substitute word is appropriate in the context')
    
    @validator('word')
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric(): #Check if the first letter in a string is a number
                raise ValueError('The word cannot start with numbers')
        return field
        
    @validator('response')
    def check_period(cls,field):
        for item in field:
            if item[-1] != '.':
                item=item+'.'
        return field
        

In [16]:
pyparser2=PydanticOutputParser(pydantic_object=Validation2)

In [17]:
pyparser2.get_format_instructions()

'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"word": {"title": "Word", "description": "list of substitute words based on context", "type": "array", "items": {"type": "string"}}, "response": {"title": "Response", "description": "reasoning why the substitute word is appropriate in the context", "type": "array", "items": {"type": "string"}}}, "required": ["word", "response"]}\n```'

In [18]:
template2='Output synonyms for the word {input_word} using the instructions {format_instructions} as per the {context} and provide appropriate explanation for why the synonym is appropriate'
syn_template2=PromptTemplate(template=template2,
                    input_variables=['input_word','context'],
                    partial_variables={'format_instructions':pyparser2.get_format_instructions()})

In [19]:
model_input3 = syn_template2.format_prompt(
    input_word="behaviour",
    context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

In [20]:
response3=llm(prompt=model_input3.to_string())

In [21]:
print(response3)



Here is the output instance:
{"word": ["conduct", "action", "demeanor", "manner"], "response": ["The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'conduct' is an appropriate synonym.", "The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'action' is an appropriate synonym.", "The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'demeanor' is an appropriate synonym.", "The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'manner' is an appropriate synonym."]}


In [22]:
parsed2=pyparser2.parse(response3)

In [23]:
print(parsed2.dict()['word'])
print(parsed2.dict()['response'])

['conduct', 'action', 'demeanor', 'manner']
["The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'conduct' is an appropriate synonym.", "The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'action' is an appropriate synonym.", "The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'demeanor' is an appropriate synonym.", "The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'manner' is an appropriate synonym."]


In [24]:
for i,w in enumerate(zip(parsed2.dict()['word'],parsed2.dict()['response'])):
    print(f'{i}.Word: {w[0]}\tExplanation:{w[1]}\n')

0.Word: conduct	Explanation:The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'conduct' is an appropriate synonym.

1.Word: action	Explanation:The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'action' is an appropriate synonym.

2.Word: demeanor	Explanation:The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'demeanor' is an appropriate synonym.

3.Word: manner	Explanation:The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'manner' is an appropriate synonym.



In [25]:
model_input4 = syn_template2.format_prompt(
    input_word="behaviour",
    context="The behaviour of the the planets with respect to the sun has been well studied and tracked and has been shown to traverse an elliptical orbit"
)

In [26]:
response4=llm(prompt=model_input4.to_string())

In [27]:
print(response4)



Here is the output instance:

{"word": ["action", "conduct", "demeanor", "manner"], "response": ["Action, conduct, demeanor, and manner are all synonyms for behaviour that can be used in the context of the planets' orbits around the sun, as they all refer to the way in which something is done or carried out."]}


In [28]:
parsed4=pyparser2.parse(response4)
print(parsed4)

word=['action', 'conduct', 'demeanor', 'manner'] response=["Action, conduct, demeanor, and manner are all synonyms for behaviour that can be used in the context of the planets' orbits around the sun, as they all refer to the way in which something is done or carried out."]


In [29]:
for i,w in enumerate(zip(parsed4.dict()['word'],parsed4.dict()['response'])):
    print(f'{i}.Word: {w[0]}\tExplanation:{w[1]}\n')

0.Word: action	Explanation:Action, conduct, demeanor, and manner are all synonyms for behaviour that can be used in the context of the planets' orbits around the sun, as they all refer to the way in which something is done or carried out.



#### Interesting shortcut! :)

## 2. Comma Separated List Output Parser

The CommaSeparatedListOutputParser parses the output of an LLM call to a comma-separated list. It does not need to be passed a class like a pydantic object due to its limited objective

In [30]:
from langchain.output_parsers import CommaSeparatedListOutputParser

### a. Setting up the parser

In [31]:
cslop=CommaSeparatedListOutputParser()

In [32]:
template1='Output synonyms for the word {input_word} using the instructions {format_instructions} as per the {context}'

In [33]:
prompt_syn_comma=PromptTemplate(template=template1,
                               input_variables=['input_word', 'context'],
                               partial_variables={'format_instructions':cslop.get_format_instructions()})

In [34]:
model_syn_comma=prompt_syn_comma.format_prompt(input_word='behaviour',
                                              context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

In [35]:
model_syn_comma2 = prompt_syn_comma.format_prompt(
    input_word="behaviour",
    context="The behaviour of the the planets with respect to the sun has been well studied and tracked and has been shown to traverse an elliptical orbit"
)

### b. Set up LLM

In [36]:
llm=OpenAI(model='text-davinci-003',temperature=0)

### c. Run the model

In [37]:
response_syn_comma=llm(model_syn_comma.to_string())

In [38]:
print(response_syn_comma)



Unruly, rowdy, boisterous, ungovernable, unmanageable, wild, uncontrolled, rambunctious, turbulent, obstreperous.


In [39]:
cslop.parse(response_syn_comma)

['Unruly',
 'rowdy',
 'boisterous',
 'ungovernable',
 'unmanageable',
 'wild',
 'uncontrolled',
 'rambunctious',
 'turbulent',
 'obstreperous.']

In [40]:
response_syn_comma2=llm(model_syn_comma2.to_string())

In [41]:
print(response_syn_comma2)



Conduct, deportment, demeanor, comportment, bearing, mien, attitude, posture, carriage, air, demeanor, demeanour


In [42]:
cslop.parse(response_syn_comma2)

['Conduct',
 'deportment',
 'demeanor',
 'comportment',
 'bearing',
 'mien',
 'attitude',
 'posture',
 'carriage',
 'air',
 'demeanor',
 'demeanour']

## 3.Structured Output Parser

This output parser can be used when you want to return <b>multiple fields</b>. While the Pydantic/JSON parser is more powerful, we initially experimented with data structures having text fields only.

In [43]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

### a. Setup parser

In [44]:
schemas=[
    ResponseSchema(name='synonym',description='Output a synonym for the word provided'),
    ResponseSchema(name='reason',description="share the reason why the model arrived at this synonym as an appropriate answer")
]

In [45]:
SOP_syn=StructuredOutputParser.from_response_schemas(schemas)

### b. Set up prompt

In [46]:
template='Output synonyms for the word {input_word} using the instructions {format_instructions} as per the {context} and provide appropriate explanation for why the synonym is appropriate'

In [47]:
prompt_SOP1=PromptTemplate(template=template,
                              input_variables=['input_word','context'],
                              partial_variables={'format_instructions':SOP_syn.get_format_instructions()})

In [48]:
model_SOP1=prompt_SOP1.format_prompt(input_word="behaviour",
    context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.")

### c. Setup LLM

In [49]:
llm=OpenAI(model='text-davinci-003',temperature=0)

### d. Run the model

In [50]:
response_SOP1=llm(model_SOP1.to_string())

In [51]:
print(response_SOP1)



```json
{
	"synonym": "conduct",
	"reason": "The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson. This synonym is appropriate as it implies the same meaning as behaviour, which is the action or manner of conducting oneself."
}
```


In [52]:
SOP_syn.parse(response_SOP1)

{'synonym': 'conduct',
 'reason': 'The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson. This synonym is appropriate as it implies the same meaning as behaviour, which is the action or manner of conducting oneself.'}

In [53]:
model_SOP2=prompt_SOP1.format_prompt(input_word="behaviour",
    context="The behaviour of the the planets with respect to the sun has been well studied and tracked and has been shown to traverse an elliptical orbit")

In [54]:
model_SOP2

StringPromptValue(text='Output synonyms for the word behaviour using the instructions The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"synonym": string  // Output a synonym for the word provided\n\t"reason": string  // share the reason why the model arrived at this synonym as an appropriate answer\n}\n``` as per the The behaviour of the the planets with respect to the sun has been well studied and tracked and has been shown to traverse an elliptical orbit and provide appropriate explanation for why the synonym is appropriate')

In [55]:
response_SOP2=llm(model_SOP2.to_string())

In [56]:
print(response_SOP2)



```json
{
	"synonym": "conduct",
	"reason": "The behaviour of the planets with respect to the sun has been well studied and tracked and has been shown to traverse an elliptical orbit, providing an appropriate explanation for why the synonym 'conduct' is appropriate."
}
```


In [57]:
SOP_syn.parse(response_SOP2)

{'synonym': 'conduct',
 'reason': "The behaviour of the planets with respect to the sun has been well studied and tracked and has been shown to traverse an elliptical orbit, providing an appropriate explanation for why the synonym 'conduct' is appropriate."}

# Error Fixing Parsers

I got the examples in this section from Langchain's documentation since the examples were so much more illustrative:<br />
<ul>
    <li>Output fixing parser: https://python.langchain.com/docs/modules/model_io/output_parsers/output_fixing_parser</li>
    <li>Retry parser: https://python.langchain.com/docs/modules/model_io/output_parsers/retry</li>
    </ul>

## OutputFixingParser

We will create another Pydantic parser to attempt to parse an incorrectly formatted output from the LLM

In [58]:
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser
from pydantic import typing, BaseModel, Field
from typing import List

In [59]:
class Revalid(BaseModel):
    name:List[str]= Field(description="name of an actor")
    film_names:List[str]=Field(description="list of names of films they starred in")
        
pyrevalid=PydanticOutputParser(pydantic_object=Revalid)

In [60]:
actor_query = "Generate the filmography for a random actor."

Let us now deliberately and manually create an incorrectly formatted LLM output (simulated)

In [61]:
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"

In [62]:
pyrevalid.parse(misformatted)

OutputParserException: Failed to parse Revalid from completion {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}. Got: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

<b><font color="red">The reason that this parsing throws an error is that JSON requires a double quote around strings.</font><br /></b>
So instead of <b>'Tom Hanks'</b>, we need to write it as <b>"Tom Hanks"</b>

In [63]:
misformatted_correct = '{"name": ["Tom Hanks"], "film_names": ["Forrest Gump"]}'

In [64]:
pyrevalid.parse(misformatted_correct)

Revalid(name=['Tom Hanks'], film_names=['Forrest Gump'])

#### In order to automate similar formatting errors from the LLM output, we use a OutputFixingParser

In [65]:
from langchain.output_parsers import OutputFixingParser

In [66]:
ofp=OutputFixingParser.from_llm(parser=pyrevalid,llm=llm) #Take the retry_chain etc values from the LLM  object

In [67]:
ofp.parse(misformatted)

Revalid(name=['Tom Hanks'], film_names=['Forrest Gump'])

TADA ! The OutputFixingParser automatically corrects the formatting of the output to ensure compatibility to the JSON format before passing it to the PydanticOutputParser for the final parsing stage

### RetryWithErrorOutputParser

In some cases, the parser needs access to both the output and the prompt to process the full context.An example of this is when __the output is not just in the incorrect format, but is partially complete.__ Taking one example:

### Scenario 1: Parsing with simple PydanticOutputParser

In [68]:
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from pydantic import BaseModel, validator, typing, Field
from typing import List

In [69]:
class Action(BaseModel):
    action:str=Field(description="action to take")
    trigger:str=Field(description="input for action")

pyaction=PydanticOutputParser(pydantic_object=Action)

In [70]:
prompt_action=PromptTemplate(template='Answer the user query.\n{format_instructions}\n{query}\n',
                     input_variables=['query'],
                     partial_variables={'format_instructions':pyaction.get_format_instructions()})

In [71]:
llm=OpenAI(model='text-davinci-003', temperature=0)

In [72]:
prompt_action.format_prompt(query="who is leo di caprios gf?").to_string()

'Answer the user query.\nThe output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"action": {"title": "Action", "description": "action to take", "type": "string"}, "trigger": {"title": "Trigger", "description": "input for action", "type": "string"}}, "required": ["action", "trigger"]}\n```\nwho is leo di caprios gf?\n'

In [73]:
response_action=llm(prompt_action.format_prompt(query="who is leo di caprios gf?").to_string())

In [74]:
response_action

'\n{"action": "find", "trigger": "Leo DiCaprio\'s girlfriend"}'

In [75]:
pyaction.parse(response_action)

Action(action='find', trigger="Leo DiCaprio's girlfriend")

__This is the default answer__. <br />
***
Now let us take an example of an incomplete response from the LLM

In [76]:
bad_answer='\n{"action": "find"}' #Only the action response is received

If we try to parse this, we should get a validation error stating that the required field 'trigger' is missing

In [77]:
pyaction.parse(bad_answer)

OutputParserException: Failed to parse Action from completion 
{"action": "find"}. Got: 1 validation error for Action
trigger
  field required (type=value_error.missing)

Hence, we know that PydanticOutputParser cannot solve this issue. Now what if we try a simple __OutputFixingParser__ like before

### Scenario 2: Parsing the output using OutputFixingParser to help the PydanticOutputParser

In [78]:
from langchain.output_parsers import OutputFixingParser

In [79]:
ofp_action=OutputFixingParser.from_llm(parser=pyaction,llm=llm)

In [80]:
ofp_action.parse(bad_answer)

Action(action='find', trigger='file')

OutputFixingParser has simply filled a default value of 'file' as it is unable to identify what the trigger should be

Now, let us try RetryWithOutputParser

### Scenario 3: Using the RetryWithErrorOutputParser

__When we create the RetryWithErrorOutputParser, we also pass the string value of the prompt to the parser so that it can reference the original prompt to assess what is missing and fill it appropriately.__

In [81]:
from langchain.output_parsers import RetryWithErrorOutputParser

In [82]:
rweop_action=RetryWithErrorOutputParser.from_llm(parser=pyaction,llm=llm)

In [83]:
prompt_value=prompt_action.format_prompt(query="who is leo di caprios gf?")

In [84]:
response_rweop=rweop_action.parse_with_prompt(bad_answer, prompt_value)

In [85]:
response_rweop

Action(action='find', trigger="Leo DiCaprio's girlfriend")

__This is the same as the original answer we obtained by passing in the complete data to the LLM__