<a href = "https://www.pieriantraining.com"><img src="../PT Centered Purple.png"> </a>

<em style="text-align:center">Copyrighted by Pierian Training</em>

# Parsing Output

Let's set up a Chat Model:

In [59]:
from langchain.prompts import PromptTemplate, SystemMessagePromptTemplate,ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
import os
api_key = os.getenv('OPENAI_API_KEY')
model = ChatOpenAI(openai_api_key=api_key)
llm = OpenAI(openai_api_key=api_key)

## List Parsing 

In [9]:
from langchain.output_parsers import CommaSeparatedListOutputParser

In [10]:
output_parser = CommaSeparatedListOutputParser()

In [11]:
format_instructions = output_parser.get_format_instructions()

In [12]:
print(format_instructions)

Your response should be a list of comma separated values, eg: `foo, bar, baz`


In [13]:
reply = "one, two, three"
output_parser.parse("one, two, three")

['one', 'two', 'three']

In [14]:
human_template = '{request} {format_instructions}'
human_prompt = HumanMessagePromptTemplate.from_template(human_template)

In [15]:
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

chat_prompt.format_prompt(request="give me 5 characteristics of dogs",
                   format_instructions = output_parser.get_format_instructions())

ChatPromptValue(messages=[HumanMessage(content='give me 5 characteristics of dogs Your response should be a list of comma separated values, eg: `foo, bar, baz`')])

In [16]:
request = chat_prompt.format_prompt(request="give me 5 characteristics of dogs",
                   format_instructions = output_parser.get_format_instructions()).to_messages()

In [17]:
result = model(request)

In [18]:
result.content

'Loyal, playful, protective, social, trainable'

In [19]:
# Convert to desired output:
output_parser.parse(result.content)

['Loyal', 'playful', 'protective', 'social', 'trainable']

## Datetime Parser 

In [20]:
from langchain.output_parsers import DatetimeOutputParser

In [21]:
output_parser = DatetimeOutputParser()

In [22]:
print(output_parser.get_format_instructions())

Write a datetime string that matches the 
            following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". Examples: 1547-12-27T04:18:38.683708Z, 463-11-11T10:00:37.423515Z, 830-11-28T21:56:13.695923Z


In [23]:
template_text = "{request}\n{format_instructions}"
human_prompt=HumanMessagePromptTemplate.from_template(template_text)

In [24]:
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

In [25]:
print(chat_prompt.format(request="When was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ))

Human: When was the 13th Amendment ratified in the US?
Write a datetime string that matches the 
            following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". Examples: 200-02-05T11:59:25.621079Z, 73-08-27T07:39:39.318277Z, 934-03-26T17:05:53.265385Z


In [29]:
request = chat_prompt.format_prompt(request="What date was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ).to_messages()

In [30]:
result = model(request,temperature=0)

In [31]:
# Careful with this, it sometimes will include extra information!
result.content

'The 13th Amendment was ratified in the US on December 6, 1865.\n\nThe datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"'

In [32]:
result.content

'The 13th Amendment was ratified in the US on December 6, 1865.\n\nThe datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"'

In [33]:
output_parser.parse(result.content)

OutputParserException: Could not parse datetime string: The 13th Amendment was ratified in the US on December 6, 1865.

The datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"

---

# Methods to Fix Parsing Issues

## Auto-Fix Parser

In [34]:
from langchain.output_parsers import OutputFixingParser

output_parser = DatetimeOutputParser()

misformatted = result.content

In [35]:
misformatted

'The 13th Amendment was ratified in the US on December 6, 1865.\n\nThe datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"'

In [36]:
new_parser = OutputFixingParser.from_llm(parser=output_parser, llm=model)

In [38]:
print (new_parser.get_format_instructions())

Write a datetime string that matches the 
            following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". Examples: 1460-12-01T00:42:46.151176Z, 232-07-06T08:19:26.106051Z, 1968-08-08T11:54:38.843948Z


In [40]:
new_parser.parse(misformatted)

OutputParserException: Could not parse datetime string: The datetime string that matches the given pattern is: "1837-02-13T03:04:53.723236Z"

____
### Fixing via System Prompt:

In [41]:
system_prompt = SystemMessagePromptTemplate.from_template("You always reply to questions only in datetime patterns.")
template_text = "{request}\n{format_instructions}"
human_prompt=HumanMessagePromptTemplate.from_template(template_text)

In [42]:
chat_prompt = ChatPromptTemplate.from_messages([system_prompt,human_prompt])

In [43]:
print(chat_prompt.format(request="When was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ))

System: You always reply to questions only in datetime patterns.
Human: When was the 13th Amendment ratified in the US?
Write a datetime string that matches the 
            following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". Examples: 1928-05-31T13:29:22.065620Z, 823-05-22T05:45:52.746304Z, 1003-01-16T13:30:32.709351Z


In [44]:
request = chat_prompt.format_prompt(request="What date was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ).to_messages()

In [45]:
result = model(request,temperature=0)

In [46]:
result.content

'1865-12-06T00:00:00.000000Z'

In [47]:
output_parser.parse(result.content)

datetime.datetime(1865, 12, 6, 0, 0)

Be careful! This could technically be construed as wrong? The full details from Wikipedia:

    27th state to ratify was Georgia: December 6, 1865

    Having been ratified by the legislatures of three-fourths of the states (27 of the 36 states, including those that had been in rebellion), Secretary of State Seward, on December 18, 1865, certified that the Thirteenth Amendment had become valid, to all intents and purposes, as a part of the Constitution.

You also have the issue of states leaving the union, which complicates what a full ratification means at that time. It kind of depends what is meant by the word "ratified"!

## Pydantic JSON Parser
You should also be aware of OpenAI's own JSON offerings (which are still quite new at this time!): https://platform.openai.com/docs/guides/gpt/function-calling


In [53]:
#pip install pydantic

In [48]:
from langchain.output_parsers import PydanticOutputParser

In [49]:
from pydantic import BaseModel, Field

In [50]:
class Scientist(BaseModel):
    
    name: str = Field(description="Name of a Scientist")
    discoveries: list = Field(description="Python list of discoveries")

In [51]:
query = 'Name a famous scientist and a list of their discoveries' 

In [52]:
parser = PydanticOutputParser(pydantic_object=Scientist)

In [53]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "Name of a Scientist", "title": "Name", "type": "string"}, "discoveries": {"description": "Python list of discoveries", "items": {}, "title": "Discoveries", "type": "array"}}, "required": ["name", "discoveries"]}
```


In [55]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

_input = prompt.format_prompt(query="Tell me about a famous scientist")

In [57]:
print (_input.to_string)

Answer the user query.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "Name of a Scientist", "title": "Name", "type": "string"}, "discoveries": {"description": "Python list of discoveries", "items": {}, "title": "Discoveries", "type": "array"}}, "required": ["name", "discoveries"]}
```
Tell me about a famous scientist



In [58]:
output = model(_input.to_string())

TypeError: Got unknown type A

In [None]:
parser.parse(output)