<a href="https://colab.research.google.com/github/asishdash/ChatGpt-LLM/blob/main/Langchain/05_LS_Parsing_Output.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Parsing Output

Let's set up a Chat Model:

In [1]:
!pip install openai
!pip install langchain

Collecting openai
  Downloading openai-0.27.8-py3-none-any.whl (73 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/73.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m71.7/73.6 kB[0m [31m2.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.27.8
Collecting langchain
  Downloading langchain-0.0.263-py3-none-any.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.5.14-py3-none-any.whl (26 kB)
Collecting langsmith<0.1.0,>=0.0.11 (from langchain)
  Downloading langsmith-0.0.22-py3-none-any.whl (32 kB)
Collecting openapi-schema-pydantic<2.0,>=1.2 (from la

In [2]:
import openai
import os

In [4]:
from langchain.prompts import PromptTemplate, SystemMessagePromptTemplate,ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
#api_key = open("C://Users//Marcial//Desktop//desktop_openai.txt").read()
api_key =os.environ['key']
model = ChatOpenAI(openai_api_key=api_key)

## List Parsing

In [26]:
from langchain.output_parsers import CommaSeparatedListOutputParser

In [6]:
output_parser = CommaSeparatedListOutputParser()

In [7]:
format_instructions = output_parser.get_format_instructions()

In [8]:
print(format_instructions)

Your response should be a list of comma separated values, eg: `foo, bar, baz`


In [9]:
reply = "one, two, three"
output_parser.parse("one, two, three")

['one', 'two', 'three']

In [10]:
human_template = '{request} {format_instructions}'
human_prompt = HumanMessagePromptTemplate.from_template(human_template)

In [11]:
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

chat_prompt.format_prompt(request="give me 5 characteristics of dogs",
                   format_instructions = output_parser.get_format_instructions())

ChatPromptValue(messages=[HumanMessage(content='give me 5 characteristics of dogs Your response should be a list of comma separated values, eg: `foo, bar, baz`', additional_kwargs={}, example=False)])

In [12]:
request = chat_prompt.format_prompt(request="give me 5 characteristics of dogs",
                   format_instructions = output_parser.get_format_instructions()).to_messages()

In [13]:
result = model(request)

In [None]:
result.content

'loyal, friendly, playful, protective, trainable'

In [14]:
# Convert to desired output:
output_parser.parse(result.content)

['loyal', 'friendly', 'playful', 'protective', 'trainable']

## Datetime Parser

In [15]:
from langchain.output_parsers import DatetimeOutputParser

In [16]:
output_parser = DatetimeOutputParser()

In [17]:
print(output_parser.get_format_instructions())

Write a datetime string that matches the 
            following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". Examples: 1990-08-20T12:21:15.957169Z, 1756-06-27T07:20:38.597328Z, 1166-04-05T21:58:17.254494Z


In [18]:
template_text = "{request}\n{format_instructions}"
human_prompt=HumanMessagePromptTemplate.from_template(template_text)

In [19]:
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

In [20]:
print(chat_prompt.format(request="When was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ))

Human: When was the 13th Amendment ratified in the US?
Write a datetime string that matches the 
            following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". Examples: 1825-05-19T22:23:53.103584Z, 1974-12-30T00:37:55.083908Z, 1012-05-11T16:35:45.641132Z


In [21]:
request = chat_prompt.format_prompt(request="What date was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ).to_messages()

In [22]:
result = model(request,temperature=0)

In [23]:
# Careful with this, it sometimes will include extra information!
result.content

'The 13th Amendment was ratified in the US on December 6, 1865.\n\nThe datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"'

In [24]:
result.content

'The 13th Amendment was ratified in the US on December 6, 1865.\n\nThe datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"'

In [25]:
output_parser.parse(result.content)

OutputParserException: ignored

---

# Methods to Fix Parsing Issues

## Auto-Fix Parser

In [None]:
from langchain.output_parsers import OutputFixingParser

output_parser = DatetimeOutputParser()

misformatted = result.content

In [None]:
misformatted

'The 13th Amendment was ratified in the US on December 6, 1865.\n\nThe datetime string that matches the given pattern is: "1865-12-06T00:00:00.000000Z"'

In [None]:
new_parser = OutputFixingParser.from_llm(parser=output_parser, llm=model)

In [None]:
new_parser.parse(misformatted)

datetime.datetime(1865, 12, 6, 0, 0)

____
### Fixing via System Prompt:

In [None]:
system_prompt = SystemMessagePromptTemplate.from_template("You always reply to questions only in datetime patterns.")
template_text = "{request}\n{format_instructions}"
human_prompt=HumanMessagePromptTemplate.from_template(template_text)

In [None]:
chat_prompt = ChatPromptTemplate.from_messages([system_prompt,human_prompt])

In [None]:
print(chat_prompt.format(request="When was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ))

System: You always reply to questions only in datetime patterns.
Human: When was the 13th Amendment ratified in the US?
Write a datetime string that matches the 
            following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". Examples: 1796-02-26T04:04:10.673088Z, 0754-04-24T13:43:26.442719Z, 0382-07-21T05:34:03.561213Z


In [None]:
request = chat_prompt.format_prompt(request="What date was the 13th Amendment ratified in the US?",
                   format_instructions=output_parser.get_format_instructions()
                   ).to_messages()

In [None]:
result = model(request,temperature=0)

In [None]:
result.content

'1865-12-06T00:00:00.000000Z'

In [None]:
output_parser.parse(result.content)

datetime.datetime(1865, 12, 6, 0, 0)

Be careful! This could technically be construed as wrong? The full details from Wikipedia:

    27th state to ratify was Georgia: December 6, 1865

    Having been ratified by the legislatures of three-fourths of the states (27 of the 36 states, including those that had been in rebellion), Secretary of State Seward, on December 18, 1865, certified that the Thirteenth Amendment had become valid, to all intents and purposes, as a part of the Constitution.

You also have the issue of states leaving the union, which complicates what a full ratification means at that time. It kind of depends what is meant by the word "ratified"!

## Pydantic JSON Parser
You should also be aware of OpenAI's own JSON offerings (which are still quite new at this time!): https://platform.openai.com/docs/guides/gpt/function-calling


In [None]:
#pip install pydantic

In [None]:
from langchain.output_parsers import PydanticOutputParser

In [None]:
from pydantic import BaseModel, Field

In [None]:
class Scientist(BaseModel):

    name: str = Field(description="Name of a Scientist")
    discoveries: list = Field(description="Python list of discoveries")

In [None]:
query = 'Name a famous scientist and a list of their discoveries'

In [None]:
parser = PydanticOutputParser(pydantic_object=Scientist)

In [None]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"title": "Name", "description": "Name of a Scientist", "type": "string"}, "discoveries": {"title": "Discoveries", "description": "Python list of discoveries", "type": "array", "items": {}}}, "required": ["name", "discoveries"]}
```


In [None]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

_input = prompt.format_prompt(query="Tell me about a famous scientist")

output = model(_input.to_string())

parser.parse(output)

Scientist(name='Albert Einstein', discoveries=['Theory of Relativity', 'Photoelectric Effect', 'Brownian Motion'])