In [60]:
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser, RetryOutputParser, RetryWithErrorOutputParser

from pydantic import BaseModel, Field, validator
from typing import List

## Setting the LLM

In [38]:
with open("openai_api.txt", "r") as f:
    OPENAI_API = f.read()

llm = OpenAI(
    model_name = "gpt-3.5-turbo-instruct",
    temperature = 1.1,
    openai_api_key = OPENAI_API
)

## What are Output Parsers

Language models output text. But many times you may want to get more structured information than just text back. This is where `output parsers` come in.

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:
* **get_format_instructions() -> str**: A method which returns a string containing instructions for how the output of a language model should be formatted.

* **parse(str) -> Any**: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

And then one optional one:
* **parse_with_prompt(str) -> Any**: A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to the prompt that generated such a response) and parses it into some structure. The prompt is largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from the prompt to do so.


## Different Type of Output Parsers

### `PydanticOutputParser`

This output parser allows users to specify an arbitrary `JSON schema` and query LLMs for JSON outputs that conform to that schema.



In [39]:
## Define the Desire Structure

class Joke_Structure(BaseModel):
    # Setting the `input variables`
    setup: str = Field(description = "question to set up a joke")
    punchline: str = Field(description = "answer to resolve the joke")

    # Adding custom validating logit
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != '?':
            raise ValueError("Badly formed question!")
        return field

In [40]:
## Initializing the Parser

parser = PydanticOutputParser(pydantic_object = Joke_Structure)

In [41]:
## Setting Prompt Template

prompt = PromptTemplate(
    template = "Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables = ["query"],
    partial_variables = {"format_instructions": parser.get_format_instructions()}
)

for k in prompt:
    print(k)

('input_variables', ['query'])
('output_parser', None)
('partial_variables', {'format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}\n```'})
('template', 'Answer the user query.\n{format_instructions}\n{query}\n')
('template_format', 'f-string')
('validate_template', True)


In [42]:
joke_query = "Tell me a joke."

_input = prompt.format_prompt(query=joke_query)

print(_input.text)

Answer the user query.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}
```
Tell me a joke.



In [43]:
output = llm(_input.to_string())

output

'\n{"setup": "Why did the chicken go to the séance?", "punchline": "To get to the other side!"}'

In [44]:
parser.parse(output)

Joke_Structure(setup='Why did the chicken go to the séance?', punchline='To get to the other side!')

### `OutputFixingParser`

This output parser wraps another output parser and tries to `fix` any mistakes.

The `Pydantic` guardrail simply tries to parse the LLM response. If it does not parse correctly, then it `errors`.

But we can do other things besides throw errors. Specifically, we can pass the misformatted output, along with the formatted instructions, to the model and ask it to fix it.



In [46]:
## Let's get an Error

class Actor(BaseModel):
    name: str = Field(description = "name of an actor")
    film_names: List[str] = Field(description = "list of names of films they starred in")
        
parser = PydanticOutputParser(pydantic_object=Actor)

In [49]:
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"

try:
    parser.parse(misformatted)
except ValueError:
    print("Error...")

Error...


In [53]:
## Fixing the Misformatted Problem

new_parser = OutputFixingParser.from_llm(parser=parser, llm=llm)

new_parser.parse(misformatted)

Actor(name='Tom Hanks', film_names=['Forrest Gump'])

### `RetryOutputParser`

While in some cases it is possible to fix any parsing mistakes by only looking at the output, in other cases we can't.

In [57]:
## Getting an Error

class Action(BaseModel):
    action: str = Field(description = "action to take")
    action_input: str = Field(description = "input to the action")
        
parser = PydanticOutputParser(pydantic_object=Action)

prompt = PromptTemplate(
    template = "Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables = ["query"],
    partial_variables = {"format_instructions": parser.get_format_instructions()}
)

prompt_value = prompt.format_prompt(query="who is leo di caprios gf?")

bad_response = '{"action": "search"}'

try:
    parser.parse(bad_response)
except ValueError:
    print("Error...")

Error...


In [58]:
## Fixing the Error

fix_parser = RetryOutputParser.from_llm(parser=parser, llm=llm)

fix_parser.parse(bad_response)

Action(action='search', action_input='term')

We can see that although the error resolved, the model is confused and doesn't know what to put for action input.

### `RetryWithErrorOutputParser`

We can use the RetryWithErrorOutputParser, which passes in the prompt (as well as the original output) to try again to get a better response.

In [61]:
retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=llm)

retry_parser.parse_with_prompt(bad_response, prompt_value)

Action(action='search', action_input='who is leo di caprios gf?')