<a href="https://colab.research.google.com/github/HoseinBahmany/learning-llms/blob/main/langchain/05_output_parsers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install langchain openai chromadb tiktoken numpy faiss-cpu

In [2]:
import os

os.environ["OPENAI_API_KEY"] = "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"
os.environ["SERPAPI_API_KEY"] = "1516792b8aa8d598271fd69823f3590da610d429c776fff1deca86f4415bc818"

Language models output text. But many times you may want to get more structured information than just text back. This is where output parsers come in.

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

* "Get format instructions": A method which returns a string containing instructions for how the output of a language model should be formatted.
* "Parse": A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

And then one optional one:

* "Parse with prompt": A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to the prompt that generated such a response) and parses it into some structure. The prompt is largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from the prompt to do so.

# Pydantic (JSON) parser

This output parser allows users to specify an arbitrary JSON schema and query LLMs for JSON outputs that conform to that schema.

Keep in mind that large language models are leaky abstractions! You'll have to use an LLM with sufficient capacity to generate well-formed JSON. In the OpenAI family, DaVinci can do reliably but Curie's ability already drops off dramatically.

Use Pydantic to declare your data model. Pydantic's BaseModel like a Python dataclass, but with actual type checking + coercion.

In [6]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

model_name = "text-davinci-003"
temperature = 0.0
model = OpenAI(model=model_name, temperature=temperature)

# Define the desired data structure
class Joke(BaseModel):
  setup: str = Field(description="question to setup a joke")
  punchline: str = Field(description="answer to resolve the joke")

  @validator('setup')
  def question_ends_with_question_mark(cls, field):
    if field[-1] != '?':
      raise ValueError("Badly formatted question!")
    return field

parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

joke_query = "Tell me a joke"
print(prompt.format(query=joke_query))

output = model(prompt.format(query=joke_query))

parsed_obj = parser.parse(output)

print("parsed object: ", parsed_obj)

Answer the user query.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"setup": {"title": "Setup", "description": "question to setup a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}
```
Tell me a joke

parsed object:  setup='Why did the chicken cross the road?' punchline='To get to the other side!'


# List Parser

This output parser can be used when you want to return a list of comma-separated items.

In [7]:
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI

output_parser = CommaSeparatedListOutputParser()

prompt = PromptTemplate(
    template="Live five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()}
)

llm = OpenAI(temperature=0)

input = prompt.format(subject="ice cream flavors")
output = llm(input)
parsed_output = output_parser.parse(output)

print("input: ", input)
print("output: ", output)
print("parsed_output: ", parsed_output)

input:  Live five ice cream flavors.
Your response should be a list of comma separated values, eg: `foo, bar, baz`
output:  

Vanilla, Chocolate, Strawberry, Mint Chocolate Chip, Cookies and Cream
parsed_output:  ['Vanilla', 'Chocolate', 'Strawberry', 'Mint Chocolate Chip', 'Cookies and Cream']


# DateTime Parser

This OutputParser shows out to parse LLM output into datetime format.

In [8]:
from langchain.prompts import PromptTemplate
from langchain.output_parsers import DatetimeOutputParser
from langchain.chains import LLMChain
from langchain.llms import OpenAI

output_parser = DatetimeOutputParser()

prompt = PromptTemplate.from_template(
    "Answer the user question:\n\n{question}\n\n{format_instructions}",
    partial_variables={"format_instructions": output_parser.get_format_instructions()}
)

chain = LLMChain(prompt=prompt, llm=OpenAI(temperature=0), output_parser=output_parser)

output = chain.run("Around when was Bitcoin founded?")
print(output)

2008-01-03 18:15:05


# Enum Parser

In [12]:
from langchain.prompts import PromptTemplate
from langchain.output_parsers import EnumOutputParser
from langchain.llms import OpenAI
from enum import Enum

class Colors(Enum):
  POSITIVE = "Positive"
  NEGATIVE = "Negative"

output_parser = EnumOutputParser(enum=Colors)

prompt = PromptTemplate.from_template(
    "Determine the sentiment of the user's sentence.\n{format_instructions}\nsentence: {sentence}",
    partial_variables={"format_instructions": output_parser.get_format_instructions()}
)

llm = OpenAI(temperature=0)

input = prompt.format(sentence="Today I had a very rough day. My boss just kept berating me and I was so stressed out!")
output = llm(input)
parsed_output = output_parser.parse(output)

print("input: ", input)
print("output: ", output)
print("parsed_output: ", parsed_output)

input:  Determine the sentiment of the user's sentence.
Select one of the following options: Positive, Negative
sentence: Today I had a very rough day. My boss just kept berating me and I was so stressed out!
output:  
Negative
parsed_output:  Colors.NEGATIVE


# Auto-fixing Parser

This output parser wraps another output parser, and in the event that the first one fails it calls out to another LLM to fix any errors.

But we can do other things besides throw errors. Specifically, we can pass the misformatted output, along with the formatted instructions, to the model and ask it to fix it.

In [14]:
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser
from pydantic import BaseModel, Field, validator
from typing import List

class Actor(BaseModel):
  name: str = Field(description="name of an actor")
  film_names: List[str] = Field(description="list of names of films they starred in")

pydantic_parser = PydanticOutputParser(pydantic_object=Actor)

output_parser = OutputFixingParser.from_llm(parser=pydantic_parser, llm=ChatOpenAI(temperature=0))

misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"

print(output_parser.parse(misformatted))


name='Tom Hanks' film_names=['Forrest Gump']


# Retry parser

While in some cases it is possible to fix any parsing mistakes by only looking at the output, in other cases it can't. An example of this is when the output is not just in the incorrect format, but is partially complete. Consider the below example.



In [18]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser, RetryWithErrorOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

class Action(BaseModel):
  action: str = Field(description="action to take")
  action_input: str = Field(description="input to the action")

pydantic_parser = PydanticOutputParser(pydantic_object=Action)

prompt = PromptTemplate.from_template(
    "Based on the user question, provide an Action and Action Input for what step should be taken.\n{format_instructions}\nQuestion: {query}\nResponse: ",
    partial_variables={"format_instructions": pydantic_parser.get_format_instructions()}
)

prompt_value = prompt.format_prompt(query="Who is Leo Di Caprio's GF?")

# suppose the LLM has returned a bad and partial response like this
bad_response = '{"action": "search"}'

# This would fail because the required field "action_input" is missing
# pydantic_parser.parse(bad_response)

retry_parser = RetryWithErrorOutputParser.from_llm(
    llm=OpenAI(temperature=0),
    parser=pydantic_parser
)

retry_parser.parse_with_prompt(bad_response, prompt_value)



Action(action='search', action_input="Leo Di Caprio's GF")