## Chat Models - Output Parsing


Language models output text. But many times you may want to get more structured information than just text back. This is where output parsers come in.

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

"Get format instructions": A method which returns a string containing instructions for how the output of a language model should be formatted.
"Parse": A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.
And then one optional one:

"Parse with prompt": A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to the prompt that generated such a response) and parses it into some structure. The prompt is largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from the prompt to do so.


In [1]:
%pip install langchain langchain_openai langchain-community --upgrade

Collecting langchain
  Downloading langchain-0.1.7-py3-none-any.whl (815 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m815.9/815.9 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting langchain_openai
  Downloading langchain_openai-0.0.6-py3-none-any.whl (29 kB)
Collecting langchain-core<0.2,>=0.1.22
  Downloading langchain_core-0.1.23-py3-none-any.whl (241 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m241.2/241.2 kB[0m [31m29.3 MB/s[0m eta [36m0:00:00[0m
Collecting langchain-community<0.1,>=0.0.20
  Downloading langchain_community-0.0.20-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting openai<2.0.0,>=1.10.0
  Downloading openai-1.12.0-py3-none-any.whl (226 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
Collecting langs

In [1]:
import os
os.environ['OPENAI_API_KEY'] = 'API_KEY_HERE'

In [1]:
from langchain_core.prompts import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain_openai.chat_models import ChatOpenAI

from langchain.output_parsers import PydanticOutputParser
from pydantic.v1 import BaseModel, Field, validator

In [2]:
# chat = ChatOpenAI(openai_api_key="...")

# If you have an envionrment variable set for OPENAI_API_KEY, you can just do:
chat = ChatOpenAI(temperature=0)

In [3]:
from typing import List


class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")


class Jokes(BaseModel):
    jokes: List[Joke] = Field(description="list of jokes")

In [4]:
# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Jokes)

In [5]:
template = "Answer the user query.\n{format_instructions}\n{query}\n"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt])

In [6]:
parser.get_format_instructions()

'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"jokes": {"title": "Jokes", "description": "list of jokes", "type": "array", "items": {"$ref": "#/definitions/Joke"}}}, "required": ["jokes"], "definitions": {"Joke": {"title": "Joke", "type": "object", "properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}}}\n```'

In [7]:
# Format the chat prompt:
messages = chat_prompt.format_prompt(
    format_instructions=parser.get_format_instructions(),
    query="What's really funny about Python programming?",
).to_messages()

In [8]:
result = chat.invoke(messages)

In [9]:
print(result.content)

{"jokes": [{"setup": "The fact that it's named after a snake", "punchline": "Python is not a snake, it's a language!"}, {"setup": "Why do Python programmers prefer using snake_case?", "punchline": "Because they don't like Java!"}]}


In [10]:
joke_pydantic_object = parser.parse(result.content)

In [16]:
try:
    print(joke_pydantic_object.model_dump())
except AttributeError:
    print(joke_pydantic_object.dict())

{'jokes': [{'setup': "The fact that it's named after a snake", 'punchline': "Python is not a snake, it's a language!"}, {'setup': 'Why do Python programmers prefer using snake_case?', 'punchline': "Because they don't like Java!"}]}


In [17]:
joke_pydantic_object.jokes

[Joke(setup="The fact that it's named after a snake", punchline="Python is not a snake, it's a language!"),
 Joke(setup='Why do Python programmers prefer using snake_case?', punchline="Because they don't like Java!")]

In [18]:
joke_pydantic_object.jokes[0].punchline

"Python is not a snake, it's a language!"