# PydanticOutputParser

`PydanticOutputParser` is a class that helps convert the output of a language model into **structured information**. Instead of providing simple text responses, this class can provide the necessary information in a **clear and systematic form**.

By using this class, you can transform the output of a language model to fit a specific data model, making it easier to process and utilize the information.

## Key Methods

`PydanticOutputParser` (and most OutputParsers) should primarily implement **two core methods**.

- **`get_format_instructions()`**: Provides instructions defining the format of the information that the language model should output. For example, it can return a string describing the fields and format of the data that the language model should output. These instructions are crucial for structuring the output and transforming it to fit a specific data model.
- **`parse()`**: Takes the output of the language model (assumed to be a string) and parses and transforms it into a specific structure. Using tools like Pydantic, it validates the input string according to a predefined schema and converts it into a data structure that follows that schema.

## References

- [Pydantic Official Documentation](https://docs.pydantic.dev/latest/)


In [16]:
from dotenv import load_dotenv
import sys
sys.path.append('AbbyUtils.py')
#API Key Load
load_dotenv()

True

In [17]:
from AbbyUtils import langsmith

langsmith("Ch04_A_OutputParser")

LangSmith is enabled.
[Project_Name]:Ch04_A_OutputParser


In [18]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from AbbyUtils import stream_response


llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")

In [19]:
email_content = """

From: John Surjadi <surjadigroup@gmail.com> 
Sent: Friday, July 5, 2024 6:31 PM
To: Ross Siemens <RSiemens@abbotsford.ca>
Subject: RE: gasoline gouging

I would like the gouging in Abbotsford to stop, today the prices were cheaper in Vancouver than they were in Abbotsford and as the mayor of Abbotsford I would think that you would have something to say about this... I'm not happy to know that our Mayor is not concerned enough to regulate how business is being conducted in Abbotsford BC... We are already taxed to death and this is completely out of control when gasoline prices in Abbotsford are higher than Metro Vancouver...
I have reached out to you in the past, I have sent you emails regarding this problem and just like every other politician, you have pass the buck and told me that you do not control the gas pricing in Abbotsford... Notes you are also in the gasoline business... well it's about time our Mayor took interest and the concerns of your voters and started to do something to contain this matter... 

I will no longer vote for you in the future if something is not done about this immediately this year... 
It's bad enough we have a carbon tax...

Yours truly Richard Williams

"""

In [20]:
from itertools import chain
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate.from_template(
    "Extract important information from {email_content}"
)
chain = prompt | llm

answer = chain.stream({"email_content": email_content})

output = stream_response(answer, return_output=True)

**Important Information Extracted:**

- **Sender:** Richard Williams
- **Recipient:** Ross Siemens (Mayor of Abbotsford)
- **Date:** July 5, 2024
- **Subject:** Gasoline pricing concerns in Abbotsford
- **Main Issues Raised:**
  - Gasoline prices in Abbotsford are higher than in Vancouver.
  - Richard Williams expresses dissatisfaction with the Mayor's lack of action regarding gas price regulation.
  - He feels that the situation is exacerbated by high taxes and the carbon tax.
  - Williams has previously reached out to the Mayor about this issue but feels his concerns have been ignored.
  - He accuses the Mayor of not taking the concerns of voters seriously, especially given the Mayor's involvement in the gasoline business.
- **Consequences Mentioned:**
  - Williams threatens to not vote for Ross Siemens in future elections if the issue is not addressed promptly.

In [21]:
class EmailSummary(BaseModel):
    person: str = Field(description="Sender")
    email: str = Field(description="Sender's email address")
    subject: str = Field(description="Email subject")
    summary: str = Field(description="Summary of the email body")
    date: str = Field(description="Date and time mentioned in the email body")

# PydanticOutputParser 생성
parser = PydanticOutputParser(pydantic_object=EmailSummary)

In [22]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"person": {"description": "Sender", "title": "Person", "type": "string"}, "email": {"description": "Sender's email address", "title": "Email", "type": "string"}, "subject": {"description": "Email subject", "title": "Subject", "type": "string"}, "summary": {"description": "Summary of the email body", "title": "Summary", "type": "string"}, "date": {"description": "Date and time mentioned in the email body", "title": "Date", "type": "string"}}, "required": ["person", "email", "subject", "summary", "date"]}
```


In [23]:
prompt = PromptTemplate.from_template(
 """
You are a helpful assistant. Please answer the following questions.

QUESTION:
{question}

EMAIL CONVERSATION:
{email_content}

FORMAT:
{format}
"""
)
prompt = prompt.partial(format=parser.get_format_instructions())

In [24]:
chain = prompt | llm

In [25]:

response = chain.stream(
    {
        "email_content": email_content,
        "question": "Extract the important information from the email.",
    }
)

output = stream_response(response, return_output=True)

```json
{
  "person": "Richard Williams",
  "email": "surjadigroup@gmail.com",
  "subject": "RE: gasoline gouging",
  "summary": "Richard Williams expresses frustration over high gasoline prices in Abbotsford compared to Vancouver, criticizes the mayor for inaction, and threatens to not vote for the mayor in the future if the issue is not addressed.",
  "date": "Friday, July 5, 2024 6:31 PM"
}
```

In [11]:
structured_output = parser.parse(output)
print(structured_output)

person='Richard Williams' email='surjadigroup@gmail.com' subject='RE: gasoline gouging' summary='Richard Williams expresses frustration over high gasoline prices in Abbotsford compared to Vancouver, criticizes the mayor for inaction, and threatens to not vote for the mayor in the future if the issue is not addressed.' date='Friday, July 5, 2024 6:31 PM'


## Creating a Chain with the Parser
You can create the output using the Pydantic object that defines the output format.

In [12]:
chain = prompt | llm | parser

In [13]:
response = chain.invoke(
    {
        "email_content": email_content,
        "question": "Extract the important information from the email.",
    }
)

In [14]:
llm_with_structered = ChatOpenAI(
    temperature=0, model_name="gpt-4o-mini"
).with_structured_output(EmailSummary)

In [15]:
answer = llm_with_structered.invoke(email_content)
answer

EmailSummary(person='Richard Williams', email='surjadigroup@gmail.com', subject='RE: gasoline gouging', summary='Richard Williams expresses his frustration to Mayor Ross Siemens regarding the high gasoline prices in Abbotsford compared to Vancouver. He criticizes the lack of action from the mayor and threatens to not vote for him in the future if the issue is not addressed. He also mentions previous communications on the matter and highlights the burden of taxes and the carbon tax.', date='July 5, 2024 6:31 PM')

### Streaming functon is not supported