# **PydanticOutputParser**

**Tổng quan**

Hướng dẫn này bao gồm cách thực hiện `PydanticOutputParser` sử dụng `pydantic`.

**PydanticOutputParser** là một lớp giúp chuyển đổi đầu ra của mô hình ngôn ngữ thành **structured information** (thông tin có cấu trúc). Lớp này có thể **provide the information you need in a clear and organized form** (cung cấp thông tin bạn cần một cách rõ ràng và có tổ chức) thay vì chỉ một phản hồi văn bản đơn giản.

Bằng cách sử dụng lớp này, bạn chuyển đổi đầu ra của mô hình ngôn ngữ để phù hợp với một mô hình dữ liệu cụ thể, giúp việc xử lý và sử dụng thông tin dễ dàng hơn.

**Phương pháp chính**

**PydanticOutputParser** chủ yếu yêu cầu triển khai hai phương pháp cốt lõi.

* `get_format_instructions()`: **Provide instructions that define the format of the information that the language model should output** (Cung cấp hướng dẫn xác định định dạng thông tin mà mô hình ngôn ngữ nên xuất ra). Ví dụ: bạn có thể trả về hướng dẫn dưới dạng một chuỗi mô tả các trường dữ liệu mà mô hình ngôn ngữ nên xuất ra và cách chúng nên được định dạng. Những hướng dẫn này rất quan trọng để mô hình ngôn ngữ cấu trúc đầu ra và chuyển đổi nó để phù hợp với mô hình dữ liệu cụ thể của bạn.
* `parse()`: **Takes the output of the language model (assumed to be a string) and analyzes and transforms it into a specific structure** (Lấy đầu ra của mô hình ngôn ngữ (giả định là một chuỗi) và phân tích và chuyển đổi nó thành một cấu trúc cụ thể). Sử dụng một công cụ như Pydantic để xác thực chuỗi đầu vào theo một lược đồ được xác định trước và chuyển đổi nó thành một cấu trúc dữ liệu tuân theo lược đồ đó.


## Setup environment

In [3]:
import os
from dotenv import load_dotenv

load_dotenv(override=True, dotenv_path="../.env")

# os.getenv("OPENAI_API_KEY")

True

## Setup model

In [4]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.0)

In [5]:
email_conversation = """
From: John (John@bikecorporation.me)
To: Kim (Kim@teddyinternational.me)
Subject: “ZENESIS” bike distribution cooperation and meeting schedule proposal
Dear Mr. Kim,

I am John, Senior Executive Director at Bike Corporation. I recently learned about your new bicycle model, "ZENESIS," through your press release. Bike Corporation is a company that leads innovation and quality in the field of bicycle manufacturing and distribution, with long-time experience and expertise in this field.

We would like to request a detailed brochure for the ZENESIS model. In particular, we need information on technical specifications, battery performance, and design aspects. This information will help us further refine our proposed distribution strategy and marketing plan.

Additionally, to discuss the possibilities for collaboration in more detail, I propose a meeting next Tuesday, January 15th, at 10:00 AM. Would it be possible to meet at your office to have this discussion?

Thank you.

Best regards,
John
Senior Executive Director
Bike Corporation
"""

In [14]:
from langchain_core.prompts import PromptTemplate
from langchain_core.messages import AIMessageChunk
from langchain_core.output_parsers import StrOutputParser

prompt = PromptTemplate.from_template(
  "Please extract the important parts of the following email.\n\n{email_conversation}"
)

chain = prompt | llm | StrOutputParser()
answer = chain.stream({"email_conversation": email_conversation})

In [15]:
def stream_response(response, return_output=False):
    """
    Streams the response from the AI model, processing and printing each chunk.

    This function iterates over each item in the 'response' iterable. If an item is an instance of AIMessageChunk, it extracts and prints the content.
    If the item is a string, it prints the string directly.
    Optionally, the function can return the concatenated string of all response chunks.

    Args:
    - response (iterable): An iterable of response chunks, which can be AIMessageChunk objects or strings.
    - return_output (bool, optional): If True, the function returns the concatenated response string. The default is False.

    Returns:
    - str: If `return_output` is True, the concatenated response string. Otherwise, nothing is returned.
    """
    answer = ""
    for token in response:
        if isinstance(token, AIMessageChunk):
            answer += token.content
            print(token.content, end="", flush=True)
        elif isinstance(token, str):
            answer += token
            print(token, end="", flush=True)
    if return_output:
        return answer

In [16]:
output = stream_response(answer, return_output=True)

- John is from Bike Corporation and is interested in collaborating with Teddy International on their new bicycle model "ZENESIS"
- Bike Corporation is a company with experience in bicycle manufacturing and distribution
- Request for a detailed brochure on the ZENESIS model, specifically technical specifications, battery performance, and design aspects
- Proposal for a meeting on Tuesday, January 15th at 10:00 AM to discuss collaboration possibilities

In [17]:
print(output)

- John is from Bike Corporation and is interested in collaborating with Teddy International on their new bicycle model "ZENESIS"
- Bike Corporation is a company with experience in bicycle manufacturing and distribution
- Request for a detailed brochure on the ZENESIS model, specifically technical specifications, battery performance, and design aspects
- Proposal for a meeting on Tuesday, January 15th at 10:00 AM to discuss collaboration possibilities


## Use PydanticOutputParser

In [19]:
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class EmailSummary(BaseModel):
    person: str = Field(description="The sender of the email")
    email: str = Field(description="The email address of the sender")
    subject: str = Field(description="The subject of the email")
    summary: str = Field(description="A summary of the email content")
    date: str = Field(
        description="The meeting date and time mentioned in the email content"
    )
    
parser = PydanticOutputParser(pydantic_object=EmailSummary)

In [21]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"person": {"description": "The sender of the email", "title": "Person", "type": "string"}, "email": {"description": "The email address of the sender", "title": "Email", "type": "string"}, "subject": {"description": "The subject of the email", "title": "Subject", "type": "string"}, "summary": {"description": "A summary of the email content", "title": "Summary", "type": "string"}, "date": {"description": "The meeting date and time mentioned in the email content", "title": "Date", "type": "string"}}, "required": ["person", "email"

Defining the prompt:

1. `question`: Receives the user's question.
2. `email_conversation`: Inputs the content of the email conversation.
3. `format`: Specifies the format.

In [22]:
prompt = PromptTemplate.from_template(
    """
You are a helpful assistant. 

QUESTION:
{question}

EMAIL CONVERSATION:
{email_conversation}

FORMAT:
{format}
"""
)

# Add partial formatting of PydanticOutputParser to format
prompt = prompt.partial(format=parser.get_format_instructions())

In [23]:
chain = prompt | llm 

In [24]:
# Execute the chain and print the result.
response = chain.stream(
    {
        "email_conversation": email_conversation,
        "question": "Extract the main content of the email.",
    }
)

# The result is provided in JSON format.
output = stream_response(response, return_output=True)

{
  "person": "John",
  "email": "John@bikecorporation.me",
  "subject": "ZENESIS bike distribution cooperation and meeting schedule proposal",
  "summary": "Bike Corporation is interested in collaborating with Teddy International for distributing the ZENESIS bike model. They are requesting a detailed brochure for the ZENESIS model to refine their distribution strategy and marketing plan. A meeting is proposed for next Tuesday, January 15th, at 10:00 AM.",
  "date": "Tuesday, January 15th, 10:00 AM"
}

In [25]:
structured_output = parser.parse(output)
print(structured_output)

person='John' email='John@bikecorporation.me' subject='ZENESIS bike distribution cooperation and meeting schedule proposal' summary='Bike Corporation is interested in collaborating with Teddy International for distributing the ZENESIS bike model. They are requesting a detailed brochure for the ZENESIS model to refine their distribution strategy and marketing plan. A meeting is proposed for next Tuesday, January 15th, at 10:00 AM.' date='Tuesday, January 15th, 10:00 AM'


In [26]:
chain = prompt | llm | parser

In [27]:
# Execute the chain and print the results.
response = chain.invoke(
    {
        "email_conversation": email_conversation,
        "question": "Extract the main content of the email.",
    }
)

# The results are output in the form of an EmailSummary object.
print(response)

person='John' email='John@bikecorporation.me' subject='ZENESIS bike distribution cooperation and meeting schedule proposal' summary='Bike Corporation is interested in collaborating with Teddy International for distributing the ZENESIS bike model. They are requesting a detailed brochure for the ZENESIS model to refine their distribution strategy and marketing plan. A meeting is proposed for next Tuesday, January 15th, at 10:00 AM.' date='Tuesday, January 15th, 10:00 AM'


### `with_structured_output()`

By using `.with_structured_output(Pydantic)`, you can add an output parser and convert the output into a Pydantic object.

In [28]:
llm_with_structure = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.0).with_structured_output(EmailSummary)



In [29]:
answer = llm_with_structure.invoke(email_conversation)

In [30]:
answer

EmailSummary(person='John', email='John@bikecorporation.me', subject='“ZENESIS” bike distribution cooperation and meeting schedule proposal', summary='John, Senior Executive Director at Bike Corporation, is interested in collaborating with Teddy International on distributing their new bicycle model, ZENESIS. He requests a detailed brochure for the ZENESIS model to refine their distribution strategy and marketing plan. John also proposes a meeting on Tuesday, January 15th, at 10:00 AM, to discuss collaboration possibilities.', date='Tuesday, January 15th, 10:00 AM')