## Output Parser

- **型変換のためのOutput Parser(一部)**
    - <font color="indianred">Pydantic (JSON) parser： LLMのレスポンスをJSON（Pydanticで構築したデータモデル）に変換します</font>
    - Structured output parser： Pydantic (JSON) parserのシンプルな版
    - <font color="indianred">BooleanOutputParser: Yes/NoをTrue/Falseに変換する</font>
    - <font color="indianred">List parser： LLMのレスポンスをList型に変換します</font>
    - Enum parser： LLMのレスポンスをEnum型に変換します
    - ~~Datetime parser：LLMのレスポンスをDatetime型に変換します~~
---
- **例外処理のためのOutput Parser**
    - <font color="indianred">Auto-fixing parser：LLMの応答を指定した型に変換できないときに別のLLMに修正依頼を出すために使います</font>
    - ~~Retry parser：LLMの応答を指定した型に変換できないときに別のLLMにやり直させるために使います~~
 
##### 参考
- [【LangChainのOutput Parserとは？】Output Parserの機能と使い方を解説](https://book.st-hakky.com/data-science/langchain-outputparser/)

## Comment
- Yes/Noの分岐を実装するなら`BooleanOutputParser`か`Pydantic (JSON) parser`
- 複数分岐なら`Pydantic (JSON) parser`が無難。

## Config

In [2]:
import os
os.environ["OPENAI_API_KEY"] = "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"

In [82]:
# LLMの入出力を表示する
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, force=True)

In [69]:
import numpy as np
from typing import Any,Callable,Dict,Iterable,List,Optional,Sized,Tuple,Union

from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_openai import OpenAI,ChatOpenAI
from langchain.prompts import PromptTemplate

# output parser
from langchain.output_parsers.enum import EnumOutputParser
from langchain.output_parsers.list import CommaSeparatedListOutputParser
from langchain.output_parsers.pydantic import PydanticOutputParser
from langchain.output_parsers.fix import OutputFixingParser
from langchain.output_parsers.boolean import BooleanOutputParser

In [40]:
# LLM（モデル）を構築
# model = OpenAI(temperature=0, model="gpt-3.5-turbo", verbose=False)
model = ChatOpenAI(temperature=0, model="gpt-3.5-turbo", verbose=False)

# test
# model.invoke("hello").content

## Pydantic (JSON) parser

Structured output parserよりも、　設計のコード数が少なく簡単？

In [55]:
from pydantic import BaseModel, Field

# Pydanticで型を定義する
class ANSWER(BaseModel):
    answer: bool = Field(description="Yes/No")
    reason: str  = Field(description="The reason why you answer yes/no.")

# OutputParserを用意する
output_parser = PydanticOutputParser(pydantic_object=ANSWER)

# フォーマットの指示を作成
format_instructions = output_parser.get_format_instructions()
print(f"format_instructions: {format_instructions}")

# プロンプトを作成
prompt_template = PromptTemplate(
    template="{text}は、正しい？ .\n{format_instructions}", # プロンプトテンプレート
    input_variables=["text"],
    partial_variables={"format_instructions": format_instructions}, # フォーマットの指示をプロンプトに設定
    # output_parser = ...
)

format_instructions: The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"answer": {"description": "Yes/No", "title": "Answer", "type": "boolean"}, "reason": {"description": "The reason why you answer yes/no.", "title": "Reason", "type": "string"}}, "required": ["answer", "reason"]}
```


### Output Example

In [58]:
# プロンプトを作成
prompt = prompt_template.format(text="1+100/2=50.5")
prompt

'1+100/2=50.5は、正しい？ .\nThe output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"answer": {"description": "Yes/No", "title": "Answer", "type": "boolean"}, "reason": {"description": "The reason why you answer yes/no.", "title": "Reason", "type": "string"}}, "required": ["answer", "reason"]}\n```'

In [59]:
# LLMに応答を出力させる
output = model.invoke(prompt)
print(output.content)

{
    "answer": false,
    "reason": "The calculation is incorrect. The correct answer is 50.5."
}


In [62]:
# 出力をパースする
result = output_parser.parse(output.content)
result

ANSWER(answer=False, reason='The calculation is incorrect. The correct answer is 50.5.')

In [63]:
result.answer

False

## BooleanOutputParser

<font color="indianred">**Note:format_instructionsはまだ未実装なので、自分で作成する必要がある。**</font>

In [96]:
# OutputParserを用意する
output_parser = BooleanOutputParser()

In [97]:
# フォーマットの指示を作成
format_instructions = 'Please answer with "yes" or "no."'
print(f"format_instructions: {format_instructions}")

# プロンプトを作成
prompt_template = PromptTemplate(
    template="{text}は、正しい？ .\n{format_instructions}", # プロンプトテンプレート
    input_variables=["text"],
    partial_variables={"format_instructions": format_instructions}, # フォーマットの指示をプロンプトに設定
    # output_parser = ...
)

format_instructions: Please answer with "yes" or "no."


### Output Example

In [98]:
# プロンプトを作成
prompt = prompt_template.format(text="1+100/2=50.5")
prompt

'1+100/2=50.5は、正しい？ .\nPlease answer with "yes" or "no."'

In [99]:
# LLMに応答を出力させる
output = model.invoke(prompt)

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'content': '1+100/2=50.5は、正しい？ .\nPlease answer with "yes" or "no."', 'role': 'user'}], 'model': 'gpt-3.5-turbo', 'n': 1, 'stream': False, 'temperature': 0.0}}
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='api.openai.com' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x130e44af0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x13087f5c0> server_hostname='api.openai.com' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.sync.SyncStream object at 0x1308f7ee0>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_header

In [100]:
output_parser.parse(output.content)

False

In [93]:
output_parser.parse("Yes, I do.")

True

In [94]:
output_parser.parse("Sure.")

ValueError: BooleanOutputParser expected output value to include either YES or NO. Received Sure..

In [92]:
output_parser.parse("Yes/No")

ValueError: Ambiguous response. Both YES and NO in received: Yes/No.

## EnumOutputParser

In [10]:
# 事前にEnumで型を定義
from enum import Enum

class ANSWER(Enum):
    YES = "yes"
    NO  = "no"

# OutputParserを用意
output_parser = EnumOutputParser(enum=ANSWER) # Enumオブジェクトを引数に渡す

# フォーマットの指示を作成
format_instructions = output_parser.get_format_instructions()
print(f"format_instructions: {format_instructions}") # format_instructions: Select one of the following options: red, blue, green

# プロンプトのテンプレートを作成
prompt_template = PromptTemplate(
    template="{text}は、正しい？ .\n{format_instructions}", # プロンプトテンプレート
    input_variables=["text"],
    partial_variables={"format_instructions": format_instructions} # フォーマットの指示をプロンプトに設定
)

format_instructions: Select one of the following options: yes, no


### Output Example

In [45]:
# プロンプトを作成
prompt = prompt_template.format(text="1+1=2")
prompt

'1+1=2は、正しい？ .\nSelect one of the following options: yes, no'

In [46]:
# LLMに応答を出力させる
output = model.invoke(prompt)
print(output.content)

yes


In [47]:
# 出力をパースする
result = output_parser.parse(output.content)
print(result)
# Enum型で出力される

ANSWER.YES


## Auto-fixing parser（OutputFixingParser）

Auto-fixing parser（OutputFixingParser）は、LLM（モデル）の応答を他のOutput Parserで変換しようとして例外が発生した時に、別のLLMに修正依頼を出すことができます。

### Error

In [67]:
class Person(BaseModel):
    name: str = Field(description="人物名")
    address: str = Field(description="住所")
        
parser = PydanticOutputParser(pydantic_object=Person)

# misformatted_outputは解析できない回答とする
misformatted_output = "{'name': 'NameName', 'address': 'AddressAddress'}"

# 回答をパースしようとする
parser.parse(misformatted_output)

OutputParserException: Invalid json output: {'name': 'NameName', 'address': 'AddressAddress'}

### Fix

In [83]:
# OutputFixingParserでPydanticOutputParserをラップする
output_fixing_parser = OutputFixingParser.from_llm(parser=parser,llm=model)

# OutputFixingParserでパースする
result = output_fixing_parser.parse(misformatted_output)

print(result)

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'content': 'Instructions:\n--------------\nThe output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"name": {"description": "\\u4eba\\u7269\\u540d", "title": "Name", "type": "string"}, "address": {"description": "\\u4f4f\\u6240", "title": "Address", "type": "string"}}, "required": ["name", "address"]}\n```\n--------------\nCompletion:\n--------------\n{\'name\': \'NameName\', \'address\': \'AddressAddress\'}\n--------------\n\nAbove, the Completi

In [84]:
# LLMに送られるプロンプトテンプレート
print(output_fixing_parser.retry_chain.prompt.template)

Instructions:
--------------
{instructions}
--------------
Completion:
--------------
{completion}
--------------

Above, the Completion did not satisfy the constraints given in the Instructions.
Error:
--------------
{error}
--------------

Please try again. Please only respond with an answer that satisfies the constraints laid out in the Instructions:
