# Quickstart

语言模型输出文本。但很多时候，你可能想要得到更多结构化的信息，而不仅仅是短信回复。这就是输出解析器的用武之地。

输出解析器是帮助构建语言模型响应的类。输出解析器必须实现两个主要方法

- Get格式指令:一个返回字符串的方法，其中包含语言模型的输出应该如何格式化的指令。
- Parse:一种接受字符串(假设是语言模型的响应)并将其解析为某种结构的方法。

还有一个可选的

`Parse with prompt`:一种方法，它接受一个字符串(假设是来自语言模型的响应)和一个提示(假设是生成这样一个响应的提示)，并将其解析为某种结构。提示符主要是在OutputParser想要以某种方式重试或修复输出，并且需要来自提示符的信息才能这样做的情况下提供的。



## 开始
下面我们将介绍主要类型的输出解析器 `PydanticOutputParser`

In [2]:
from dotenv import load_dotenv, find_dotenv
from langchain.globals import set_debug

load_dotenv(find_dotenv())
set_debug(False)

In [3]:
from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import OpenAI

model = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0.0)

In [4]:
# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # You can add custom validation logic easily with Pydantic.
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field

In [10]:
# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)
parser.get_format_instructions()

'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}\n```'

In [9]:
parser.get_input_schema()

pydantic.v1.main.PydanticOutputParserInput

In [11]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

In [13]:
res = prompt.invoke({"query": "Tell me a joke."})
res.to_string()

'Answer the user query.\nThe output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}\n```\nTell me a joke.\n'

In [14]:


# And a query intended to prompt a language model to populate the data structure.
prompt_and_model = prompt | model
output = prompt_and_model.invoke({"query": "Tell me a joke."})
output

'\n{\n    "setup": "Why did the tomato turn red?",\n    "punchline": "Because it saw the salad dressing!"\n}'

In [15]:
parser.invoke(output)

Joke(setup='Why did the tomato turn red?', punchline='Because it saw the salad dressing!')

### LCEL
输出解析器实现Runnable接口，这是LangChain表达式语言(LCEL)的基本构建块。这意味着它们支持invoke, ainvoke, stream, aststream, batch, abbatch, aststream日志调用。

输出解析器接受字符串或BaseMessage作为输入，并且可以返回任意类型。




In [16]:
parser.invoke(output)

Joke(setup='Why did the tomato turn red?', punchline='Because it saw the salad dressing!')

除了手动调用解析器之外，我们还可以直接将其添加到Runnable序列中

In [17]:
chain = prompt | model | parser
chain.invoke({"query": "Tell me a joke."})

Joke(setup='Why did the tomato turn red?', punchline='Because it saw the salad dressing!')

虽然所有解析器都支持流接口，但只有某些解析器可以流式传输部分解析过的对象，因为这高度依赖于输出类型。不能构造部分对象的解析器将简单地生成完全解析的输出。

例如，`SimpleJsonOutputParser`可以流式传输部分输出

In [18]:
from langchain.output_parsers.json import SimpleJsonOutputParser

json_prompt = PromptTemplate.from_template(
    "Return a JSON object with an `answer` key that answers the following question: {question}"
)
json_parser = SimpleJsonOutputParser()
json_chain = json_prompt | model | json_parser
list(json_chain.stream({"question": "Who invented the microscope?"}))

[{},
 {'answer': ''},
 {'answer': 'Ant'},
 {'answer': 'Anton'},
 {'answer': 'Antonie'},
 {'answer': 'Antonie van'},
 {'answer': 'Antonie van Lee'},
 {'answer': 'Antonie van Leeu'},
 {'answer': 'Antonie van Leeuwen'},
 {'answer': 'Antonie van Leeuwenho'},
 {'answer': 'Antonie van Leeuwenhoek'}]

而PydanticOutputParser不能

In [22]:
list(chain.stream({"query": "Tell me a joke."}))

OutputParserException: Invalid json output: 

## 自定义输出解析器

在某些情况下，您可能希望实现一个自定义解析器，以将模型输出结构为自定义格式。

有两种方法可以实现自定义解析器     
1. 在LCEL中使用`RunnableLambda`或`RunnableGenerator`，我们强烈建议在大多数用例中使用它
2. 通过从一个基类继承来进行解析，这是一种困难的方法

这两种方法之间的区别大多是表面上的，主要在于哪种回调被触发(例如，`on_chain_start` vs. `on_parser_start`)，以及在LangSmith这样的跟踪平台中如何可视化可运行的lambda和解析器。


### Runnable Lambdas and Generators
推荐的解析方法是使用可运行的lambdas和可运行的生成器!     
在这里，我们将进行一个简单的解析，将模型输出的大小写颠倒过来。    
例如，如果模型输出:Meow，解析器将生成Meow。   

In [3]:
from typing import Iterable

from langchain_anthropic.chat_models import ChatAnthropic
from langchain_openai import ChatOpenAI
from langchain_core.messages import AIMessage, AIMessageChunk

model = ChatOpenAI()

def parse(ai_message: AIMessage) -> str:
    """Parse the AI message."""
    print(ai_message)
    return ai_message.content.swapcase()


chain = model | parse
chain.invoke("hello")

content='Hello! How can I assist you today?' response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 8, 'total_tokens': 17}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_a450710239', 'finish_reason': 'stop', 'logprobs': None} id='run-1ae9de5f-42e2-4a38-aae4-c5e548e4fea6-0'


'hELLO! hOW CAN i ASSIST YOU TODAY?'

> 当使用|语法组合时，LCEL自动将函数parse升级为`RunnableLambda(parse)`。如果你不喜欢，你可以手动导入`RunnableLambda`，然后输入`runparse = RunnableLambda(parse)`。

流媒体工作吗?

In [4]:
for chunk in chain.stream("tell me about yourself in one sentence"):
    print(chunk, end="|", flush=True)

content='I am a dedicated and passionate individual who strives to make a positive impact in the world.' response_metadata={'finish_reason': 'stop'} id='run-b30a9a18-589c-4ba4-b2c6-10beb3a55401'
i AM A DEDICATED AND PASSIONATE INDIVIDUAL WHO STRIVES TO MAKE A POSITIVE IMPACT IN THE WORLD.|

不，它不会，因为解析器在解析输出之前聚合输入。

如果我们想要实现一个流解析器，我们可以让解析器在输入上接受一个可迭代对象，并在结果可用时生成结果。

In [7]:
from langchain_core.runnables import RunnableGenerator

def streaming_parse(chunks: Iterable[AIMessageChunk]) -> Iterable[str]:
    print("----",list(chunks))
    for chunk in chunks:
        yield chunk.content.swapcase()

streaming_parse = RunnableGenerator(streaming_parse)
streaming_parse

RunnableGenerator(streaming_parse)

In [8]:
chain = model | streaming_parse
chain.invoke("hello")

---- [AIMessage(content='Hello! How can I assist you today?', response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 8, 'total_tokens': 17}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-92416f60-3a2f-44d4-93c7-5118e3d4ea63-0')]


让我们确认一下流媒体是否有效

In [9]:
for chunk in chain.stream("tell me about yourself in one sentence"):
    print(chunk, end="|", flush=True)

---- [AIMessageChunk(content='', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content='I', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content=' am', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content=' a', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content=' hard', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content='working', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content=' and', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content=' dedicated', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content=' individual', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content=' with', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content=' a', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content=' passion', id='run-50585591-d4cf-4ebf-97eb-1d8a26d5546b'), AIMessageChunk(content=' for', id='r

### Inherting from Parsing Base Classes
实现解析器的另一种方法是从`BaseOutputParser`、`BaseGenerationOutputParser`或其他基本解析器中继承，具体取决于您需要做什么。

一般来说，我们不推荐这种方法用于大多数用例，因为它会导致编写更多的代码，而没有显著的好处。

最简单的输出解析器扩展了`BaseOutputParser`类，并且必须实现以下方法
- `Parse`:从模型中获取字符串输出并对其进行解析
- (可选)`type`:标识解析器的名称。

当聊天模型或LLM的输出格式不正确时，可以抛出OutputParserException，表明解析由于错误的输入而失败。使用此异常允许使用解析器的代码以一致的方式处理异常。

因为`BaseOutputParser`实现了`Runnable`接口，所以你以这种方式创建的任何自定义解析器都将成为有效的LangChain `Runnables`，并将受益于自动异步支持、批处理接口、日志记录支持等。

#### Simple Parser
下面是一个简单的解析器，它可以解析布尔值的字符串表示形式(例如YES或NO)并将其转换为相应的布尔类型。

In [14]:
from langchain_core.exceptions import OutputParserException
from langchain_core.output_parsers import BaseOutputParser


# The [bool] desribes a parameterization of a generic.
# It's basically indicating what the return type of parse is
# in this case the return type is either True or False
class BooleanOutputParser(BaseOutputParser[bool]):
    """Custom boolean parser."""

    true_val: str = "YES"
    false_val: str = "NO"

    def parse(self, text: str) -> bool:
        cleaned_text = text.strip().upper()
        print('------', cleaned_text)
        print('******', self.true_val.upper())
        print('++++++', self.false_val.upper())
        if cleaned_text not in (self.true_val.upper(), self.false_val.upper()):
            raise OutputParserException(
                f"BooleanOutputParser expected output value to either be "
                f"{self.true_val} or {self.false_val} (case-insensitive). "
                f"Received {cleaned_text}."
            )
        return cleaned_text == self.true_val.upper()

    @property
    def _type(self) -> str:
        return "boolean_output_parser"

In [15]:
parser = BooleanOutputParser()
parser.invoke("YES")

------ YES
****** YES
++++++ NO


True

In [16]:
try:
    parser.invoke("MEOW")
except Exception as e:
    print(f"Triggered an exception of type: {type(e)}")

------ MEOW
****** YES
++++++ NO
Triggered an exception of type: <class 'langchain_core.exceptions.OutputParserException'>


让我们测试一下如何更改参数化

In [17]:
parser = BooleanOutputParser(true_val="OKAY")
parser.invoke("OKAY")

------ OKAY
****** OKAY
++++++ NO


True

让我们确认是否存在其他LCEL方法

In [18]:
parser.batch(["OKAY", "NO"])

------ OKAY
****** OKAY
++++++ NO
------ NO
****** OKAY
++++++ NO


[True, False]

In [19]:
await parser.abatch(["OKAY", "NO"])

------------ NO
****** OKAY
++++++ NO
 OKAY
****** OKAY
++++++ NO


[True, False]

In [None]:
from langchain_anthropic.chat_models import ChatAnthropic

anthropic = ChatAnthropic(model_name="claude-2.1")
anthropic.invoke("say OKAY or NO")

In [20]:
from langchain_openai import ChatOpenAI

anthropic = ChatOpenAI()
anthropic.invoke("say OKAY or NO")

AIMessage(content='OKAY', response_metadata={'token_usage': {'completion_tokens': 2, 'prompt_tokens': 12, 'total_tokens': 14}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-ed1828fb-6559-4a75-bc03-f3ba43be6a17-0')

让我们测试一下解析器是否有效

In [21]:
chain = anthropic | parser
chain.invoke("say OKAY or NO")

------ OKAY
****** OKAY
++++++ NO


True

### Parsing Raw Model Outputs
有时，除了原始文本之外，模型输出中还有其他重要的元数据。这方面的一个例子是工具调用，其中打算传递给被调用函数的参数在单独的属性中返回。如果您需要这种细粒度的控制，您可以改为继承`BaseGenerationOutputParser`类。

该类需要一个方法解析结果。该方法接受原始模型输出(例如，`Generation`或`ChatGeneration`列表)并返回解析后的输出。

同时支持`Generation`和`ChatGeneration`允许解析器与常规llm以及聊天模型一起工作。

In [24]:
from typing import List

from langchain_core.exceptions import OutputParserException
from langchain_core.messages import AIMessage
from langchain_core.output_parsers import BaseGenerationOutputParser
from langchain_core.outputs import ChatGeneration, Generation


class StrInvertCase(BaseGenerationOutputParser[str]):
    """An example parser that inverts the case of the characters in the message.

    This is an example parse shown just for demonstration purposes and to keep
    the example as simple as possible.
    """

    def parse_result(self, result: List[Generation], *, partial: bool = False) -> str:
        """Parse a list of model Generations into a specific format.

        Args:
            result: A list of Generations to be parsed. The Generations are assumed
                to be different candidate outputs for a single model input.
                Many parsers assume that only a single generation is passed it in.
                We will assert for that
            partial: Whether to allow partial results. This is used for parsers
                     that support streaming
        """
        if len(result) != 1:
            raise NotImplementedError(
                "This output parser can only be used with a single generation."
            )
        generation = result[0]
        print('-----', generation)
        if not isinstance(generation, ChatGeneration):
            # Say that this one only works with chat generations
            raise OutputParserException(
                "This output parser can only be used with a chat generation."
            )
        return generation.message.content.swapcase()


chain = anthropic | StrInvertCase()

In [25]:
chain.invoke("Tell me a short sentence about yourself")

----- text='I am a curious and creative individual always seeking to learn and grow.' message=AIMessage(content='I am a curious and creative individual always seeking to learn and grow.', response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 14, 'total_tokens': 28}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-a1ba3c36-cc45-42d0-a67b-c09a21b6e35c-0')


'i AM A CURIOUS AND CREATIVE INDIVIDUAL ALWAYS SEEKING TO LEARN AND GROW.'

## other parser
### CSV parser
当您希望返回逗号分隔的项列表时，可以使用此输出解析器。

In [26]:
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()
print('-----', format_instructions)
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions},
)

model = ChatOpenAI(temperature=0)

chain = prompt | model | output_parser

----- Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`


In [27]:
chain.invoke({"subject": "ice cream flavors"})

['Vanilla',
 'Chocolate',
 'Strawberry',
 'Mint Chocolate Chip',
 'Cookies and Cream']

In [28]:
for s in chain.stream({"subject": "ice cream flavors"}):
    print(s)

['vanilla']
['chocolate']
['strawberry']
['mint chocolate chip']
['cookies and cream']


### Datetime parser
它的OutputParser可用于将LLM输出解析为日期时间格式。



In [29]:
from langchain.output_parsers import DatetimeOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

output_parser = DatetimeOutputParser()
print("------", output_parser.get_format_instructions())
template = """Answer the users question:

{question}

{format_instructions}"""
prompt = PromptTemplate.from_template(
    template,
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

------ Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 0516-09-30T21:08:17.621725Z, 1104-07-18T13:47:46.991974Z, 1901-10-19T15:54:08.968262Z

Return ONLY this string, no other words!


In [31]:
chain = prompt | OpenAI() | output_parser
output = chain.invoke({"question": "when was bitcoin founded?"})

In [32]:
output

datetime.datetime(2009, 1, 3, 18, 15, 5)

### Enum parser
本手册展示了如何使用Enum输出解析器。

In [34]:
from langchain.output_parsers.enum import EnumOutputParser

from enum import Enum

class Colors(Enum):
    RED = "red"
    GREEN = "green"
    BLUE = "blue"
    
parser = EnumOutputParser(enum=Colors)
print("--------", parser.get_format_instructions())
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

prompt = PromptTemplate.from_template(
    """What color eyes does this person have?

> Person: {person}

Instructions: {instructions}"""
).partial(instructions=parser.get_format_instructions())

chain = prompt | ChatOpenAI() | parser

chain.invoke({"person": "Frank Sinatra"})

-------- Select one of the following options: red, green, blue


<Colors.BLUE: 'blue'>

### JSON parser

该输出解析器允许用户指定任意JSON模式，并查询llm以获取符合该模式的输出。

请记住，大型语言模型是有漏洞的抽象!您必须使用具有足够容量的LLM来生成格式良好的JSON。在OpenAI家族中，达芬奇可以做得很可靠，但居里的能力已经急剧下降。

您可以选择使用Pydantic来声明您的数据模型。

In [35]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0)

# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")
    
# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = JsonOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser



In [36]:
parser.get_format_instructions()

'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}\n```'

In [37]:
chain.invoke({"query": joke_query})

{'setup': "Why couldn't the bicycle stand up by itself?",
 'punchline': 'Because it was two tired!'}

#### Streaming
这个输出解析器支持流。

In [38]:
for s in chain.stream({"query": joke_query}):
    print(s)

{}
{'setup': ''}
{'setup': 'Why'}
{'setup': 'Why couldn'}
{'setup': "Why couldn't"}
{'setup': "Why couldn't the"}
{'setup': "Why couldn't the bicycle"}
{'setup': "Why couldn't the bicycle find"}
{'setup': "Why couldn't the bicycle find its"}
{'setup': "Why couldn't the bicycle find its way"}
{'setup': "Why couldn't the bicycle find its way home"}
{'setup': "Why couldn't the bicycle find its way home?"}
{'setup': "Why couldn't the bicycle find its way home?", 'punchline': ''}
{'setup': "Why couldn't the bicycle find its way home?", 'punchline': 'Because'}
{'setup': "Why couldn't the bicycle find its way home?", 'punchline': 'Because it'}
{'setup': "Why couldn't the bicycle find its way home?", 'punchline': 'Because it lost'}
{'setup': "Why couldn't the bicycle find its way home?", 'punchline': 'Because it lost its'}
{'setup': "Why couldn't the bicycle find its way home?", 'punchline': 'Because it lost its bearings'}
{'setup': "Why couldn't the bicycle find its way home?", 'punchline': '

#### Without Pydantic
你也可以不用Pydantic来使用它。这将提示它返回JSON，但没有提供具体的模式应该是什么。

In [39]:
joke_query = "Tell me a joke."

parser = JsonOutputParser()

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)



In [40]:
parser.get_format_instructions()

'Return a JSON object.'

In [41]:
chain = prompt | model | parser

chain.invoke({"query": joke_query})

{'response': "Why couldn't the bicycle stand up by itself? Because it was two tired!"}

### OpenAI Functions
这些输出解析器使用OpenAI函数调用来构建其输出。这意味着它们只能用于支持函数调用的模型。有几种不同的变体

- `JsonOutputFunctionsParser`:以JSON形式返回函数调用的参数
- `PydanticOutputFunctionsParser`:将函数调用的参数作为Pydantic模型返回
- `JsonKeyOutputFunctionsParser`:以JSON形式返回函数调用中特定键的值
- `PydanticAttrOutputFunctionsParser`:返回函数调用中特定键的值作为Pydantic模型

In [42]:
from langchain_community.utils.openai_functions import (
    convert_pydantic_to_openai_function,
)
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import ChatOpenAI

In [None]:
class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")


openai_functions = [convert_pydantic_to_openai_function(Joke)]
model = ChatOpenAI(temperature=0)
prompt = ChatPromptTemplate.from_messages(
    [("system", "You are helpful assistant"), ("user", "{input}")]
)




#### JsonOutputFunctionsParser


In [None]:
from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser
parser = JsonOutputFunctionsParser()
chain = prompt | model.bind(functions=openai_functions) | parser
chain.invoke({"input": "tell me a joke"})
for s in chain.stream({"input": "tell me a joke"}):
    print(s)

#### JsonKeyOutputFunctionsParser
这只是从返回的响应中提取一个键。当您想要返回一个列表时，这很有用。

In [None]:
from typing import List

from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser

class Jokes(BaseModel):
    """Jokes to tell user."""

    joke: List[Joke]
    funniness_level: int

parser = JsonKeyOutputFunctionsParser(key_name="joke")
openai_functions = [convert_pydantic_to_openai_function(Jokes)]
chain = prompt | model.bind(functions=openai_functions) | parser
chain.invoke({"input": "tell me two jokes"})
for s in chain.stream({"input": "tell me two jokes"}):
    print(s)
    

#### PydanticOutputFunctionsParser
它建立在`JsonOutputFunctionsParser`之上，但将结果传递给`Pydantic Model`。这允许您选择进一步验证。

In [None]:
from langchain.output_parsers.openai_functions import PydanticOutputFunctionsParser
class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # You can add custom validation logic easily with Pydantic.
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field


parser = PydanticOutputFunctionsParser(pydantic_schema=Joke)
openai_functions = [convert_pydantic_to_openai_function(Joke)]
chain = prompt | model.bind(functions=openai_functions) | parser
chain.invoke({"input": "tell me a joke"})

### OpenAI Tools
这些输出解析器从OpenAI的函数调用API响应中提取工具调用。这意味着它们只能用于支持函数调用的模型，特别是最新的`tools`和`tool_choice`参数。我们建议在阅读本指南之前熟悉函数调用。

输出解析器有几种不同的变体
- `JsonOutputToolsParser`:以JSON形式返回函数调用的参数
- `JsonOutputKeyToolsParser`:以JSON形式返回函数调用中特定键的值
- `PydanticToolsParser`:将函数调用的参数作为Pydantic模型返回



In [43]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import ChatOpenAI

class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0).bind_tools([Joke])
model.kwargs["tools"]

[{'type': 'function',
  'function': {'name': 'Joke',
   'description': 'Joke to tell user.',
   'parameters': {'type': 'object',
    'properties': {'setup': {'description': 'question to set up a joke',
      'type': 'string'},
     'punchline': {'description': 'answer to resolve the joke',
      'type': 'string'}},
    'required': ['setup', 'punchline']}}}]

In [45]:
prompt = ChatPromptTemplate.from_messages(
    [("system", "You are helpful assistant"), ("user", "{input}")]
)
prompt

ChatPromptTemplate(input_variables=['input'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are helpful assistant')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='{input}'))])

#### JsonOutputToolsParser


In [47]:
from langchain.output_parsers.openai_tools import JsonOutputToolsParser
parser = JsonOutputToolsParser()
chain = prompt | model | parser
chain.invoke({"input": "tell me a joke"})

[{'args': {'setup': "Why couldn't the bicycle stand up by itself?",
   'punchline': 'Because it was two tired!'},
  'type': 'Joke'}]

要包含工具调用id，我们可以指定`return_id=True`

In [48]:
parser = JsonOutputToolsParser(return_id=True)
chain = prompt | model | parser
chain.invoke({"input": "tell me a joke"})

[{'args': {'setup': "Why couldn't the bicycle stand up by itself?",
   'punchline': 'It was two tired!'},
  'id': 'call_cP6GiMGRPEXYiEiUccHRFUfK',
  'type': 'Joke'}]

### JsonOutputKeyToolsParser
这只是从返回的响应中提取一个键。当您传入单个工具并且只想要它的参数时，这很有用。



In [None]:
from typing import List

from langchain.output_parsers.openai_tools import JsonOutputKeyToolsParser
parser = JsonOutputKeyToolsParser(key_name="Joke")
chain = prompt | model | parser
chain.invoke({"input": "tell me a joke"})




某些模型每次调用可以返回多个工具调用，因此默认情况下输出是一个列表。如果我们只想返回第一个工具调用，我们可以只指定`first_tool_only=True`

In [None]:
parser = JsonOutputKeyToolsParser(key_name="Joke", first_tool_only=True)
chain = prompt | model | parser
chain.invoke({"input": "tell me a joke"})

#### PydanticToolsParser
它建立在`JsonOutputToolsParser`之上，但将结果传递给`Pydantic Model`。这允许您选择进一步验证。

In [None]:
from langchain.output_parsers.openai_tools import PydanticToolsParser
class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # You can add custom validation logic easily with Pydantic.
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field


parser = PydanticToolsParser(tools=[Joke])
model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0).bind_tools([Joke])
chain = prompt | model | parser
chain.invoke({"input": "tell me a joke"})

#### Output-fixing parser
这个输出解析器包装了另一个输出解析器，如果第一个解析器失败，它调用另一个LLM来修复任何错误。

除了抛出错误，我们还可以做其他事情。具体来说，我们可以将格式化错误的输出和格式化的指令一起传递给模型，并要求模型对其进行修复。

对于本例，我们将使用上面的Pydantic输出解析器。下面是如果我们传递给它一个不符合模式的结果会发生的情况

In [49]:
from typing import List

from langchain.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")

actor_query = "Generate the filmography for a random actor."
parser = PydanticOutputParser(pydantic_object=Actor)
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"
parser.parse(misformatted)



OutputParserException: Invalid json output: {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}

现在我们可以构造和使用`OutputFixingParser`。这个输出解析器接受另一个输出解析器作为参数，但也接受一个LLM，用来尝试纠正任何格式错误。

In [50]:
from langchain.output_parsers import OutputFixingParser

new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())

In [51]:
new_parser.parse(misformatted)

Actor(name='Tom Hanks', film_names=['Forrest Gump'])

#### Pandas DataFrame Parser
Pandas DataFrame是Python编程语言中流行的数据结构，通常用于数据操作和分析。它为处理结构化数据提供了一组全面的工具，使其成为数据清理、转换和分析等任务的通用选择。

这个输出解析器允许用户指定任意的Pandas DataFrame，并以格式化字典的形式向llm查询数据，该字典从相应的DataFrame中提取数据。请记住，大型语言模型是有漏洞的抽象!您必须使用具有足够容量的LLM来按照定义的格式指令生成格式良好的查询。

使用Pandas DataFrame对象来声明您希望执行查询的DataFrame。

In [None]:
import pprint
from typing import Any, Dict

import pandas as pd
from langchain.output_parsers import PandasDataFrameOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0)

# Solely for documentation purposes.
def format_parser_output(parser_output: Dict[str, Any]) -> None:
    for key in parser_output.keys():
        parser_output[key] = parser_output[key].to_dict()
    return pprint.PrettyPrinter(width=4, compact=True).pprint(parser_output)


# Define your desired Pandas DataFrame.
df = pd.DataFrame(
    {
        "num_legs": [2, 4, 8, 0],
        "num_wings": [2, 0, 0, 0],
        "num_specimen_seen": [10, 2, 1, 8],
    }
)

# Set up a parser + inject instructions into the prompt template.
parser = PandasDataFrameOutputParser(dataframe=df)

# Here's an example of a column operation being performed.
df_query = "Retrieve the num_wings column."

# Set up the prompt.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
parser_output = chain.invoke({"query": df_query})

format_parser_output(parser_output)



In [None]:
# Here's an example of a row operation being performed.
df_query = "Retrieve the first row."

# Set up the prompt.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
parser_output = chain.invoke({"query": df_query})

format_parser_output(parser_output)

In [None]:
# Here's an example of a random Pandas DataFrame operation limiting the number of rows
df_query = "Retrieve the average of the num_legs column from rows 1 to 3."

# Set up the prompt.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
parser_output = chain.invoke({"query": df_query})

print(parser_output)

In [None]:
# Here's an example of a poorly formatted query
df_query = "Retrieve the mean of the num_fingers column."

# Set up the prompt.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
parser_output = chain.invoke({"query": df_query})

### Pydantic parser
该输出解析器允许用户指定任意`Pydantic Model`，并查询llm以获得符合该模式的输出。

请记住，大型语言模型是有漏洞的抽象!您必须使用具有足够容量的LLM来生成格式良好的JSON。在OpenAI家族中，达芬奇可以做得很可靠，但居里的能力已经急剧下降。

使用Pydantic来声明数据模型。`Pydantic的BaseModel`类似于Python数据类，但具有实际的类型检查和强制转换。

In [None]:
from typing import List

from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0)

# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # You can add custom validation logic easily with Pydantic.
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field


# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": joke_query})

In [None]:
# Here's another example, but with a compound typed field.
class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")


actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": actor_query})

#### Retry parser
虽然在某些情况下，可以通过查看输出来修复任何解析错误，但在其他情况下则不能。这种情况的一个例子是，输出不仅格式不正确，而且部分完整。考虑下面的例子。



In [None]:
from langchain.output_parsers import (
    OutputFixingParser,
    PydanticOutputParser,
)
from langchain_core.prompts import (
    PromptTemplate,
)
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI, OpenAI

template = """Based on the user question, provide an Action and Action Input for what step should be taken.
{format_instructions}
Question: {query}
Response:"""


class Action(BaseModel):
    action: str = Field(description="action to take")
    action_input: str = Field(description="input to the action")


parser = PydanticOutputParser(pydantic_object=Action)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

prompt_value = prompt.format_prompt(query="who is leo di caprios gf?")

bad_response = '{"action": "search"}'

parser.parse(bad_response)

 如果我们尝试使用`OutputFixingParser`来修复这个错误，它将会感到困惑——也就是说，它不知道实际应该为动作输入输入什么。

In [None]:
fix_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())
fix_parser.parse(bad_response)

相反，我们可以使用`RetryOutputParser`，它传入提示(以及原始输出)，以再次尝试获得更好的响应。

In [None]:
from langchain.output_parsers import RetryOutputParser
retry_parser = RetryOutputParser.from_llm(parser=parser, llm=OpenAI(temperature=0))
retry_parser.parse_with_prompt(bad_response, prompt_value)

我们还可以使用自定义链轻松地添加`RetryOutputparser`，该链将`RAW LLM/CHATMODEL`输出转换为更可行的格式。

In [None]:
from langchain_core.runnables import RunnableLambda, RunnableParallel

completion_chain = prompt | OpenAI(temperature=0)

main_chain = RunnableParallel(
    completion=completion_chain, prompt_value=prompt
) | RunnableLambda(lambda x: retry_parser.parse_with_prompt(**x))


main_chain.invoke({"query": "who is leo di caprios gf?"})

#### Structured output parser
当您要返回多个字段时，可以使用此输出解析器。尽管`Pydantic/JSON`解析器更强大，但这对于功能较低的模型很有用。

In [None]:
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(
        name="source",
        description="source used to answer the user's question, should be a website.",
    ),
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

现在我们得到一个字符串，其中包含如何格式化响应的指令，然后将其插入到提示符中。

In [None]:
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions},
)

model = ChatOpenAI(temperature=0)
chain = prompt | model | output_parser

In [None]:
chain.invoke({"question": "what's the capital of france?"})

In [None]:
for s in chain.stream({"question": "what's the capital of france?"}):
    print(s)

#### XML parser
这个输出解析器允许用户以流行的XML格式从LLM获得结果。

请记住，大型语言模型是有漏洞的抽象!您必须使用具有足够容量的LLM来生成格式良好的XML。

在下面的示例中，我们使用[`Claude Model`](https://docs.anththropic.com/claude/docs)，可与XML标签搭配使用。

In [None]:
from langchain.output_parsers import XMLOutputParser
from langchain_community.chat_models import ChatAnthropic
from langchain_core.prompts import PromptTemplate

model = ChatAnthropic(model="claude-2", max_tokens_to_sample=512, temperature=0.1)

让我们从对模型的简单请求开始。

In [None]:
actor_query = "Generate the shortened filmography for Tom Hanks."
output = model.invoke(
    f"""{actor_query}
Please enclose the movies in <movie></movie> tags"""
)
print(output.content)

最后，让我们添加一些标记来根据需要定制输出。

In [None]:
parser = XMLOutputParser(tags=["movies", "actor", "film", "name", "genre"])
prompt = PromptTemplate(
    template="""{query}\n{format_instructions}""",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)


chain = prompt | model | parser

output = chain.invoke({"query": actor_query})

print(output)

In [None]:
for s in chain.stream({"query": actor_query}):
    print(s)

#### YAML parser
该输出解析器允许用户指定任意模式，并使用YAML格式化响应，查询符合该模式的llm输出。

请记住，大型语言模型是有漏洞的抽象!您必须使用具有足够容量的LLM来生成格式良好的YAML。在OpenAI家族中，达芬奇可以做得很可靠，但居里的能力已经急剧下降。

您可以选择使用Pydantic来声明您的数据模型。

In [None]:
from typing import List

from langchain.output_parsers import YamlOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0)

# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")
    
# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = YamlOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": joke_query})