# 如何从模型返回结构化数据
:::info 前提条件
本指南假定您熟悉以下概念：- [聊天模型](/docs/concepts/chat_models)- [函数/工具调用](/docs/concepts/tool_calling):::
让模型返回符合特定[模式](/docs/concepts/structured_outputs/)的输出通常很有用。一个常见的应用场景是从文本中提取数据，以便插入数据库或与其他下游系统一起使用。本指南介绍了几种从模型获取结构化输出的策略。
## `.with_structured_output()` 方法
<span data-heading-keywords="with_structured_output"></span>

:::info 支持的模型
您可以在此处找到[支持此方法的模型列表](/docs/integrations/chat/)。
:::
这是获取结构化输出的最简单且最可靠的方法。`with_structured_output()` 专为[提供原生结构化输出API的模型](/docs/integrations/chat/)（如工具/函数调用或JSON模式）实现，并在底层利用这些功能。
该方法接收一个模式（schema）作为输入，该模式指定了期望输出属性的名称、类型和描述。该方法返回一个类似模型的 Runnable 对象，但与输出字符串或[消息](/docs/concepts/messages/)不同，它会输出与给定模式对应的对象。该模式可以指定为 TypedDict 类、[JSON Schema](https://json-schema.org/) 或 Pydantic 类。如果使用 TypedDict 或 JSON Schema，则 Runnable 将返回一个字典；如果使用 Pydantic 类，则返回一个 Pydantic 对象。
举个例子，我们让模型生成一个笑话，并将铺垫部分与笑点分开：
import ChatModelTabs from "@theme/ChatModelTabs";
<ChatModelTabs
customVarName="llm"/>

In [1]:
# | output: false
# | echo: false

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0)

### Pydantic 类
如果我们希望模型返回一个 Pydantic 对象，只需传入所需的 Pydantic 类即可。使用 Pydantic 的主要优势在于，模型生成的输出将经过验证。如果缺少任何必填字段或任何字段类型错误，Pydantic 都会引发错误。

In [2]:
from typing import Optional

from pydantic import BaseModel, Field


# Pydantic
class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(
        default=None, description="How funny the joke is, from 1 to 10"
    )


structured_llm = llm.with_structured_output(Joke)

structured_llm.invoke("Tell me a joke about cats")

Joke(setup='Why was the cat sitting on the computer?', punchline='Because it wanted to keep an eye on the mouse!', rating=7)

:::提示除了Pydantic类的结构之外，Pydantic类的名称、文档字符串（docstring）、参数的名称及其提供的描述也非常重要。大多数情况下，`with_structured_output`会使用模型的功能/工具调用API，你可以有效地认为所有这些信息都被添加到了模型提示（prompt）中。好的,我会严格按照要求进行翻译,确保markdown格式一致。以下是一个示例翻译:

# Getting Started with Markdown

Markdown is a lightweight markup language that you can use to add formatting elements to plaintext text documents. 

## Basic Syntax

Here are some of the most commonly used markdown syntax:

- **Headers**: Use `#` for h1, `##` for h2 etc.
- *Emphasis*: Use `*` or `_` for *italic*, `**` or `__` for **bold**
- Lists:
  - Unordered lists use `-`, `*` or `+`
  - Ordered lists use numbers
- `Code`: Wrap inline code with backticks (`` ` ``)
- [Links](https://example.com): `[text](URL)`
- Images: `![alt text](image.jpg)`

> Blockquotes are created using `>` characters

```python
# Code blocks
def hello():
    print("Hello Markdown!")
```

### TypedDict 或 JSON 模式
如果你不想使用 Pydantic、明确不希望进行参数验证，或者希望能够流式传输模型输出，你可以使用 TypedDict 类来定义你的模式。我们可以选择使用 LangChain 支持的特殊 `Annotated` 语法，该语法允许你指定字段的默认值和描述。请注意，如果模型未生成该字段，默认值*不会*自动填充，它仅用于传递给模型的模式定义中。
:::info 要求
- 核心库：`langchain-core>=0.2.26`- 类型扩展：强烈建议从 `typing_extensions` 导入 `Annotated` 和 `TypedDict`，而非从 `typing` 导入，以确保在不同 Python 版本中的行为一致。
:::

In [3]:
from typing import Optional

from typing_extensions import Annotated, TypedDict


# TypedDict
class Joke(TypedDict):
    """Joke to tell user."""

    setup: Annotated[str, ..., "The setup of the joke"]

    # Alternatively, we could have specified setup as:

    # setup: str                    # no default, no description
    # setup: Annotated[str, ...]    # no default, no description
    # setup: Annotated[str, "foo"]  # default, no description

    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"]


structured_llm = llm.with_structured_output(Joke)

structured_llm.invoke("Tell me a joke about cats")

{'setup': 'Why was the cat sitting on the computer?',
 'punchline': 'Because it wanted to keep an eye on the mouse!',
 'rating': 7}

同样地，我们可以传入一个 [JSON Schema](https://json-schema.org/) 字典。这种方式无需导入任何模块或类，能非常清晰地展示每个参数的文档说明，但代价是代码会略显冗长。

In [4]:
json_schema = {
    "title": "joke",
    "description": "Joke to tell user.",
    "type": "object",
    "properties": {
        "setup": {
            "type": "string",
            "description": "The setup of the joke",
        },
        "punchline": {
            "type": "string",
            "description": "The punchline to the joke",
        },
        "rating": {
            "type": "integer",
            "description": "How funny the joke is, from 1 to 10",
            "default": None,
        },
    },
    "required": ["setup", "punchline"],
}
structured_llm = llm.with_structured_output(json_schema)

structured_llm.invoke("Tell me a joke about cats")

{'setup': 'Why was the cat sitting on the computer?',
 'punchline': 'Because it wanted to keep an eye on the mouse!',
 'rating': 7}

### 在多个模式之间进行选择
让模型从多个模式中进行选择的最简单方法是创建一个具有联合类型属性的父模式。
#### 使用 Pydantic

In [7]:
from typing import Union


class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(
        default=None, description="How funny the joke is, from 1 to 10"
    )


class ConversationalResponse(BaseModel):
    """Respond in a conversational manner. Be kind and helpful."""

    response: str = Field(description="A conversational response to the user's query")


class FinalResponse(BaseModel):
    final_output: Union[Joke, ConversationalResponse]


structured_llm = llm.with_structured_output(FinalResponse)

structured_llm.invoke("Tell me a joke about cats")

FinalResponse(final_output=Joke(setup='Why was the cat sitting on the computer?', punchline='Because it wanted to keep an eye on the mouse!', rating=7))

In [8]:
structured_llm.invoke("How are you today?")

FinalResponse(final_output=ConversationalResponse(response="I'm just a computer program, so I don't have feelings, but I'm here and ready to help you with whatever you need!"))

#### 使用 TypedDict

In [9]:
from typing import Optional, Union

from typing_extensions import Annotated, TypedDict


class Joke(TypedDict):
    """Joke to tell user."""

    setup: Annotated[str, ..., "The setup of the joke"]
    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"]


class ConversationalResponse(TypedDict):
    """Respond in a conversational manner. Be kind and helpful."""

    response: Annotated[str, ..., "A conversational response to the user's query"]


class FinalResponse(TypedDict):
    final_output: Union[Joke, ConversationalResponse]


structured_llm = llm.with_structured_output(FinalResponse)

structured_llm.invoke("Tell me a joke about cats")

{'final_output': {'setup': 'Why was the cat sitting on the computer?',
  'punchline': 'Because it wanted to keep an eye on the mouse!',
  'rating': 7}}

In [10]:
structured_llm.invoke("How are you today?")

{'final_output': {'response': "I'm just a computer program, so I don't have feelings, but I'm here and ready to help you with whatever you need!"}}

响应应当与Pydantic示例中展示的完全一致。

或者，你也可以直接使用工具调用功能，让模型在选项之间进行选择（前提是[你选用的模型支持此功能](/docs/integrations/chat/)）。这种方法需要稍多的解析和设置工作，但在某些情况下能带来更好的性能表现，因为你无需使用嵌套模式。更多细节请参阅[这份操作指南](/docs/how_to/tool_calling)。

### 流式传输
当输出类型为字典（即模式被指定为TypedDict类或JSON Schema字典）时，我们可以从结构化模型中流式传输输出。
:::信息
请注意，所生成的是已经聚合的块，而非增量数据。
:::

In [11]:
from typing_extensions import Annotated, TypedDict


# TypedDict
class Joke(TypedDict):
    """Joke to tell user."""

    setup: Annotated[str, ..., "The setup of the joke"]
    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"]


structured_llm = llm.with_structured_output(Joke)

for chunk in structured_llm.stream("Tell me a joke about cats"):
    print(chunk)

{}
{'setup': ''}
{'setup': 'Why'}
{'setup': 'Why was'}
{'setup': 'Why was the'}
{'setup': 'Why was the cat'}
{'setup': 'Why was the cat sitting'}
{'setup': 'Why was the cat sitting on'}
{'setup': 'Why was the cat sitting on the'}
{'setup': 'Why was the cat sitting on the computer'}
{'setup': 'Why was the cat sitting on the computer?'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': ''}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an'}
{'setup': 'Why was the cat sitting on the computer?', 'punc

### 少量样本提示
对于更复杂的模式，在提示中添加少量示例非常有用。这可以通过几种方式实现。
最简单且最通用的方法是在提示中的系统消息里添加示例：

In [12]:
from langchain_core.prompts import ChatPromptTemplate

system = """You are a hilarious comedian. Your specialty is knock-knock jokes. \
Return a joke which has the setup (the response to "Who's there?") and the final punchline (the response to "<setup> who?").

Here are some examples of jokes:

example_user: Tell me a joke about planes
example_assistant: {{"setup": "Why don't planes ever get tired?", "punchline": "Because they have rest wings!", "rating": 2}}

example_user: Tell me another joke about planes
example_assistant: {{"setup": "Cargo", "punchline": "Cargo 'vroom vroom', but planes go 'zoom zoom'!", "rating": 10}}

example_user: Now about caterpillars
example_assistant: {{"setup": "Caterpillar", "punchline": "Caterpillar really slow, but watch me turn into a butterfly and steal the show!", "rating": 5}}"""

prompt = ChatPromptTemplate.from_messages([("system", system), ("human", "{input}")])

few_shot_structured_llm = prompt | structured_llm
few_shot_structured_llm.invoke("what's something funny about woodpeckers")

{'setup': 'Woodpecker',
 'punchline': "Woodpecker you a joke, but I'm afraid it might be too 'hole-some'!",
 'rating': 7}

当构建输出的底层方法是工具调用时，我们可以将示例作为显式工具调用传入。您可以通过API参考文档确认当前使用的模型是否支持工具调用功能。

In [13]:
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage

examples = [
    HumanMessage("Tell me a joke about planes", name="example_user"),
    AIMessage(
        "",
        name="example_assistant",
        tool_calls=[
            {
                "name": "joke",
                "args": {
                    "setup": "Why don't planes ever get tired?",
                    "punchline": "Because they have rest wings!",
                    "rating": 2,
                },
                "id": "1",
            }
        ],
    ),
    # Most tool-calling models expect a ToolMessage(s) to follow an AIMessage with tool calls.
    ToolMessage("", tool_call_id="1"),
    # Some models also expect an AIMessage to follow any ToolMessages,
    # so you may need to add an AIMessage here.
    HumanMessage("Tell me another joke about planes", name="example_user"),
    AIMessage(
        "",
        name="example_assistant",
        tool_calls=[
            {
                "name": "joke",
                "args": {
                    "setup": "Cargo",
                    "punchline": "Cargo 'vroom vroom', but planes go 'zoom zoom'!",
                    "rating": 10,
                },
                "id": "2",
            }
        ],
    ),
    ToolMessage("", tool_call_id="2"),
    HumanMessage("Now about caterpillars", name="example_user"),
    AIMessage(
        "",
        tool_calls=[
            {
                "name": "joke",
                "args": {
                    "setup": "Caterpillar",
                    "punchline": "Caterpillar really slow, but watch me turn into a butterfly and steal the show!",
                    "rating": 5,
                },
                "id": "3",
            }
        ],
    ),
    ToolMessage("", tool_call_id="3"),
]
system = """You are a hilarious comedian. Your specialty is knock-knock jokes. \
Return a joke which has the setup (the response to "Who's there?") \
and the final punchline (the response to "<setup> who?")."""

prompt = ChatPromptTemplate.from_messages(
    [("system", system), ("placeholder", "{examples}"), ("human", "{input}")]
)
few_shot_structured_llm = prompt | structured_llm
few_shot_structured_llm.invoke({"input": "crocodiles", "examples": examples})

{'setup': 'Crocodile',
 'punchline': 'Crocodile be seeing you later, alligator!',
 'rating': 6}

有关使用工具调用时的小样本提示的更多信息，请参阅[此处](/docs/how_to/tools_few_shot/)。

### （高级）指定结构化输出的方法
对于支持多种输出结构方式的模型（即同时支持工具调用和JSON模式），您可以通过`method=`参数指定要使用的方法。
:::info JSON 模式
如果使用 JSON 模式，你仍需在模型提示中指定所需的模式。传递给 `with_structured_output` 的模式仅用于解析模型输出，不会像工具调用那样直接传递给模型。
要查看您使用的模型是否支持 JSON 模式，请查阅 [API 参考](https://python.langchain.com/api_reference/langchain/index.html) 中该模型的条目。
好的,我会按照您的要求进行翻译,确保输出标准的markdown格式内容,不显示任何额外标记。以下是一个示例翻译:

# 欢迎使用翻译助手

这是一个标准的markdown格式翻译示例:

## 二级标题示例

- 列表项1
- 列表项2
- 列表项3

**加粗文本** 和 *斜体文本*

> 引用区块示例

`行内代码` 和 

```
代码块
```

[链接文本](https://example.com)

![图片描述](image.jpg)

表格示例:

| 列1 | 列2 | 列3 |
|-----|-----|-----|
| 数据1 | 数据2 | 数据3 |
| 数据4 | 数据5 | 数据6 |

请提供您需要翻译的具体英文内容,我会保持相同的markdown格式进行翻译。

In [14]:
structured_llm = llm.with_structured_output(None, method="json_mode")

structured_llm.invoke(
    "Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys"
)

{'setup': 'Why was the cat sitting on the computer?',
 'punchline': 'Because it wanted to keep an eye on the mouse!'}

### （高级）原始输出
大型语言模型（LLMs）在生成结构化输出时并不完美，尤其是当数据结构变得复杂时。您可以通过传递参数 `include_raw=True` 来避免抛出异常，并自行处理原始输出。这将改变输出格式，使其包含原始消息输出、解析后的值（如果成功）以及可能出现的错误：

In [17]:
structured_llm = llm.with_structured_output(Joke, include_raw=True)

structured_llm.invoke("Tell me a joke about cats")

{'raw': AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_f25ZRmh8u5vHlOWfTUw8sJFZ', 'function': {'arguments': '{"setup":"Why was the cat sitting on the computer?","punchline":"Because it wanted to keep an eye on the mouse!","rating":7}', 'name': 'Joke'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 33, 'prompt_tokens': 93, 'total_tokens': 126}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'stop', 'logprobs': None}, id='run-d880d7e2-df08-4e9e-ad92-dfc29f2fd52f-0', tool_calls=[{'name': 'Joke', 'args': {'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!', 'rating': 7}, 'id': 'call_f25ZRmh8u5vHlOWfTUw8sJFZ', 'type': 'tool_call'}], usage_metadata={'input_tokens': 93, 'output_tokens': 33, 'total_tokens': 126}),
 'parsed': {'setup': 'Why was the cat sitting on the computer?',
  'punchline': 'Because it wanted to keep an eye on the m

## 直接提示与解析模型输出
并非所有模型都支持 `.with_structured_output()` 方法，因为并非所有模型都具备工具调用或 JSON 模式支持功能。对于此类模型，您需要直接提示模型使用特定格式，并通过输出解析器从原始模型输出中提取结构化响应。
### 使用 `PydanticOutputParser`
以下示例使用内置的 [`PydanticOutputParser`](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html) 来解析聊天模型的输出，该模型被提示匹配给定的 Pydantic 模式。请注意，我们通过解析器的方法直接将 `format_instructions` 添加到提示中：

In [31]:
from typing import List

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field


class Person(BaseModel):
    """Information about a person."""

    name: str = Field(..., description="The name of the person")
    height_in_meters: float = Field(
        ..., description="The height of the person expressed in meters."
    )


class People(BaseModel):
    """Identifying information about all people in a text."""

    people: List[Person]


# Set up a parser
parser = PydanticOutputParser(pydantic_object=People)

# Prompt
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the user query. Wrap the output in `json` tags\n{format_instructions}",
        ),
        ("human", "{query}"),
    ]
).partial(format_instructions=parser.get_format_instructions())

让我们看看发送给模型的信息有哪些：

In [37]:
query = "Anna is 23 years old and she is 6 feet tall"

print(prompt.invoke({"query": query}).to_string())

System: Answer the user query. Wrap the output in `json` tags
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Identifying information about all people in a text.", "properties": {"people": {"title": "People", "type": "array", "items": {"$ref": "#/definitions/Person"}}}, "required": ["people"], "definitions": {"Person": {"title": "Person", "description": "Information about a person.", "type": "object", "properties": {"name": {"title": "Name", "description": "The name of the person", "type": "string"}, "height_in_meters": {"title": "Height In Meters", "description": "The heig

现在让我们来调用它：

In [9]:
chain = prompt | llm | parser

chain.invoke({"query": query})

People(people=[Person(name='Anna', height_in_meters=1.8288)])

要深入了解如何结合提示技术使用输出解析器来生成结构化输出，请参阅[本指南](/docs/how_to/output_parser_structured)。
### 自定义解析
你也可以使用 [LangChain 表达式语言 (LCEL)](/docs/concepts/lcel) 创建自定义提示和解析器，通过普通函数来解析模型的输出：

In [10]:
import json
import re
from typing import List

from langchain_core.messages import AIMessage
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field


class Person(BaseModel):
    """Information about a person."""

    name: str = Field(..., description="The name of the person")
    height_in_meters: float = Field(
        ..., description="The height of the person expressed in meters."
    )


class People(BaseModel):
    """Identifying information about all people in a text."""

    people: List[Person]


# Prompt
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the user query. Output your answer as JSON that  "
            "matches the given schema: ```json\n{schema}\n```. "
            "Make sure to wrap the answer in ```json and ``` tags",
        ),
        ("human", "{query}"),
    ]
).partial(schema=People.schema())


# Custom parser
def extract_json(message: AIMessage) -> List[dict]:
    """Extracts JSON content from a string where JSON is embedded between ```json and ``` tags.

    Parameters:
        text (str): The text containing the JSON content.

    Returns:
        list: A list of extracted JSON strings.
    """
    text = message.content
    # Define the regular expression pattern to match JSON blocks
    pattern = r"```json(.*?)```"

    # Find all non-overlapping matches of the pattern in the string
    matches = re.findall(pattern, text, re.DOTALL)

    # Return the list of matched JSON strings, stripping any leading or trailing whitespace
    try:
        return [json.loads(match.strip()) for match in matches]
    except Exception:
        raise ValueError(f"Failed to parse: {message}")

以下是发送给模型的提示：

In [11]:
query = "Anna is 23 years old and she is 6 feet tall"

print(prompt.format_prompt(query=query).to_string())

System: Answer the user query. Output your answer as JSON that  matches the given schema: ```json
{'title': 'People', 'description': 'Identifying information about all people in a text.', 'type': 'object', 'properties': {'people': {'title': 'People', 'type': 'array', 'items': {'$ref': '#/definitions/Person'}}}, 'required': ['people'], 'definitions': {'Person': {'title': 'Person', 'description': 'Information about a person.', 'type': 'object', 'properties': {'name': {'title': 'Name', 'description': 'The name of the person', 'type': 'string'}, 'height_in_meters': {'title': 'Height In Meters', 'description': 'The height of the person expressed in meters.', 'type': 'number'}}, 'required': ['name', 'height_in_meters']}}}
```. Make sure to wrap the answer in ```json and ``` tags
Human: Anna is 23 years old and she is 6 feet tall


以下是我们调用它时的效果：

In [12]:
chain = prompt | llm | extract_json

chain.invoke({"query": query})

[{'people': [{'name': 'Anna', 'height_in_meters': 1.8288}]}]