# 环境配置

In [1]:
from dotenv import load_dotenv
import os

# 加载.env
load_dotenv('.env')
# 配置openai api key
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')

# Prompt（提示器）

## Parse(输出解析器)

输出解析器是帮助结构化语言模型响应的类。输出解析器必须实现两种主要方法：

- `get_format_instructions() -> str`：该方法返回一个包含语言模型输出格式说明的字符串。
- `parse(str) -> Any`：该方法接受一个字符串（假定为语言模型的响应)，并将其解析成某种结构。

还有一个可选方法：

- `parse_with_prompt(str, PromptValue) -> Any`：该方法接受一个字符串（假定为语言模型的响应)和一个提示（假定为生成此类响应的提示)，然后将其解析成某种结构。提示在很大程度上是提供的，以防OutputParser希望以某种方式重试或修复输出，并需要提示信息来执行此操作。

下面我们介绍主要类型的输出解析器——`PydanticOutputParser`。其他选项请参见`examples`文件夹。

### PydanticOutputParser（Pydantic输出解析器#）

此输出解析器允许用户指定任意JSON模式，并查询LLM（大型语言模型）以获得符合该模式的JSON输出。

请记住，大型语言模型是有泄漏的抽象！您需要使用具有足够生成能力的LLM来生成格式良好的JSON。在OpenAI产品系列中，DaVinci可以可靠地做到这一点，但Curie的能力已经大幅下降。

使用Pydantic声明数据模型。Pydantic的BaseModel就像一个具有实际类型检查和强制转换的Python数据类。

In [2]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
 
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

In [40]:
model = OpenAI(temperature=0.0)

In [5]:
# 定义所需的数据结构
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")
 
    # You can add custom validation logic easily with Pydantic.
    # 您可以轻松地使用Pydantic添加自定义验证逻辑。验证setup 是否符合要求。
    @validator('setup')
    def question_ends_with_question_mark(cls, field):
        if field[-1] != '?':
            raise ValueError("Badly formed question!")
        return field

In [6]:
# Set up a parser + inject instructions into the prompt template.
# 设置一个解析器 + 将指令注入到提示模板中。
parser = PydanticOutputParser(pydantic_object=Joke)

In [26]:
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}
```




输出应该按照以下 JSON 模式的格式进行格式化：

例如，对于模式 {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}，
对象 {"foo": ["bar", "baz"]} 是符合模式要求的格式化实例。而对象 {"properties": {"foo": ["bar", "baz"]}} 则不符合要求。

以下是输出模式：
```
{"properties": {"setup": {"title": "设置", "description": "提问来设定一个笑话", "type": "string"}, "punchline": {"title": "结局", "description": "解答笑话的答案", "type": "string"}}, "required": ["setup", "punchline"]}
```


In [33]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

In [34]:
# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."
_input = prompt.format_prompt(query=joke_query)

In [41]:
output = model(_input.to_string())

In [43]:
output
# n{"setup": "Why did the chicken cross the road?", "punchline": "To get to the other side!"}'

'\n{"setup": "Why did the chicken cross the road?", "punchline": "To get to the other side!"}'

In [44]:
parser.parse(output)

setup='Why did the chicken cross the road?' punchline='To get to the other side!'


In [32]:
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

### 逗号分隔列表输出解析器

In [45]:
from langchain.output_parsers import CommaSeparatedListOutputParser

In [50]:
output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="List five {subject}.please use chinses reply.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions}
)

In [51]:
format_instructions
# 您的回复应该是逗号分隔值的列表，例如：`foo, bar, baz`
# Your response should be a list of comma separated values, eg: `foo, bar, baz`

'Your response should be a list of comma separated values, eg: `foo, bar, baz`'

In [52]:
_input = prompt.format(subject="冰激凌口味")
output = model(_input)
output_parser.parse(output)

['草莓', '芒果', '抹茶', '巧克力', '香草']

In [53]:
output

'\n\n草莓, 芒果, 抹茶, 巧克力, 香草'

###  Datetime

This OutputParser shows out to parse LLM output into datetime format.

In [54]:
from langchain.prompts import PromptTemplate
from langchain.output_parsers import DatetimeOutputParser
from langchain.chains import LLMChain
from langchain.llms import OpenAI

In [55]:
output_parser = DatetimeOutputParser()
template = """Answer the users question:

{question}

{format_instructions}"""
prompt = PromptTemplate.from_template(template, partial_variables={"format_instructions": output_parser.get_format_instructions()})

In [56]:
chain = LLMChain(prompt=prompt, llm=OpenAI())

In [57]:
output = chain.run("around when was bitcoin founded?")

In [59]:
output

'\n\n2008-01-03T18:15:05.000000Z'

In [60]:
output_parser.parse(output)

datetime.datetime(2008, 1, 3, 18, 15, 5)

### Enum Output Parser

In [62]:
from langchain.output_parsers.enum import EnumOutputParser
from enum import Enum

In [63]:
class Colors(Enum):
    RED = "red"
    GREEN = "green"
    BLUE = "blue"
parser = EnumOutputParser(enum=Colors)

In [65]:
parser.parse('red')

<Colors.RED: 'red'>

In [69]:
# And new lines
parser.parse(" blue\n")

<Colors.BLUE: 'blue'>

### OutputFixingParser

In [None]:
#  OutputFixingParser
# This output parser wraps another output parser and tries to fix any mistakes

# The Pydantic guardrail simply tries to parse the LLM response. If it does not parse correctly, then it errors.

# But we can do other things besides throw errors. Specifically, we can pass the misformatted output, along with the formatted instructions, to the model and ask it to fix it.

# For this example, we’ll use the above OutputParser. Here’s what happens if we pass it a result that does not comply with the schema:

# 此输出解析器将包装另一个输出解析器并尝试修复任何错误
# Pydantic护栏仅尝试解析LLM响应。如果解析不正确，则会出错。
# 但我们除了抛出错误之外还可以做其他事情。具体来说，我们可以将格式错误的输出与格式化的说明一起传递给模型，并要求其进行修复。
# 在这个例子中，我们将使用上面的OutputParser。如果我们传递一个不符合模式的结果，以下是会发生的情况：

In [72]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

In [73]:
class Actor(BaseModel):
    name: str = Field(description="name of an actor") # 演员名
    film_names: List[str] = Field(description="list of names of films they starred in") # 他们主演的电影名单
        
actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)

In [74]:
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"

In [75]:
parser.parse(misformatted)

OutputParserException: Failed to parse Actor from completion {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}. Got: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

In [76]:
from langchain.output_parsers import OutputFixingParser

new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())

In [77]:
new_parser.parse(misformatted)

Actor(name='Tom Hanks', film_names=['Forrest Gump'])

In [78]:
misformatted = "{'name': '战狼', 'film_names': ['吴京']}"
new_parser.parse(misformatted)

Actor(name='吴京', film_names=['战狼', '流浪地球'])

### RetryOutputParser 重试输出解析器
在某些情况下，仅通过查看输出就可以修复解析错误，但在其他情况下则无法实现。例如，输出的格式不仅是错误的，而且是部分完成的。请参考下面的示例。

In [88]:
template = """Based on the user question, provide an Action and Action Input for what step should be taken.please use chinese reply.
{format_instructions}
Question: {query}
Response:"""
class Action(BaseModel):
    action: str = Field(description="action to take")
    action_input: str = Field(description="input to the action")
        
parser = PydanticOutputParser(pydantic_object=Action)

In [89]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

In [90]:
# 谁是 Leo Di Caprios GF？
prompt_value = prompt.format_prompt(query="who is leo di caprios gf?")

In [92]:
bad_response = '{"action": "search"}'
parser.parse(bad_response)

OutputParserException: Failed to parse Action from completion {"action": "search"}. Got: 1 validation error for Action
action_input
  field required (type=value_error.missing)

In [93]:
fix_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())

In [94]:
fix_parser.parse(bad_response)

Action(action='search', action_input='')

In [95]:
from langchain.output_parsers import RetryWithErrorOutputParser

In [96]:
retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=OpenAI(temperature=0))

In [97]:
retry_parser.parse_with_prompt(bad_response, prompt_value)

Action(action='search', action_input='who is leo di caprios gf?')

In [102]:
print(prompt_value.to_string())

Answer the user query.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"action": {"title": "Action", "description": "action to take", "type": "string"}, "action_input": {"title": "Action Input", "description": "input to the action", "type": "string"}}, "required": ["action", "action_input"]}
```
who is leo di caprios gf?



In [103]:
model = OpenAI()
model(prompt_value.to_string())

'\n{"action": "find", "action_input": "Leo DiCaprio\'s girlfriend"}'

###  结构化输出解析器# 
虽然Pydantic/JSON解析器更强大，但我们最初尝试的数据结构只有文本字段。

In [104]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

In [105]:
response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(name="source", description="source used to answer the user's question, should be a website.")
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [107]:
print(output_parser.get_format_instructions())

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
	"answer": string  // answer to the user's question
	"source": string  // source used to answer the user's question, should be a website.
}
```


In [106]:
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)

In [108]:
model = OpenAI(temperature=0)
_input = prompt.format_prompt(question="what's the capital of france?")
output = model(_input.to_string())

In [109]:
output_parser.parse(output)

{'answer': 'Paris',
 'source': 'https://www.worldatlas.com/articles/what-is-the-capital-of-france.html'}