## 依赖安装

In [None]:
!pip install python-dotenv
!pip install openai

## 设置 OpenAI Key 环境变量

下面的 openai key 已脱敏处理，需要输入你自己的 openai key。

In [75]:
%env OPENAI_API_KEY=sk-OWDAUVJgZKKRyNVUKWN9T3Bl====xbZP7XC0

env: OPENAI_API_KEY=sk-OWDAUVJgZKKRyNVUKWN9T3Bl====xbZP7XC0


## 获取 OpenAI Key

In [6]:
import os
import openai

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

## 封装函数 prompt -> response

In [7]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    return response.choices[0].message["content"]

In [8]:
get_completion('what is 1+1?')

'As an AI language model, I can tell you that the answer to 1+1 is 2.'

## Chat API: OpenAI

下面是不使用 LangChain 的方式，利用 gpt 将文本翻译为目标语言。

In [9]:
origin_text = """
我会发着呆，然后微微笑，接着闭上眼。
"""

In [10]:
target_language = "English"

In [23]:
prompt = f"""
Please help me translate the following text into {target_language}.

```
{origin_text}
```

You only need to output the translated response.
"""
print(prompt)

'\nPlease help me translate the following text into English.\n\n```\n\n我会发着呆，然后微微笑，接着闭上眼。\n\n```\n\nYou only need to output the translated result.\n'

In [22]:
response = get_completion(prompt)
response

'"I will stare blankly, then smile slightly, and then close my eyes."'

## Chat API: LangChain

下面是通过 LangChain 的方式实现上面的例子。

In [None]:
!pip install --upgrade langchain

### Model

LangChain 对 ChatGPT API 进行了封装。temperature 参数默认为 0.7，其取值越大输出结果的随机性和创造性（瞎编）会越高，所以在这里我们设置为 0.0。

In [21]:
from langchain.chat_models import ChatOpenAI

# To control the randomness and creativity of the generated
# text by an LLM, use temperature = 0.0
chat = ChatOpenAI(temperature=0.0)
chat

ChatOpenAI(verbose=False, callbacks=None, callback_manager=None, tags=None, client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, model_name='gpt-3.5-turbo', temperature=0.0, model_kwargs={}, openai_api_key='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', openai_api_base='', openai_organization='', openai_proxy='', request_timeout=None, max_retries=6, streaming=False, n=1, max_tokens=None)

根据上面输出的内容，可以知道调用 ChatOpenAI 构造函数创建了一个 model 对象，可以知道默认使用的是 gpt-3.5-turbo 模型和temperature 参数，以及你绑定的 openai key，还有其他参数目前还用不着。

### Prompt template

概念区分：

- prompt： 一般作为可复用的模版，里面可以插入变量
- message：最终输入给 ChatGPT 的问题，message = prompt + variables

LangChain 提供了对 Prompt 更方便的操作方式：

1. 用字符串定义一段 prompt，字符串中的 `{}` 是待插入的变量；
2. 使用 `ChatPromptTemplate.from_template` 工厂函数传入该字符串格式的 prompt 作为参数，创建 prompt_template 对象；
3. 利用该对象，可以访问到 prompt 中的变量命名，以及 prompt 的模版源字符串；
4. 调用该对象上的 `format_messages` 传入变量，即可生成一个 messages list；
5. 使用之前创建的 model 对象，即 chat，传入 messages，最终返回结果。




In [50]:
prompt_template_string = """
Please help me translate the following text into {language}.

```
{text}
```

You only need to output the translated response.
"""

In [52]:
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(prompt_template_string)
prompt_template

ChatPromptTemplate(input_variables=['text', 'language'], output_parser=None, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['language', 'text'], output_parser=None, partial_variables={}, template='\nPlease help me translate the following text into {language}.\n\n```\n{text}\n```\n\nYou only need to output the translated result.\n', template_format='f-string', validate_template=True), additional_kwargs={})])

In [36]:
prompt_template.messages[0].prompt

PromptTemplate(input_variables=['language', 'text'], output_parser=None, partial_variables={}, template='\nPlease help me translate the following text into {language}.\n\n```\n{text}\n```\n\nYou only need to output the translated result.\n', template_format='f-string', validate_template=True)

In [37]:
prompt_template.messages[0].prompt.input_variables

['language', 'text']

In [54]:
origin_text = """
我会发着呆，然后微微笑，接着闭上眼。
"""
target_language = "English"

messages = prompt_template.format_messages(text = origin_text, language = target_language)
print(messages[0])

content='\nPlease help me translate the following text into English.\n\n```\n\n我会发着呆，然后微微笑，接着闭上眼。\n\n```\n\nYou only need to output the translated result.\n' additional_kwargs={} example=False


In [46]:
response = chat(messages)
print(response.content)

"I will stare blankly, then smile slightly, and then close my eyes."


如果我们想将内容翻译为法语，只需要修改 `target_language` 为 `French`。

In [53]:
origin_text = """
我会发着呆，然后微微笑，接着闭上眼。
"""
target_language = "French"

messages = prompt_template.format_messages(text = origin_text, language = target_language)
response = chat(messages)

print(response.content)

Je vais rester là à ne rien faire, puis sourire légèrement, puis fermer les yeux.


### Output Parsers

parsers 的作用是对 gpt 返回的内容解析为你所需要的格式。

下面是希望 gpt 输出的格式，即调用 `chat(messages).content` 希望返回一个 Python 中的 Dictionary 数据类型。

In [55]:
{
  "gift": False,
  "delivery_days": 5,
  "price_value": ["pretty affordable!"]
}

{'gift': False, 'delivery_days': 5, 'price_value': 'pretty affordable!'}

在不使用 parsers 的情况下，我们只能做到如下程度。

In [58]:
# 这是一段顾客的评论
customer_review = """\
这个吹叶机真的很不错，有四档风力：蜡烛吹、轻柔微风、狂风暴雨和龙卷风。\
它两天内就到了，正好赶上我给妻子的周年纪念礼物。\
我觉得我妻子非常喜欢它，甚至被它惊艳到了。\
目前我是唯一使用它的人，我一直在每隔一天早上使用它来清理我们草坪上的落叶。\
虽然它比其他吹叶机略贵一些，但我认为它的额外功能很值得购买。
"""

# 创建 template 字符串，让 gpt 分析评论，转换为 json 格式
review_to_json_prompt_template_string = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

Format the output as JSON with the following keys:
gift
delivery_days
price_value

text: {text}
"""

In [60]:
reivew_to_json_prompt_template = ChatPromptTemplate.from_template(review_to_json_prompt_template_string)
review_to_json_messages = reivew_to_json_prompt_template.format_messages(text = customer_review)

response = chat(review_to_json_messages)
print(response.content)

{
    "gift": true,
    "delivery_days": 2,
    "price_value": ["虽然它比其他吹叶机略贵一些，但我认为它的额外功能很值得购买。"]
}


这里的 `response.content` 实际上还是一个字符串，并不是我们所期待的 Dictionary 类型。

下面会借助 parsers 的能力实现这个效果。

In [61]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

以此定义每一个 json shcema 的 key，以及对其具体的描述。

In [65]:
gift_schema = ResponseSchema(name="gift", description="Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days", description="How many days did it take for the product to arrive? If this information is not found, output -1.")
price_value_schema = ResponseSchema(name="price_value", description="Extract any sentences about the value or price, and output them as a comma separated Python list.")

response_schemas = [gift_schema, delivery_days_schema, price_value_schema] # 存放到 list 中

使用 `StructuredOutputParser.from_response_schemas` 接收 `response_schemas` 创建 parser。

In [67]:
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
response_format_instructions = output_parser.get_format_instructions()

print(response_format_instructions) # 获取 parser 的提示语

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"gift": string  // Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.
	"delivery_days": string  // How many days did it take for the product to arrive? If this information is not found, output -1.
	"price_value": string  // Extract any sentences about the value or price, and output them as a comma separated Python list.
}
```


由于 parser 已经帮我们生成了格式化相关的 prompt，所以需要修改一下 template_string：

In [68]:
# 这是一段顾客的评论
customer_review = """\
这个吹叶机真的很不错，有四档风力：蜡烛吹、轻柔微风、狂风暴雨和龙卷风。\
它两天内就到了，正好赶上我给妻子的周年纪念礼物。\
我觉得我妻子非常喜欢它，甚至被它惊艳到了。\
目前我是唯一使用它的人，我一直在每隔一天早上使用它来清理我们草坪上的落叶。\
虽然它比其他吹叶机略贵一些，但我认为它的额外功能很值得购买。
"""

# 将 response_format_instructions 插入到 template_string 后面
review_to_json_prompt_template_string = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

text: {text}

{format_instructions}
"""

reivew_to_json_prompt_template = ChatPromptTemplate.from_template(review_to_json_prompt_template_string)
review_to_json_messages = reivew_to_json_prompt_template.format_messages(text = customer_review, format_instructions = response_format_instructions)

response = chat(review_to_json_messages)
print(response.content)

```json
{
	"gift": true,
	"delivery_days": "2",
	"price_value": ["虽然它比其他吹叶机略贵一些，但我认为它的额外功能很值得购买。"]
}
```


下面还需要借助 parser 来解析 `response.content`，将其转换为 Dictionary：

In [74]:
output_dict = output_parser.parse(response.content)

print(output_dict)
type(output_dict)

{'gift': True, 'delivery_days': '2', 'price_value': ['虽然它比其他吹叶机略贵一些，但我认为它的额外功能很值得购买。']}


dict