Skip to content

Latest commit

 

History

History
370 lines (314 loc) · 25.7 KB

几句话聊论文.MD

File metadata and controls

370 lines (314 loc) · 25.7 KB

很多论文之间重合度较高,精读其中一篇即可,更多论文只是帮助拓展下思路。

这里会把博客未覆盖的论文简单总结下, 不过这里的论文我也只是泛泛读了下,要是有错漏之处,请指出~

已覆盖论文

  • Chain-of-thought
    • Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
    • DecomP: Decomposed Prompting: A Modular Approach for Solving Complex Tasks
    • AutoCOT:AUTOMATIC CHAIN OF THOUGHT PROMPTING IN LARGE LANGUAGE MODELS
    • COMPLEXITY-BASED PROMPTING FOR MULTI-STEP REASONING
    • STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning
    • LARGE LANGUAGE MODELS CAN SELF-IMPROVE
    • Successive Prompting for Decomposing Complex Questions
    • Tree of Thoughts: Deliberate Problem Solving with Large Language Models
  • LLM Agent
    • Augmented Large Language Models with Parametric Knowledge Guiding
    • IRCOT: Interleaving Retrieval with Chain-of-Thought Reasoning for knowledge Intensive Multi-Step Question
    • ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
    • PAL: Program-aided Language Models
    • Faithful Chain-of-Thought Reasoning
    • REPLUG: Retrieval-Augmented Black-Box Language Models
    • TALM: Tool Augmented Language Models
    • Search-in-the-Chain: Towards Accurate, Credible and Traceable Large Language Models for Knowledge-intensive Tasks
    • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
    • ART: Automatic multi-step reasoning and tool-use for large language models
    • Query Rewriting for Retrieval-Augmented Large Language Models
    • RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit
  • Long Input
    • Lost in the Middle: How Language Models Use Long Contexts
  • Fast Inference
    • Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
  • Domain LLMS
    • FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis
  • Instruction Tunning
    • OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs

Chain-of-thought

LLM+COT约等于流氓会武术,谁也挡不住

1. Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

zero-shot版本的Least-to-Most。用以下Prompt激活模型先分解问题再解决问题的能力:Let's first understand the problem and devise a plan to solve the problem. Then, let’s carry out the plan and solve the problem step by step

【类比:LEAST-TO-MOST PROMPTING ENABLES COMPLEX REASONING IN LARGE LANGUAGE MODELS】

2. DecomP: Decomposed Prompting: A Modular Approach for Solving Complex Tasks

和Least-to-Most问题分解的思路类似,Decomposer few-shot prompt把问题分解成几个子问题,提出了层次化和递归的问题分解方案。

【关联:Least-to-Most,Self-Ask】

3. AutoCOT:AUTOMATIC CHAIN OF THOUGHT PROMPTING IN LARGE LANGUAGE MODELS

自动化构建COT few-shot样本的方案。先论证了few-shot使用和问题相关且多样的样本效果更好。自动化方案如下

  • 第一步使用SentenceBert对问题进行编码并聚类
  • 第二步每个cluster中从类中心向外遍历Question并使用zero-shot-COT自动生成推理,如果推理满足筛选条件(步数<=5,token<=60)则使用该指令样本来代表cluster 之后直接使用以上生成的cluster样本来作为few-shot-cot

4. COMPLEXITY-BASED PROMPTING FOR MULTI-STEP REASONING

探索如何进一步提升COT的效果

  • 结论1:更难(推理步骤更多)的few-shot样本效果更好,不好计算推理步数的样本可以用问题长度来替代。
  • 结论2:推理时更多推理步数的思维链(感觉应该是没有跳过部分推理更合理些,因为有些时候模型会跳过中间推理步骤)和self-consistency联合使用效果更好

【关联:self-consistency】

5. STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning

类似半监督的方案,利用模型自己生成COT,然后不断迭代进行COT微调,来优化模型COT的效果,具体流程 step1. few-shot-cot让模型对一个无标注数据集生成COT推理,只保留推理结果正确的样本,因为正确的样本COT的质量更高 step2. 利用第一步生成的cot样本对模型微调,然后用微调后的模型对数据集生成新的COT重复第一步 第一步第二步不断迭代,每次微调都是从头训练。同时针对模型无论如何都不能回答对的问题,可以通过加入Hint的方式让模型反推出正确COT。 我还好奇Hint的prompt模板长啥样子,结果多项选择问题作者直接在正确答案后面写了个CORRECT哈哈哈,在指令微调时会把Hint丢掉。 其实就是半监督+主动学习的思路

【关联:The CoT Collection等COT微调方案】

6. LARGE LANGUAGE MODELS CAN SELF-IMPROVE

同STaR是COT微调方案,不过是使用Self-Consistency来生成COT样本。 同时和Specializing Smaller Language Models towards Multi-Step Reasoning相同论文采用了多种COT样本格式,包括few-shot-COT, zero-shot-cot等等,来避免模型对样本格式过拟合。

【关联:SelfConsistency, STaR】

7. Successive Prompting for Decomposing Complex Questions

对比Least-to-Most,是先把问题拆解成子问题再逐一回答,也就是Decompose Problem(DP)模块是并行的,Answer Problem(AP)模块是串行的,每个子问题的回答都依赖之前所有子问题的(Q,A)。Successive Prompt是两个模块都是串行的,并且交替进行(Interleaving),也就是第二个问题依赖第一个问题+第一个问题的答案。并且提供了Prompt范式和微调范式两种方案。

【关联:Least-to-Most,DecomP,Plan-and-Solve, Faithful Chain-of-Thought】

8. Tree of Thoughts: Deliberate Problem Solving with Large Language Models 【TOT】

结合单路问题拆解Least-to-Most,和多路思维链打分的self-consistentcy cot,得到了改良版的问题拆解树形COT -> TOT。对比self-consistency是整个思维链走完,再对多个思维链进行打分选最好的,TOT是每一个step都生成多个path,每一步都进行打分,最后通过DFS/BFS的树遍历方式,得到最优的思维链,类似Chain-of-Though + Beam Search的结合体。个人感觉论文的idae很好,但实现方案上还有提升空间。TOT的实现包括以下4个步骤

  1. 问题拆解:其实这不是一个步骤,而是针对每个问题,例如论文中的24点游戏,自由写作等都给出了不同step拆解的prompt。
  2. 想法生成:基于以上prompt,在每一步生成多个单步骤的候选thought
propose_prompt = '''Input: 2 8 8 14
Possible next steps:
2 + 8 = 10 (left: 8 10 14)
8 / 2 = 4 (left: 4 8 14)
14 + 2 = 16 (left: 8 8 16)
2 * 8 = 16 (left: 8 14 16)
8 - 2 = 6 (left: 6 8 14)
14 - 8 = 6 (left: 2 6 8)
14 /  2 = 7 (left: 7 8 8)
14 - 2 = 12 (left: 8 8 12)
Input: {input}
Possible next steps:
'''
  1. 节点评估:评估每个节点当前状态的打分,给出了大模型绝对打分,和相对打分(用大模型做多项选择)两种评估方式
  2. 树遍历:BFS搜索,每一层保留TopK打分最高的节点

【关联:self-consistency,Least-to-Most】

LLM调用工具

续上文流氓会武术,谁也挡不住;模型用工具,咋玩咋靠谱;

1. Augmented Large Language Models with Parametric Knowledge Guiding

使用大模型作为外挂知识库,先使用领域知识构建指令微调样本,再进行大模型微调,然后使用模型生成回答context,再基于Context让GPT回答。 哈哈但是在业界使用大家可不看回答准确率,使用外挂搜索和知识库的引用那是为了给用户检(甩)验(锅)真理的机会! 不过感觉这个思路可以替代一些传统槽位填充的方案

【关联:REPLUG】

2.IRCOT: Interleaving Retrieval with Chain-of-Thought Reasoning for knowledge Intensive Multi-Step Question

加入外部知识抽取,抽取+COT推理交替进行。近似于只有信息抽取动作的ReACT。适用于多步推理需要依赖上一步信息获取和回答的场景例如多跳QA。Retriver先抽取K个相关文档作为Context生成COT,每次只取第一步推理,并使用该推理作为query获取更多上下文加入已有上下文。 但是这种pipeline的方案在业界可行度不高,毕竟latency很难接受。

【关联:multi-step 开放域QA的一些其他方案例如SelfASk,ReAct,DecomP】

3. ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

把ReACT这类pipeline链式的调用方案,既后一步推理依赖前一步推理+工具调用+调用返回的方案,改成了利用LLM的推理能力一次性模拟生成所有推理和调用步骤[Planer],并发调用所有工具[worker]. 然后基于planer部分生成的类似于槽位填充的分析模板,把worker工具调用的返回结果填充进去,让LLM生成回答。 节省LLM token调用的同时大大优化整体推理的延时。

【关联:ReACT,SelfAsk】

4. PAL: Program-aided Language Models

单一编程工具的调用方案。在COT中交替进行推理,和代码指令的编写。推理的部分以#开始,也就是转化成了代码注释的形势,用于之后的代码解码。举个栗子

Q: Roger has 5 tennis balls. He buys 2 more cans of
tennis balls. Each can has 3 tennis balls. How many
tennis balls does he have now?
A: Roger started with 5 tennis balls.
tennis_balls = 5
2 cans of 3 tennis balls each is
bought_balls = 2 * 3
tennis balls. The answer is
answer = tennis_balls + bought_balls

5. Faithful Chain-of-Thought Reasoning

和PAL相似,不过在单一类型代码的基础上进行了拓展,支持更多的符号化表达。在思维链中加入符号化的执行命令。基于Prompt,让LM收到问题后,生成NL和SL交替出现的思维链,其中NL是问题拆解的自然语言语句和ReAct的Reason相同,SL则是任务相关的可执行工具调用命令。 和ReACT不同的是,工具的执行和生成工具调用拆分开了,在生成完整的思维链之后,才会统一执行所有的SL语句。在Math World Problem,MultiHop QA,PDDDL和Logical Inference上进行了测试。举个栗子

Q:If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the  parking lot?
# 1. How many cars are there in the beginning? (independent, support: ["there are 3 cars in the parking lot"])
n_cars_begin = 3
# 2. How many cars arrive? (independent, support: ["2 more cars arrive"])
n_cars_arrive = 2
# 3. Final answer: How many cars are in the parking lot? (depends on 1, 2)
n_cars_total = n_cars_begin + n_cars_arrive

【关联:ReAct,PAL 】

6. REPLUG: Retrieval-Augmented Black-Box Language Models

固定大模型优化检索器的方案,之前的检索器优化方向主要是相关性,既更准确的计算query和检索Document之间的相关性。论文提出针对大模型的解码,可以有 一个新的优化目标。就是检索到的文档,在输入大模型作为context后,应该使得模型生成文本序列的perplexity更低,要达成这个目标损失函数的构造如下

  1. 计算相似度分布:抽取TopK相关的文档,对K个文档相似度进行归一化
  2. 计算LM似然函数分布:每个文档作为context和query拼接,计算正确输出y的条件生成概率,把K个文档的条件生成概率进行归一化
  3. 计算损失函数: 检索器的训练目标是最小化相似度分布和LM似然函数分布的KL散度。既保证相似度更高的Document,能提升模型正确token的解码置信度。

7. TALM: Tool Augmented Language Models

谷歌2022年发表的语言模型工具使用的论文,算是ToolFormer的前辈。整体思路和toolformer很像,不过那时的模型能力跟不上。 一个可以借鉴的点是Self-play的样本生成方式,先标注少量的工具使用样本,微调模型,然后采样query,调用工具得到结果和模型输出,如果输出和标注相似,则保留样本。 然后基于补充的原本再微调模型,重复以上过程,使用了弱监督的训练方案。

【关联:ToolFormer】

8. Search-in-the-Chain: Towards Accurate, Credible and Traceable Large Language Models for Knowledge-intensive Tasks

Self Ask的复杂优化版本,还是思维链推理和搜索工具结合。主要目的是提高LLM在正确的位置,正确使用外部检索,避免错误的Information Retrival破坏思维链。

  1. 先用LLM直接生成全局推理的思维链(Chain-of-Query),其中每个节点都是一个QA对,推理对应的prompt如下
Construct a global reasoning chain for this complex [Question] : Where do greyhound buses that are in the birthplace of Spirit If...'s performer leave
from? You should generate a query to the search engine based on what you already know at each step of the reasoning chain, starting with [Query].
If you know the answer for [Query], generate it starting with [Answer].
You can try to generate the final answer for the [Question] by referring to the [Query]-[Answer] pairs, starting with [Final Content].
If you don't know the answer, generate a query to search engine based on what you already know and do not know, starting with [Unsolved Query].
For example:
[Question]: Where do greyhound buses that are in the birthplace of Spirit If...'s performer leave from?
[Query 1]: Who is the performer of Spirit If... ?
If you don't know the answer:
[Unsolved Query]: Who is the performer of Spirit If... ?
If you know the answer:
[Answer 1]: The performer of Spirit If… is Kevin Drew
  1. 然后遍历每个节点,引入IR搜索,针对LLM已经进行回答的节点的Q召回Top1的Document,然后引入一个在Open-Domain-QA上训练的MRC模型,基于Query和Document进行答案抽取,得到答案G。然后基于以下Prompt,使用LLM来对该节点的答案进行反思和校准。
continue constructing the reasoning chain for [Question]: Q. Reference: di
  1. 针对LLM没有回答也就是以上的“Unsolved Query”,同样是用MRC进行答案抽取,然后用以下prompt,让LLM把答案填充到对应位置
According to the Reference, the answer for qi, should be g∗, you can give your answer and continue constructing the reasoning chain for [Question]:
Q. Reference: ∗i .

【关联:React, SelfAsk】

9. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace

LLM调度所有Huggingface的模型,重点在多模态交互,拓展性相对有限,完全基于In-Context prompt实现,准确率有天花板,分成以下4个步骤

  1. 任务规划:基于用户的指令,LLM理解后解析成一系列子任务对应的模型调用列表,包括调用的模型,调用依赖,调用顺序,和输入
Query=generate a video with the title "An astronaut is walking in space" and dub it.
Task Planning:
1: {"task": "text-to-video", "id": 0, "dep": [-1], "args": {"text": "An astronaut is walking in space" }}
2: {"task": "text-to-speech", "id": 1, "dep": [-1], "args": {"text": "An astronaut is walking in space" }}
  1. 模型选择:LLM基于Model Card的模型描述,为以上每个任务选择合适的模型。
  2. 任务执行:按顺序调用模型得到每一步的结果
Execution Results:
1. {"generated video": "/videos/5696.mp4"}
2. {"generated audio": "/audios/3f9e.flac"}
  1. 生成回答:对模型调用过程和生成结果进行组合。
Response: Sure. I have read the image example.jpg for you. The inference result is 'INVOICE NO :
18301/102/T0305'. The model used for this task is microsoft/trocr-base-printed, which is best suited for
the task of image-to-text as it is fine-tuned on the SROIE dataset which is specifically designed for this task.
Then I have converted the text to audio, the generated audio is /audios/da5s.wav
and the model used for this task is facebook/fastspeech2-en-ljspeech, which is a FastSpeech 2 text-tospeech
model which is suitable for the task of text-to-speech. Is there anything else I can help you with?

【关联:TaskMatrix.AI,Gorilla】

10. TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

LLM Agent系统设计, LLM调度任意世界API,API无限制,主要包括以下几个组件

  1. 基础模型作为中心系统MCFM:,理解各种模态的用户输入上文和指令,生成完成任务所需的步骤(step1),以及API的调用代码(step3)
  2. API标准化平台:所有工具标准化入库,包括API Name,参数列表,API说明书,调用demo,补充注释。类型Huggingface的Model Card,通过标准化赋予API无限拓展性
  3. API选择器:根据MCFM生成步骤选择最相关的API(step2)
  4. API执行器:执行API任务
  5. Feedback:这部分暂未收集到足够的数据,所以暂未详述

【关联:HuggingGPT,Gorilla】

11. ART: Automatic multi-step reasoning and tool-use for large language models

简单说就是加入多种工具调用,零微调版本的AutoCOT。主要包括两部分,Task Library构建和Task Retrieval

  1. Task library 包括算数,代码,检索,COT推理,字符串编辑等5大类问题拆解+ 工具调用的few-shot library,以搜索为例,prompt library如下
In these examples, you are given a task description and an input. Break the input down into subtasks in order to solve the task.
You can use search functions like Google search in one or more of your substeps, if there in insufficient information. Other
functions like arithmetic and logical operations can also be used.
Description: (Knwon or Unknwon) Choose the option that best answers the question. If the question does not have a known
answer, choose "Unknown".
Input: How many hairs were on Neil Armstrong’s head when he landed on the moon?
choice: Unknown
choice: Five million
Q1: [search] How many hairs were on Neil Armstrong’s head when he landed on the moon?
#1:
Apollo 11 (July 16–24, 1969) was the American spaceflight that first landed humans on the Moon. Commander Neil
Armstrong and lunar module pilot Buzz Aldrin.
Neil Alden Armstrong (August 5, 1930 – August 25, 2012) was an American astronaut and aeronautical engineer who became
the first person to walk on the Moon.
Q2: [subquestion] Does the information help answer the question? There could be no definitive answer because the question is
too specific, about personal details not in public record, because the answer is not yet known, or the question is opinion-based.
#2: No. The question is too specific
Q3: [compare] What is the final answer?
#3: Unknown
Q4: [EOQ]
Ans: Unknown
  1. Task Retrieval 针对新的问题会召回N个task prompt构成动态的In-Context,召回的方案包括用大模型计算相似度寻找相似的prompt,以及遍历多个任务集使用在测试集上效果最好的In-Context组合来使用。个人感觉在动态In-Context选择上还未出现较好的方案,不论是相似度还是测试集评估,泛化性都比较有限。

  2. human-feedback 真的是字面意思,就是人工可以中间干预推理步骤,包括加入sub-task,修改sub-task和返回值等等,和https://godmode.space/的交互实现很相似。

【关联:AutoCOT】

11. Query Rewriting for Retrieval-Augmented Large Language Models

REPLUG是通过优化检索器,来提高内容检索和大模型之间的适配程度。而Query Rewrite是通过query改写来进行, 提出了Rewrite-Retrieve-Read的基于强化学习的Query改写小模型微调方案,步骤如下

  1. 初始化query改写小模型:用大模型生成一组query改写样本,用于训练初始化的query改写小模型mT5-large
  2. 构建pipeline:Question -> query改写 -> 检索 -> LLM推理。其中检索器有使用Bing API,拼接检索网页的snippet(网页相关摘要)作为大模型输入,LLM推理是Chatgpt。
  3. PPO强化学习:reward function对齐当前任务对结果的评估。通过强化学习不断优化query改写小模型,从而提升最终模型的预测效果。

【关联:webcpm, webglm, webgpt, REPLUG】

12. RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit

RETA-LLM是再RAG传统的Retrieval-Read的基础上进一步优化了ppeline。除了加入和以上Query Rewrite相似的改写模块之外,还加入了抽取模块,以及事实检查模块。整个RAG的链路如下:多轮+query改写-> 检索Top3文档 -> 段落抽取模块从文档中抽取和Q相关的内容 -> 答案生成 -> 事实检查模块。几个要点有

  1. 这里引入了上一轮对话上文进行改写,除了优化Q之外还能解决多轮对话间的指代问题。
  2. 检索部分使用了disentangle retriever,这块个人了解不多,不过感觉技术并不太成熟
  3. 段落抽取直接使用了大模型
  4. 事实检查模块也是大模型,输入是模型生成的回答和引用上文,返回True,False是否存在事实性错误

【关联:webcpm, webglm, webgpt, REPLUG, Query-Rewrite】

长文本

The Longer the better

1. Lost in the Middle: How Language Models Use Long Contexts

对GPT-3.5,Claude等闭源模型和LongChat-13B,MPT-30B等开源支持长文本的模型,在多文档(10~30个)问答和抽取上进行了长文本能力评估。得到以下几点发现

  • 模型对于首尾文档的理解效果更好,等相关信息出现在开头和结尾的时候效果显著更好,中间部分效果存在20%+的效果下降,效果存在U型曲线。所以多文档作为上文可以通过rerank把重要的放在首尾
  • 即便是长文本增强的模型,当上文长度上升(10个文档->30个文档)模型效果也会有显著下降,GPT-3.5-turbo,GPT-3.5-turbo(16K)在超长文档上效果没有很显著的差异

推理提速

The Faster the better

1. Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding

直观解释就是让模型先解码出写作大纲,然后大纲中的每个子项通过并发解码来提升整体推理速度。想法很好,和上面的ReWOO的思路相似,不过存在局限性,也就是不同分点的解码之间会更独立缺少关联性。就像写作文一段是一段,哈哈串起来感觉可能会奇怪。以及要求不同子项可以独立解码没有先后依赖。实现上分成两个步骤

  • Skeleton stage: 使用指令+fewshot,让模型针对问题生成拆解项,prompt如下
[User:] You’re an organizer responsible for only giving the skeleton (not the full content) for answering the question.
Provide the skeleton in a list of points (numbered 1., 2., 3., etc.) to answer the question. Instead of writing a full
sentence, each skeleton point should be very short with only 3∼5 words. Generally, the skeleton should have 3∼10
points.
Question:
What are the typical types of Chinese dishes?
Skeleton:
1. Dumplings.
2. Noodles.
3. Dim Sum.
4. Hot Pot.
5. Wonton.
6. Ma Po Tofu.
7. Char Siu.
8. Fried Rice.
Question:
What are some practical tips for individuals to reduce their carbon emissions?
Skeleton:
1. Energy conservation.
2. Efficient transportation.
3. Home energy efficiency.
4. Reduce water consumption.
5. Sustainable diet.
6. Sustainable travel.
Now, please provide the skeleton for the following question.
{question}
Skeleton:
[Assistant:] 1.
  • Point-expanding stage: 基于指令,对以上生成的每个子观点进行扩写
[User:] You’re responsible for continuing the writing of one and only one point in the overall answer to the following
question.
{question}
The skeleton of the answer is
{skeleton}
Continue and only continue the writing of point {point index}. Write it **very shortly** in 1∼2 sentence and
do not continue with other points!
[Assistant:] {point index}. {point skeleton}

【关联:Least-to-Most,Plan-and-Solve,DecomP,ReWOO】

垂直领域大模型

1. FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis

要说作者不炒股我是不信的,要是给模型起个中文名,我愿称之为K线图技术分析大模型。模型效果是给一张K线图,用户提问,模型基于图片进行回答。基座模型是LLaVA, 使用了两阶段的对齐训练方案

  1. Pretrain Alignment 论文使用中国A股市场股票价格时间序列数据来构建K线图,同时在蜡烛图的基础上加入了3日,6日,9日移动平均线,来模拟真实情况。预训练阶段的目的是对齐K线图,和图片文字描述的高维空间表征。这里图片文字描述,作者使用prompt让Chat GPT生成对应描述。训练集构成就是(图片,图片描述指令,chatgpt生成的图片描述)。 继续预训练,论文使用batch_size=128, lr=2e-3,训练了1个epoch。

  2. Instruction Finetuning 指令微调阶段是QA任务,针对200K左右的K线图图片,每张图片生成5个用户提问,然后让Chatgpt生成对应的回答。和预训练主要的差异是chatgpt除了回答用户对当前图片的提问,还会给出趋势预测。 指令微调,论文使用batch_size=128, lr=1e-5, 训练了3个epoch。

指令微调

1. OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs

之前看过不少消除模型偏见的论文,哈哈还是一次看到放大模型偏见的方案。不过反而给人更真实可信的感觉。两个原因其一是要消除偏见就要先搞清偏见的种类有哪些,其二这个世界的观点本就是矛盾且不一定统一的,所以我个人比较偏好的观点是与其通过平衡数据得到一个无偏的模型,不如把偏见进行更好对齐,这样可以用指令词简单的引导出对同一个问题不同偏好的观点。