# 第一章、任务描述
Retrieval Augmented Generation（RAG）with LLM是目前比较热门的应用之一，实现并不难，但提取内容的准确度是目前普遍存在的问题。想要提高准确度，需要考虑多个细节，例如：
*   如何保证文档切片不会造成相关内容丢失
*   切片大小如何控制
*   如何保证召回内容跟问题是相关的等等。

请提供相关的代码实现，尽可能的解决RAG准确度低的问题。

**Key Function**
*   Langchain已经提供了一些api接口，可以调用，但需要写明白解决了哪方面的问题，同时也应该有自己的改进
*   提供一个demo，去展示该方案使用前后的效果对比，给出准确度定量的估计，不少于5个例子
*   加分：Tree-of-Thought，Graph-of-Thought，Knowledge-Graph




# 第二章、方案汇总
本方案基于LLM+RAG构建了一个金融领域的QA Demo，然后通过bad case分析RAG目前存在的一些问题，然后通过数据，召回和生成等多个方面逐步对基础方案进行改进和优化，从而达到更好的效果（**准确率从15%提升到75+%**）。具体地：

**第三章：基础方案**
根据Langchain官方示例，基于ChatGPT构造了一个金融领域的QA Demo，使用了如下数据：
*   汇丰2022年度报告
*   汇丰官网的FAQ问答对，共300多个
*   各大证券公司关于最近中央金融会议的观点文章
*   金融相关的LLM文章

**第四章：评估方法**

定义测试和评估RAG的数据集和指标。

**第五章：改进方案**

整体方案采用Mixture of Expert (MoE)架构，先使用LLM进行意图分类，然后分别调用领域专家RAG。我们对bad case进行分析，针对性的对领域专家RAG进行改进。

**第六章：总结**

对整个项目进行总结。

注：由于时间有限，本项目Demo没有进行充分的Train/Test验证，而从bad case的角度出发，探索和验证若干提升RAG的方法。



# 第三章、基础方案
按照LangChain教程的默认设置：https://python.langchain.com/docs/use_cases/question_answering/

In [1]:
!pip install -q openai==0.28.1
!pip install -q langchain==0.0.330
!pip install -q -U langchainhub
!pip install -q -U chromadb tiktoken pypdf pymupdf lark
!pip install -q -U FlagEmbedding sentence_transformers
!pip install -q -U transformers
!pip install -q rank_bm25 cohere
!pip install -q -U evaluate rouge_score
!pip install -q -U unstructured[all-docs] pydantic lxml

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/77.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.[0m[31m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.0/45.0 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m479.8/479.8 kB[0m [31m8.4 MB/s[0m

In [2]:
import os
import evaluate
import openai
import pandas as pd

from copy import deepcopy

from langchain import hub
from langchain.chains import RetrievalQA
from langchain.chat_models import AzureChatOpenAI
from langchain.document_loaders import DirectoryLoader, PyPDFLoader, TextLoader
from langchain.embeddings import HuggingFaceBgeEmbeddings, OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

In [3]:
os.environ["OPENAI_API_TYPE"] = "OPENAI_API_TYPE"
os.environ["OPENAI_API_VERSION"] = "OPENAI_API_VERSION"
os.environ["OPENAI_API_BASE"] = "OPENAI_API_BASE"
os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"

In [4]:
OPENAI_API_KEY = "OPENAI_API_KEY"
OPENAI_DEPLOYMENT_NAME = "gpt-35-turbo"
MODEL_NAME = "gpt-35-turbo"

## 1、加载数据

In [7]:
from google.colab import drive
drive.mount('/content/drive')
PROJ_DIR = "/content/drive/My Drive/Colab Notebooks/HSBCRAG"
DATA_DIR = f"{PROJ_DIR}/data"

Mounted at /content/drive


In [8]:
faq_loader = DirectoryLoader(f'{DATA_DIR}/hsbc_faqs/', glob="./*.txt", loader_cls=TextLoader)
faqs = faq_loader.load()

In [9]:
annual_report_loader = DirectoryLoader(f'{DATA_DIR}/hsbc_annual_reports/', glob="./*.pdf", loader_cls=PyPDFLoader)
annual_reports = annual_report_loader.load()

In [10]:
article_loader = DirectoryLoader(f'{DATA_DIR}/jinronghuiyi_articles/', glob="./*.pdf", loader_cls=PyPDFLoader)
articles = article_loader.load()

In [11]:
paper_loader = DirectoryLoader(f'{DATA_DIR}/llm_papers/', glob="./*.pdf", loader_cls=PyPDFLoader)
papers = paper_loader.load()

In [12]:
documents = faqs + annual_reports + articles + papers

In [13]:
print(f"""faq={len(faqs)}
annual_reports={len(annual_reports)}
articles={len(articles)}
papers={len(papers)}
documents={len(documents)}""")

faq=6
annual_reports=172
articles=141
papers=230
documents=549


## 2、切分文档

In [14]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

In [15]:
len(texts)

2743

## 3、构建索引
OpenAIEmbeddings速度较慢，并且容易碰到RateLimitError问题，我们使用BGE来作为baseline

In [16]:
# get around RateLimitError by increasing max_retries
# https://github.com/langchain-ai/langchain/issues/2493
# embedding = OpenAIEmbeddings(
#     deployment="text-embedding-ada-002",
#     show_progress_bar=True,
#     maxConcurrency=5,
#     # chunk_size=1,
#     disallowed_special=(),
#     max_retries=100,
# )

model_name = "BAAI/bge-small-zh"
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': True}
embedding = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
    query_instruction="为这个句子生成表示以用于检索相关文章："
)

Downloading (…)40ca8/.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)2dce540ca8/README.md:   0%|          | 0.00/27.9k [00:00<?, ?B/s]

Downloading (…)ce540ca8/config.json:   0%|          | 0.00/717 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/95.8M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)40ca8/tokenizer.json:   0%|          | 0.00/439k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/367 [00:00<?, ?B/s]

Downloading (…)2dce540ca8/vocab.txt:   0%|          | 0.00/110k [00:00<?, ?B/s]

Downloading (…)e540ca8/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [17]:
len(embedding.embed_query("text"))

512

In [18]:
Chroma().delete_collection()
baseline_vectordb = Chroma.from_documents(
    documents=texts,
    embedding=embedding,
    collection_name="baseline"
)

In [19]:
baseline_vectordb._collection.count()

2743

In [20]:
baseline_retriever = baseline_vectordb.as_retriever()

## 4、Prompt

In [21]:
default_rag_prompt = hub.pull("rlm/rag-prompt")

## 5、配置LLM

In [22]:
chatgpt = AzureChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    deployment_name=OPENAI_DEPLOYMENT_NAME,
    model_name=MODEL_NAME,
    temperature=0
)

In [23]:
chatgpt.predict("申请汇丰中国信用卡有哪些步骤？ ")

'申请汇丰中国信用卡的步骤如下：\n\n1. 在汇丰中国官网或手机APP上选择信用卡产品，填写个人信息并提交申请。\n\n2. 提交申请后，等待汇丰银行的审核，通常需要1-2个工作日。\n\n3. 审核通过后，汇丰银行会联系您确认申请信息，并告知您信用卡的额度和发卡时间。\n\n4. 在收到信用卡后，需要激活并设置密码，可以通过汇丰中国官网或手机APP完成。\n\n5. 使用信用卡时，需要注意还款日期和还款方式，可以通过汇丰中国官网或手机APP进行还款。\n\n6. 使用信用卡时，需要注意信用卡的使用规则和注意事项，避免产生不必要的费用和风险。'

In [24]:
chatgpt.predict("汇丰中国信用卡有哪些密码？")

'汇丰中国信用卡有以下几种密码：\n\n1. 信用卡密码：用于在ATM机上取现金或进行其他操作时输入的密码，一般为6位数字。\n\n2. 网上银行密码：用于登录汇丰中国网上银行进行账户管理和交易的密码，一般为8-30位数字、字母或符号的组合。\n\n3. 手机银行密码：用于登录汇丰中国手机银行进行账户管理和交易的密码，一般为6-8位数字、字母或符号的组合。\n\n4. 短信验证码：用于在进行某些交易时接收的短信验证码，一般为6位数字。\n\n5. 动态密码：用于在进行某些高风险交易时生成的动态密码，一般为6位数字。'

## 6、构建RAG

In [25]:
def initialize_rag(llm, retriever, prompt=default_rag_prompt):
    chain_type_kwargs = {"prompt": prompt}
    rag = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs=chain_type_kwargs
    )
    return rag

In [26]:
baseline_rag = initialize_rag(llm=chatgpt, retriever=baseline_retriever)

# 第四章、评估方法

In [27]:
class Evaluator:
    @staticmethod
    def score_any(predict_str, answers):
        for answer in answers:
            if answer in predict_str:
                return 1
        return 0

    @staticmethod
    def score_ratio(predict_str, answers):
        s = 0
        for answer in answers:
            if answer in predict_str:
                s += 1
        return s / len(answers)

    @staticmethod
    def score_string(predict_str, answer):
        rouge = evaluate.load("rouge")
        predict_str = predict_str.replace("\n", "").replace("+s", " ")
        answer = answer.replace("\n", "").replace("+s", " ")
        rouge_results = rouge.compute(predictions=[predict_str], references=[[answer]])
        return rouge_results["rougeL"]

    @staticmethod
    def score_metadata(source_documents, answer, field):
        for source in source_documents:
            if source.metadata.get(field) != answer:
                return 0
        return 1

    @staticmethod
    def test_rag(rag, query, verbose=0, return_source=False):
        response = rag(query)
        if verbose > 0:
            print("-"*200)
            print(f"Question: {query}")
            print(f"Response: {response['result']}")
            if verbose > 1 and "source_documents" in response:
                print('\n\nSources:')
                for source in response["source_documents"]:
                    print(f"page={source.metadata.get('page')}, source={source.metadata.get('source')}")
                    if verbose > 2:
                        print(source.page_content)
        if return_source:
            return response.get("result",""), response.get("source_documents",[])
        else:
            return response.get("result","")

    @staticmethod
    def test_rag_all(rag, question_answer_pairs, verbose=0):
        results = deepcopy(question_answer_pairs)
        for category,qas in results.items():
            for qa in qas:
                qa["response"], qa["source_documents"] = Evaluator.test_rag(
                    rag=rag, query=qa["question"], verbose=verbose, return_source=True
                    )
        return results

    @staticmethod
    def evaluate_rag_results(results):
        results = deepcopy(results)
        for category,qas in results.items():
            for qa in qas:
                w = qa.get("weight", 1.0)
                if qa["type"] == "any":
                    qa["score"] = w * Evaluator.score_any(qa["response"], qa["answer"])
                elif qa["type"] == "ratio":
                    qa["score"] = w * Evaluator.score_ratio(qa["response"], qa["answer"])
                elif qa["type"] == "string":
                    qa["score"] = w * Evaluator.score_string(qa["response"], qa["answer"])
                elif qa["type"].startswith("metadata"):
                    qa["score"] = w * Evaluator.score_metadata(qa["source_documents"], qa["answer"], qa["type"].split(".")[-1])
        return results

    @staticmethod
    def compute_rag_score(results):
        s = 0
        cnt = 0
        for category,qas in results.items():
            for qa in qas:
                s += qa["score"]
                cnt += 1
        return s/cnt

    @staticmethod
    def inspect_rag_results(results):
        for category,qas in results.items():
            for qa in qas:
                if qa["score"] != 1:
                    print(f"category={category}, question={qa['question']}, score={qa['score']}, response={qa['response']}")

In [28]:
question_answer_pairs = {
    "huifeng_faq": [
        {
            "question": "如何环球转账？",
            "answer": """汇丰环球转账功能，可通过登录网上银行\n\n\n在“我的银行” —“转账及货币兑换” —“转账及货币兑换”。\n在新交易中，“转出账户”选择外币储蓄账户。转入账户选择“我的账户”，并选择另一个汇丰其他国家 / 地区的外币账户。\n您可验证交易详情并确认，即可完成交易。\n\n\n您还可以通过登录“汇丰银行”手机App进行相关操作。""",
            "type": "string"
        },
        {
            "question": "如何查看汇丰环球转账历史记录？",
            "answer": "可以。您能查看过去 12 个月以内的环球转账记录。请在汇丰环球网上银行页面点击 “环球转账历史记录” ，然后选择转出国家和扣款账户便可查询。",
            "type": "string"
        },
        {
            "question": "汇丰信用卡有几个密码？",
            "answer": ["查询密码","交易密码"],
            "type": "ratio",
        },
        {
            "question": "如果赎回申请成功，我多久才可以取回资金？",
            "answer": "若您投资的是代客境外理财计划-开放式海外基金型产品，在海外基金管理人接受银行的赎回要求后，银行将在从海外基金管理人处收到基金赎回额后向您支付理财计划赎回额。银行通常在收到投资者有关理财计划赎回申请后10个营业日内向投资者付款。",
            "type": "string"
        },
        {
            "question": "卓越理财客户服务月费是多少？",
            "answer": "如您在汇丰中国的同一个卓越理财客户号码下的所有账户之月内日均总余额低于500,000元人民币/等值外币，本行将每月收取300元人民币或等值外币的服务月费。详情请参阅汇丰中国《账户和服务费率（个人客户适用）》。",
            "type": "string"
        }

    ],
    "huifeng_annual_report": [
        {
            # P7
            "question": "汇丰董事会下属哪些委员会？",
            "answer": ["审计委员会","风险及消费者权益保护委员会","关联交易控制委员会","薪酬委员会","提名委员会"],
            "type": "ratio"
        },
        {
            "question": "2022年度，汇丰中国获得《财富管理》杂志什么称号？",
            "answer": ["2022年度最佳中国外资私人银行"],
            "type": "ratio"
        },
        {
            "question": "汇丰在什么时候成为首家协助QFI完成北交所交易的外资托管行？",
            "answer": ["2022年1月"],
            "type": "any"
        },
        {
            "question": "汇丰的WPB是哪个部门？",
            "answer": ["财富管理及个人银行业务部"],
            "type": "any"
        },
        {
            "question": "财富管理及个人银行业务部总监是哪位？",
            "answer": ["孙丹莹"],
            "type": "any"
        },
        {
            # P27~P29
            "question": "与汇丰银行业务相关的主要风险有哪些？",
            "answer": ["信用风险", "市场风险", "财资风险", "操作风险", "抗逆力风险", "监管合规风险", "金融犯罪风险", "声誉风险"],
            "type": "ratio",
        },
        {
            "question": "截至2022年末，汇丰银行资产总计人民币多少亿元？",
            "answer": ["5,968.5","5,968.5亿元","人民币5,968.5亿元","596,845","人民币596,845百万元","596,845人民币百万元"],
            "type": "any",
        },
        {
            "question": "截至2022年末，汇丰银行负债合计人民币多少亿元？",
            "answer": ["5,386.7","5,386.7亿元","人民币5,386.7亿元","538,674","人民币538,674百万元","538,674人民币百万元"],
            "type": "any",
            },
        {
            "question": "汇丰银行在2022会计年度的营业收入为人民币多少亿元？",
            "answer": ["149.4","149.4亿元","人民币149.4亿元"],
            "type": "any",
        },
        {
            "question": "汇丰银行在2022会计年度的营业支出为人民币多少亿元？",
            "answer": ["80.7","80.7亿元","人民币80.7亿元"],
            "type": "any",
        },
        {
            "question": "汇丰银行在2022会计年度的净利润为人民币多少亿元？",
            "answer": ["60.4","60.4亿元","人民币60.4亿元"],
            "type": "any",
        },
        {
            "question": "2022年，汇丰银行不良贷款余额是人民币多少亿元？",
            "answer": ["5.1","5.1亿元","人民币5.1亿元"],
            "type": "any",
        },
        {
            "question": "2022年，汇丰银行不良贷款率是多少？",
            "answer": ["0.21%"],
            "type": "any",
        },
        {
            "question": "2022年吸收个人活期存款多少？",
            "answer": ["47,200,715","47,200,715千元","人民币47,200,715千元","47,200,715人民币千元"],
            "type": "any",
        },
        {
            "question": "2022年吸收个人定期存款多少？",
            "answer": ["37,742,065","37,742,065千元","人民币37,742,065千元","37,742,065人民币千元"],
            "type": "any",
        },
                {
            "question": "2021年吸收个人活期存款多少？",
            "answer": ["44,839,631","44,839,631千元","人民币44,839,631千元","44,839,631人民币千元"],
            "type": "any",
        },
        {
            "question": "2021年吸收个人定期存款多少？",
            "answer": ["26,185,953","26,185,953千元","人民币26,185,953千元","26,185,953人民币千元"],
            "type": "any",
        },
    ],
    "jinronghuiyi_article": [
        {
            "question": "中金对中央金融工作会议的观点有哪些？",
            "answer": "中金公司",
            "type": "metadata.公司",
        },
        {
            "question": "广发证券对中央金融工作会议的观点有哪些？",
            "answer": "广发证券",
            "type": "metadata.公司",
        },
        {
            "question": "平安证券对中央金融工作会议的观点有哪些？",
            "answer": "平安证券",
            "type": "metadata.公司",
        }
    ],
    "llm_paper": [
        {
            "question": "What is the number of training tokens for Llama 2?",
            "answer": ["2.0T","2 trillion"],
            "type": "any",
        },
        {
            "question": "What is the author of Llama?",
            "answer": ['Hugo Touvron','Thibaut Lavril','Gautier Izacard','Xavier Martinet','Marie-Anne Lachaux','Timothée Lacroix','Baptiste Rozière','Naman Goyal','Eric Hambro','Faisal Azhar','Aurelien Rodriguez','Armand Joulin','Edouard Grave','Guillaume Lample'],
            "type": "ratio",
        },
        {
            "question": "What is the affiliation of the first author of Llama?",
            "answer": ["Meta AI","GenAI, Meta"],
            "type": "any",
        },
        {
            "question": "What is the affiliation of the first author of DISC-FinLLM?",
            "answer": ["Fudan University and Huazhong University of Science and Technology","Huazhong University of Science and Technology and Fudan University"],
            "type": "any",
        },
        {
            "question": "What are the common authors of Llama and Llama 2?",
            "answer": ['Aurelien Rodriguez','Hugo Touvron','Marie-Anne Lachaux','Naman Goyal','Thibaut Lavril','Xavier Martinet'],
            "type": "ratio",
        },
        {
            "question": "Is the first author of Llama and Llama 2 the same? If yes, please output <<YES>>, otherwise output <<NO>>.",
            "answer": ["<<Yes>>"],
            "type": "any",
        },
        {
            "question": "Is there any LLM in financial area? What are they?",
            "answer": ["BloombergGPT","FinGPT","DISC-FinLLM","ConFIRM","FinVis-GPT"],
            "type": "ratio"
        },
    ]
}

In [29]:
sum([len(qs) for qs in question_answer_pairs.values()])

32

In [30]:
evaluator = Evaluator()

# 第五章、改进方案
在本章，我们从具体的case出发，依次从数据、召回和生成等三个主要方面来改进RAG。在这之后，我们将给出完整的方案实现。最后，我们将探索COT，KG和Tool在改进RAG上面的应用。

## 1、数据层

### a、【done】利用好Meta数据

In [31]:
a = evaluator.test_rag(baseline_rag, "平安证券对中央金融工作会议的观点有哪些？", verbose=2)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Question: 平安证券对中央金融工作会议的观点有哪些？
Response: 平安证券对中央金融工作会议的观点没有在提供的文本中提到。


Sources:
page=0, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/jinronghuiyi_articles/国联证券-中央金融工作会议点评：中央金融工作会议传递了哪些重要信号？-231101.pdf
page=1, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/jinronghuiyi_articles/国联证券-中央金融工作会议点评：中央金融工作会议传递了哪些重要信号？-231101.pdf
page=0, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/jinronghuiyi_articles/国信证券-中央金融工作会议解读：推动高质量发展，强化金融监管-231101.pdf
page=0, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/jinronghuiyi_articles/东莞证券-中央金融工作会议点评：加快建设金融强国，释放积极信号，稳定市场预期-231101.pdf


以上的case，从召回来源来看，跟问题不相关。考虑到文件名存在结构化信息，譬如"XX证券-YY.pdf"，可以将该信息添加到meta信息里面，通过meta数据进行过滤，提供召回的相关性。

In [32]:
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever

In [33]:
metadata_field_info = [
    AttributeInfo(
        name="公司",
        description="公司",
        type="string",
    )
]
content_description = "资料"


def augment_metadata():
    for article in articles:
        title = article.metadata["source"].split("/")[-1].split(".pdf")[0]
        article.metadata["公司"] = title.split("-")[0]

    for faq in faqs:
        faq.metadata["公司"] = "汇丰银行"

    for report in annual_reports:
        report.metadata["公司"] = "汇丰银行"

    for paper in papers:
        paper.metadata["公司"] = ""

In [34]:
augment_metadata()

In [35]:
documents = faqs + annual_reports + articles + papers

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

Chroma().delete_collection()
metadata_vectordb = Chroma.from_documents(
    documents=texts,
    embedding=embedding,
    collection_name="add_metadata"
)
metadata_vectordb._collection.count()

metadata_retriever = SelfQueryRetriever.from_llm(
    chatgpt,
    metadata_vectordb,
    content_description,
    metadata_field_info,
    use_original_query=True,
    verbose=True,
)

metadata_rag = initialize_rag(llm=chatgpt, retriever=metadata_retriever)

In [36]:
a = evaluator.test_rag(metadata_rag, query="平安证券对中央金融工作会议的观点有哪些？", verbose=2)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Question: 平安证券对中央金融工作会议的观点有哪些？
Response: 平安证券认为，中央金融工作会议肯定了金融回归本源的成绩，但也指出了金融服务实体经济质效不高等问题。未来一段时间，强监管和防风险仍是重点工作，而“建设金融强国”、“金融高质量发展”等新提法，也为金融行业规范有序服务实体经济提出新要求。会议提出了7个方面的具体要求，包括营造良好货币政策环境、优化资金供给结构等。


Sources:
page=0, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/jinronghuiyi_articles/平安证券-首席宏观报告：中央金融工作会议的五个关注点-231101.pdf
page=3, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/jinronghuiyi_articles/平安证券-首席宏观报告：中央金融工作会议的五个关注点-231101.pdf
page=3, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/jinronghuiyi_articles/平安证券-首席宏观报告：中央金融工作会议的五个关注点-231101.pdf
page=1, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/jinronghuiyi_articles/平安证券-首席宏观报告：中央金融工作会议的五个关注点-231101.pdf


可以看到，使用meta进行过滤后，可以得到比较相关的召回结果。

### b、【done】更好地处理PDF数据

从下面的case可以看到，常用的PyPDFLoader对于年报的解析不够理想（主要是格式和表格数据），出现了严重的叠词现象，会对RAG的召回和生成产生影响。我们可以使用其他的PDF解析工具来避免此类问题，也能提高RAG的效果。

In [37]:
for report in annual_reports:
    if "风风险险" in report.page_content:
        print(report)

page_content='- 21 -汇丰银行 (中国)有限公司  \n \n董事会报告 –薪酬报告 (续) \n \n - 21 - \nRESTRICTED   \n薪薪酬酬报报告告(续)  \n \n董董事事、、监监事事、、高高级级管管理理层层及及其其他他关关键键管管理理人人员员薪薪酬酬  \n \n根据上述汇丰中国薪酬框架，汇丰中国于 2022年向高级管理层及其他关键管理人员支付的薪酬\n总额为人民币 1.45亿元。 \n \n2022年，汇丰中国向非执行董事和独立董事支付的董事费共计人民币 225.6万元。除此之外，本\n行非执行董事和独立董事 未从汇丰中国领取其它薪酬和福利。 2022年度， 本行监事未在汇丰中国\n领取监事费或其他薪酬和福利。  \n  \n薪薪酬酬递递延延支支付付和和非非现现金金薪薪酬酬情情况况，，包包括括因因故故扣扣回回的的情情况况  \n \n2022年度， 汇丰中国有 155位员工过往年度的递延绩效薪酬获得支付， 总额人民币 4,979万元，\n其中支付高级管理层及其他关键管理人员人民币 1,855万元。有39位员工因离职 取消递延绩效\n薪酬或因违规违纪等情形追索扣回其相应期限内的部分或全部可变绩效薪酬，涉及股份数\n49,882股和金额人民币 122万元。2022年未发生因故调整已授予但尚未归属的 递延可变绩效\n薪酬或扣回已归属或已经支付的 递延可变绩效薪酬案例。  \n \n2022绩效年度可变绩效薪酬的授予亦遵循上述递延规定，汇丰中国共计有 273位员工的可变绩\n效薪酬达到递延要求，他们 60%的绩效薪酬在 2023年3月发放，而其余 40%将按照汇丰中国递延\n支付政策循序发放。  \n \n年年度度薪薪酬酬方方案案制制定定、、备备案案及及经经济济、、风风险险和和社社会会责责任任指指标标完完成成情情况况  \n \n汇丰中国 2022年年度薪酬方案已通过董事会审批和监管备案。 2022年利润总额达到预算要求，\n得益于营业收入的增长及 对营业成本的有效控制。资本充足率、贷款覆盖率、拨备覆盖率及杠杆\n率均有效控制在最低监管指标要求之上，不良贷款率符合 本行风险偏好及容忍度要求。同时，汇\n丰中国也高度重视案防工作， 将案件风险作为全行的一项重要风险进行管控。 2022年未发生案件\n或案件风险事件。汇丰中国视环境

In [38]:
from langchain.document_loaders import PyMuPDFLoader

In [39]:
annual_report_loader = DirectoryLoader(f'{DATA_DIR}/hsbc_annual_reports/', glob="./*.pdf", loader_cls=PyMuPDFLoader)
annual_reports = annual_report_loader.load()

In [40]:
article_loader = DirectoryLoader(f'{DATA_DIR}/jinronghuiyi_articles/', glob="./*.pdf", loader_cls=PyMuPDFLoader)
articles = article_loader.load()

In [41]:
paper_loader = DirectoryLoader(f'{DATA_DIR}/llm_papers/', glob="./*.pdf", loader_cls=PyMuPDFLoader)
papers = paper_loader.load()

In [42]:
for report in annual_reports:
    if "风风险险" in report.page_content:
        print(report)

可以看到，叠词的问题解决了。

In [43]:
documents = faqs + annual_reports + articles + papers

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

Chroma().delete_collection()
better_pdf_vectordb = Chroma.from_documents(
    documents=texts,
    embedding=embedding,
    collection_name="better_pdf"
)

better_pdf_retriever = better_pdf_vectordb.as_retriever()

better_pdf_rag = initialize_rag(llm=chatgpt, retriever=better_pdf_retriever)

## 2、召回层

### a、【done】更好的Embedding
我们使用更好的Embedding可以获得更好的召回效果。参考C-MTEB上面的任务，我们选取了BGE。（https://github.com/FlagOpen/FlagEmbedding）

In [44]:
model_name = "BAAI/bge-large-zh-v1.5"
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': True}
embedding = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
    query_instruction="为这个句子生成表示以用于检索相关文章："
)

Downloading (…)3d520/.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

Downloading (…)39c423d520/README.md:   0%|          | 0.00/27.7k [00:00<?, ?B/s]

Downloading (…)c423d520/config.json:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.30G [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.30G [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)3d520/tokenizer.json:   0%|          | 0.00/439k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

Downloading (…)39c423d520/vocab.txt:   0%|          | 0.00/110k [00:00<?, ?B/s]

Downloading (…)423d520/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [45]:
Chroma().delete_collection()
better_emb_vectordb = Chroma.from_documents(
    documents=texts,
    embedding=embedding,
    collection_name="better_emb"
)

better_emb_retriever = better_emb_vectordb.as_retriever()

better_emb_rag = initialize_rag(llm=chatgpt, retriever=better_emb_retriever)

In [46]:
a = evaluator.test_rag(baseline_rag, "2022年，汇丰银行不良贷款率是多少？", verbose=2)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Question: 2022年，汇丰银行不良贷款率是多少？
Response: I'm sorry, there is no information provided in the given context about the non-performing loan ratio of HSBC Bank in 2022.


Sources:
page=164, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/hsbc_annual_reports/汇丰银行(中国)有限公司2022年度报告.pdf
page=24, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/hsbc_annual_reports/汇丰银行(中国)有限公司2022年度报告.pdf
page=105, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/hsbc_annual_reports/汇丰银行(中国)有限公司2022年度报告.pdf
page=71, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/hsbc_annual_reports/汇丰银行(中国)有限公司2022年度报告.pdf


In [47]:
a = evaluator.test_rag(better_emb_rag, "2022年，汇丰银行不良贷款率是多少？", verbose=2)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Question: 2022年，汇丰银行不良贷款率是多少？
Response: 汇丰银行2022年不良贷款率为0.21%。


Sources:
page=23, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/hsbc_annual_reports/汇丰银行(中国)有限公司2022年度报告.pdf
page=152, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/hsbc_annual_reports/汇丰银行(中国)有限公司2022年度报告.pdf
page=153, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/hsbc_annual_reports/汇丰银行(中国)有限公司2022年度报告.pdf
page=24, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/hsbc_annual_reports/汇丰银行(中国)有限公司2022年度报告.pdf


从上面的结果可以看到，使用更好的Embedding模型，可以提高召回的相关性，从而提升RAG的效果。（Chroma似乎有随机性，结果每次跑有所不同，https://github.com/langchain-ai/langchain/issues/1946）

### b、【done】优化Chunk粒度
Chunk的粒度需要根据语料的粒度，Embedding模型的效果，以及LLM context长度等因素来决定。
*   如果多是FAQ等短的问答对，可以选择较小的chunk size；
*   譬如语录多是短的FAQ，可以使用较短的窗口，如果是横跨多页的篇章（譬如汇丰年报里面，关于“与银行业务相关的主要风险”的部分，横跨了P27～P29三页），可以选择较大的chunk size；
*   为了兼容两种情况，可以使用Langchain里面的ParentDocumentRetriever，先用较小的child chunk来检索，然后返回较大的parent chunk来进行后续的生成；
*   由于LLM一般有context长度的限制，所以并非chunk size越大越好（有可能被截断），一般需要根据应用场景来调整到合适的chunk size。



In [48]:
# small chunk
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

small_chunks_vectordb = Chroma.from_documents(
    documents=texts,
    embedding=embedding,
    collection_name="small_chunks"
)

small_chunks_retriever = small_chunks_vectordb.as_retriever()

small_chunks_rag = initialize_rag(llm=chatgpt, retriever=small_chunks_retriever)

In [49]:
# big chunk
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

big_chunks_vectordb = Chroma.from_documents(
    documents=texts,
    embedding=embedding,
    collection_name="big_chunks"
)

big_chunks_retriever = big_chunks_vectordb.as_retriever()

big_chunks_rag = initialize_rag(llm=chatgpt, retriever=big_chunks_retriever)

### c、【done】多级召回保证召回粒度

In [50]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore

In [51]:
# text splitter for big chunks
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)

# text splitter for small chunks
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=50)

# vectorstore for small chunks
vectorstore = Chroma(collection_name="parent_chunks", embedding_function=embedding)

# storage for big chunks
store = InMemoryStore()

parent_chunks_retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

parent_chunks_retriever.add_documents(documents)

len(list(parent_chunks_retriever.docstore.yield_keys()))

1459

In [52]:
parent_chunks_rag = initialize_rag(llm=chatgpt, retriever=parent_chunks_retriever)

### d、【wip】多路召回提升召回率

In [53]:
# branches = [
#     "上海自贸试验区支行",
#     "深圳华侨城支行",
#     "深圳海天路支行",
#     "深圳华强北路支行",
#     "苏州玄妙广场支行",
#     "天津国际大厦支行",
#     "阳江支行",
# ]
# for branch in branches:
#     query = f"汇丰银行{branch}的地址和电话是多少？"
#     for rag in [baseline_rag, small_chunks_rag, big_chunks_rag, parent_chunks_rag]:
#         a = test_rag(rag=rag, query=query, verbose=1)


# from langchain.retrievers import BM25Retriever, EnsembleRetriever

# bm25_retriever = BM25Retriever.from_documents(texts)
# bm25_retriever.k = 2

# bm25_retriever.get_relevant_documents("阳江支行")

# ensemble_retriever = EnsembleRetriever(
#     retrievers=[bm25_retriever, parent_chunks_retriever],
#     weights=[0.5, 0.5]
# )

# ensemble_rag = initialize_rag(llm=chatgpt, retriever=ensemble_retriever)

In [54]:
# a = test_rag(rag=ensemble_rag, query=query, verbose=1)

### e、【wip】MultiQuery提升召回多样性

In [55]:
# from langchain.retrievers.multi_query import MultiQueryRetriever

In [56]:
# retriever_from_llm = MultiQueryRetriever.from_llm(
#     retriever=parent_chunks_retriever, llm=chatgpt
# )

In [57]:
# llm_rag = initialize_rag(llm=chatgpt, retriever=retriever_from_llm)

In [58]:
# query = "2022年的营收额是多少？相比2021增长了多少？"
# a = test_rag(rag=llm_rag, query=query, verbose=1)

In [59]:
# from typing import List
# from langchain.chains import LLMChain
# from pydantic import BaseModel, Field
# from langchain.prompts import PromptTemplate
# from langchain.output_parsers import PydanticOutputParser


# # Output parser will split the LLM result into a list of queries
# class LineList(BaseModel):
#     # "lines" is the key (attribute name) of the parsed output
#     lines: List[str] = Field(description="Lines of text")


# class LineListOutputParser(PydanticOutputParser):
#     def __init__(self) -> None:
#         super().__init__(pydantic_object=LineList)

#     def parse(self, text: str) -> LineList:
#         lines = text.strip().split("\n")
#         return LineList(lines=lines)


# output_parser = LineListOutputParser()

# QUERY_PROMPT = PromptTemplate(
#     input_variables=["question"],
#     template="""You are an AI language model assistant. Your task is to extract the keywords
#     in the given user question.
#     Provide these keywords separated by newlines.

#     Examples:
#     Original question: 汇丰银行阳江支行的地址和电话是多少？
#     Output:
#     汇丰银行
#     阳江支行

#     Original question: {question}""",
# )

# # Chain
# llm_chain = LLMChain(llm=chatgpt, prompt=QUERY_PROMPT, output_parser=output_parser)

In [60]:
# llm_retriever = MultiQueryRetriever(
#     retriever=parent_chunks_retriever, llm_chain=llm_chain, parser_key="lines"
# )

In [61]:
# llm_rag = initialize_rag(llm=chatgpt, retriever=llm_retriever)

In [62]:
# query = "汇丰银行阳江支行的地址和电话是多少？"
# a = test_rag(rag=llm_rag, query=query, verbose=1)

In [63]:
# llm_chain(query)

### f、【wip】MultiVector提高召回

In [64]:
# !apt install tesseract-ocr
# !apt-get install poppler-utils
# !pip install -q pytesseract

In [65]:
# from lxml import html
# from pydantic import BaseModel
# from typing import Any, Optional
# from unstructured.partition.pdf import partition_pdf

# # Get elements
# raw_pdf_elements = partition_pdf(filename=f"{DATA_DIR}/llm_papers/Llama 2- Open Foundation and Fine-Tuned Chat Models.pdf",
#                                  # Unstructured first finds embedded image blocks
#                                  extract_images_in_pdf=False,
#                                  # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles
#                                  # Titles are any sub-section of the document
#                                  infer_table_structure=True,
#                                  # Post processing to aggregate text once we have the title
#                                  chunking_strategy="by_title",
#                                  # Chunking params to aggregate text blocks
#                                  # Attempt to create a new chunk 3800 chars
#                                  # Attempt to keep chunks > 2000 chars
#                                  max_characters=4000,
#                                  new_after_n_chars=3800,
#                                  combine_text_under_n_chars=2000,
#                                  image_output_dir_path=f"{DATA_DIR}/llm_papers")

In [66]:
# class Element(BaseModel):
#     type: str
#     text: Any

# # Categorize by type
# categorized_elements = []
# for element in raw_pdf_elements:
#     if "unstructured.documents.elements.Table" in str(type(element)):
#         categorized_elements.append(Element(type="table", text=str(element)))
#     elif "unstructured.documents.elements.CompositeElement" in str(type(element)):
#         categorized_elements.append(Element(type="text", text=str(element)))

# # Tables
# table_elements = [e for e in categorized_elements if e.type == "table"]
# print(len(table_elements))

# # Text
# text_elements = [e for e in categorized_elements if e.type == "text"]
# print(len(text_elements))

In [67]:
# from langchain.docstore.document import Document
# from langchain.prompts import ChatPromptTemplate
# from langchain.schema.output_parser import StrOutputParser
# from langchain.chains.summarize import load_summarize_chain

# # Prompt
# prompt_text="""You are an assistant tasked with summarizing tables and text. \
# Give a concise summary of the table or text. Table or text chunk: {text} """
# prompt = ChatPromptTemplate.from_template(prompt_text)

# # Summary chain
# summarize_chain = load_summarize_chain(chatgpt, chain_type="stuff")

# # Apply to tables
# tables = [Document(page_content=i.text) for i in table_elements]
# table_summaries = [summarize_chain.run([table]) for table in tables]

# # Apply to texts
# texts = [Document(page_content=i.text) for i in text_elements]
# text_summaries = [summarize_chain.run([text]) for text in texts]

In [68]:
# len(tables), len(table_summaries), len(texts), len(text_summaries)

In [69]:
# import uuid
# from langchain.vectorstores import Chroma
# from langchain.storage import InMemoryStore
# from langchain.schema.document import Document
# from langchain.embeddings import OpenAIEmbeddings
# from langchain.retrievers.multi_vector import MultiVectorRetriever

# # The vectorstore to use to index the child chunks
# vectorstore = Chroma(
#     collection_name="summaries",
#     embedding_function=embedding
# )

# # The storage layer for the parent documents
# store = InMemoryStore()
# id_key = "doc_id"

# # The retriever (empty to start)
# multi_vector_retriever = MultiVectorRetriever(
#     vectorstore=vectorstore,
#     docstore=store,
#     id_key=id_key,
# )

# # Add texts
# doc_ids = [str(uuid.uuid4()) for _ in texts]
# summary_texts = [Document(page_content=s,metadata={id_key: doc_ids[i]}) for i, s in enumerate(text_summaries)]
# multi_vector_retriever.vectorstore.add_documents(summary_texts)
# multi_vector_retriever.docstore.mset(list(zip(doc_ids, texts)))

# # Add tables
# table_ids = [str(uuid.uuid4()) for _ in tables]
# summary_tables = [Document(page_content=s,metadata={id_key: table_ids[i]}) for i, s in enumerate(table_summaries)]
# multi_vector_retriever.vectorstore.add_documents(summary_tables)
# multi_vector_retriever.docstore.mset(list(zip(table_ids, tables)))

In [70]:
# multi_vector_rag = initialize_rag(llm=chatgpt, retriever=multi_vector_retriever)

In [71]:
# query = "What is the number of training tokens for Llama 2?"
# for rag in [baseline_rag, big_chunks_rag, multi_vector_rag]:
#     test_rag(rag, query, verbose=1)

### g、【done】MultiEmbedding

考虑到LLM Paper相关的数据是英文的，所以这里我们采用英文Embedding模型对其进行编码能得到更好的效果。参考MTEB，我们采用了BAAI/bge-large-en-v1.5（https://github.com/FlagOpen/FlagEmbedding）。

In [72]:
model_name = "BAAI/bge-large-en-v1.5"
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': True}
en_embedding = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
    query_instruction="Represent this sentence for searching relevant passages: "
)

Downloading (…)5e2c6/.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

Downloading (…)ba76d5e2c6/README.md:   0%|          | 0.00/90.3k [00:00<?, ?B/s]

Downloading (…)76d5e2c6/config.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)5e2c6/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

Downloading (…)ba76d5e2c6/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)6d5e2c6/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [73]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
texts = text_splitter.split_documents(papers)

papers_en_emb_vectordb = Chroma.from_documents(
    documents=texts,
    embedding=en_embedding,
    collection_name="papers_en_emb"
)

papers_en_emb_retriever = papers_en_emb_vectordb.as_retriever()

papers_en_emb_rag = initialize_rag(llm=chatgpt, retriever=papers_en_emb_retriever)

In [74]:
a = evaluator.test_rag(baseline_rag, query="What is the number of training tokens for LLaMA2?", verbose=2)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Question: What is the number of training tokens for LLaMA2?
Response: The number of training tokens for Llama 2 is not explicitly stated in the given context.


Sources:
page=1, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/llm_papers/LLaMA- Open and Efficient Foundation Language Models.pdf
page=15, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/llm_papers/Llama 2- Open Foundation and Fine-Tuned Chat Models.pdf
page=5, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/llm_papers/Llama 2- Open Foundation and Fine-Tuned Chat Models.pdf
page=1, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/llm_papers/LLaMA- Open and Efficient Foundation Language Models.pdf


In [75]:
a = evaluator.test_rag(papers_en_emb_rag, query="What is the number of training tokens for LLaMA2?", verbose=2)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Question: What is the number of training tokens for LLaMA2?
Response: The number of training tokens for Llama 2 is 2.0 trillion tokens. This is stated in the table comparing the attributes of the Llama 2 models with the Llama 1 models. All models are trained with a global batch-size of 4M tokens. The bigger models, 34B and 70B, use Grouped-Query Attention (GQA) for improved inference scalability.


Sources:
page=5, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/llm_papers/Llama 2- Open Foundation and Fine-Tuned Chat Models.pdf
page=4, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/llm_papers/Llama 2- Open Foundation and Fine-Tuned Chat Models.pdf
page=4, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/llm_papers/Llama 2- Open Foundation and Fine-Tuned

### h、【done】Rerank提升召回相关性
使用Rerank的原因：
* 虽然Embedding+向量召回可以很快在大量的文档里面实现召回，但是返回的召回不一定都相关；
* 基于HNSW的召回具有一定的随机性，多次召回结果可能会不一致；
* LLM有context长度限制，如果相关的内容排序不是特别靠前，有可能被截断，影响提取信息的准确度；

可以引入Reranker提升相关性和稳定性，同时减轻输入LLM的context长度限制。
*   Reranker：BAAI/bge-reranker-large，https://github.com/FlagOpen/FlagEmbedding
*   参考文章：https://mp.weixin.qq.com/s/4UoRi8VhQjfE7zcpFnre4A






In [76]:
from FlagEmbedding import FlagReranker
reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation

score = reranker.compute_score(['query', 'passage'])
print(score)

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores)

Downloading (…)okenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/801 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

-1.5224609375
[-5.609375, 5.76171875]


In [77]:
import asyncio
from typing import Dict, Optional, Sequence

from langchain.callbacks.manager import Callbacks
from langchain.pydantic_v1 import Extra
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors.base import BaseDocumentCompressor
from langchain.schema import Document


class BGERerank(BaseDocumentCompressor):
    """Document compressor that uses BGE Reranker."""

    reranker: FlagReranker

    class Config:
        """Configuration for this pydantic object."""

        extra = Extra.forbid
        arbitrary_types_allowed = True

    def compress_documents(
        self,
        documents: Sequence[Document],
        query: str,
        callbacks: Optional[Callbacks] = None,
    ) -> Sequence[Document]:
        if len(documents) == 0:  # to avoid empty api call
            return []
        doc_list = list(documents)
        query_doc_pairs = [[query, d.page_content] for d in doc_list]
        scores = self.reranker.compute_score(query_doc_pairs)
        for doc,score in zip(doc_list, scores):
            doc.metadata["relevance_score"] = score
        doc_list = sorted(doc_list, key=lambda d: d.metadata["relevance_score"], reverse=True)
        return doc_list

    async def acompress_documents(
        self,
        documents: Sequence[Document],
        query: str,
        callbacks: Optional[Callbacks] = None,
    ) -> Sequence[Document]:
        """Compress retrieved documents given the query context."""
        return await asyncio.get_running_loop().run_in_executor(
            None, self.compress_documents, documents, query, callbacks
        )

In [78]:
compressor = BGERerank(reranker=reranker)

# 更好的策略是先用base_retriever召回一个比较大的池子，譬如10，然后使用base_compressor重排序选出topk，譬如3
# 这里为了方便，直接使用base_retriever的topk大小作为base_compressor的大小
big_chunks_compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=big_chunks_retriever
)

big_chunks_compression_rag = initialize_rag(llm=chatgpt, retriever=big_chunks_compression_retriever)

In [79]:
a = evaluator.test_rag(big_chunks_rag, query="如何环球转账？", verbose=3)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Question: 如何环球转账？
Response: 以上常见问题旨在提供于中国大陆开展金融活动所可能面临的与金融产品、服务或监管要求相关的概略信息。因金融产品/服务及相关监管要求的多样性和复杂性，这些常见问题无法涵盖所有细节。有关信息仅供您参考，且不能将其作为法律、金融或任何其他方面的专业意见。汇丰银行对该等信息之准确性、及时性及/或完整性不作任何担保、陈述或保证，亦不承担与此相关的任何责任。


Sources:
page=None, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/hsbc_faqs/我想要进行国内外转账及汇款.txt
Question: 17. 我能进行什么形式的环球转账？
Answer: 环球转账同名速汇—对同名账户的跨境转账
1）从其他国家转入中国的同名账户的环球转账，您能在转出账户所在银行发起转账指令时选择立即转账、预约转账和循环转账，视该银行届时提供的转账形式为准。
2）对于从中国转出的环球转账，您只可选择立即转账或预约转账。
环球转账亲情速汇—对同一汇丰卓越理财家庭金融服务下的家庭成员持有的汇丰海外账户办理转账，您只可选择立即转账。
由于外币跨境汇款受中国相关政府部门监管，因此我行环球转账功能未能提供循环转账服务。

Question: 18. 我使用过其他国家/地区的汇丰环球网上银行转账服务，汇丰中国的和其他国家的页面一样吗？
Answer: 一般来说,页面设置和客户体验都是一样的。但对于从中国转出的环球转账,在您输入或选择了币种、金额、转出和转入账户后, 请如实选择交易编码/原因，这是根据中国国家外汇管理局的要求设定的。对于环球转账尚不支持的资金用途/来源类别，请您亲临柜台咨询办理。

Question: 19. 我能查询过去已被执行的汇丰环球转账吗？
Answ

In [80]:
a = evaluator.test_rag(big_chunks_compression_rag, query="如何环球转账？", verbose=3)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Question: 如何环球转账？
Response: 如何环球转账？您可以通过登录网上银行或手机银行进行操作。在网上银行中，选择“我的银行”-“转账及货币兑换”-“转账及货币兑换”，选择转出账户和转入账户，验证交易详情并确认即可完成交易。在手机银行中，选择“转账汇款”-“环球转账”，输入转出账户和转入账户信息，验证交易详情并确认即可完成交易。若您在海外或港澳台地区，请在号码前加拨中国地区码+86。


Sources:
page=None, source=/content/drive/My Drive/Colab Notebooks/HSBCRAG/data/hsbc_faqs/我想要进行国内外转账及汇款.txt
400-820-3090
若您在海外或港澳台地区，请在号码前加拨中国地区码+86

Question: 10. 我是否可在中国境内划转外币？
Answer: 可以。但根据国家外汇管理局相关规定，个人外汇账户内资金境内划转，仅限于本人账户之间、个人与近亲属账户之间。若有外汇境内划转需要，需要您携带相关证明文件亲临汇丰中国分支行进行办理。

Question: 11. 通过个人网上银行，如何才能连接不同国家的汇丰账户？
Answer: 首先,将被连接的国家/地区的网上银行账号必须是活跃的。在您选择要添加的国家/地区后，根据屏幕提示输入网上银行用户名、密码、安全密码/安全问题便可。

Question: 12. 什么是环球网上银行？我能否由此浏览我在世界各地的汇丰账户？
Answer: “环球网上银行”服务是专门为我行客户推出的网上银行增值服务。我行客户可以通过“环球网上银行”服务浏览其拥有的其他国家或地区的汇丰账户信息。

Question: 13. 环球转账
Answer: 环球转账支持币种包括：美元，英镑，欧元，日元，港元，加拿大元，澳元，新加坡元，瑞士法郎，

可以看到经过rerank之后，“环球转账”相关的内容从第3位上升到第2位，从而可以被送入LLM进行生成（LLM有context的限制，相关的内容排序越靠前，越有利于LLM能提取到准确的信息）。

## 3、生成层

### a、【done】根据场景选择更好的LLM
好的LLM模型本身就很大程度决定了生成质量的基础水平；（本Demo没有进行验证，直接使用ChatGPT作为基础模型）

### b、【done】优化Prompt
Prompt Engineering是影响LLM生成质量很重要的环节，其影响着LLM指令跟随的能力，同时在Prompt中加入额外的信息，也能帮助LLM生成更好和更具有事实性的回复。

In [81]:
fin_rag_prompt = deepcopy(default_rag_prompt)
fin_rag_prompt.messages[0].prompt.template = """You are an experienced financial analyst for HSBC with an interest in Large Language Model. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:"""

In [82]:
prompt_dict = {
    "default_rag_prompt": default_rag_prompt,
    "fin_rag_prompt": fin_rag_prompt,
    "default_prompt": None,
}
retriever_dict = {
    "baseline_retriever": baseline_retriever,
    "better_pdf_retriever": better_pdf_retriever,
    "better_emb_retriever": better_emb_retriever,
    "small_chunks_retriever": small_chunks_retriever,
    "big_chunks_retriever": big_chunks_retriever,
    "parent_chunks_retriever": parent_chunks_retriever,
    "big_chunks_compression_retriever": big_chunks_compression_retriever,
}

In [83]:
rag_results = []
for prompt_name, prompt in prompt_dict.items():
    for retriever_name, retriever in retriever_dict.items():
        rag = initialize_rag(llm=chatgpt, retriever=retriever, prompt=prompt)
        res = evaluator.test_rag_all(rag=rag, question_answer_pairs=question_answer_pairs, verbose=0)
        result = {
            "prompt": prompt_name,
            "retriever": retriever_name,
            "results": res
        }
        rag_results.append(result)

In [84]:
for result in rag_results:
    result["results"] = evaluator.evaluate_rag_results(result["results"])
    result["score"] = evaluator.compute_rag_score(result["results"])

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

In [85]:
df = pd.DataFrame(rag_results).sort_values("score", ascending=False)

In [86]:
df = df[["prompt","retriever","score"]]
df["rag"] = "single_rag"
df = df[["prompt","retriever","rag","score"]]

In [87]:
best_prompt = df.iloc[0]["prompt"]
best_retriever = df.iloc[0]["retriever"]
best_score = df.iloc[0]["score"]

In [88]:
best_prompt, best_retriever, best_score

('default_prompt', 'big_chunks_compression_retriever', 0.772767857142857)

## 4、整体实现

使用LLM进行意图分类，然后分别调用领域专家RAG，类似于Mixture of Expert (MoE)架构。

In [89]:
class EnsembleRAG:
    def __init__(self, intent_prompt, tools, llm):
        self.intent_prompt = intent_prompt
        self.tools = tools
        self.llm = llm

    def __call__(self, query):
        intent_prompt = self.intent_prompt.format(question=query)
        category = self.llm.predict(intent_prompt)
        category = category.lower()
        if category == "none" or category not in self.tools:
            print(f"{query}: {category}")
            return {"query": "query", "result": "I don't know."}
        else:
            tool = self.tools[category]
            return tool(query)

In [90]:
# 使用llm进行意图分类，然后分别调用对应的RAG
intent_domain_prompt = """你是一个有用的助手。
以下是用户可能提出问题的四个意图领域的描述。
对于给定的用户问题，请在这些意图领域中进行选择。
请仅返回意图领域。回答后不要返回任何其他内容。
如果您认为没有任何与之相关的领域，请返回 NONE。

域名：huifeng_faq
描述：对于回答与汇丰金融产品相关的问题非常有用，如账户、账单、转账、汇款、支付、数字银行、微信服务、存款、住房抵押贷款、投资、保险、信用卡等

域名：huifeng_annual_report
描述：对于回答与汇丰财报和年度报告相关的问题非常有用，例如营业收入、营业支出、成本、吸收存款、公司结构、部门结构、公司治理、薪酬结构、风险治理、部门业务范畴等。

域名：jinronghuiyi_article
说明：适用于回答中央金融工作会议的问题，比如不同证券公司的观点。

域名：llm_paper
描述：对于回答与大型语言模型相关的问题非常有用，如LLM、LLaMA、LLaMA2和金融LLM（FinGPT和FinLLM）等。

Question: {question}
Domain:
"""

In [91]:
best_rag = initialize_rag(llm=chatgpt, retriever=retriever_dict[best_retriever], prompt=prompt_dict[best_prompt])
paper_rag = initialize_rag(llm=chatgpt, retriever=papers_en_emb_retriever, prompt=prompt_dict[best_prompt])
tools = {
    "huifeng_faq": best_rag, # 优化了pdf+zh embedding+chunk+rerank+prompt
    "huifeng_annual_report": best_rag, # 优化了pdf+zh embedding+chunk+rerank+prompt
    "jinronghuiyi_article": metadata_rag, # 优化了metadata
    "llm_paper": paper_rag # 优化了pdf+en embedding+chunk+prompt
}

In [92]:
intent_results = {}
for category,qas in question_answer_pairs.items():
    intent_results[category] = 0
    for qa in qas:
        i = chatgpt.predict(intent_domain_prompt.format(question=qa["question"]))
        intent_results[category] += int(i == category)
    intent_results[category] /= len(qas)
intent_results

{'huifeng_faq': 1.0,
 'huifeng_annual_report': 0.7647058823529411,
 'jinronghuiyi_article': 1.0,
 'llm_paper': 1.0}

In [93]:
ensemble_rag = EnsembleRAG(intent_prompt=intent_domain_prompt, tools=tools, llm=chatgpt)
ensemble_rag_results = evaluator.test_rag_all(rag=ensemble_rag, question_answer_pairs=question_answer_pairs, verbose=0)
ensemble_rag_results = evaluator.evaluate_rag_results(ensemble_rag_results)
evaluator.inspect_rag_results(ensemble_rag_results)

category=huifeng_faq, question=如何环球转账？, score=0.4, response=您可以通过以下步骤进行汇丰环球转账：

1. 登录汇丰银行网上银行，进入“我的银行”-“转账及货币兑换”-“转账及货币兑换”页面。

2. 在新交易中，“转出账户”选择外币储蓄账户。转入账户选择“我的账户”，并选择另一个汇丰其他国家/地区的外币账户。

3. 输入转账金额和币种，选择交易编码/原因，然后验证交易详情并确认，即可完成交易。

您还可以通过登录汇丰银行手机App进行相关操作。请注意，环球转账支持的币种包括美元、英镑、欧元、日元、港元、加拿大元、澳元、新加坡元、瑞士法郎和新西兰元。同时，环球转账仅支持外币现汇储蓄账户，不支持外币现钞储蓄账户。如果您需要进行环球转账，您可以在网上银行或柜台开立外币附属账号。
category=jinronghuiyi_article, question=广发证券对中央金融工作会议的观点有哪些？, score=0.0, response=广发证券对中央金融工作会议的观点无法从提供的文本中得出。
category=llm_paper, question=What is the author of Llama?, score=0.9285714285714286, response=The authors of Llama are Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothee Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample.
category=llm_paper, question=What is the affiliation of the first author of DISC-FinLLM?, score=0.0, response=The first author of DISC-FinLLM is affiliated with two i

In [94]:
score = evaluator.compute_rag_score(ensemble_rag_results)
score

0.835267857142857

In [95]:
df.loc[len(df.index)] = [f"{best_prompt}", f"{best_retriever}+SelfQueryRetriever", "ensemble_rag", score]

## 5、进一步改进

### a、【done】引入KG结构化知识

In [96]:
paper_rag(question_answer_pairs["llm_paper"][5]["question"])

{'query': 'Is the first author of Llama and Llama 2 the same? If yes, please output <<YES>>, otherwise output <<NO>>.',
 'result': 'No.',
 'source_documents': [Document(page_content='Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron∗\nLouis Martin†\nKevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev\nPunit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich\nYinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra\nIgor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi\nAlan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing El

我们通过在Prompt中加入KG的信息来模拟LLM+KG。更具体的方案可以参考：

*   https://blog.langchain.dev/using-a-knowledge-graph-to-implement-a-devops-rag-application/
*   https://mp.weixin.qq.com/s/VJRG0MUaEGR6iM_xFRroyg



In [97]:
paper_rag.combine_documents_chain.llm_chain.prompt = deepcopy(paper_rag.combine_documents_chain.llm_chain.prompt)
paper_rag.combine_documents_chain.llm_chain.prompt.messages[0].prompt.template = """Use the following pieces of context to answer the users question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
You are also provided the following Academic Knowledge Graph for checking affiliation. Please only use it when necessary.
----------------
KG:
("Hugo Touvron", "affiliated_with", "Meta AI")
("Wei Chen", "affiliated_with", "Fudan University and Huazhong University of Science and Technology")
----------------
{context}"""

paper_rag.combine_documents_chain.llm_chain.prompt.messages[1].prompt.template = """{question}"""

In [98]:
paper_rag(question_answer_pairs["llm_paper"][2]["question"])

{'query': 'What is the affiliation of the first author of Llama?',
 'result': 'The first author of Llama is Hugo Touvron and he is affiliated with Meta AI.',
 'source_documents': [Document(page_content='Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron∗\nLouis Martin†\nKevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev\nPunit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich\nYinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra\nIgor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi\nAlan Schelten Ruan Silva Eric Michael Smith Ranjan Sub

In [99]:
ensemble_rag = EnsembleRAG(intent_prompt=intent_domain_prompt, tools=tools, llm=chatgpt)
ensemble_rag_results = evaluator.test_rag_all(rag=ensemble_rag, question_answer_pairs=question_answer_pairs, verbose=0)
ensemble_rag_results = evaluator.evaluate_rag_results(ensemble_rag_results)
evaluator.inspect_rag_results(ensemble_rag_results)

category=huifeng_faq, question=如何环球转账？, score=0.4, response=您可以通过以下步骤进行汇丰环球转账：

1. 登录汇丰银行网上银行，进入“我的银行”-“转账及货币兑换”-“转账及货币兑换”页面。

2. 在新交易中，“转出账户”选择外币储蓄账户。转入账户选择“我的账户”，并选择另一个汇丰其他国家/地区的外币账户。

3. 输入转账金额和币种，选择交易编码/原因，然后验证交易详情并确认，即可完成交易。

您还可以通过登录汇丰银行手机App进行相关操作。请注意，环球转账支持的币种包括美元、英镑、欧元、日元、港元、加拿大元、澳元、新加坡元、瑞士法郎和新西兰元。同时，环球转账仅支持外币现汇储蓄账户，不支持外币现钞储蓄账户。如果您需要进行环球转账，您可以在网上银行或柜台开立外币附属账号。
category=jinronghuiyi_article, question=广发证券对中央金融工作会议的观点有哪些？, score=0.0, response=广发证券对中央金融工作会议的观点没有在提供的文本中提及。
category=llm_paper, question=What is the author of Llama?, score=0.7857142857142857, response=There are multiple authors of Llama, including Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothee Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, and Armand Joulin. They are all affiliated with Meta AI.
category=llm_paper, question=What are the common authors of Llama and Llama 2?, score=0.16666666666666666, response=Unfortunately, the prov

In [100]:
score = evaluator.compute_rag_score(ensemble_rag_results)
score

0.8672619047619047

In [101]:
df.loc[len(df.index)] = [f"{best_prompt}", f"{best_retriever}+SelfQueryRetriever", "ensemble_rag+KG", score]

### b、【done】Chain-of-Thought处理复杂Query

In [112]:
paper_rag(question_answer_pairs["llm_paper"][5]["question"])

{'query': 'Is the first author of Llama and Llama 2 the same? If yes, please output <<YES>>, otherwise output <<NO>>.',
 'result': 'NO',
 'source_documents': [Document(page_content='Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron∗\nLouis Martin†\nKevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev\nPunit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich\nYinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra\nIgor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi\nAlan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ell

In [103]:
print(paper_rag.combine_documents_chain.llm_chain.prompt.messages[0].prompt.template)

Use the following pieces of context to answer the users question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
You are also provided the following Academic Knowledge Graph for checking affiliation. Please only use it when necessary.
----------------
KG:
("Hugo Touvron", "affiliated_with", "Meta AI")
("Wei Chen", "affiliated_with", "Fudan University and Huazhong University of Science and Technology")
----------------
{context}


In [111]:
paper_rag.combine_documents_chain.llm_chain.prompt = deepcopy(paper_rag.combine_documents_chain.llm_chain.prompt)
paper_rag.combine_documents_chain.llm_chain.prompt.messages[0].prompt.template = """Use the following pieces of context to answer the users question.
You are also provided the following Academic Knowledge Graph for checking affiliation. Please only use it when necessary.
----------------
KG:
("Hugo Touvron", "affiliated_with", "Meta AI")
("Wei Chen", "affiliated_with", "Fudan University and Huazhong University of Science and Technology")
----------------
{context}"""

paper_rag.combine_documents_chain.llm_chain.prompt.messages[1].prompt.template = """{question}\nLet's think step by step."""

In [105]:
paper_rag(question_answer_pairs["llm_paper"][5]["question"])

{'query': 'Is the first author of Llama and Llama 2 the same? If yes, please output <<YES>>, otherwise output <<NO>>.',
 'result': '1. Look for the names of the authors of Llama and Llama 2.\n2. Identify if the first author of Llama is the same as the first author of Llama 2.\n3. Output <<YES>> if they are the same, otherwise output <<NO>>.\n\nBased on the given context, the first author of Llama is not explicitly mentioned. Therefore, we cannot determine if the first author of Llama and Llama 2 are the same.',
 'source_documents': [Document(page_content='Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron∗\nLouis Martin†\nKevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan M

In [106]:
ensemble_rag = EnsembleRAG(intent_prompt=intent_domain_prompt, tools=tools, llm=chatgpt)
ensemble_rag_results = evaluator.test_rag_all(rag=ensemble_rag, question_answer_pairs=question_answer_pairs, verbose=0)
ensemble_rag_results = evaluator.evaluate_rag_results(ensemble_rag_results)
evaluator.inspect_rag_results(ensemble_rag_results)

category=huifeng_faq, question=如何环球转账？, score=0.4, response=您可以通过以下步骤进行汇丰环球转账：

1. 登录汇丰银行网上银行，进入“我的银行”-“转账及货币兑换”-“转账及货币兑换”页面。

2. 在新交易中，“转出账户”选择外币储蓄账户。转入账户选择“我的账户”，并选择另一个汇丰其他国家/地区的外币账户。

3. 输入转账金额和币种，选择交易编码/原因，然后验证交易详情并确认，即可完成交易。

您还可以通过登录汇丰银行手机App进行相关操作。请注意，环球转账支持的币种包括美元、英镑、欧元、日元、港元、加拿大元、澳元、新加坡元、瑞士法郎和新西兰元。同时，环球转账仅支持外币现汇储蓄账户，不支持外币现钞储蓄账户。如果您需要进行环球转账，您可以在网上银行或柜台开立外币附属账号。
category=jinronghuiyi_article, question=广发证券对中央金融工作会议的观点有哪些？, score=0.0, response=广发证券对中央金融工作会议的观点没有在提供的文本中提及。
category=llm_paper, question=What is the author of Llama?, score=0.07142857142857142, response=1. The paper "LLaMA: Open and Efficient Foundation Language Models" has multiple authors.
2. The first author listed is Hugo Touvron.
3. However, the paper also mentions that Llama was developed by Meta AI.
4. The Academic Knowledge Graph provided shows that Hugo Touvron is affiliated with Meta AI.
5. Therefore, it can be concluded that both Hugo Touvron and Meta AI are authors of Llama.
category=llm_paper, question=Is

In [107]:
score = evaluator.compute_rag_score(ensemble_rag_results)
score

0.8709821428571428

In [108]:
df.loc[len(df.index)] = [f"{best_prompt}+COT", f"{best_retriever}+SelfQueryRetriever", "ensemble_rag+KG", score]

### c、【todo】引入上下文



如果是多轮QA，需要引入Memory来记录上下文信息，来帮助提高生成效果。对于对话，可以使用Coversation Buffer：https://python.langchain.com/docs/modules/memory/types/buffer

### d、【todo】Self-RAG
paper：https://github.com/AkariAsai/self-rag

# 第六章：总结

本文通过金融LLM+RAG的Demo，探索了RAG优化的一些思路。**从最初15%的准确率，通过一系列的优化，最终达到75+%**（Chroma似乎有随机性，结果每次跑有所不同，https://github.com/langchain-ai/langchain/issues/1946）。有以下一些Insight：

*   数据层：整个RAG的输入
    - 对其进行高质量的预处理，可以帮助后续的召回和生成模块；*（在本Demo有正向效果）*
    - 同时利用meta源数据，可以提高召回的相关性；*（在本Demo有正向效果）*
*   召回层：核心环节，负责检索送进LLM进行生成的原材料，其召回结果的全面性、多样性和相关性对生成质量至关重要。
    - 为了保证召回结果的相关性，可以通过改进Embedding模型和进行Rerank；*（在本Demo有正向效果）*
    - 为了提高召回的全面性，需要调节合适的chunk大小或者使用多粒度召回的方式；*（在本Demo有正向效果）*
    - 为了提高召回的多样性，可以采用MultiQuery和MultiVector等方法*（在本Demo中，没有验证到有效性）*。
*   生成层：RAG的最后一环，负责最终结果的生成。
    - 好的LLM模型本身就很大程度决定了生成质量的基础水平；*（本Demo没有进行验证，直接使用ChatGPT作为基础模型）*
    - 好的Prompt能帮助LLM更好的生成；*（在本Demo有正向效果）*
*   其他
    - 对于复杂的问题，使用COT技术进行问题的分解和步步推理可以提升效果；*（在本Demo有正向效果）*
    - 使用Agent（Intent Classification+EnsembleRAG/Tool）可以使用针对领域优化的RAG来提升整体效果；*（在本Demo有正向效果）*
    - 引入KG等外部信息，可以帮助解决LLM幻觉问题；*（在本Demo有正向效果）*

In [109]:
df.sort_values("score", ascending=False)

Unnamed: 0,prompt,retriever,rag,score
23,default_prompt+COT,big_chunks_compression_retriever+SelfQueryRetr...,ensemble_rag+KG,0.870982
22,default_prompt,big_chunks_compression_retriever+SelfQueryRetr...,ensemble_rag+KG,0.867262
21,default_prompt,big_chunks_compression_retriever+SelfQueryRetr...,ensemble_rag,0.835268
20,default_prompt,big_chunks_compression_retriever,single_rag,0.772768
18,default_prompt,big_chunks_retriever,single_rag,0.697768
4,default_rag_prompt,big_chunks_retriever,single_rag,0.510268
13,fin_rag_prompt,big_chunks_compression_retriever,single_rag,0.48125
6,default_rag_prompt,big_chunks_compression_retriever,single_rag,0.480941
19,default_prompt,parent_chunks_retriever,single_rag,0.473624
11,fin_rag_prompt,big_chunks_retriever,single_rag,0.467187
