# 通过SK实现RAG

对于需要外部知识支撑的场景，我们通常会使用RAG（Retrieval Augmented Generation）来实现，其中一般会涉及到文档解析，切片，向量化
检索，通过LLM生成输出等步骤。
在[Semantic ChatBot](./chatbot_with_sk.ipynb)中我们基于semantic kernel实现了一个简单的chatbot，并使用`context variables`
实现了历史聊天的存储和记录。
但是对于很长的外部知识库库，我们可能会因此需要超长的上下文，甚至无法没办法记录所有的历史，为此SK中提供了Memory类型以记录实现LLM的记忆能力。

In [None]:
! pip install semantic-kernel==0.4.5.dev0

初始化鉴权，导入SK，以及适配SK的Qianfan实现类型

In [1]:
import os 

os.environ["QIANFAN_ACCESS_KEY"] = "your_ak"
os.environ["QIANFAN_SECRET_KEY"] = "your_sk"

In [2]:
import semantic_kernel as sk
from qianfan.extensions.semantic_kernel import (
    QianfanChatCompletion,
    QianfanTextEmbedding,
)

### SK Memory

SK Memory 是一个数据框架，可以通过接入外部的各种数据源；可以是从网页，数据库，email等，这些都集成在了SK的内置connectors中，而通过`QianfanTextEmbedding`，可以提取这些数据源中的文本的特征向量，以供后续的检索使用。


这里使用了`VolatileMemoryStore`作为Memory的实现为例，`VolatileMemoryStore` 实现了内存的临时存储（底层通过一个Dict[Dict[str, MemoryRecord]] 实现分collection的kv存储）。

In [3]:
from semantic_kernel.memory import VolatileMemoryStore
from semantic_kernel.core_skills import TextMemorySkill
kernel = sk.Kernel()

qf_chat_service = QianfanChatCompletion(ai_model_id="ERNIE-Bot-4")
qf_text_embedding = QianfanTextEmbedding(ai_model_id="Embedding-V1")
kernel.add_chat_service("chat-qf", qf_chat_service)
kernel.add_text_embedding_generation_service("embed-eb", qf_text_embedding)

kernel.register_memory_store(memory_store=VolatileMemoryStore())
kernel.import_skill(TextMemorySkill())

{'recall': SKFunction(), 'save': SKFunction()}

调用异步函数，完成数据的添加，这里往了一个名为`aboutMe`的`collection`中添加了若干个人信息

In [4]:
async def populate_memory(kernel: sk.Kernel) -> None:
    # Add some documents to the semantic memory
    await kernel.memory.save_information_async(collection="aboutMe", id="info1", text="我名字叫做小度")
    await kernel.memory.save_information_async(
        collection="aboutMe", id="info2", text="我工作在baidu"
    )
    await kernel.memory.save_information_async(
        collection="aboutMe", id="info3", text="我来自中国"
    )
    await kernel.memory.save_information_async(
        collection="aboutMe",
        id="info4",
        text="我曾去过北京，上海，深圳",
    )
    await kernel.memory.save_information_async(collection="aboutMe", id="info5", text="我爱打羽毛球")

In [5]:
await populate_memory(kernel)

[INFO] [02-01 13:50:38] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
[INFO] [02-01 13:50:39] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
[INFO] [02-01 13:50:39] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
[INFO] [02-01 13:50:39] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
[INFO] [02-01 13:50:40] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


通过`TextMemoryBase`中实现的余弦相似度计算向量相似度可以search到对应相似的回答：

In [6]:
async def search_memory_examples(kernel: sk.Kernel) -> None:
    questions = [
        "我的名字是？",
        "我在哪里工作？",
        "我去过哪些地方旅游?",
        "我的家乡是?",
        "我的爱好是？",
    ]

    for question in questions:
        print(f"Question: {question}")
        result = await kernel.memory.search_async("aboutMe", question)
        print(f"Answer: {result[0].text}\n")

In [7]:
await search_memory_examples(kernel)

[INFO] [02-01 13:40:18] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Question: 我的名字是？


[INFO] [02-01 13:40:19] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我名字叫做小度

Question: 我在哪里工作？


[INFO] [02-01 13:40:20] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我工作在baidu

Question: 我去过哪些地方旅游?


[INFO] [02-01 13:40:20] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我曾去过北京，上海，深圳

Question: 我的家乡是?


[INFO] [02-01 13:40:20] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我来自中国

Question: 我的爱好是？
Answer: 我爱打羽毛球



Let's now revisit the our chat sample from the [previous notebook](04-context-variables-chat.ipynb).
If you remember, we used context variables to fill the prompt with a `history` that continuously got populated as we chatted with the bot. Let's add also memory to it!

This is done by using the `TextMemorySkill` which exposes the `recall` native function.

`recall` takes an input ask and performs a similarity search on the contents that have
been embedded in the Memory Store and returns the most relevant memory. 

In [13]:
from typing import Tuple
async def setup_chat_with_memory(
    kernel: sk.Kernel,
) -> Tuple[sk.SKFunctionBase, sk.SKContext]:
    sk_prompt = """
    你是一个人设分析师，你需要严格根据以下背景资料，以及历史聊天记录回答以下问题：
    
    背景资料：
    - {{$fact1}} {{recall $fact1}}
    - {{$fact2}} {{recall $fact2}}
    - {{$fact3}} {{recall $fact3}}
    - {{$fact4}} {{recall $fact4}}
    - {{$fact5}} {{recall $fact5}}

    聊天记录: {{$chat_history}}
    回答:

     """.strip()

    chat_func = kernel.create_semantic_function(sk_prompt, temperature=0.8)

    context = kernel.create_new_context()
    context["fact1"] = "我的名字是？"
    context["fact2"] = "我在哪里工作？"
    context["fact3"] = "我去过哪些地方旅游?"
    context["fact4"] = "我的家乡是?"
    context["fact5"] = "我的爱好是？"

    context[sk.core_skills.TextMemorySkill.COLLECTION_PARAM] = "aboutMe"
    context[sk.core_skills.TextMemorySkill.RELEVANCE_PARAM] = "0.5"

    context["chat_history"] = ""

    return chat_func, context

The `RelevanceParam` is used in memory search and is a measure of the relevance score from 0.0 to 1.0, where 1.0 means a perfect match. We encourage users to experiment with different values.

Now that we've included our memories, let's chat!

In [14]:
async def chat(kernel: sk.Kernel, chat_func: sk.SKFunctionBase, context: sk.SKContext) -> bool:
    try:
        user_input = input("用户:> ")
        context["user_input"] = user_input
        print(f"User:> {user_input}")
    except KeyboardInterrupt:
        print("\n\nExiting chat...")
        return False
    except EOFError:
        print("\n\nExiting chat...")
        return False

    if user_input == "exit":
        print("\n\nExiting chat...")
        return False

    answer = await kernel.run_async(chat_func, input_vars=context.variables)
    context["chat_history"] += f"\n用户:> {user_input}\n回答:> {answer}\n"

    print(f"ChatBot:> {answer}")
    return True

In [12]:
print("Populating memory...")
await populate_memory(kernel)

print("Asking questions... (manually)")
await search_memory_examples(kernel)

print("Setting up a chat (with memory!)")
chat_func, context = await setup_chat_with_memory(kernel)

print("Begin chatting (type 'exit' to exit):\n")
chatting = True
while chatting:
    chatting = await chat(kernel, chat_func, context)

[INFO] [02-01 13:55:41] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Populating memory...


[INFO] [02-01 13:55:42] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
[INFO] [02-01 13:55:42] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
[INFO] [02-01 13:55:43] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
[INFO] [02-01 13:55:43] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
[INFO] [02-01 13:55:44] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Asking questions... (manually)
Question: 我的名字是？


[INFO] [02-01 13:55:44] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我名字叫做小度

Question: 我在哪里工作？


[INFO] [02-01 13:55:45] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我工作在baidu

Question: 我去过哪些地方旅游?


[INFO] [02-01 13:55:45] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我曾去过北京，上海，深圳

Question: 我的家乡是?


[INFO] [02-01 13:55:46] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我来自中国

Question: 我的爱好是？
Answer: 我爱打羽毛球

Setting up a chat (with memory!)
Begin chatting (type 'exit' to exit):



[INFO] [02-01 13:56:03] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


User:> hi你好？


[INFO] [02-01 13:56:03] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
[INFO] [02-01 13:56:04] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
Memory not found in collection: aboutMe
[INFO] [02-01 13:56:04] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
Memory not found in collection: aboutMe
[INFO] [02-01 13:56:05] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
Memory not found in collection: aboutMe
Variable `$chat_history` not found
[INFO] [02-01 13:56:05] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /chat/eb-instant


ChatBot:> 您好，根据您的背景资料，您是来自百度的一位人设分析师。您的工作地点在百度，具体的工作内容可能涉及到用户的人设分析。您目前没有提及您曾经去过哪些地方旅游，家乡和爱好等信息。如果您愿意分享这些信息，我会尽力回答您的问题。
User:> exit


Exiting chat...


### 添加文档到Memory中

很多时候，我们有大量的外部知识库，接下来我们将使用SK的`VolatileMemoryStore`以用于加载外部知识文档：
例如我们添加千帆SDK的repo：

In [11]:
github_files = {
    "https://github.com/baidubce/bce-qianfan-sdk/blob/main/README.md": "README: 千帆SDK介绍，安装，基础使用方法",
    "https://github.com/baidubce/bce-qianfan-sdk/blob/main/cookbook/finetune/trainer_finetune.ipynb": "Cookbook: 千帆SDK Trainer使用方法"
}

Now let's add these files to our VolatileMemoryStore using `SaveReferenceAsync`. We'll separate these memories from the chat memories by putting them in a different collection.

In [12]:
memory_collection_name = "QianfanGithub"
i = 0
for entry, value in github_files.items():
    await kernel.memory.save_reference_async(
        collection=memory_collection_name,
        description=value,
        text=value,
        external_id=entry,
        external_source_name="GitHub",
    )
    i += 1
    print("  已添加 {} saved".format(i))

[INFO] [02-01 13:41:11] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
[INFO] [02-01 13:41:11] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


  已添加 1 saved
  已添加 2 saved


In [20]:
ask = "我希望整体了解千帆SDK，有什么办法？"
print("===========================\n" + "Query: " + ask + "\n")

results = await kernel.memory.search_async(memory_collection_name, ask, limit=5, min_relevance_score=0.7)

i = 0
for res in results:
    i += 1
    print(f"Result {i}:")
    print("  URL:     : " + res.id)
    print("  Title    : " + res.description)
    print("  Relevance: " + str(res.relevance))
    print()

[INFO] [02-01 13:42:12] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Query: 我希望整体了解千帆SDK，有什么办法？

Result 1:
  URL:     : https://github.com/baidubce/bce-qianfan-sdk/blob/main/README.md
  Title    : README: 千帆SDK介绍，安装，基础使用方法
  Relevance: 0.7502846678234273



使用返回的文档进行

除了VolatileMemory之外，我们还可以通过对接外部向量库的形式实现大量的外部知识库，SK官方提供常用的例如`chroma`,`pinecone`等实现。通过直接替换memory_store可以实现kernel和chroma的对接：

In [21]:
from semantic_kernel.connectors.memory.chroma import (
    ChromaMemoryStore,
)

kernel.register_memory_store(
    memory_store=ChromaMemoryStore(
        persist_directory="./"
    )
)

In [22]:
await populate_memory(kernel)

[INFO] [02-01 13:42:23] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
Insert of existing embedding ID: info1
Add of existing embedding ID: info1
[INFO] [02-01 13:42:23] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
Insert of existing embedding ID: info2
Add of existing embedding ID: info2
[INFO] [02-01 13:42:24] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
Insert of existing embedding ID: info3
Add of existing embedding ID: info3
[INFO] [02-01 13:42:24] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
Insert of existing embedding ID: info4
Add of existing embedding ID: info4
[INFO] [02-01 13:42:25] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1
Insert of existing embedding ID: info5
Add of existing embedding ID: info5


In [23]:
await search_memory_examples(kernel)

[INFO] [02-01 13:42:26] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Question: 我的名字是？


Chroma returns distance score not cosine similarity score.                So embeddings are automatically queried from database for calculation.
[INFO] [02-01 13:42:27] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我名字叫做小度

Question: 我在哪里工作？


Chroma returns distance score not cosine similarity score.                So embeddings are automatically queried from database for calculation.
[INFO] [02-01 13:42:27] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我工作在baidu

Question: 我去过哪些地方旅游?


Chroma returns distance score not cosine similarity score.                So embeddings are automatically queried from database for calculation.
[INFO] [02-01 13:42:28] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我曾去过北京，上海，深圳

Question: 我的家乡是?


Chroma returns distance score not cosine similarity score.                So embeddings are automatically queried from database for calculation.
[INFO] [02-01 13:42:28] openapi_requestor.py:166 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1


Answer: 我来自中国

Question: 我的爱好是？


Chroma returns distance score not cosine similarity score.                So embeddings are automatically queried from database for calculation.


Answer: 我爱打羽毛球

