# llamaindex+Internlm2 RAG实践

在本notebook中，我们将会使用llamaindex和Interlm2-chat-1.8b进行知识库查询实践。首先来安装llamaindex。

In [None]:
!pip install llama-index llama-index-llms-huggingface "transformers[torch]==4.41.1" "huggingface_hub[inference]==0.23.1" sentence-transformers sentencepiece
!pip install einops



## LlamaIndex HuggingFaceLLM

未使用RAG技术之前，我们来测试一下询问“xtuner是什么？”的结果。

In [None]:
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.llms import ChatMessage
llm = HuggingFaceLLM(
    model_name="internlm/internlm2-chat-1_8b",
    tokenizer_name="internlm/internlm2-chat-1_8b",
    model_kwargs={"trust_remote_code":True},
    tokenizer_kwargs={"trust_remote_code":True}
)

rsp = llm.chat(messages=[ChatMessage(content="xtuner是什么？")])
print(rsp)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

assistant: xtuner是一款用于播放音乐的软件，它支持多种音频格式，包括MP3、WAV、WMA、FLAC、AAC、APE、OGG、WMA、WAV、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、WMA、W


可以看到结果，并不是我们所期望的。接下来借助RAG技术试试看。

## LlamaIndex RAG
首先安装相关依赖

In [None]:
!pip install llama-index-embeddings-huggingface llama-index-embeddings-instructor

Collecting llama-index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.2.1-py3-none-any.whl (7.1 kB)
Collecting llama-index-embeddings-instructor
  Downloading llama_index_embeddings_instructor-0.1.3-py3-none-any.whl (3.6 kB)
Collecting instructorembedding<2.0.0,>=1.0.1 (from llama-index-embeddings-instructor)
  Downloading InstructorEmbedding-1.0.1-py2.py3-none-any.whl (19 kB)
Installing collected packages: instructorembedding, llama-index-embeddings-instructor, llama-index-embeddings-huggingface
Successfully installed instructorembedding-1.0.1 llama-index-embeddings-huggingface-0.2.1 llama-index-embeddings-instructor-0.1.3


此处将xtuner的README文件放入data文件中，作为知识库。

In [None]:
!mkdir data
!git clone https://github.com/InternLM/xtuner.git
!mv xtuner/README_zh-CN.md ./data

Cloning into 'xtuner'...
remote: Enumerating objects: 8423, done.[K
remote: Counting objects: 100% (5457/5457), done.[K
remote: Compressing objects: 100% (884/884), done.[K
remote: Total 8423 (delta 5061), reused 4662 (delta 4571), pack-reused 2966[K
Receiving objects: 100% (8423/8423), 1.64 MiB | 14.76 MiB/s, done.
Resolving deltas: 100% (6455/6455), done.


In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLM

embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
)

Settings.embed_model = embed_model

Settings.llm = llm

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("xtuner是什么?")

print(response)

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/4.12k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/645 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/471M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

XTuner 是一个高效、灵活、全能的轻量化大模型微调工具库。
file_path: /content/data/README_zh-CN.md

XTuner 是一个高效、灵活、全能的轻量化大模型微调工具库。
file_path: /content/data/README_zh-CN.md

XTuner 是一个高效、灵活、全能的轻量化大模型微调工具库。
file_path: /content/data/README_zh-CN.md

XTuner 是一个高效、灵活、全能的轻量化大模型微调工具库。
file_path: /content/data/README_zh-CN.md

XTuner 是一个高效、灵活、全能的轻量化大模型微调工具库。
file_path: /content/data/README_zh-CN.md

XTuner 是一个高效、灵活、全能的轻量化大模型微调工具库。
file_path: /content/data/README_zh-CN.md

XTuner 是一个高效、灵活、全能的轻量化大模型微调工具库。
file_path: /content/data/README_zh-CN.md

XTuner 是一个高效、灵活、全能的轻量化大模型微调工具库


此处的回答结果，应用了xtuner中README的内容，符合我们的预期。