# 使用 LlamaIndex 框架调用 Qdrant

问题：

- 和 `simple.ipynb` 使用相同的 embedding，但是得到的余弦相似度有 `0.5774549345488187-0.6095981240348884` 的差
    - 初步结论，提示词的作用，见 [多种加载 embedding 方式比较](./simple-llamaindex.ipynb)

## 准备

In [1]:
%%time
%%capture

persist_dir = "/tmp/qdrant_my_books"
!rm -rf $persist_dir

CPU times: user 7.29 ms, sys: 4.08 ms, total: 11.4 ms
Wall time: 111 ms


In [2]:
%%time
%%capture

!pip install llama-index-vector-stores-qdrant
!pip install qdrant_client
!pip install llama-index-embeddings-fastembed
!pip install llama-index-embeddings-huggingface
# !pip install fastembed-gpu
!pip install transformers

CPU times: user 50.6 ms, sys: 18.5 ms, total: 69.1 ms
Wall time: 19.6 s


In [3]:
%%time

books = [
    {
        "name": "围城",
        "description": "主人公方鸿渐留学回国后，面临找工作和个人感情的种种问题。他经历了几段感情波折，包括与鲍小姐的失败婚姻和与孙柔嘉的恋情，最终与孙柔嘉结婚。但婚后生活并不如意，他在事业上也遭遇挫折，未能实现自己的理想。",
        "author": "钱钟书",
    },
    {
        "name": "故乡",
        "description": "小说讲述了主人公“我”（即鲁迅的化身）在阔别多年后回到故乡接母亲到城里生活的故事。在故乡，他遇到了童年的玩伴闰土和老仆人杨二嫂。通过与他们的交谈和观察，主人公感受到故乡的变化和人们生活的困苦。",
        "author": "鲁迅",
    },
    {
       "name": "阿Q正传",
        "description": "讲述了阿Q这个贫苦农民在中国封建社会中的悲惨生活。他虽然穷困潦倒，但心态自负，总是以精神胜利法来安慰自己，逃避现实的困境。然而，随着社会动荡和革命的到来，阿Q的命运变得更加悲惨，最终被误认为是革命党人而被处死。",
        "author": "鲁迅",
    }, 
]

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 3.81 µs


In [4]:
%%time

from llama_index.core import Document

documents=[]

for book in books:
    document=Document(
        text=book['description'],
        metadata={"name": book['name'], "author": book['author']},
    )
    documents.append(document)

CPU times: user 2.62 s, sys: 363 ms, total: 2.99 s
Wall time: 2.68 s


In [5]:
%%time

from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

# Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-small-zh-v1.5") # 可以用
Settings.embed_model = HuggingFaceEmbedding(model_name="/models/bge-small-zh-v1.5")

CPU times: user 1.51 s, sys: 216 ms, total: 1.72 s
Wall time: 1.69 s


In [6]:
Settings.embed_model.max_length

512

## 使用 Qdrant

### 设置存储位置

In [7]:
%%time

collection_name="my_books"

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 3.58 µs


In [8]:
%%time

from qdrant_client import QdrantClient
from qdrant_client import models

client = QdrantClient(":memory:")


if not client.collection_exists(collection_name):
    client.create_collection(
        collection_name=collection_name,
        vectors_config=models.VectorParams(
            size=Settings.embed_model.max_length,
            distance=models.Distance.COSINE,
        ),
    )

CPU times: user 794 ms, sys: 12.4 ms, total: 806 ms
Wall time: 805 ms


### 构建 VectorStoreIndex

In [9]:
%%time

from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex
from qdrant_client import models

vector_store = QdrantVectorStore(
    client=client, 
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=Settings.embed_model.max_length,
        distance=models.Distance.COSINE,
    ),
)
storage_context = StorageContext.from_defaults(
    vector_store=vector_store,
    # persist_dir=persist_dir,
)
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
)

CPU times: user 422 ms, sys: 54.8 ms, total: 476 ms
Wall time: 474 ms


### 保存和加载

In [10]:
%%time

index.storage_context.persist(persist_dir)

CPU times: user 2.2 ms, sys: 0 ns, total: 2.2 ms
Wall time: 2.45 ms


In [11]:
%%time

from llama_index.core import load_index_from_storage

storage_context = StorageContext.from_defaults(
    vector_store=vector_store,
    persist_dir=persist_dir
)
index = load_index_from_storage(storage_context)

CPU times: user 2.11 ms, sys: 0 ns, total: 2.11 ms
Wall time: 1.79 ms


### 查询

In [12]:
%%time

retriever = index.as_retriever()
nodes = retriever.retrieve("方鸿渐")

nodes

CPU times: user 18.5 ms, sys: 0 ns, total: 18.5 ms
Wall time: 17.8 ms


[NodeWithScore(node=TextNode(id_='7a0737ef-ed8d-4a29-ab67-c44df797e19b', embedding=None, metadata={'name': '围城', 'author': '钱钟书'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='d4dfd37a-2039-4b85-a51e-0c8aee1b99db', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'name': '围城', 'author': '钱钟书'}, hash='56789afd35feb54babd9e0b0e5d2829f958d312139246f30ce119b1451af4a01')}, text='主人公方鸿渐留学回国后，面临找工作和个人感情的种种问题。他经历了几段感情波折，包括与鲍小姐的失败婚姻和与孙柔嘉的恋情，最终与孙柔嘉结婚。但婚后生活并不如意，他在事业上也遭遇挫折，未能实现自己的理想。', start_char_idx=0, end_char_idx=99, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.5774549345488187),
 NodeWithScore(node=TextNode(id_='a2d82a2c-9333-43b4-b727-dd00a4c26797', embedding=None, metadata={'name': '阿Q正传', 'author': '鲁迅'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(nod

### 过滤

In [13]:
%%time

from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="author", operator=FilterOperator.EQ, value="钱钟书"
        ),
    ]
)

CPU times: user 131 µs, sys: 0 ns, total: 131 µs
Wall time: 134 µs


In [14]:
%%time

retriever = index.as_retriever(filters=filters)
nodes = retriever.retrieve("方鸿渐")

nodes

CPU times: user 8.89 ms, sys: 0 ns, total: 8.89 ms
Wall time: 9.17 ms


[NodeWithScore(node=TextNode(id_='7a0737ef-ed8d-4a29-ab67-c44df797e19b', embedding=None, metadata={'name': '围城', 'author': '钱钟书'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='d4dfd37a-2039-4b85-a51e-0c8aee1b99db', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'name': '围城', 'author': '钱钟书'}, hash='56789afd35feb54babd9e0b0e5d2829f958d312139246f30ce119b1451af4a01')}, text='主人公方鸿渐留学回国后，面临找工作和个人感情的种种问题。他经历了几段感情波折，包括与鲍小姐的失败婚姻和与孙柔嘉的恋情，最终与孙柔嘉结婚。但婚后生活并不如意，他在事业上也遭遇挫折，未能实现自己的理想。', start_char_idx=0, end_char_idx=99, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.5774549345488187)]