# 使用 keyword index default 方式

使用 obsidian 作为数据源。

评估 keyword table index `default` 方式，是否可以作为 embedding 方式的补充。

初步结论：

- 创建索引很慢，2个笔记耗时 `19.5 s`, 约为嵌入索引的5倍
- 查询效果没问题，耗时比嵌入检索的查询快

## 全局设置

In [1]:
%%time

from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.llm=Ollama(
    base_url="http://ape:11434",
    model="qwen2",
    is_chat_model=True,
    temperature=0.1,
    request_timeout=60.0
)

Settings.embed_model = OllamaEmbedding(
    model_name="quentinz/bge-large-zh-v1.5",
    base_url="http://ape:11434",
    ollama_additional_kwargs={"mirostat": 0}, # -mirostat N 使用 Mirostat 采样。
)

CPU times: user 2.17 s, sys: 240 ms, total: 2.41 s
Wall time: 2.03 s


## 加载文档

In [3]:
%%time

from llama_index.readers.obsidian import ObsidianReader


documents = ObsidianReader(
    "/models/obsidian/kb/知识库"
).load_data() 

len(documents)

CPU times: user 2.22 ms, sys: 0 ns, total: 2.22 ms
Wall time: 2.21 ms


21

## 创建索引

In [4]:
%%time

from llama_index.core import KeywordTableIndex

index = KeywordTableIndex.from_documents(
    documents
)

CPU times: user 1.29 s, sys: 27 ms, total: 1.32 s
Wall time: 19.5 s


## 查询

In [5]:
%%time

query_engine = index.as_query_engine(
    streaming=True
)
streaming_response = query_engine.query("docker容器在后台执行的命令")
streaming_response.print_response_stream()

docker容器在后台执行的命令是通过`docker run -d`来实现的。例如：

```bash
docker run -d --rm bash
```

这里的`-d`参数表示在后台启动容器，而`--rm`则表示运行完成后自动删除容器。所以这个命令会在后台启动一个bash shell，并且一旦完成操作或出现错误，该容器会自动被移除。CPU times: user 186 ms, sys: 4.18 ms, total: 190 ms
Wall time: 2.67 s


In [6]:
len(streaming_response.source_nodes)

10

In [7]:
streaming_response.source_nodes[0]

NodeWithScore(node=TextNode(id_='e4e3bdcd-5814-45a6-a70c-45219d9e414b', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='61f6ff28-ead9-45a5-8143-5bf6da72dbcd', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='c14c8b160d0f44e1fe479685616bd0d906b78ea2587e38eabfc96295ecca7f2d')}, text="docker 容器在后台执行\n\n```bash\n# tty=true -t\ndocker run -d -t --rm bash\n```\n\ndocker-compose.yaml:\n\n```yaml\nversion: '3'\nservices:\n  bash:\n    container_name: bash\n    image: bash:latest\n    tty: true\n```", mimetype='text/plain', start_char_idx=2, end_char_idx=198, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None)

In [8]:
streaming_response.source_nodes[5]

NodeWithScore(node=TextNode(id_='674f88e5-e6d2-4b5d-b7f4-cd5ce83d3872', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='aa2835a2-b111-46ca-8550-42bece16b290', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='84ad709a0fbb660867926152c5028e5e32e3c3556ad739945dc1431d6b076da2')}, text='Docker 加速相关配置', mimetype='text/plain', start_char_idx=2, end_char_idx=15, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None)