# 使用 keyword index rake 方式

使用 obsidian 作为数据源。

评估 keyword table index `rake` 方式，是否可以作为 embedding 方式的补充。

初步结论：

- 效果不好，查询结果为空
- 原因是，rake 算法是为英文准备的，不适合中文

## 全局设置

In [1]:
%%time

from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.llm=Ollama(
    base_url="http://ape:11434",
    model="qwen2",
    is_chat_model=True,
    temperature=0.1,
    request_timeout=60.0
)

Settings.embed_model = OllamaEmbedding(
    model_name="quentinz/bge-large-zh-v1.5",
    base_url="http://ape:11434",
    ollama_additional_kwargs={"mirostat": 0}, # -mirostat N 使用 Mirostat 采样。
)

CPU times: user 2.08 s, sys: 244 ms, total: 2.33 s
Wall time: 1.94 s


In [4]:
%%time
%%capture

!pip install rake_nltk

CPU times: user 8.13 ms, sys: 4.51 ms, total: 12.6 ms
Wall time: 1.88 s


In [7]:
%%time
%%capture

import nltk

nltk.set_proxy("http://myproxy:7890")
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...


CPU times: user 82.9 ms, sys: 26.1 ms, total: 109 ms
Wall time: 1.43 s


[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


## 加载文档

In [2]:
%%time

from llama_index.readers.obsidian import ObsidianReader


documents = ObsidianReader(
    "/models/obsidian/kb/知识库"
).load_data() 

len(documents)

CPU times: user 148 ms, sys: 15.6 ms, total: 164 ms
Wall time: 167 ms


21

## 创建索引

In [8]:
%%time

from llama_index.core import RAKEKeywordTableIndex

index = RAKEKeywordTableIndex.from_documents(
    documents,
)

CPU times: user 27.6 ms, sys: 3.44 ms, total: 31 ms
Wall time: 30.8 ms


## 查询

In [9]:
%%time

query_engine = index.as_query_engine(
    streaming=True
)
streaming_response = query_engine.query("docker容器在后台执行的命令")
streaming_response.print_response_stream()

Empty ResponseCPU times: user 2.71 ms, sys: 347 µs, total: 3.06 ms
Wall time: 2.61 ms
