# Obsidian 笔记时发现问题的测试

碰到的问题是：

- refresh doc，笔记改名或者移动目录，旧索引记录不被删除

总结：

- 修改doc内容，正常更新
- 未refresh的doc，不会删除
- doc修改了id，被当作新的doc加入

## 全局设置

In [1]:
%%time

from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.llm=Ollama(
    base_url="http://ape:11434",
    model="qwen2",
    is_chat_model=True,
    temperature=0.1,
    request_timeout=60.0
)

Settings.embed_model = OllamaEmbedding(
    model_name="quentinz/bge-large-zh-v1.5",
    base_url="http://ape:11434",
    ollama_additional_kwargs={"mirostat": 0}, # -mirostat N 使用 Mirostat 采样。
)

CPU times: user 2.16 s, sys: 272 ms, total: 2.44 s
Wall time: 2.06 s


## 手动创建文档

In [2]:
%%time

from llama_index.core import Document

documents = [
    Document( text="test1", doc_id="doc_01"),
    Document( text="test2", doc_id="doc_02"),
    Document( text="test3", doc_id="doc_03"),
]

documents[0].id_

CPU times: user 69 µs, sys: 0 ns, total: 69 µs
Wall time: 71.3 µs


'doc_01'

## 建立索引和存储

In [3]:
%%time

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)

CPU times: user 203 ms, sys: 32.3 ms, total: 235 ms
Wall time: 536 ms


In [4]:
%%time

index.storage_context.persist()

CPU times: user 12 ms, sys: 537 µs, total: 12.5 ms
Wall time: 11.8 ms


## 修改文档内容 - ok

In [5]:
%%time

documents = [
    Document( text="test1", doc_id="doc_01"),
    Document( text="test2", doc_id="doc_02"),
    Document( text="test300", doc_id="doc_03"),
]

status = index.refresh_ref_docs(documents)

status

CPU times: user 463 µs, sys: 4.05 ms, total: 4.51 ms
Wall time: 55.6 ms


[False, False, True]

## 删除文档 - 索引中仍存在

In [6]:
%%time

documents = [
    Document( text="test1", doc_id="doc_01"),
    Document( text="test2", doc_id="doc_02"),
    # Document( text="test3", doc_id="doc_0300"),
]

status = index.refresh_ref_docs(documents)

status

CPU times: user 93 µs, sys: 13 µs, total: 106 µs
Wall time: 109 µs


[False, False]

In [7]:
%%time

import json

file_path = './storage/docstore.json'

with open(file_path, 'r', encoding='utf-8') as file:
    data = json.load(file)

list(data.get('docstore/data', {}).keys())

CPU times: user 214 µs, sys: 30 µs, total: 244 µs
Wall time: 226 µs


['5c335419-aa2c-4cea-b35c-0a91957d6b6d',
 '2bf9031e-69ef-45e0-a4f5-50548ed62655',
 '89b6d84f-44d4-4fb1-8f90-a81a1ea63545']

In [8]:
%%time

index.storage_context.persist()

CPU times: user 11.8 ms, sys: 0 ns, total: 11.8 ms
Wall time: 11.1 ms


In [9]:
%%time

with open(file_path, 'r', encoding='utf-8') as file:
    data = json.load(file)

list(data.get('docstore/data', {}).keys())

CPU times: user 575 µs, sys: 0 ns, total: 575 µs
Wall time: 468 µs


['5c335419-aa2c-4cea-b35c-0a91957d6b6d',
 '2bf9031e-69ef-45e0-a4f5-50548ed62655',
 '9f0cfa71-0bac-4c1b-b45d-3c05f2d975a1']

## 文档变更id - 索引中仍存在

In [10]:
%%time

documents = [
    Document( text="test1", doc_id="doc_01"),
    Document( text="test2", doc_id="doc_002"),
]

status = index.refresh_ref_docs(documents)

status

CPU times: user 4.51 ms, sys: 0 ns, total: 4.51 ms
Wall time: 107 ms


[False, True]

In [11]:
%%time

index.storage_context.persist()

CPU times: user 11.6 ms, sys: 3.42 ms, total: 15 ms
Wall time: 14.7 ms


In [12]:
%%time

with open(file_path, 'r', encoding='utf-8') as file:
    data = json.load(file)

list(data.get('docstore/data', {}).keys())

CPU times: user 675 µs, sys: 94 µs, total: 769 µs
Wall time: 677 µs


['5c335419-aa2c-4cea-b35c-0a91957d6b6d',
 '2bf9031e-69ef-45e0-a4f5-50548ed62655',
 '9f0cfa71-0bac-4c1b-b45d-3c05f2d975a1',
 '73b7a675-8867-497a-a0c6-c096fb1f17f4']