# 加载飞书文档查询

初步结论：

- 可以通过飞书文档实现查询
- 飞书文档查询可以回溯到文档链接
- 需要给出文档id，才可以加入文档

## 全局设置

In [2]:
%%time
%%capture

!pip install llama-index-readers-feishu-docs

CPU times: user 12.5 ms, sys: 7.52 ms, total: 20 ms
Wall time: 2.56 s


In [3]:
%%time

from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.llm=Ollama(
    base_url="http://ape:11434",
    model="qwen2",
    is_chat_model=True,
    temperature=0.1,
    request_timeout=60.0
)

Settings.embed_model = OllamaEmbedding(
    model_name="quentinz/bge-large-zh-v1.5",
    base_url="http://ape:11434",
    ollama_additional_kwargs={"mirostat": 0}, # -mirostat N 使用 Mirostat 采样。
)

CPU times: user 2.26 ms, sys: 3.8 ms, total: 6.06 ms
Wall time: 5.45 ms


## 加载文档

In [20]:
%%time

app_id = "cli_a6446f9440cad00c"
app_secret = "G0vGE6ZSWhj6rq1ae6UBlhwUViBuTRzY"
doc_ids = ["Kws2d8Y97oyu0wxadvvcwlbenNc"]

from llama_index.readers.feishu_docs import FeishuDocsReader

documents=FeishuDocsReader(app_id, app_secret).load_data(document_ids=doc_ids)

CPU times: user 8.1 ms, sys: 1.07 ms, total: 9.17 ms
Wall time: 500 ms


In [21]:
len(documents)

1

## 创建索引

In [22]:
%%time

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents
)

CPU times: user 20.1 ms, sys: 0 ns, total: 20.1 ms
Wall time: 1.32 s


## 查询

In [23]:
%%time

query_engine = index.as_query_engine(
    streaming=True
)
streaming_response = query_engine.query("docker容器在后台执行的命令")
streaming_response.print_response_stream()

docker容器在后台执行的命令是`docker run -d -t --rm bash`.CPU times: user 80.2 ms, sys: 1.99 ms, total: 82.2 ms
Wall time: 2.77 s


In [16]:
%%time

streaming_response = query_engine.query("重启docker的命令")
streaming_response.print_response_stream()

要重启Docker守护进程，可以使用以下命令：

```bash
sudo systemctl daemon-reload && sudo systemctl restart docker
```

这个命令首先重新加载系统启动配置文件（`daemon-reload`），然后重启Docker服务。CPU times: user 94.2 ms, sys: 19 ms, total: 113 ms
Wall time: 1.36 s


In [17]:
%%time

streaming_response = query_engine.query("docker怎么设置代理服务器")
streaming_response.print_response_stream()

在Docker中设置代理服务器，可以通过编辑`~/.docker/config.json`文件来实现。以下是一个示例配置：

```json
{
  "proxies": {
    "default": {
      "httpProxy": "http://10.1.1.100:8118",
      "httpsProxy": "http://10.1.1.100:8118",
      "noProxy": "localhost,*.test.example.com,.example2.com"
    }
  }
}
```

将上述配置添加到`~/.docker/config.json`文件中。这里的`httpProxy`和`httpsProxy`分别指定了HTTP和HTTPS代理服务器的地址，而`noProxy`则定义了不需要通过代理访问的域名或IP地址列表。

完成编辑后，重启Docker服务以使更改生效：

```bash
sudo systemctl daemon-reload && sudo systemctl restart docker
```

这样，Docker容器内的所有HTTP和HTTPS请求都将通过指定的代理服务器进行。CPU times: user 262 ms, sys: 26.3 ms, total: 289 ms
Wall time: 5.05 s


In [18]:
len(streaming_response.source_nodes)

2

In [19]:
streaming_response.source_nodes[0]

NodeWithScore(node=TextNode(id_='15df7a37-756c-4480-8adf-5d8f9b976dac', embedding=None, metadata={'document_id': 'Kws2d8Y97oyu0wxadvvcwlbenNc'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='d335c3bd-b64f-4d8c-a544-5e14996be5b1', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'document_id': 'Kws2d8Y97oyu0wxadvvcwlbenNc'}, hash='c2a181b6fbe00728ed32cd23a12724acaa6e25dc14c78dfe533d144a43ee820f'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='3823ba56-a2d5-4e19-bb34-8636e0945c0a', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='38ea7644d0fa218279b599e540d68d47665eb1a5499bf5af194218a923b6b0cd')}, text='Docker FAQ\nDocker 加速相关配置\n\ndockerfile 中常用加速\n\n# 使用nvidia作为基础镜像\nFROM nvidia/cuda:12.2.0-devel-ubuntu22.04\n\n# 设置时区环境变量\nENV TZ=Asia/Shanghai\n\n# 设置时区\n# 安装 python3/pip/jupyter相关\nRUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone \\\n    && sed -i s@/archi