# Kùzu图存储

本笔记本介绍了如何配置`Kùzu`作为LlamaIndex中图存储的后端。


In [None]:
%pip install llama-index
%pip install llama-index-llms-openai
%pip install llama-index-graph-stores-kuzu
%pip install pyvis

In [None]:
# 我的OpenAI密钥
import os

os.environ["OPENAI_API_KEY"] = "在这里输入API密钥"

## 为Kùzu做准备

### 1. 安装Kùzu

您可以通过以下命令安装Kùzu：

```bash
pip install kuzu
```

### 2. 导入Kùzu

在您的Python文件中，您可以使用以下代码导入Kùzu：

```python
import kuzu
```

### 3. 准备数据

确保您的数据已经准备好，可以被Kùzu所处理。


In [None]:
# 清理此笔记本中使用的所有目录
import shutil

shutil.rmtree("./test1", ignore_errors=True)
shutil.rmtree("./test2", ignore_errors=True)
shutil.rmtree("./test3", ignore_errors=True)

In [None]:
import kuzu

db = kuzu.Database("test1")

## 使用KuzuGraphStore的知识图谱

KuzuGraphStore是一个基于图数据库的知识图谱存储系统，它提供了一种灵活的方式来存储和查询知识图谱数据。在这个示例中，我们将学习如何使用KuzuGraphStore来构建和查询知识图谱。


In [None]:
from llama_index.graph_stores.kuzu import KuzuGraphStore

graph_store = KuzuGraphStore(db)

#### 构建知识图谱


In [None]:
from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from IPython.display import Markdown, display
import kuzu

In [None]:
documents = SimpleDirectoryReader(
    "../../../examples/data/paul_graham"
).load_data()

In [None]:
# 定义LLM

llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512

In [None]:

# 从llama_index.core导入StorageContext

storage_context = StorageContext.from_defaults(graph_store=graph_store)

# 注意：可能需要一段时间！
index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
    storage_context=storage_context,
)
# # 从现有图存储重新加载，而无需每次重新计算，请使用：
# index = KnowledgeGraphIndex(nodes=[], storage_context=storage_context)

#### 查询知识图谱

首先，我们可以查询并仅将三元组发送给LLM。


In [None]:
query_engine = index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)

In [None]:
display(Markdown(f"<b>{response}</b>"))

<b>Interleaf was involved in making software, added a scripting language, was inspired by Emacs, taught what not to do, built impressive technology, and made software that became obsolete and was replaced by a service. Additionally, Interleaf made software that could launch as soon as it was done and was affected by rapid changes in the industry.</b>

对于更详细的答案，我们还可以发送从中提取三元组的文本。


In [None]:
query_engine = index.as_query_engine(
    include_text=True, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)

In [None]:
display(Markdown(f"<b>{response}</b>"))

<b>Interleaf was a company that made software for creating documents. They added a scripting language inspired by Emacs, making it a dialect of Lisp. Despite having smart people and building impressive technology, Interleaf ultimately faced challenges due to the rapid advancements in technology, as they were affected by Moore's Law. The software they created could be launched as soon as it was done, and they made use of software that was considered slick in 1996. Additionally, Interleaf's experience taught valuable lessons about the importance of being run by product people rather than sales people in technology companies, the risks of editing code by too many people, the significance of office environment on productivity, and the impact of conventional office hours on optimal hacking times.</b>

#### 用嵌入进行查询


In [None]:
# 注意：可能需要一些时间！
db = kuzu.Database("test2")
graph_store = KuzuGraphStore(db)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
new_index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
    storage_context=storage_context,
    include_embeddings=True,
)

In [None]:
# 使用前3个三元组加关键词进行查询（重复的三元组将被移除）
query_engine = index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=5,
)
response = query_engine.query(
    "告诉我更多关于作者在Interleaf工作的内容",
)

In [None]:
display(Markdown(f"<b>{response}</b>"))

<b>The author worked on software at Interleaf, a company that made software for creating documents. The software the author worked on was an online store builder, which required a private launch before a public launch to recruit an initial set of users. The author also learned valuable lessons at Interleaf, such as the importance of having technology companies run by product people, the pitfalls of editing code by too many people, the significance of office environment on productivity, and the impact of big bureaucratic customers. Additionally, the author discovered that low-end software tends to outperform high-end software, emphasizing the importance of being the "entry level" option in the market.</b>

#### 可视化图表


In [None]:
## 创建图
from pyvis.network import Network

g = index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.show("kuzugraph_draw.html")

kuzugraph_draw.html


#### [可选] 尝试构建图并手动添加三元组！


In [None]:
from llama_index.core.node_parser import SentenceSplitter

In [None]:
node_parser = SentenceSplitter()

In [None]:
nodes = node_parser.get_nodes_from_documents(documents)

In [None]:
# 初始化一个空的数据库
db = kuzu.Database("test3")
graph_store = KuzuGraphStore(db)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
index = KnowledgeGraphIndex(
    [],
    storage_context=storage_context,
)

In [None]:
# 手动添加关键字映射和节点
# 添加三元组（主语，关系，宾语）

# 对于节点0
node_0_tups = [
    ("作者", "参与", "写作"),
    ("作者", "参与", "编程"),
]
for tup in node_0_tups:
    index.upsert_triplet_and_node(tup, nodes[0])

# 对于节点1
node_1_tups = [
    ("Interleaf", "为...制作软件", "创建文档"),
    ("Interleaf", "添加", "脚本语言"),
    ("软件", "生成", "网站"),
]
for tup in node_1_tups:
    index.upsert_triplet_and_node(tup, nodes[1])

In [None]:
query_engine = index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)

In [None]:
str(response)

'Interleaf was involved in creating documents and also added a scripting language to its software.'