-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Do you need to file a feature request?
- I have searched the existing feature request and this feature request is not already filed.
- I believe this is a legitimate feature request, not just a question or bug.
Feature Request Description
将llm生成的实体、三元组,保存到本地.txt文档,并添加计数功能。
如果需要重复使用这些数据(例如后续分析、模型训练、知识图谱构建),保存到本地是必要的。
下面我将给出详细的代码生成步骤:
1、将LLM生成的实体,保存到本地.txt文档。提取实体并计数。将下列代码放到./dickens文件夹下即可。
import json
# 读取 JSON 文件
input_file = 'vdb_entities.json' # 替换为你的 JSON 文件路径
output_file = 'entity_names1.txt' # 输出的文本文件路径
# 打开 JSON 文件并加载数据
with open(input_file, 'r', encoding='utf-8') as f:
data = json.load(f)
# 提取所有 entity_name 的值
entity_names = [item['entity_name'].strip('"') for item in data['data']]
# 将 entity_name 写入文本文件
with open(output_file, 'w', encoding='utf-8') as f:
for name in entity_names:
f.write(name + '\n')
print(f"共提取了 {len(entity_names)} 个 entity_name,已保存到 {output_file}")2、将LLM生成的三元组,保存到本地.txt文档。提取三元组并计数。将下列代码放到./dickens文件夹下即可。
import xml.etree.ElementTree as ET
# 解析GraphML文件
tree = ET.parse('graph_chunk_entity_relation.graphml')
root = tree.getroot()
# 定义命名空间(GraphML 使用默认命名空间)
ns = {"g": "http://graphml.graphdrawing.org/xmlns"}
# 存储提取的关系
relations = []
# 遍历所有 <edge> 元素
for edge in root.findall(".//g:edge", ns):
source = edge.get("source").strip('"') # 去除多余的引号
target = edge.get("target").strip('"')
# 提取 d5(关系类型)
d5 = edge.find('.//g:data[@key="d5"]', ns)
if d5 is not None:
relation_type = d5.text.strip('"') # 去除引号
relations.append(f"({source}, {relation_type}, {target})")
# 写入 TXT 文件
with open("relations.txt", "w", encoding="utf-8") as f:
f.write("\n".join(relations))
print(f"已提取 {len(relations)} 条关系,并保存到 relations.txt")Additional Context
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request



