Skip to content

Save LLM generated entities and triples locally, and add counting function for entities and triples!!! #1260

@Idol-Dou2021

Description

@Idol-Dou2021

Do you need to file a feature request?

  • I have searched the existing feature request and this feature request is not already filed.
  • I believe this is a legitimate feature request, not just a question or bug.

Feature Request Description

将llm生成的实体、三元组,保存到本地.txt文档,并添加计数功能。
如果需要重复使用这些数据(例如后续分析、模型训练、知识图谱构建),保存到本地是必要的。
下面我将给出详细的代码生成步骤:
1、将LLM生成的实体,保存到本地.txt文档。提取实体并计数。将下列代码放到./dickens文件夹下即可。

import json

# 读取 JSON 文件
input_file = 'vdb_entities.json'  # 替换为你的 JSON 文件路径
output_file = 'entity_names1.txt'  # 输出的文本文件路径

# 打开 JSON 文件并加载数据
with open(input_file, 'r', encoding='utf-8') as f:
    data = json.load(f)

# 提取所有 entity_name 的值
entity_names = [item['entity_name'].strip('"') for item in data['data']]

# 将 entity_name 写入文本文件
with open(output_file, 'w', encoding='utf-8') as f:
    for name in entity_names:
        f.write(name + '\n')

print(f"共提取了 {len(entity_names)} 个 entity_name,已保存到 {output_file}")

2、将LLM生成的三元组,保存到本地.txt文档。提取三元组并计数。将下列代码放到./dickens文件夹下即可。

import xml.etree.ElementTree as ET

# 解析GraphML文件
tree = ET.parse('graph_chunk_entity_relation.graphml')
root = tree.getroot()

# 定义命名空间(GraphML 使用默认命名空间)
ns = {"g": "http://graphml.graphdrawing.org/xmlns"}

# 存储提取的关系
relations = []

# 遍历所有 <edge> 元素
for edge in root.findall(".//g:edge", ns):
    source = edge.get("source").strip('"')  # 去除多余的引号
    target = edge.get("target").strip('"')
    
    # 提取 d5(关系类型)
    d5 = edge.find('.//g:data[@key="d5"]', ns)
    if d5 is not None:
        relation_type = d5.text.strip('"')  # 去除引号
        relations.append(f"({source}, {relation_type}, {target})")

# 写入 TXT 文件
with open("relations.txt", "w", encoding="utf-8") as f:
    f.write("\n".join(relations))

print(f"已提取 {len(relations)} 条关系,并保存到 relations.txt")

Additional Context

1)运行代码一、生成实体并计数
Image

Image
2)运行代码二、生成三元组并计数

Image

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions