# Overview

## 準備

必要なモジュールのimport

In [6]:
import sys
from pprint import pprint
sys.path.append('../src')
from graph import DataSet, KnowledgeGraph
from query_creator import RandomQueryCreator
from retriever import LLMRetriever

## 1.データセットの読み込み
知識グラフの各ノードとリレーションに自然文による説明が文が付与されたText Attributed Knowledge Graph (TAKG)のデータセットを読み込みます．データセットは[KG-BERT](https://github.com/yao8839836/kg-bert/tree/master)でされていたものを使います．

In [2]:
dataset = DataSet()
dataset.from_files('../data/umls')

datasetには教師データとテストデータが含まれています．

In [3]:
print(dataset.train)
print(dataset.test)

KnowledgeGraph with 135 nodes and 3105 edges
KnowledgeGraph with 131 nodes and 593 edges


ノードは名称(name)，説明(text)の3つの属性があります．

In [7]:
kg_train = dataset.train
pprint(kg_train.nodes[0])

{'name': 'idea_or_concept',
 'text': 'In philosophy, ideas are usually taken as mental representational '
         'images of some object. Ideas can also be abstract concepts that do '
         'not present as mental images. Many philosophers have considered '
         'ideas to be a fundamental ontological category of being. The '
         'capacity to create and understand the meaning of ideas is considered '
         'to be an essential and defining feature of human beings. In a '
         'popular sense, an idea arises in a reflexive, spontaneous manner, '
         'even without thinking or serious reflection, for example, when we '
         'talk about the idea of a person or a place. A new or original idea '
         'can often lead to innovation.'}


## 2. 知識グラフ埋込モデルの学習

知識グラフのノード属性の説明(text)を対象として埋込ベクトルを計算します．埋込ベクトルの計算にはOpenAIのAPIを用います．

In [8]:
dataset.set_text_embeddings()

calculate text embedding for relations connected to specified entities.
calculate text embedding for specified entities.


100%|██████████| 135/135 [00:39<00:00,  3.39it/s]


calculate text embedding for relations connected to specified entities.


100%|██████████| 44/44 [00:12<00:00,  3.53it/s]


calculate text embedding for relations connected to specified entities.
calculate text embedding for specified entities.


100%|██████████| 131/131 [00:44<00:00,  2.96it/s]


calculate text embedding for relations connected to specified entities.


100%|██████████| 36/36 [00:10<00:00,  3.44it/s]


知識グラフの構造と上記の埋込ベクトルに基づき，知識グラフ埋込モデルを計算します．知識グラフの埋込モデルの計算には[pykeen](https://github.com/pykeen/pykeen?tab=readme-ov-file)を使います．

In [9]:
# デバッグ中．
# dataset.train_graph_embeddings(DistMultLiteral, training_kwargs={'num_epochs':1})

## 3. クエリの作成

知識グラフ埋込の精度向上のために必要なトリプルを取得するためのクエリを作成します．

> まだランダムに作る手法しか実装していない → **このランダムは弱すぎる．もっと妥当なheadとrelationの組み合わせを出すようにする．**

In [11]:
query_creator = RandomQueryCreator()
queries = query_creator.create(dataset.train)

クエリはtailが欠損したトリプルです．

In [14]:
pprint(queries[0])

{'head_name': 'natural_phenomenon_or_process',
 'head_text': 'Types of natural phenomena include: Weather, fog, thunder, '
              'tornadoes; biological processes, decomposition, germination; '
              'physical processes, wave propagation, erosion; tidal flow, and '
              'natural disasters such as electromagnetic pulses, volcanic '
              'eruptions, and earthquakes.',
 'relation_name': 'method_of',
 'relation_text': 'method of'}


## 4. 生成AIによるTailの取得

生成AIを使ってTailのEntityを取得します．
> + 現在はvanillaなLLMを利用．
> + ハルシネーションが多いため，RAGを用いた手法に変える必要がある．
> + ただし，RAGで用いるデータの用意が必要．→　**トリプルとEntityの説明から疑似的なドキュメントを作ってしまう．**

In [17]:
retriever = LLMRetriever()
triples = retriever.complete_triples(queries)

response
{'name': 'Tail', 'args': {'tail_name': 'scientific_analysis', 'tail_text': 'Scientific analysis is a method used to study and understand natural phenomena or processes. It involves systematic observation, measurement, experimentation, and the formulation of hypotheses to explain the underlying mechanisms and causes of these phenomena.'}, 'id': 'call_9c8Ko5rWU8Hml0gFnfqiYAFS', 'type': 'tool_call'}
response
{'name': 'Tail', 'args': {'tail_name': 'Ukrainian Centre for Islamic Studies', 'tail_text': 'The Ukrainian Centre for Islamic Studies is an organization focused on the study and promotion of Islamic culture, history, and religion in Ukraine. It operates within the Kyiv Islamic Cultural Centre and collaborates with other Islamic organizations to provide educational resources and support for the Muslim community in Ukraine.'}, 'id': 'call_v2ERQ0I2vWeNzi5qK6oxRcVM', 'type': 'tool_call'}
response
{'name': 'Tail', 'args': {'tail_name': 'country', 'tail_text': "A country is a disti

Tailが補完されたトリプルが得られます．

In [18]:
pprint(triples[0])

{'head_name': 'natural_phenomenon_or_process',
 'head_text': 'Types of natural phenomena include: Weather, fog, thunder, '
              'tornadoes; biological processes, decomposition, germination; '
              'physical processes, wave propagation, erosion; tidal flow, and '
              'natural disasters such as electromagnetic pulses, volcanic '
              'eruptions, and earthquakes.',
 'relation_name': 'method_of',
 'relation_text': 'method of',
 'tail_name': 'scientific_analysis',
 'tail_text': 'Scientific analysis is a method used to study and understand '
              'natural phenomena or processes. It involves systematic '
              'observation, measurement, experimentation, and the formulation '
              'of hypotheses to explain the underlying mechanisms and causes '
              'of these phenomena.'}


## 5. 知識グラフ埋込モデルの更新

4で得られたトリプルを知識グラフに追加し，知識グラフ埋込モデルを更新します．（これで良くなっているはず）

In [None]:
dataset.train.add_triples(triples)
#作成中
#dataset.update_graph_embedding()