#### About

> Knowledge Graph Construction

Knowledge graph construction is the process of creating a graph that represents knowledge in a particular domain. It involves identifying entities and their relationships and representing them in a structured format. This graph can be used for various applications such as question answering, recommendation systems, and natural language processing.


Defining a set of sentences that describe relationships between entities

In [12]:
training_data = [
    ("John is married to Jane", "marriage", "John", "Jane"),
    ("Jane is the mother of Tom", "parent-child", "Jane", "Tom"),
    ("Tom is the son of Jane", "parent-child", "Jane", "Tom"),
    ("John and Jane have two children", "parent-child", "John", "child1"),
    ("John and Jane have two children", "parent-child", "Jane", "child2")
]


In [25]:
from keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np


In [26]:
# Define the maximum length of a sentence
MAX_SEQUENCE_LENGTH = 100

# Create a tokenizer to convert words to integers
tokenizer = Tokenizer()
tokenizer.fit_on_texts([x[0] for x in training_data])


In [27]:
# Convert the sentences to sequences of integers
sequences = tokenizer.texts_to_sequences([x[0] for x in training_data])

# Pad the sequences to a fixed length
padded_sequences = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)

# Create one-hot vectors for the relationship types
labels = np.zeros((len(training_data), 3))
for i, (_, rel_type, _, _) in enumerate(training_data):
    if rel_type == "marriage":
        labels[i, 0] = 1
    elif rel_type == "parent-child":
        labels[i, 1] = 1
    else:
        labels[i, 2] = 1


In [28]:
# Create one-hot vectors for the entities
entities = np.zeros((len(training_data), 4))
for i, (_, _, entity1, entity2) in enumerate(training_data):
    if entity1 == "John":
        entities[i, 0] = 1
    elif entity1 == "Jane":
        entities[i, 1] = 1
    elif entity1 == "Tom":
        entities[i, 2] = 1
    else:
        entities[i, 3] = 1
    if entity2 == "John":
        entities[i, 0] = 1
    elif entity2 == "Jane":
        entities[i, 1] = 1
    elif entity2 == "Tom":
        entities[i, 2] = 1
    else:
        entities[i, 3] = 1


In [29]:
from keras.layers import Input, Embedding, LSTM, Dense, concatenate
from keras.models import Model


In [30]:
# Define the input layers
sentence_input = Input(shape=(MAX_SEQUENCE_LENGTH,))
entity_input = Input(shape=(4,))



In [31]:
# Define the embedding layer for the sentences
embedding_layer = Embedding(len(tokenizer.word_index) + 1, 100, input_length=MAX_SEQUENCE_LENGTH)(sentence_input)

# Define the LSTM layer for the sentences
lstm_layer = LSTM(100)(embedding_layer)

# Define the dense layer for the relationship types
rel_type_layer = Dense(3, activation='softmax')(lstm_layer)

# Define the dense layer for the entities
entity_layer = Dense(4, activation='softmax')(entity_input)

# Concatenate the output of the LSTM layer and the entity layer
merged_layer = concatenate([lstm_layer, entity_layer])

# Define the output layer for the knowledge graph
kg_layer = Dense(1, activation='sigmoid')(merged_layer)

# Define the Keras model
model = Model(inputs=[sentence_input, entity_input], outputs=[rel_type_layer, kg_layer])


2023-05-15 19:17:12.176223: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-05-15 19:17:12.178261: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-05-15 19:17:12.185506: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You mus

In [32]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


In [15]:
model.fit([padded_sequences, entities], [labels, np.zeros((len(training_data), 1))], epochs=10, batch_size=32)


Epoch 1/10


2023-05-15 19:07:25.222502: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-05-15 19:07:25.225215: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-05-15 19:07:25.226997: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You mus

Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f9cd0775e80>