# 1. Knowledge Graphs

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ramshreyas/Ontodidact/blob/main/YouTube/1_Knowledge_Graphs/notebooks/1_Knowledge_Graphs.ipynb)

---

### Setup & Imports

In [None]:
%pip install llama-index networkx matplotlib pyvis llama-index-graph-stores-neo4j llama-index-embeddings-huggingface

In [None]:
# General imports
import os
from pprint import pprint
import dotenv
dotenv.load_dotenv()

# Async
import nest_asyncio
nest_asyncio.apply()

---

### A toy knowledge graph

A knowledge graphs represents **knowledge** in the form of *entities* and the **relationships** between them. Let's use the family tree of House Lannister from Game of Thrones as an example:

In [None]:
# Import network visualitzation libraries
import networkx as nx
import matplotlib.pyplot as plt

Create a directed graph

In [None]:
G = nx.DiGraph()

Add nodes for each member of the Lannister family - these are our **entities**

In [None]:
characters = ["Tywin", "Joanna", "Cersei", "Jaime", "Tyrion", "Joffrey", "Myrcella", "Tommen"]
G.add_nodes_from(characters)

Now let's add their **relationships**. 

Here we are only considering parent-child relationships, so this would essentially represent a family tree.

In [None]:
# Adding relationships
relationships = [
    ("Tywin", "Cersei"), ("Joanna", "Cersei"),
    ("Tywin", "Jaime"), ("Joanna", "Jaime"),
    ("Tywin", "Tyrion"), ("Joanna", "Tyrion"),
    ("Cersei", "Joffrey"), ("Jaime", "Joffrey"),
    ("Cersei", "Myrcella"), ("Jaime", "Myrcella"),
    ("Cersei", "Tommen"), ("Jaime", "Tommen")
]
G.add_edges_from(relationships)

Visualize the House Lannister

In [None]:
# Adding relationships
relationships = [
    ("Tywin", "Cersei"), ("Joanna", "Cersei"),
    ("Tywin", "Jaime"), ("Joanna", "Jaime"),
    ("Tywin", "Tyrion"), ("Joanna", "Tyrion"),
    ("Cersei", "Joffrey"), ("Jaime", "Joffrey"),
    ("Cersei", "Myrcella"), ("Jaime", "Myrcella"),
    ("Cersei", "Tommen"), ("Jaime", "Tommen")
]
G.add_edges_from(relationships)

# Manually set positions for a family tree layout
pos = {
    "Tywin": (0.5, 1),
    "Joanna": (1.5, 1),
    "Cersei": (0, 0.5),
    "Jaime": (1, 0.5),
    "Tyrion": (2, 0.5),
    "Joffrey": (0, 0),
    "Myrcella": (1, 0),
    "Tommen": (2, 0)
}

# Draw nodes and edges
nx.draw(G, pos, with_labels=True, node_color='lightblue', edge_color='gray', node_size=3000, font_size=10, font_color="black")

plt.title('Lannister Family Tree')
plt.axis('off')  # Turn off the axis
plt.show()

So this family tree represents a particular piece of knowledge. 

It contains *explicit* knowledge - that Jaime (illegitimately) fathered Joffrey.

It also contains *implicit* knowledge - that Cersei and Jaime are siblings, which is not directly represented as a relationship between the entities - more on this later.

In this visualization, entities are represented by nodes or *verteces*, and relationships are represented by the lines or *edges* connecting them.

This is a simple knowledge graph.

---

### LLMs can extract knowledge graphs directly from text

Now I'm sure you'll agree that was a bit tedious. Listing our relationships explicitly and building knowledge graphs can get overwhelming very quickly. 

The good news is that LLMs and LlamaIndex can help! This is where things get really interesting. 

Let's load a text description of house Lannister, which we will use as a source to create a knowledge graph without OpenAI and LlamaIndex.

In [None]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/").load_data()

In [None]:
pprint(documents[0].get_text())

Construct the knowledge graph index for the loaded documents using PropertyGraphIndex

In [None]:
from typing import Literal
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

# best practice to use upper-case
entities = Literal["PERSON"]
relations = Literal["CHILD_OF", 'PARENT_OF', "SIBLING_OF", "SPOUSE_OF"]

# define which entities can have which relations
validation_schema = {
    "PERSON": ["CHILD_OF", "PARENT_OF", "SIBLING_OF", "SPOUSE_OF"],
}

kg_extractor = SchemaLLMPathExtractor(
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    # if false, allows for values outside of the schema
    # useful for using the schema as a suggestion
    strict=True,
)

Initialize a Graph Store and an empty vector store

In [None]:
from llama_index.graph_stores.neo4j import Neo4jPGStore

graph_store = Neo4jPGStore(
    username="neo4j",
    password=os.getenv("NEO4J_PASSWORD"),
    url="bolt://localhost:7687",
)
vec_store = None

We can now create a PropertyGraph Index which will generate our knowledge graph automatically from the text!

In [None]:
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

index = PropertyGraphIndex.from_documents(
    documents,
    kg_extractors=[kg_extractor],
    embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    property_graph_store=graph_store,
    vector_store=vec_store,
    show_progress=True,
)

---

### Viewing the knowledge graph with Neo4j (is awesome)

---

### Querying the knowledge graph (is deterministic)

Now let's create LLM-powered retrievers to query the knowledge graph using natural language.

In [None]:
from llama_index.core.indices.property_graph import (
    LLMSynonymRetriever,
    VectorContextRetriever,
)


llm_synonym = LLMSynonymRetriever(
    index.property_graph_store,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    include_text=False,
)

vector_context = VectorContextRetriever(
    index.property_graph_store,
    embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    include_text=False,
)

In [None]:
retriever = index.as_retriever(
    sub_retrievers=[
        llm_synonym,
        vector_context,
    ]
)

Test the retriever

In [None]:
nodes = retriever.retrieve("Who is Tommen's mother?")

for node in nodes:
    print(node.text)

Knowledge Graph query engine

In [None]:
query_engine = index.as_query_engine(
    sub_retrievers=[
        llm_synonym,
        vector_context,
    ],
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
)

response = query_engine.query("Who are Tyrion's siblings?")

print(str(response))

---