# Knowledge Graphs for Complex Topics

**Table of contents**

- [Introduction](#Introduction)
    - [What is a knowledge graph](#What-is-a-knowledge-graph)
    - [Knowledge graph applications](#Knowledge-graph-applications)
- [Setup and Dependencies](#Setup-and-Dependencies)
- [Defining the structures](#Defining-the-structures)
    - [Node and Edge Classes](#Node-and-Edge-Classes)
    - [KnowledgeGraph Class](#KnowledgeGraph-Class)
- [Generating the Knowledge Graph](#Generating-the-Knowledge-Graph)
    - [generate_graph function](#generate_graph-function)
    - [visualize_knowledge_graph function](#visualize_knowledge_graph-function)
    - [Graph Visualization](#graph-visualization)
- [Advanced Iterative Graph Generation](#Advanced-Iterative-Graph-Generation)
    - [What are the benefits of this approach?](#What-are-the-benefits-of-this-approach?)
    - [What's different?](#What's-different?)
    - [Generate itrative graph](#Generate-itrative-graph)
    - [Examples Use Case](#Examples-Use-Case)
- [Conclusion](#Conclusion)


## Introduction

**What is a knowledge graph**

A knowledge graph, also known as a semantic network, represents a network of real-world entities—i.e. objects, events, situations, or concepts—and illustrates the relationship between them. This information is usually stored in a graph database and visualized as a graph structure, prompting the term knowledge “graph.”

A knowledge graph primarily consists of three elements: ``nodes``, ``edges``, and ``labels``. Nodes can represent any entity, be it an object, location, or individual. Edges establish the connection or relationship between these nodes. For instance, consider a node representing a popular author, "J.K. Rowling", and another node representing one of her books, "Harry Potter". The edge between these nodes could define the relationship as "author of", indicating that J.K. Rowling is the author of Harry Potter.

**Knowledge graph applications**

By using automated knowledge graphs, you can split hard topics into visually appealing and easy bits, making learning less scary and more helpful.

some of the widely used examples are:
- Search Engines: Knowledge graphs are used by search engines like Google to enhance search results with semantic-search information gathered from a wide variety of sources.
- Recommendation Systems: They are used in recommendation systems to suggest products or services based on user's behavior and preferences.
- Natural Language Processing: In NLP, knowledge graphs are used to understand and generate human language.
- Data Integration: Knowledge graphs help in integrating data from different sources by understanding the relationship between them.
- Artificial Intelligence and Machine Learning: They are used in AI and ML to provide context to data, which helps in better decision making.

----

## Setup and Dependencies

we're going to use the [`instructor`](https://github.com/jxnl/instructor) library to simplify the interaction between OpenAI and our code.

we are also going to use the [Graphviz](https://graphviz.org) library to bring structure to our intricate subjects and have a graph visualization.


In [None]:
!pip install instructor

In [2]:
import instructor 
from openai import OpenAI

client = instructor.patch(OpenAI())

Install the Graphviz based on your operation system https://graphviz.org/download/

## Defining the structures

### Node and Edge Classes

We begin by modeling our knowledge graph with Node and Edge objects.

Node objects represent key concepts or entities, while Edge objects signify the relationships between them.

In [5]:
from pydantic import BaseModel, Field
from typing import List

# The Node class represents key concepts or entities in our knowledge graph.
# Each node has an id, a label, and a color.
class Node(BaseModel):
    id: int
    label: str
    color: str

# The Edge class signifies the relationships between nodes in our knowledge graph.
# Each edge has a source node, a target node, a label, and a color.
# By default, the color of an edge is set to "black".
class Edge(BaseModel):
    source: int
    target: int
    label: str
    color: str = "black"

### KnowledgeGraph Class

The KnowledgeGraph class integrates the nodes and edges, forming a comprehensive structure of our graph.

It contains a list of nodes and a list of edges.

Each node represents a key concept or entity, and each edge represents a relationship between two nodes.

In [15]:
class KnowledgeGraph(BaseModel):
    nodes: List[Node] = Field(..., default_factory=list)  # A list of nodes in the knowledge graph.
    edges: List[Edge] = Field(..., default_factory=list)  # A list of edges in the knowledge graph.

## Generating the Knowledge Graph

### generate_graph function

The ``generate_graph`` function uses OpenAI's model to create a KnowledgeGraph object from an input string.

It requests the model to interpret the input as a detailed knowledge graph and uses the response to form the KnowledgeGraph object.

In [7]:
def generate_graph(input) -> KnowledgeGraph:
    return client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "user",
                "content": f"Help me understand the following by describing it as a detailed knowledge graph: {input}",
            }
        ],
        response_model=KnowledgeGraph,
    )

### visualize_knowledge_graph function

The `visualize_knowledge_graph` function visualizes a knowledge graph.

It accepts a `KnowledgeGraph` object as input, which includes nodes and edges.

The function uses the `graphviz` library to create a directed graph (`Digraph`).

Each node and edge from the `KnowledgeGraph` is added to the `Digraph` with their respective attributes (id, label, color).

The graph is then rendered and displayed.

In [13]:
from graphviz import Digraph

def visualize_knowledge_graph(kg: KnowledgeGraph):
    
    dot = Digraph(comment="Knowledge Graph")

    for node in kg.nodes:
        dot.node(str(node.id), node.label, color=node.color)
    for edge in kg.edges:
        dot.edge(str(edge.source), str(edge.target), label=edge.label, color=edge.color)

    dot.render("knowledge_graph.gv", view=True)

### Graph Visualization


Now we generate a knowledge graph for the query "Teach me about quantum mechanics" and visualize it.

You can then open this file to explore the key concepts and their relationships in quantum mechanics.

In [16]:
graph: KnowledgeGraph = generate_graph("Teach me about quantum mechanics")
visualize_knowledge_graph(graph)

![Image](https://jxnl.github.io/instructor/examples/knowledge_graph.png)


## Advanced Iterative Graph Generation

When dealing with extensive or segmented text inputs, processing them all at once might be challenging due to limitations in prompt length or the complexity of the content.

In such scenarios, an iterative approach to building the knowledge graph proves beneficial.

This method involves processing the text in smaller, manageable chunks, updating the graph with new information from each chunk.

### What are the benefits of this approach?

- Scalability: This approach can handle large datasets by breaking them down into smaller, more manageable pieces.

- Flexibility: It allows for dynamic updates to the graph, accommodating new information as it becomes available.

- Efficiency: Processing smaller chunks of text can be more efficient and less prone to errors or omissions.

### What's different?

The Previous example laid the foundation, while this new example will adds more complexity and functionality.

The Node and Edge classes have been augmented with a __hash__ method, enabling these objects to be used in sets, thereby making it easier to handle duplicates.

In [27]:
class Node(BaseModel):
    id: int
    label: str
    color: str

    def __hash__(self) -> int:
        return hash((id, self.label))
    
class Edge(BaseModel):
    source: int
    target: int
    label: str
    color: str = "black"

    def __hash__(self) -> int:
        return hash((self.source, self.target, self.label))

KnowledgeGraph Class now have ``update`` and ``draw`` methods.

The nodes and edges fields in the KnowledgeGraph class are now optional, providing more flexibility.

``update``: This method allows for the combination and deduplication of two graphs.

``draw``: includes a prefix parameter, facilitating the creation of different graph versions during iterations.

In [34]:
from typing import Optional

class KnowledgeGraph(BaseModel):
    # Optional list of nodes and edges in the knowledge graph
    nodes: Optional[List[Node]] = Field(..., default_factory=list)
    edges: Optional[List[Edge]] = Field(..., default_factory=list)

    def update(self, other: "KnowledgeGraph") -> "KnowledgeGraph":
        # This method updates the current graph with the other graph, deduplicating nodes and edges.
        return KnowledgeGraph(
            nodes=list(set(self.nodes + other.nodes)),  # Combine and deduplicate nodes
            edges=list(set(self.edges + other.edges)),  # Combine and deduplicate edges
        )

    def draw(self, prefix: str = None):
        # This method visualizes the knowledge graph using the Graphviz library
        dot = Digraph(comment="Knowledge Graph")

        # Add nodes to the graph
        for node in self.nodes:
            dot.node(str(node.id), node.label, color=node.color)

        # Add edges to the graph
        for edge in self.edges:
            dot.edge(
                str(edge.source), str(edge.target), label=edge.label, color=edge.color
            )
        
        # Render the graph as a .png file and display it
        dot.render(prefix, format="png", view=True)

### Generate itrative graph

The new ``generate_graph`` function is designed to handle a list of inputs iteratively, updating the graph with each new piece of information.

In [37]:
def generate_graph(input: List[str]) -> KnowledgeGraph:
    # Initialize an empty KnowledgeGraph
    cur_state = KnowledgeGraph()

    # Get the number of iterations
    num_iterations = len(input)

    # Iterate over the input list
    for i, inp in enumerate(input):
        new_updates = client.chat.completions.create(
            model="gpt-3.5-turbo-16k",
            messages=[
                {
                    "role": "system",
                    "content": """You are an iterative knowledge graph builder.
                    You are given the current state of the graph, and you must append the nodes and edges 
                    to it Do not procide any duplcates and try to reuse nodes as much as possible.""",
                },
                {
                    "role": "user",
                    "content": f"""Extract any new nodes and edges from the following:
                    # Part {i}/{num_iterations} of the input:

                    {inp}""",
                },
                {
                    "role": "user",
                    "content": f"""Here is the current state of the graph:
                    {cur_state.model_dump_json(indent=2)}""",
                },
            ],
            response_model=KnowledgeGraph,
        )  # type: ignore

        # Update the current state with the new updates
        cur_state = cur_state.update(new_updates)

        # Draw the current state of the graph
        cur_state.draw(prefix=f"iteration_{i}")
        
    # Return the final state of the KnowledgeGraph
    return cur_state


### Examples Use Case

In this approach, we process the text in manageable chunks, one at a time.

This method is particularly beneficial when dealing with extensive text that may not fit into a single prompt.

It is especially useful in scenarios such as constructing a knowledge graph for a complex topic, where the information is distributed across multiple documents or sections.

In [38]:
text_chunks = [
    "Jason knows a lot about quantum mechanics. He is a physicist. He is a professor",
    "Professors are smart.",
    "Sarah knows Jason and is a student of his.",
    "Sarah is a student at the University of Toronto. and UofT is in Canada.",
]

graph: KnowledgeGraph = generate_graph(text_chunks)

graph.draw(prefix="final")

![Image](https://i.ibb.co/802Zm88/Clean-Shot-2023-11-12-at-16-31-13.png)

## Conclusion

This tutorial demonstrated how to generate and visualize a knowledge graph for complex topics.

By using knowledge graphs, you can break down intricate subjects into understandable components, enhancing your learning experience.

You also learned about the advanced and iterative forms of generating knowledge graphs,

which bring significant improvements compared to the previous example, particularly in handling iterative updates and visualization.

These enhancements make the framework more robust and suitable for complex scenarios where data is received in parts or requires continuous updating.