# Knowledge Graph with Langchain

## Setup and Environment

In [None]:
%pip install --upgrade langchain langchain-experimental langchain-openai python-dotenv pyvis

In [6]:
from dotenv import load_dotenv
import os

load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")

## Initialize the LLM

In [7]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="openai/gpt-4.1-mini",
    api_key=api_key,
    base_url="https://models.github.ai/inference"
)

## Initialize the Graph Transformer

In [8]:
from langchain_experimental.graph_transformers import LLMGraphTransformer

graph_transformer = LLMGraphTransformer(llm=llm)

## Prepare the Input Text

In [13]:
text = """
Donald John Trump (born June 14, 1946) is an American politician, media personality, and businessman who is the 47th president of the United States. A member of the Republican Party, he served as the 45th president from 2017 to 2021.

Born into a wealthy New York City family, Trump graduated from the University of Pennsylvania in 1968 with a bachelor's degree in economics. He became the president of his family's real estate business in 1971, renamed it the Trump Organization, and began acquiring and building skyscrapers, hotels, casinos, and golf courses. He also launched side ventures, many licensing the Trump name, and filed for six business bankruptcies in the 1990s and 2000s. From 2004 to 2015, he hosted the reality television show The Apprentice, bolstering his fame as a billionaire. Presenting himself as a political outsider, Trump won the 2016 presidential election against Democratic Party nominee Hillary Clinton.

During his first presidency, Trump imposed a travel ban on seven Muslim-majority countries, expanded the Mexico–United States border wall, and enforced a family separation policy on the border. He rolled back environmental and business regulations, signed the Tax Cuts and Jobs Act, and appointed three Supreme Court justices. In foreign policy, Trump withdrew the U.S. from agreements on climate, trade, and Iran's nuclear program, and initiated a trade war with China. In response to the COVID-19 pandemic from 2020, he downplayed its severity, contradicted health officials, and signed the CARES Act. After losing the 2020 presidential election to Joe Biden, Trump attempted to overturn the result, culminating in the January 6 Capitol attack in 2021. He was impeached in 2019 for abuse of power and obstruction of Congress, and in 2021 for incitement of insurrection; the Senate acquitted him both times.

In 2023, Trump was found liable in civil cases for sexual abuse and defamation and for business fraud. He was found guilty in 34 counts of falsifying business records in 2024, making him the first U.S. president convicted of a felony. After winning the 2024 presidential election against then-vice president Kamala Harris, he was sentenced to a discharge, and two felony indictments against him for retention of classified documents and obstruction of the 2020 election were dismissed without prejudice.

Trump began his second presidency by initiating mass layoffs of federal workers. He imposed tariffs on nearly all countries at the highest level since the Great Depression and signed the One Big Beautiful Bill Act. His administration's actions—including the targeting of political opponents and civil society, the persecution of transgender people, the mass deportation of immigrants, and the extensive use of executive orders—have drawn over 300 lawsuits challenging their legality.

Since 2015, Trump's leadership style and political agenda—often referred to as Trumpism—have reshaped the Republican Party's identity. Many of his comments and actions have been characterized as racist or misogynistic. He has made many false or misleading statements during his campaigns and presidency, to a degree unprecedented in American politics. He promotes conspiracy theories. Trump's actions, especially in his second term, have been described as authoritarian and contributing to democratic backsliding. After his first term, scholars and historians ranked him as one of the worst presidents in American history."""

In [11]:
from langchain_core.documents import Document

documents = [Document(page_content=text)]

## Create the Knowledge Graph

In [18]:
graph_documents = await graph_transformer.aconvert_to_graph_documents(documents)

In [15]:
print(graph_documents[0].nodes)
print(graph_documents[0].relationships)

[Node(id='Donald John Trump', type='Person', properties={}), Node(id='United States', type='Location', properties={}), Node(id='Republican Party', type='Organization', properties={}), Node(id='University Of Pennsylvania', type='Organization', properties={}), Node(id='Trump Organization', type='Organization', properties={}), Node(id='The Apprentice', type='Creative work', properties={}), Node(id='Hillary Clinton', type='Person', properties={}), Node(id='Joe Biden', type='Person', properties={}), Node(id='Kamala Harris', type='Person', properties={}), Node(id='Covid-19 Pandemic', type='Event', properties={}), Node(id='Cares Act', type='Event', properties={}), Node(id='Tax Cuts And Jobs Act', type='Event', properties={}), Node(id='Trumpism', type='Concept', properties={}), Node(id='2020 Presidential Election', type='Event', properties={}), Node(id='2016 Presidential Election', type='Event', properties={}), Node(id='2024 Presidential Election', type='Event', properties={}), Node(id='Januar

## Visualize the Knowledge Graph

In [9]:
from pyvis.network import Network

def visualize_graph(graph_documents):

    # Create network
    net = Network(height="1200px", width="100%", directed=True,
                      notebook=False, bgcolor="#222222", font_color="white")
    
    nodes = graph_documents[0].nodes
    relationships = graph_documents[0].relationships

    # Build lookup for valid nodes
    node_dict = {node.id: node for node in nodes}
    
    # Filter out invalid edges and collect valid node IDs
    valid_edges = []
    valid_node_ids = set()
    for rel in relationships:
        if rel.source.id in node_dict and rel.target.id in node_dict:
            valid_edges.append(rel)
            valid_node_ids.update([rel.source.id, rel.target.id])


    # Track which nodes are part of any relationship
    connected_node_ids = set()
    for rel in relationships:
        connected_node_ids.add(rel.source.id)
        connected_node_ids.add(rel.target.id)

    # Add valid nodes
    for node_id in valid_node_ids:
        node = node_dict[node_id]
        try:
            net.add_node(node.id, label=node.id, title=node.type, group=node.type)
        except:
            continue  # skip if error

    # Add valid edges
    for rel in valid_edges:
        try:
            net.add_edge(rel.source.id, rel.target.id, label=rel.type.lower())
        except:
            continue  # skip if error

    # Configure physics
    net.set_options("""
            {
                "physics": {
                    "forceAtlas2Based": {
                        "gravitationalConstant": -100,
                        "centralGravity": 0.01,
                        "springLength": 200,
                        "springConstant": 0.08
                    },
                    "minVelocity": 0.75,
                    "solver": "forceAtlas2Based"
                }
            }
            """)
        
    output_file = "knowledge_graph.html"
    net.save_graph(output_file)
    print(f"Graph saved to {os.path.abspath(output_file)}")

    # Try to open in browser
    try:
        import webbrowser
        webbrowser.open(f"file://{os.path.abspath(output_file)}")
    except:
        print("Could not open browser automatically")
        

In [None]:
# Run the function
visualize_graph(graph_documents)

## Add Constraints

In [17]:
allowed_nodes = ["Person", "Organization", "Award"]
allowed_relationships = [("Person", "WORKS_AT", "Organization"), ("Person", "SPOUSE", "Person"), ("Person", "AWARD", "Award")]

graph_transformer_with_constraints = LLMGraphTransformer(
    llm=llm,
    allowed_nodes=allowed_nodes,
    allowed_relationships=allowed_relationships
)

graph_documents_constrained = await graph_transformer_with_constraints.aconvert_to_graph_documents(documents)

In [18]:
visualize_graph(graph_documents_constrained)

Graph saved to c:\Users\Induwara Gayashan\Knowledge_Graph_App\knowledge_graph.html


## Generate the Text from Wikipedia Search

In [None]:
%pip install wikipedia

In [22]:
import wikipedia

def search_wikipedia(query: str) -> str:
    """Search Wikipedia for a given query and return the complete summary."""
    try:
        summary = wikipedia.summary(query)
        return summary
    except Exception as e:
        return f"An error occurred while searching Wikipedia: {str(e)}"

In [23]:
summary = search_wikipedia("Albert Einstein")
print(summary)

Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist best known for developing the theory of relativity. Einstein also made important contributions to quantum theory. His mass–energy equivalence formula E = mc2, which arises from special relativity, has been called "the world's most famous equation". He received the 1921 Nobel Prize in Physics for "his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect".
Born in the German Empire, Einstein moved to Switzerland in 1895, forsaking his German citizenship (as a subject of the Kingdom of Württemberg) the following year. In 1897, at the age of seventeen, he enrolled in the mathematics and physics teaching diploma program at the Swiss federal polytechnic school in Zurich, graduating in 1900. He acquired Swiss citizenship a year later, which he kept for the rest of his life, and afterwards secured a permanent position at the Swiss Patent Office in Bern. In

In [24]:
new_documents = [Document(page_content=summary)]
new_graph_documents = await graph_transformer.aconvert_to_graph_documents(new_documents)

In [25]:
visualize_graph(new_graph_documents)

Graph saved to c:\Users\Induwara Gayashan\Knowledge_Graph_App\knowledge_graph.html


## Chunking Strategy

In [31]:
text = wikipedia.page("Donald John Trump").content
text



In [32]:
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter

# Estimate token count (roughly 4 characters per token)
estimated_tokens = len(text) // 4
    
# If text exceeds 3000 tokens (leaving buffer for system prompt), use chunking strategy
if estimated_tokens > 3000:
    # Split text into chunks with overlap to maintain context
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=10000,  # ~2500 tokens per chunk (leaving buffer for prompt/response)
        chunk_overlap=2000,  # ~500 tokens overlap
        length_function=len,
        separators=["\n\n", "\n", ". ", " ", ""]
    )
        
    chunks = text_splitter.split_text(text)
    
        
    documents = [Document(page_content=chunk) for chunk in chunks]
        
    # Process all chunks and merge results
    all_graph_docs = []
    for idx, doc in enumerate(documents, 1):
        graph_docs = await graph_transformer.aconvert_to_graph_documents([doc])
        all_graph_docs.extend(graph_docs)
    
visualize_graph(all_graph_docs)

Graph saved to c:\Users\Induwara Gayashan\Knowledge_Graph_App\knowledge_graph.html


## Create Knowledge Graph from a PDF

In [3]:
%pip install pypdf

Collecting pypdf
  Downloading pypdf-6.5.0-py3-none-any.whl.metadata (7.1 kB)
Downloading pypdf-6.5.0-py3-none-any.whl (329 kB)
Installing collected packages: pypdf
Successfully installed pypdf-6.5.0
Note: you may need to restart the kernel to use updated packages.


In [12]:
from langchain_classic.document_loaders import PyPDFLoader

loader = PyPDFLoader("attention.pdf")
docs = loader.load()

text = ""

for doc in docs:
    text += doc.page_content + "\n"

from langchain_classic.text_splitter import RecursiveCharacterTextSplitter

# Estimate token count (roughly 4 characters per token)
estimated_tokens = len(text) // 4
    
# If text exceeds 3000 tokens (leaving buffer for system prompt), use chunking strategy
if estimated_tokens > 3000:
    # Split text into chunks with overlap to maintain context
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=10000,  # ~2500 tokens per chunk (leaving buffer for prompt/response)
        chunk_overlap=2000,  # ~500 tokens overlap
        length_function=len,
        separators=["\n\n", "\n", ". ", " ", ""]
    )
        
    chunks = text_splitter.split_text(text)
    
        
    documents = [Document(page_content=chunk) for chunk in chunks]
        
    # Process all chunks and merge results
    all_graph_docs = []
    for idx, doc in enumerate(documents, 1):
        graph_docs = await graph_transformer.aconvert_to_graph_documents([doc])
        all_graph_docs.extend(graph_docs)
    
visualize_graph(all_graph_docs)


Graph saved to c:\Users\Induwara Gayashan\Knowledge_Graph_App\knowledge_graph.html
