<a href="https://colab.research.google.com/github/AmirJlr/KnowledgeGraphs/blob/master/LLMKG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [28]:
from google.colab import userdata
import os

os.environ['GOOGLE_API_KEY'] = userdata.get('GOOGLE_API_KEY')

In [29]:
!pip install -qU langchain langchain-experimental langchain-google-genai pyvis

In [30]:
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_core.documents import Document
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(temperature=0, model="models/gemini-2.0-flash")

graph_transformer = LLMGraphTransformer(llm=llm)

In [31]:
text = """
Real Madrid Club de Fútbol, commonly referred to as Real Madrid, is a Spanish professional football club based in Madrid. The club competes in La Liga, the top tier of Spanish football.

Founded in 1902 as Madrid Football Club, the club has traditionally worn a white home kit since its inception. The honorific title 'Real' is Spanish for "Royal" and was bestowed to the club by King Alfonso XIII in 1920 along with the crown in the club crest. Real Madrid have played their home matches in the 78,297-capacity Santiago Bernabéu in Madrid since 1947. Unlike most European sporting clubs, Real Madrid's members (socios) have owned and operated the club throughout its history. The official Madrid anthem is the "Hala Madrid y nada más", written by RedOne and Manuel Jabois.[8] The club is one of the most widely supported in the world and is the most followed football club on social media according to the CIES Football Observatory as of 2024.[9][10] It was estimated to be worth $6.6 billion in 2024, making it the world's most valuable football club.[11] In 2024, Real Madrid became the first football club to make €1 billion ($1.08bn) in revenue according to the club's announcement.[12]

Real Madrid are one of the most successful football clubs in the world and most successful in Europe. In domestic football, the club has won 71 trophies; a record 36 La Liga titles, 20 Copa del Rey, 13 Supercopa de España, a Copa Eva Duarte and a Copa de la Liga.[13] In International football, Real Madrid have won a record 35 trophies: a record 15 European Cup/UEFA Champions League titles, a record six UEFA Super Cups, two UEFA Cups, a joint record two Latin Cups, a record one Iberoamerican Cup, and a record nine FIFA Club World championships.[note 1] Madrid was ranked joint first a record number of times in the International Federation of Football History & Statistics (IFFHS) Club World Ranking for the years 2000, 2002, 2014, 2017, and 2024.[17] In UEFA, Madrid ranks first in the all-time club ranking.[18][19]

Being one of the three founding members of La Liga that have never been relegated from the top division since its inception in 1929 (along with Athletic Bilbao and Barcelona), Real Madrid has many long-standing rivalries, most notably El Clásico with Barcelona and El Derbi Madrileño with Atlético Madrid. The club established itself as a major force in both Spanish and European football during the 1950s and 60s, winning five consecutive and six overall European Cups and reaching a further two finals. This success was replicated on the domestic front, with Madrid winning 12 league titles in 16 years. This team, which included Alfredo Di Stéfano, Ferenc Puskás, Paco Gento and Raymond Kopa is considered, by some in the sport, to be the greatest of all time.[20][21] Real Madrid is known for its Galácticos policy, which involves signing the world's best players, such as Ronaldo, Zinedine Zidane and David Beckham to create a superstar team.[22] The term 'Galácticos policy' generally refers to the two eras of Florentino Pérez's presidency of the club (2000–2006 and 2009–2018); however, players brought in just before his tenure are sometimes considered to be part of the Galácticos legacy. A notable example is Steve McManaman, who like many other players also succeeded under the policy.[23] On 26 June 2009, Madrid signed Cristiano Ronaldo for a record-breaking £80 million (€94 million);[24] he became both the club's and history's all-time top goalscorer.[25][26][27][28] Madrid have recently focused on signing young talents such as Vinícius Júnior, Rodrygo, and Jude Bellingham.[29]

Real Madrid is recognised as the greatest football club of the 20th century by FIFA and as the best European club during the same timeframe by the IFFHS,[30] while also receiving the FIFA Centennial Order of Merit in 2004.[31] Real Madrid has the highest participations in the European Cup/UEFA Champions League (55),[18] a tournament in which they hold the overall record for the most wins, most draws and most goals scored.[32] Real Madrid is the only club to have won three consecutive titles (three-peat) in the European Cup/UEFA Champions League twice, first in 1955–56, 1956–57, and 1957–58, and second in 2015–16, 2016–17 and 2017–18 and was the first and the only club to win La Décima (in 2013–14).[33] In June 2024, they won a record-extending 15th Champions League title (the sixth in eleven seasons), recognised as such by Guinness World Records.[34] Real Madrid is the first club across all of Europe's top-five leagues to win 100 trophies in all competitions.[35] As of February 2025, Real Madrid are ranked 1st in Europe, according to the UEFA club rankings as well as first in last 10 years (2013–2023) overall.[36][37]

"""

In [32]:
documents = [Document(page_content=text)]
graph_documents = await graph_transformer.aconvert_to_graph_documents(documents)

In [33]:
print(f"Nodes:{graph_documents[0].nodes}")
print(f"Relationships:{graph_documents[0].relationships}")

Nodes:[Node(id='Real Madrid Club De Fútbol', type='Organization', properties={}), Node(id='Madrid', type='City', properties={}), Node(id='La Liga', type='League', properties={}), Node(id='Madrid Football Club', type='Organization', properties={}), Node(id='King Alfonso Xiii', type='Person', properties={}), Node(id='Santiago Bernabéu', type='Stadium', properties={}), Node(id='Redone', type='Person', properties={}), Node(id='Manuel Jabois', type='Person', properties={}), Node(id='Cies Football Observatory', type='Organization', properties={}), Node(id='Athletic Bilbao', type='Organization', properties={}), Node(id='Barcelona', type='Organization', properties={}), Node(id='Atlético Madrid', type='Organization', properties={}), Node(id='Alfredo Di Stéfano', type='Person', properties={}), Node(id='Ferenc Puskás', type='Person', properties={}), Node(id='Paco Gento', type='Person', properties={}), Node(id='Raymond Kopa', type='Person', properties={}), Node(id='Ronaldo', type='Person', propert

## Visualization

In [34]:
import os
import webbrowser
from pyvis.network import Network

def visualize_graph(graph_documents, file_name):
    # Create network
    net = Network(height="1200px", width="100%", directed=True,
                  notebook=False, bgcolor="#222222", font_color="white")

    nodes = graph_documents[0].nodes
    relationships = graph_documents[0].relationships

    # Build lookup for valid nodes
    node_dict = {node.id: node for node in nodes}

    # Filter out invalid edges and collect valid node IDs
    valid_edges = []
    valid_node_ids = set()
    for rel in relationships:
        if rel.source.id in node_dict and rel.target.id in node_dict:
            valid_edges.append(rel)
            valid_node_ids.update([rel.source.id, rel.target.id])

    # Track which nodes are part of any relationship
    connected_node_ids = set()
    for rel in relationships:
        connected_node_ids.add(rel.source.id)
        connected_node_ids.add(rel.target.id)

    # Add valid nodes
    for node_id in valid_node_ids:
        node = node_dict[node_id]
        try:
            net.add_node(node.id, label=node.id, title=node.type, group=node.type)
        except Exception as e:
            print(f"Error adding node {node.id}: {str(e)}")

    # Add valid edges
    for rel in valid_edges:
        try:
            net.add_edge(rel.source.id, rel.target.id, label=rel.type.lower())
        except Exception as e:
            print(f"Error adding edge from {rel.source.id} to {rel.target.id}: {str(e)}")

    # Configure physics
    net.set_options("""
        {
            "physics": {
                "forceAtlas2Based": {
                    "gravitationalConstant": -100,
                    "centralGravity": 0.01,
                    "springLength": 200,
                    "springConstant": 0.08
                },
                "minVelocity": 0.75,
                "solver": "forceAtlas2Based"
            },
            "nodes": {
                "color": {
                    "highlight": {
                        "background": "rgba(255,0,0,0.5)",
                        "border": "rgba(255,0,0,1)"
                    },
                    "hover": {
                        "background": "rgba(0,255,0,0.5)",
                        "border": "rgba(0,255,0,1)"
                    }
                }
            }
        }
    """)

    net.save_graph(file_name)
    print(f"Graph saved to {os.path.abspath(file_name)}")

    # Try to open in browser
    try:
        webbrowser.open(f"file://{os.path.abspath(file_name)}")
    except Exception as e:
        print(f"Could not open browser automatically: {str(e)}")


# Example usage:
# visualize_graph(graph_documents)

In [35]:
# Run the function
visualize_graph(graph_documents, "graph_documents.html")

Graph saved to /content/graph_documents.html


# Extract specific types of nodes

In [36]:
allowed_nodes = ["Organization", "Location"]
graph_transformer_nodes_defined = LLMGraphTransformer(llm=llm, allowed_nodes=allowed_nodes)
graph_documents_nodes_defined = await graph_transformer_nodes_defined.aconvert_to_graph_documents(documents)

In [37]:
print(f"Nodes:{graph_documents_nodes_defined[0].nodes}")
print(f"Relationships:{graph_documents_nodes_defined[0].relationships}")

Nodes:[Node(id='Real Madrid', type='Organization', properties={}), Node(id='Madrid', type='Location', properties={}), Node(id='Santiago Bernabéu', type='Location', properties={}), Node(id='Fifa', type='Organization', properties={}), Node(id='Iffhs', type='Organization', properties={}), Node(id='Athletic Bilbao', type='Organization', properties={}), Node(id='Barcelona', type='Organization', properties={}), Node(id='Atlético Madrid', type='Organization', properties={})]
Relationships:[Relationship(source=Node(id='Real Madrid', type='Organization', properties={}), target=Node(id='Madrid', type='Location', properties={}), type='LOCATED_IN', properties={}), Relationship(source=Node(id='Santiago Bernabéu', type='Location', properties={}), target=Node(id='Madrid', type='Location', properties={}), type='LOCATED_IN', properties={}), Relationship(source=Node(id='Real Madrid', type='Organization', properties={}), target=Node(id='Santiago Bernabéu', type='Location', properties={}), type='HOME_STAD

In [38]:
visualize_graph(graph_documents_nodes_defined, "graph_documents_nodes_defined.html")

Graph saved to /content/graph_documents_nodes_defined.html


# Extract specific types of relationships

In [39]:
allowed_nodes = ["Person", "Organization", "Location"]

allowed_relationships = [
    ("Person", "WORKS_AT", "Organization"),
]
graph_transformer_rel_defined = LLMGraphTransformer(
  llm=llm,
  allowed_nodes=allowed_nodes,
  allowed_relationships=allowed_relationships
)
graph_documents_rel_defined = await graph_transformer_rel_defined.aconvert_to_graph_documents(documents)

In [40]:
graph_documents_rel_defined[0].nodes

[Node(id='Real Madrid', type='Organization', properties={}),
 Node(id='King Alfonso Xiii', type='Person', properties={}),
 Node(id='Redone', type='Person', properties={}),
 Node(id='Manuel Jabois', type='Person', properties={}),
 Node(id='Florentino Pérez', type='Person', properties={}),
 Node(id='Cristiano Ronaldo', type='Person', properties={}),
 Node(id='Zinedine Zidane', type='Person', properties={}),
 Node(id='David Beckham', type='Person', properties={}),
 Node(id='Steve Mcmanaman', type='Person', properties={}),
 Node(id='Vinícius Júnior', type='Person', properties={}),
 Node(id='Rodrygo', type='Person', properties={}),
 Node(id='Jude Bellingham', type='Person', properties={})]

In [41]:
graph_documents_rel_defined[0].relationships

[Relationship(source=Node(id='Redone', type='Person', properties={}), target=Node(id='Real Madrid', type='Organization', properties={}), type='WORKS_AT', properties={}),
 Relationship(source=Node(id='Manuel Jabois', type='Person', properties={}), target=Node(id='Real Madrid', type='Organization', properties={}), type='WORKS_AT', properties={}),
 Relationship(source=Node(id='Cristiano Ronaldo', type='Person', properties={}), target=Node(id='Real Madrid', type='Organization', properties={}), type='WORKS_AT', properties={}),
 Relationship(source=Node(id='Zinedine Zidane', type='Person', properties={}), target=Node(id='Real Madrid', type='Organization', properties={}), type='WORKS_AT', properties={}),
 Relationship(source=Node(id='David Beckham', type='Person', properties={}), target=Node(id='Real Madrid', type='Organization', properties={}), type='WORKS_AT', properties={}),
 Relationship(source=Node(id='Steve Mcmanaman', type='Person', properties={}), target=Node(id='Real Madrid', type='O

In [42]:
# Visualize graph
visualize_graph(graph_documents_rel_defined, "graph_documents_rel_defined.html")

Graph saved to /content/graph_documents_rel_defined.html
