# Knowledge Graph Construction with Personality Modeling

This notebook demonstrates how to:
1. Extract a knowledge graph from unstructured text
2. Enrich the graph with personality traits of subjects
3. Visualize and query the knowledge graph

We use LangChain with Google's Gemini API as the LLM backbone.

## 1. Environment Setup

First, let's install the required dependencies:

In [1]:
!pip install langchain langchain-google-genai langchain-experimental neo4j networkx pyvis python-dotenv

Collecting langchain-google-genai
  Downloading langchain_google_genai-3.0.0-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-experimental
  Downloading langchain_experimental-0.3.4-py3-none-any.whl.metadata (1.7 kB)
Collecting neo4j
  Downloading neo4j-6.0.2-py3-none-any.whl.metadata (5.2 kB)
Collecting pyvis
  Downloading pyvis-0.3.2-py3-none-any.whl.metadata (1.7 kB)
INFO: pip is looking at multiple versions of langchain-google-genai to determine which version is compatible with other requirements. This could take a while.
Collecting langchain-google-genai
  Downloading langchain_google_genai-2.1.12-py3-none-any.whl.metadata (7.1 kB)
Collecting google-ai-generativelanguage<1,>=0.7 (from langchain-google-genai)
  Downloading google_ai_generativelanguage-0.9.0-py3-none-any.whl.metadata (10 kB)
Collecting filetype<2,>=1.2 (from langchain-google-genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting langchain-community<0.4.0,>=0.3.0 (from langchain-

Import necessary libraries and set up environment variables:

In [None]:
import os
import json
import networkx as nx
from dotenv import load_dotenv
from pyvis.network import Network
from IPython.display import display, HTML

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain.graphs import NetworkxEntityGraph
from langchain.schema import Document
from langchain.chains import GraphCypherQAChain

# Load environment variables
load_dotenv()

# Set Google API key
GOOGLE_API_KEY = "xxx" # replace with your own

Initialize the Gemini model via LangChain:

In [13]:
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite", google_api_key=GOOGLE_API_KEY)

## 2. Document Preprocessing

Create functions to load and preprocess text documents:

In [4]:
def load_document(file_path):
    """Load document from file"""
    with open(file_path, "r", encoding="utf-8") as file:
        text = file.read()
    return text

def preprocess_text(text):
    """Basic text preprocessing to remove unwanted characters"""
    # Remove extra whitespace
    text = " ".join(text.split())
    return text

def chunk_text(text, chunk_size=4000, chunk_overlap=200):
    """Split text into manageable chunks"""
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    chunks = text_splitter.split_text(text)
    return chunks

## 3. Knowledge Graph Construction

Set up the graph transformer to extract entities and relationships:

In [5]:
def extract_knowledge_graph(text_chunks):
    """Extract knowledge graph from text chunks"""
    # Initialize graph transformer with Gemini LLM
    transformer = LLMGraphTransformer(
        llm=llm,
        node_properties=True,  # Extract node properties
    )

    # Process each chunk and accumulate results
    all_triples = []

    for i, chunk in enumerate(text_chunks):
        print(f"Processing chunk {i+1}/{len(text_chunks)}...")
        doc = Document(page_content=chunk)
        # Convert text to graph triples
        graph_documents = transformer.convert_to_graph_documents([doc])

        # Extract triples
        for graph_doc in graph_documents:
            # Access the triples using the correct attribute name, which is 'relationships'
            all_triples.extend([(r.source.id, r.type, r.target.id) for r in graph_doc.relationships])

    return all_triples

Create functions to build and store the graph in NetworkX:

In [6]:
def build_networkx_graph(triples):
    """Build NetworkX graph from triples"""
    G = nx.DiGraph()

    for subj, pred, obj in triples:
        # Add nodes if they don't exist
        if not G.has_node(subj):
            G.add_node(subj, label=subj)
        if not G.has_node(obj):
            G.add_node(obj, label=obj)

        # Add edge with relationship as attribute
        G.add_edge(subj, obj, label=pred)

    return G

def store_graph(graph, filename="knowledge_graph.graphml"):
    """Store graph to GraphML file"""
    nx.write_graphml(graph, filename)
    print(f"Graph saved to {filename}")

## 4. Personality Modeling

Create functions to extract personality traits using Gemini and add them to the graph:

In [17]:
def extract_personality_traits(text, person_name):
    """Extract personality traits for a given person using Gemini"""
    prompt = f"""
    From the following text, analyze the personality of {person_name}.
    Return a JSON object with the following Big Five personality traits and scores between 0 and 1:
    1. Openness (curiosity, creativity, openness to new experiences)
    2. Conscientiousness (organization, responsibility, work ethic)
    3. Extraversion (sociability, assertiveness, talkativeness)
    4. Agreeableness (kindness, cooperation, empathy)
    5. Neuroticism (anxiety, emotional instability, negative emotions)

    Also include a list of 3-5 key personality descriptors (adjectives).

    Format your response as valid JSON like this example:
    {{"openness": 0.8, "conscientiousness": 0.7, "extraversion": 0.6, "agreeableness": 0.5, "neuroticism": 0.3, "descriptors": ["creative", "organized", "friendly"]}}

    TEXT: {text}
    """

    response = llm.invoke(prompt)
    content = response.content

    # Extract JSON from response (handle potential formatting issues)
    try:
        # Try to extract JSON if it's embedded in text
        start_idx = content.find("{")
        end_idx = content.rfind("}") + 1
        json_str = content[start_idx:end_idx]
        traits = json.loads(json_str)
    except json.JSONDecodeError:
        print(f"Error parsing JSON for {person_name}. Using default values.")
        traits = {
            "openness": 0.5,
            "conscientiousness": 0.5,
            "extraversion": 0.5,
            "agreeableness": 0.5,
            "neuroticism": 0.5,
            "descriptors": ["unknown"]
        }

    # Ensure trait values are not None and are within the valid range [0, 1]
    for trait in ["openness", "conscientiousness", "extraversion", "agreeableness", "neuroticism"]:
        if trait not in traits or traits[trait] is None:
            traits[trait] = 0.5  # Default to 0.5 if missing or None
        else:
            # Ensure the value is a float and clamp it between 0 and 1
            try:
                traits[trait] = float(traits[trait])
                traits[trait] = max(0.0, min(1.0, traits[trait]))
            except (ValueError, TypeError):
                 traits[trait] = 0.5 # Default if conversion fails


    return traits

def identify_person_entities(graph):
    """Identify nodes that likely represent persons"""
    # Use Gemini to identify which nodes are persons
    persons = []
    nodes = list(graph.nodes())

    # Process in batches to avoid token limits
    batch_size = 20
    for i in range(0, len(nodes), batch_size):
        batch = nodes[i:i+batch_size]
        prompt = f"""
        From the following list of entities, identify which ones are likely to be persons (people).
        Return only the entities that represent people as a comma-separated list.

        Entities: {', '.join(batch)}
        """

        response = llm.invoke(prompt)
        # Filter out empty strings and entities not present in the graph
        batch_persons = [p.strip() for p in response.content.split(',') if p.strip() and p.strip() in nodes]
        persons.extend(batch_persons)

    # Remove duplicates
    persons = list(set(persons))
    return persons

def add_personality_to_graph(graph, text, persons=None):
    """Add personality traits to person entities in the graph"""
    if persons is None:
        persons = identify_person_entities(graph)

    print(f"Adding personality traits for {len(persons)} identified persons...")

    for person in persons:
        print(f"Processing personality for: {person}")
        traits = extract_personality_traits(text, person)

        # Add trait information as node attributes, ensuring values are not None
        for trait, value in traits.items():
            if trait != "descriptors" and value is not None:
                graph.nodes[person][trait] = value

        # Add descriptors, ensuring it's a string
        descriptors = traits.get("descriptors", [])
        if isinstance(descriptors, list):
             graph.nodes[person]["descriptors"] = ", ".join(descriptors)
        else:
             graph.nodes[person]["descriptors"] = str(descriptors) # Convert to string if not a list


        # Optionally add trait nodes and relationships
        for trait, value in traits.items():
            if trait != "descriptors" and value is not None:
                trait_node = f"{trait.capitalize()}"
                if not graph.has_node(trait_node):
                    graph.add_node(trait_node, label=trait_node)
                # Store the value as both label property and weight property for edge
                graph.add_edge(person, trait_node, label=f"has_{trait}", weight=value)

    return graph

## 5. Graph Visualization

Create functions to visualize the knowledge graph:

In [8]:
def visualize_graph(graph, output_file="knowledge_graph.html"):
    """Visualize graph using PyVis"""
    # Create PyVis network
    net = Network(notebook=True, width="100%", height="800px", directed=True)

    # Add nodes with labels and attributes
    for node, attrs in graph.nodes(data=True):
        # Prepare node attributes for visualization
        title = f"<b>{node}</b><br>"

        # Add personality traits to title if available
        personality_traits = ["openness", "conscientiousness", "extraversion", "agreeableness", "neuroticism"]
        for trait in personality_traits:
            if trait in attrs:
                title += f"{trait.capitalize()}: {attrs[trait]:.2f}<br>"

        if "descriptors" in attrs:
            title += f"Descriptors: {attrs['descriptors']}"

        # Determine if node is a person (has personality traits)
        is_person = any(trait in attrs for trait in personality_traits)

        # Choose node color based on entity type
        if is_person:
            color = "#ff6666"  # Red for persons
        elif any(trait == node for trait in [t.capitalize() for t in personality_traits]):
            color = "#66ff66"  # Green for traits
        else:
            color = "#6666ff"  # Blue for other entities

        # Add node to network
        net.add_node(node, label=attrs.get("label", node), title=title, color=color)

    # Add edges with labels
    for source, target, attrs in graph.edges(data=True):
        edge_label = attrs.get("label", "")
        edge_weight = attrs.get("weight", 1)

        # Use the trait score as label instead of "has_trait"
        if edge_label.startswith("has_"):
            # Use the weight (score) as the label
            edge_label = f"{edge_weight:.2f}"

        # Make personality trait edges thicker based on their value
        width = 1
        if attrs.get("label", "").startswith("has_"):
            width = edge_weight * 5  # Scale up for visibility

        net.add_edge(source, target, label=edge_label, width=width)

    # Configure physics to allow nodes to be dragged and stay in place
    net.set_options("""
    {
      "physics": {
        "enabled": false,
        "stabilization": {
          "iterations": 100,
          "fit": true
        },
        "barnesHut": {
          "gravitationalConstant": -2000,
          "centralGravity": 0.1,
          "springLength": 95,
          "springConstant": 0.04
        },
        "solver": "barnesHut"
      },
      "interaction": {
        "dragNodes": true,
        "navigationButtons": true
      },
      "layout": {
        "improvedLayout": true
      }
    }
    """)

    # Save and display
    net.save_graph(output_file)
    return display(HTML(f"<a href='{output_file}' target='_blank'>Open visualization in new tab</a>"))

## 6. Graph Querying

Create functions to query the knowledge graph:

In [9]:
def query_graph_llm(graph, query_text):
    """Query the knowledge graph using natural language and Gemini"""
    # Collect node information including personality traits
    node_info = []
    for node, attrs in graph.nodes(data=True):
        # Get basic node information
        info = f"Node: {node}"

        # Add personality traits if they exist
        personality_traits = ["openness", "conscientiousness", "extraversion", "agreeableness", "neuroticism"]
        trait_info = []
        for trait in personality_traits:
            if trait in attrs:
                trait_info.append(f"{trait}={attrs[trait]:.2f}")

        if trait_info:
            info += f" [Traits: {', '.join(trait_info)}]"

        # Add descriptors if they exist
        if "descriptors" in attrs:
            info += f" [Descriptors: {attrs['descriptors']}]"

        node_info.append(info)

    # Collect edge information
    edge_info = []
    for src, tgt, attrs in graph.edges(data=True):
        edge_label = attrs.get('label', 'related_to')
        edge_weight = attrs.get('weight', None)

        if edge_weight is not None:
            edge_info.append(f"({src})-[{edge_label} {edge_weight:.2f}]->({tgt})")
        else:
            edge_info.append(f"({src})-[{edge_label}]->({tgt})")

    # Create prompt with detailed graph information
    prompt = f"""
    You are given a knowledge graph with the following nodes and relationships.
    Answer the question based on this graph information.

    NODES (including personality traits when available):
    {chr(10).join(node_info)}

    RELATIONSHIPS:
    {chr(10).join(edge_info)}

    Question: {query_text}

    When answering questions about personality traits, use the numeric values to make comparisons.
    For example, if asked who is most extraverted, compare the extraversion scores.
    """

    response = llm.invoke(prompt)
    return response.content

def find_entities_by_trait(graph, trait, threshold=0.7):
    """Find entities with a high score for a specific personality trait"""
    results = []
    for node, attrs in graph.nodes(data=True):
        if trait.lower() in attrs and attrs[trait.lower()] >= threshold:
            results.append((node, attrs[trait.lower()]))

    # Sort by trait value (descending)
    results.sort(key=lambda x: x[1], reverse=True)
    return results

## 7. Complete Pipeline

Let's put everything together in a complete pipeline:

In [10]:
def build_knowledge_graph_with_personality(text_or_file, is_file=False):
    """Build a knowledge graph with personality modeling from text"""
    # Load document if it's a file
    if is_file:
        text = load_document(text_or_file)
        # Extract filename without path or extension for output file naming
        import os
        base_filename = os.path.splitext(os.path.basename(text_or_file))[0]
        graph_filename = f"{base_filename}_graph.graphml"
        html_filename = f"{base_filename}_graph.html"
    else:
        text = text_or_file
        graph_filename = "knowledge_graph.graphml"
        html_filename = "knowledge_graph.html"

    print("1. Preprocessing text...")
    processed_text = preprocess_text(text)

    print("2. Chunking text...")
    chunks = chunk_text(processed_text)
    print(f"   Created {len(chunks)} chunks")

    print("3. Extracting knowledge graph...")
    triples = extract_knowledge_graph(chunks)
    print(f"   Extracted {len(triples)} triples")

    print("4. Building graph...")
    graph = build_networkx_graph(triples)
    print(f"   Graph has {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")

    print("5. Adding personality traits...")
    graph = add_personality_to_graph(graph, processed_text)

    print("6. Saving and visualizing graph...")
    store_graph(graph, graph_filename)
    visualize_graph(graph, html_filename)

    return graph, processed_text

## 8. Example Usage

Let's test our pipeline with a sample text:

In [20]:
sample_text = """
John Smith is the CEO of TechInnovate. He's known for his visionary thinking and creative problem-solving approach.
John often works closely with Sarah Johnson, the CTO, who is highly analytical and detail-oriented.
Sarah joined the company in 2018 after leaving her role at DataSystems.

Michael Brown, TechInnovate's Head of Marketing, is outgoing and charismatic. He leads a team of 15 people
and reports directly to John. Michael previously worked at MediaCorp for 5 years.

TechInnovate was founded in 2015 and is headquartered in San Francisco. The company specializes in AI solutions
for healthcare and has partnerships with several major hospitals, including Metropolitan Hospital.

Dr. Emily Chen is the Chief Medical Officer at Metropolitan Hospital. She is calm, patient, and meticulous in her work.
Emily has been collaborating with TechInnovate on their latest AI diagnostic tool.
"""

sample_file = 'test.txt'

# Run the pipeline
graph, text = build_knowledge_graph_with_personality(sample_file, is_file = True)

1. Preprocessing text...
2. Chunking text...
   Created 1 chunks
3. Extracting knowledge graph...
Processing chunk 1/1...
   Extracted 52 triples
4. Building graph...
   Graph has 44 nodes and 46 edges
5. Adding personality traits...
Adding personality traits for 4 identified persons...
Processing personality for: Albert Einstein
Processing personality for: Franklin D. Roosevelt
Processing personality for: Adolf Hitler
Processing personality for: Satyendra Nath Bose
6. Saving and visualizing graph...
Graph saved to test_graph.graphml


Now let's query our graph to demonstrate some insights:

In [None]:
# Find people with high openness scores
open_people = find_entities_by_trait(graph, "openness", 0.7)
print("People with high openness scores:")
for person, score in open_people:
    print(f"- {person}: {score:.2f}")

# Find relationships between key people
query_result = query_graph_llm(graph, "What is the relationship between John Smith and Sarah Johnson?")
print("\nRelationship query result:")
print(query_result)

# Find personality insights
query_result = query_graph_llm(graph, "Who is the most extroverted person in the graph and what is their role?")
print("\nPersonality insight query result:")
print(query_result)

## 9. Conclusion

This notebook demonstrates a complete pipeline for:
1. Extracting a knowledge graph from unstructured text using LangChain and Gemini
2. Enriching the graph with personality traits
3. Visualizing and querying the resulting knowledge graph

The approach can be extended with:
- More sophisticated entity resolution
- Integration with external knowledge bases
- Implementing graph-based RAG for enhanced question answering
- Fine-tuning personality modeling for domain-specific applications