# Text Embeddings Example: Generating Text-Based Embeddings for Nodes and Edges

This notebook demonstrates how to:
1. Build a knowledge graph with nodes and edges
2. Generate text embeddings for nodes using their attributes (name, type, description)
3. Generate text embeddings for edges using their relationships (subject-predicate-object)
4. Store embeddings in a VectorStore
5. Query nodes and edges by semantic similarity using text

Unlike Node2Vec embeddings (which capture graph structure), text embeddings capture semantic meaning from the textual descriptions and relationships, making them ideal for finding nodes/edges with similar meanings or purposes.


In [1]:
import os
import json
from datetime import datetime
from pathlib import Path
from dotenv import load_dotenv
from spindle import (
    SpindleExtractor,
    create_ontology,
    GraphStore,
    ChromaVectorStore
)

# Load environment variables
load_dotenv()

# Check if required packages are available
try:
    from sentence_transformers import SentenceTransformer
    print("✓ sentence-transformers is available")
except ImportError:
    print("⚠ sentence-transformers not available, will use API fallback if configured")
    
print("✓ Required packages loaded")


⚠ sentence-transformers not available, will use API fallback if configured
✓ Required packages loaded


## Part 1: Building a Knowledge Graph

First, we need to build a knowledge graph with nodes and edges. We'll add nodes with descriptions and create relationships between them.


In [2]:
# Create a GraphStore and build a sample knowledge graph
graph_name = f"text_embeddings_example_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
print(f"Creating graph: {graph_name}")

with GraphStore(db_path=graph_name) as store:
    # Add nodes with descriptions
    store.add_node("Alice Johnson", "Person", description="Senior software engineer specializing in Python and machine learning")
    store.add_node("Bob Smith", "Person", description="Data scientist working on analytics and visualization")
    store.add_node("Carol Davis", "Person", description="Product manager focused on AI-powered features")
    store.add_node("David Chen", "Person", description="DevOps engineer managing cloud infrastructure")
    
    store.add_node("TechCorp", "Organization", description="Technology company developing AI solutions and cloud services")
    store.add_node("DataSystems Inc", "Organization", description="Data analytics company specializing in business intelligence")
    store.add_node("CloudTech", "Organization", description="Cloud services provider offering infrastructure and platform services")
    
    # Add edges to create relationships
    store.add_edge("Alice Johnson", "works_at", "TechCorp")
    store.add_edge("Bob Smith", "works_at", "TechCorp")
    store.add_edge("Carol Davis", "works_at", "TechCorp")
    store.add_edge("David Chen", "works_at", "CloudTech")
    
    store.add_edge("Bob Smith", "works_at", "DataSystems Inc")
    
    # Add some relationships between people
    store.add_edge("Alice Johnson", "knows", "Bob Smith")
    store.add_edge("Alice Johnson", "knows", "Carol Davis")
    store.add_edge("Bob Smith", "knows", "Carol Davis")
    
    print("✓ Graph created with nodes and edges")
    
    # Get statistics
    stats = store.get_statistics()
    print(f"\nGraph Statistics:")
    print(f"  Nodes: {stats['node_count']}")
    print(f"  Edges: {stats['edge_count']}")


Creating graph: text_embeddings_example_20251105_222818
✓ Graph created with nodes and edges

Graph Statistics:
  Nodes: 7
  Edges: 8


## Part 2: Generating Text Embeddings for Nodes

Now we'll create text embeddings for each node. We'll combine the node's name, type, and description into a text representation that captures its semantic meaning.


In [3]:
# Create VectorStore for storing text embeddings
vector_store = ChromaVectorStore(collection_name="text_embeddings")

def create_node_text(node: dict) -> str:
    """
    Create a text representation of a node for embedding generation.
    Combines name, type, and description into a semantic text string.
    """
    name = node.get('name', '')
    node_type = node.get('type', '')
    description = node.get('description', '')
    
    # Build text representation
    parts = [f"{name} is a {node_type}"]
    if description:
        parts.append(f"who/which {description}")
    
    return ". ".join(parts)

# Generate embeddings for all nodes
print("Generating text embeddings for nodes...")
print("-" * 70)

node_embeddings = {}
with GraphStore(db_path=graph_name) as store:
    # Get all nodes
    all_nodes_query = """
    MATCH (e:Entity)
    RETURN e.name, e.type, e.description
    """
    
    nodes_data = store.query_cypher(all_nodes_query)
    
    for node_row in nodes_data:
        node_name = node_row['e.name']
        node_info = {
            'name': node_name,
            'type': node_row.get('e.type', ''),
            'description': node_row.get('e.description', '')
        }
        
        # Create text representation
        node_text = create_node_text(node_info)
        
        # Generate embedding and store in VectorStore
        vector_index = vector_store.add(
            text=node_text,
            metadata={
                "type": "node",
                "entity_type": node_info['type'],
                "name": node_name,
                "embedding_method": "text",
                "text_representation": node_text
            }
        )
        
        node_embeddings[node_name] = vector_index
        
        print(f"✓ {node_name}: {node_text[:80]}...")
        print(f"  Vector Index: {vector_index[:30]}...")
        print()

print(f"\n✓ Generated embeddings for {len(node_embeddings)} nodes")


Generating text embeddings for nodes...
----------------------------------------------------------------------
✓ ALICE JOHNSON: ALICE JOHNSON is a Person. who/which Senior software engineer specializing in Py...
  Vector Index: e4c9a1df-7509-4184-bca0-3caf3c...

✓ BOB SMITH: BOB SMITH is a Person. who/which Data scientist working on analytics and visuali...
  Vector Index: 4e324cef-9a3e-4cba-9614-b6dacb...

✓ CAROL DAVIS: CAROL DAVIS is a Person. who/which Product manager focused on AI-powered feature...
  Vector Index: 588268c3-9a05-40ad-a7ca-d4125e...

✓ DAVID CHEN: DAVID CHEN is a Person. who/which DevOps engineer managing cloud infrastructure...
  Vector Index: 85ee65f8-d5c7-4653-a7a0-098462...

✓ TECHCORP: TECHCORP is a Organization. who/which Technology company developing AI solutions...
  Vector Index: 5b9b8ee1-4b67-4eb0-b68b-14088a...

✓ DATASYSTEMS INC: DATASYSTEMS INC is a Organization. who/which Data analytics company specializing...
  Vector Index: 969d3f30-2805-44da-8be8-8

## Part 3: Generating Text Embeddings for Edges

Now we'll create text embeddings for edges. We'll format each edge as a natural language statement (e.g., "Alice Johnson works at TechCorp") to capture the semantic meaning of the relationship.


In [4]:
def create_edge_text(edge: dict, subject_node: dict = None, object_node: dict = None) -> str:
    """
    Create a text representation of an edge for embedding generation.
    Formats as a natural language statement about the relationship.
    """
    subject = edge.get('subject', '')
    predicate = edge.get('predicate', '').lower().replace('_', ' ')
    obj = edge.get('object', '')
    
    # Format predicate for natural language
    # "works_at" -> "works at"
    # "knows" -> "knows"
    predicate_text = predicate
    
    # Build natural language statement
    # "Alice Johnson works at TechCorp"
    edge_text = f"{subject} {predicate_text} {obj}"
    
    # Optionally add context from node descriptions
    if subject_node and object_node:
        subject_desc = subject_node.get('description', '')
        object_desc = object_node.get('description', '')
        
        # Add context if available
        context_parts = []
        if subject_desc:
            context_parts.append(f"{subject} is {subject_desc}")
        if object_desc:
            context_parts.append(f"{obj} is {object_desc}")
        
        if context_parts:
            edge_text = f"{edge_text}. {' '.join(context_parts)}"
    
    return edge_text

# Generate embeddings for all edges
print("Generating text embeddings for edges...")
print("-" * 70)

edge_embeddings = {}
with GraphStore(db_path=graph_name) as store:
    # Get all edges
    all_edges = store.edges()
    
    for edge in all_edges:
        subject_name = edge['subject']
        object_name = edge['object']
        
        # Get node information for context
        subject_node = store.get_node(subject_name)
        object_node = store.get_node(object_name)
        
        # Create text representation
        edge_text = create_edge_text(edge, subject_node, object_node)
        
        # Generate embedding and store in VectorStore
        vector_index = vector_store.add(
            text=edge_text,
            metadata={
                "type": "edge",
                "subject": subject_name,
                "predicate": edge['predicate'],
                "object": object_name,
                "embedding_method": "text",
                "text_representation": edge_text
            }
        )
        
        edge_key = f"{subject_name}::{edge['predicate']}::{object_name}"
        edge_embeddings[edge_key] = vector_index
        
        print(f"✓ {edge['subject']} --[{edge['predicate']}]--> {edge['object']}")
        print(f"  Text: {edge_text[:100]}...")
        print(f"  Vector Index: {vector_index[:30]}...")
        print()

print(f"\n✓ Generated embeddings for {len(edge_embeddings)} edges")


Generating text embeddings for edges...
----------------------------------------------------------------------
✓ ALICE JOHNSON --[WORKS_AT]--> TECHCORP
  Text: ALICE JOHNSON works at TECHCORP. ALICE JOHNSON is Senior software engineer specializing in Python an...
  Vector Index: 2bddd8be-e4c3-4d3c-83a7-3a0df5...

✓ ALICE JOHNSON --[KNOWS]--> BOB SMITH
  Text: ALICE JOHNSON knows BOB SMITH. ALICE JOHNSON is Senior software engineer specializing in Python and ...
  Vector Index: 438ae0a3-56a1-4d4a-923b-27f91e...

✓ ALICE JOHNSON --[KNOWS]--> CAROL DAVIS
  Text: ALICE JOHNSON knows CAROL DAVIS. ALICE JOHNSON is Senior software engineer specializing in Python an...
  Vector Index: 08d7f824-2dda-48a0-917b-a9d2b6...

✓ BOB SMITH --[WORKS_AT]--> TECHCORP
  Text: BOB SMITH works at TECHCORP. BOB SMITH is Data scientist working on analytics and visualization TECH...
  Vector Index: d4d1dc51-ea3f-41df-95ac-82b98a...

✓ BOB SMITH --[WORKS_AT]--> DATASYSTEMS INC
  Text: BOB SMITH works at DATASYST

## Part 4: Querying Nodes by Text Similarity

We can now query nodes using natural language queries. The text embeddings will find nodes with similar semantic meaning.


In [5]:
# Query 1: Find nodes related to "software engineering"
print("Query 1: Finding nodes related to 'software engineering'")
print("-" * 70)

results = vector_store.query(
    text="software engineering and development",
    top_k=5,
    metadata_filter={"type": "node"}
)

print(f"Found {len(results)} similar node(s):\n")
for i, result in enumerate(results, 1):
    print(f"{i}. {result['metadata'].get('name', 'Unknown')} ({result['metadata'].get('entity_type', 'Unknown')})")
    print(f"   Text: {result['text'][:80]}...")
    print(f"   Similarity distance: {result['distance']:.4f} (lower = more similar)")
    print()

with GraphStore(db_path=graph_name) as store:
    # Show the actual node details
    if results:
        top_result = results[0]
        node_name = top_result['metadata'].get('name')
        if node_name:
            node = store.get_node(node_name)
            if node:
                print(f"Top result details:")
                print(f"  Name: {node.get('name')}")
                print(f"  Type: {node.get('type')}")
                print(f"  Description: {node.get('description', 'N/A')}")
                print()


Query 1: Finding nodes related to 'software engineering'
----------------------------------------------------------------------
Found 5 similar node(s):

1. DATASYSTEMS INC (Organization)
   Text: DATASYSTEMS INC is a Organization. who/which Data analytics company specializing...
   Similarity distance: 0.6473 (lower = more similar)

2. ALICE JOHNSON (Person)
   Text: ALICE JOHNSON is a Person. who/which Senior software engineer specializing in Py...
   Similarity distance: 0.6617 (lower = more similar)

3. DAVID CHEN (Person)
   Text: DAVID CHEN is a Person. who/which DevOps engineer managing cloud infrastructure...
   Similarity distance: 0.6983 (lower = more similar)

4. CAROL DAVIS (Person)
   Text: CAROL DAVIS is a Person. who/which Product manager focused on AI-powered feature...
   Similarity distance: 0.6990 (lower = more similar)

5. TECHCORP (Organization)
   Text: TECHCORP is a Organization. who/which Technology company developing AI solutions...
   Similarity distance: 0.70

In [6]:
# Query 2: Find nodes related to "data analytics"
print("Query 2: Finding nodes related to 'data analytics'")
print("-" * 70)

results = vector_store.query(
    text="data analytics and business intelligence",
    top_k=5,
    metadata_filter={"type": "node"}
)

print(f"Found {len(results)} similar node(s):\n")
for i, result in enumerate(results, 1):
    print(f"{i}. {result['metadata'].get('name', 'Unknown')} ({result['metadata'].get('entity_type', 'Unknown')})")
    print(f"   Text: {result['text'][:80]}...")
    print(f"   Similarity distance: {result['distance']:.4f}")
    print()


Query 2: Finding nodes related to 'data analytics'
----------------------------------------------------------------------
Found 5 similar node(s):

1. DATASYSTEMS INC (Organization)
   Text: DATASYSTEMS INC is a Organization. who/which Data analytics company specializing...
   Similarity distance: 0.3845

2. BOB SMITH (Person)
   Text: BOB SMITH is a Person. who/which Data scientist working on analytics and visuali...
   Similarity distance: 0.5584

3. CAROL DAVIS (Person)
   Text: CAROL DAVIS is a Person. who/which Product manager focused on AI-powered feature...
   Similarity distance: 0.6285

4. TECHCORP (Organization)
   Text: TECHCORP is a Organization. who/which Technology company developing AI solutions...
   Similarity distance: 0.6984

5. DAVID CHEN (Person)
   Text: DAVID CHEN is a Person. who/which DevOps engineer managing cloud infrastructure...
   Similarity distance: 0.7447



In [8]:
# Query 3: Find organizations
print("Query 3: Finding organizations")
print("-" * 70)

# ChromaDB requires $and operator for multiple conditions
results = vector_store.query(
    text="technology companies and organizations",
    top_k=5,
    metadata_filter={"$and": [{"type": "node"}, {"entity_type": "Organization"}]}
)

print(f"Found {len(results)} organization(s):\n")
for i, result in enumerate(results, 1):
    print(f"{i}. {result['metadata'].get('name', 'Unknown')}")
    print(f"   Text: {result['text'][:80]}...")
    print(f"   Similarity distance: {result['distance']:.4f}")
    print()


Query 3: Finding organizations
----------------------------------------------------------------------
Found 3 organization(s):

1. TECHCORP
   Text: TECHCORP is a Organization. who/which Technology company developing AI solutions...
   Similarity distance: 0.4403

2. CLOUDTECH
   Text: CLOUDTECH is a Organization. who/which Cloud services provider offering infrastr...
   Similarity distance: 0.5039

3. DATASYSTEMS INC
   Text: DATASYSTEMS INC is a Organization. who/which Data analytics company specializing...
   Similarity distance: 0.5643



## Part 5: Querying Edges by Text Similarity

We can also query edges using natural language to find relationships with similar semantic meaning.


In [9]:
# Query 1: Find employment relationships
print("Query 1: Finding employment relationships")
print("-" * 70)

results = vector_store.query(
    text="person works at company employment",
    top_k=5,
    metadata_filter={"type": "edge"}
)

print(f"Found {len(results)} similar edge(s):\n")
for i, result in enumerate(results, 1):
    metadata = result['metadata']
    print(f"{i}. {metadata.get('subject')} --[{metadata.get('predicate')}]--> {metadata.get('object')}")
    print(f"   Text: {result['text'][:100]}...")
    print(f"   Similarity distance: {result['distance']:.4f}")
    print()


Query 1: Finding employment relationships
----------------------------------------------------------------------
Found 5 similar edge(s):

1. BOB SMITH --[WORKS_AT]--> TECHCORP
   Text: BOB SMITH works at TECHCORP. BOB SMITH is Data scientist working on analytics and visualization TECH...
   Similarity distance: 0.6023

2. CAROL DAVIS --[WORKS_AT]--> TECHCORP
   Text: CAROL DAVIS works at TECHCORP. CAROL DAVIS is Product manager focused on AI-powered features TECHCOR...
   Similarity distance: 0.6274

3. DAVID CHEN --[WORKS_AT]--> CLOUDTECH
   Text: DAVID CHEN works at CLOUDTECH. DAVID CHEN is DevOps engineer managing cloud infrastructure CLOUDTECH...
   Similarity distance: 0.6300

4. ALICE JOHNSON --[WORKS_AT]--> TECHCORP
   Text: ALICE JOHNSON works at TECHCORP. ALICE JOHNSON is Senior software engineer specializing in Python an...
   Similarity distance: 0.6350

5. BOB SMITH --[WORKS_AT]--> DATASYSTEMS INC
   Text: BOB SMITH works at DATASYSTEMS INC. BOB SMITH is Data scientist wor

In [10]:
# Query 2: Find relationships at TechCorp
print("Query 2: Finding relationships involving TechCorp")
print("-" * 70)

results = vector_store.query(
    text="TechCorp relationships and connections",
    top_k=5,
    metadata_filter={"type": "edge"}
)

print(f"Found {len(results)} similar edge(s):\n")
for i, result in enumerate(results, 1):
    metadata = result['metadata']
    print(f"{i}. {metadata.get('subject')} --[{metadata.get('predicate')}]--> {metadata.get('object')}")
    print(f"   Text: {result['text'][:100]}...")
    print(f"   Similarity distance: {result['distance']:.4f}")
    print()


Query 2: Finding relationships involving TechCorp
----------------------------------------------------------------------
Found 5 similar edge(s):

1. BOB SMITH --[WORKS_AT]--> TECHCORP
   Text: BOB SMITH works at TECHCORP. BOB SMITH is Data scientist working on analytics and visualization TECH...
   Similarity distance: 0.4865

2. ALICE JOHNSON --[WORKS_AT]--> TECHCORP
   Text: ALICE JOHNSON works at TECHCORP. ALICE JOHNSON is Senior software engineer specializing in Python an...
   Similarity distance: 0.5295

3. CAROL DAVIS --[WORKS_AT]--> TECHCORP
   Text: CAROL DAVIS works at TECHCORP. CAROL DAVIS is Product manager focused on AI-powered features TECHCOR...
   Similarity distance: 0.5394

4. DAVID CHEN --[WORKS_AT]--> CLOUDTECH
   Text: DAVID CHEN works at CLOUDTECH. DAVID CHEN is DevOps engineer managing cloud infrastructure CLOUDTECH...
   Similarity distance: 0.5906

5. BOB SMITH --[WORKS_AT]--> DATASYSTEMS INC
   Text: BOB SMITH works at DATASYSTEMS INC. BOB SMITH is Data scien

In [11]:
# Query 3: Find social connections
print("Query 3: Finding social connections and relationships between people")
print("-" * 70)

results = vector_store.query(
    text="people know each other social connections",
    top_k=5,
    metadata_filter={"type": "edge"}
)

print(f"Found {len(results)} similar edge(s):\n")
for i, result in enumerate(results, 1):
    metadata = result['metadata']
    print(f"{i}. {metadata.get('subject')} --[{metadata.get('predicate')}]--> {metadata.get('object')}")
    print(f"   Text: {result['text'][:100]}...")
    print(f"   Similarity distance: {result['distance']:.4f}")
    print()


Query 3: Finding social connections and relationships between people
----------------------------------------------------------------------
Found 5 similar edge(s):

1. ALICE JOHNSON --[KNOWS]--> BOB SMITH
   Text: ALICE JOHNSON knows BOB SMITH. ALICE JOHNSON is Senior software engineer specializing in Python and ...
   Similarity distance: 0.7650

2. BOB SMITH --[KNOWS]--> CAROL DAVIS
   Text: BOB SMITH knows CAROL DAVIS. BOB SMITH is Data scientist working on analytics and visualization CARO...
   Similarity distance: 0.7803

3. ALICE JOHNSON --[KNOWS]--> CAROL DAVIS
   Text: ALICE JOHNSON knows CAROL DAVIS. ALICE JOHNSON is Senior software engineer specializing in Python an...
   Similarity distance: 0.7975

4. BOB SMITH --[WORKS_AT]--> TECHCORP
   Text: BOB SMITH works at TECHCORP. BOB SMITH is Data scientist working on analytics and visualization TECH...
   Similarity distance: 0.8101

5. BOB SMITH --[WORKS_AT]--> DATASYSTEMS INC
   Text: BOB SMITH works at DATASYSTEMS INC. BOB SM

## Part 6: Comparing Text Embeddings with Node2Vec Embeddings

Text embeddings and Node2Vec embeddings serve different purposes:

- **Text Embeddings**: Capture semantic meaning from descriptions and relationships. Good for finding nodes/edges with similar meanings or purposes based on their textual content.
- **Node2Vec Embeddings**: Capture structural patterns in the graph. Good for finding nodes with similar connection patterns regardless of their text attributes.

Let's see how they differ by comparing results.


In [12]:
# Compare: Text-based vs Structure-based similarity
print("Comparing Text Embeddings vs Node2Vec Embeddings")
print("=" * 70)

# Check if node2vec is available
try:
    import networkx as nx
    from node2vec import Node2Vec
    NODE2VEC_AVAILABLE = True
except ImportError:
    NODE2VEC_AVAILABLE = False
    print("⚠ Node2Vec not available (need: pip install node2vec networkx)")
    print("  Skipping Node2Vec comparison\n")

if NODE2VEC_AVAILABLE:
    # Create a separate vector store for Node2Vec embeddings
    node2vec_store = ChromaVectorStore(collection_name="node2vec_embeddings")
    
    with GraphStore(db_path=graph_name) as store:
        # Compute Node2Vec embeddings
        print("Computing Node2Vec embeddings...")
        node2vec_embeddings = store.compute_graph_embeddings(
            node2vec_store,
            dimensions=128,
            num_walks=10,
            walk_length=80
        )
        print(f"✓ Computed Node2Vec embeddings for {len(node2vec_embeddings)} nodes\n")
        
        # Query using text embeddings
        print("Text Embedding Query: 'software engineer'")
        print("-" * 70)
        text_results = vector_store.query(
            text="software engineer",
            top_k=3,
            metadata_filter={"type": "node"}
        )
        for i, result in enumerate(text_results, 1):
            print(f"{i}. {result['metadata'].get('name')} (distance: {result['distance']:.4f})")
        print()
        
        # Query using Node2Vec embeddings (structure-based)
        print("Node2Vec Query: Find nodes structurally similar to 'Alice Johnson'")
        print("-" * 70)
        alice = store.get_node("Alice Johnson")
        if alice and alice.get('vector_index'):
            # Get the embedding from node2vec store
            alice_emb = node2vec_store.get(alice['vector_index'])
            if alice_emb:
                # Query by embedding vector
                node2vec_results = node2vec_store.collection.query(
                    query_embeddings=[alice_emb['embedding']],
                    n_results=4,
                    where={"type": "node"},
                    include=["documents", "metadatas", "distances"]
                )
                
                if node2vec_results["ids"]:
                    for i in range(min(3, len(node2vec_results["ids"][0]))):
                        node_name = node2vec_results["metadatas"][0][i].get("name", "")
                        if node_name.upper() != "ALICE JOHNSON":
                            distance = node2vec_results["distances"][0][i]
                            print(f"{i+1}. {node_name} (distance: {distance:.4f})")
        
        print("\nNote: Text embeddings find semantically similar nodes,")
        print("      while Node2Vec finds structurally similar nodes.")
        
        # Cleanup
        node2vec_store.close()


Comparing Text Embeddings vs Node2Vec Embeddings
Computing Node2Vec embeddings...


Computing transition probabilities:   0%|          | 0/7 [00:00<?, ?it/s]

Generating walks (CPU: 1): 100%|██████████| 10/10 [00:00<00:00, 299.06it/s]


✓ Computed Node2Vec embeddings for 7 nodes

Text Embedding Query: 'software engineer'
----------------------------------------------------------------------
1. ALICE JOHNSON (distance: 0.5895)
2. DAVID CHEN (distance: 0.6736)
3. CAROL DAVIS (distance: 0.7007)

Node2Vec Query: Find nodes structurally similar to 'Alice Johnson'
----------------------------------------------------------------------
2. TECHCORP (distance: 0.0007)
3. BOB SMITH (distance: 0.0019)

Note: Text embeddings find semantically similar nodes,
      while Node2Vec finds structurally similar nodes.


## Part 7: Inspecting Embeddings

Let's inspect the embeddings we've created to understand their structure and properties.


In [13]:
# Inspect a node embedding
print("Inspecting Node Embedding:")
print("-" * 70)

with GraphStore(db_path=graph_name) as store:
    alice = store.get_node("Alice Johnson")
    if alice:
        # Get the embedding from our vector store
        if "ALICE JOHNSON" in node_embeddings:
            vector_index = node_embeddings["ALICE JOHNSON"]
            embedding_data = vector_store.get(vector_index)
            
            if embedding_data:
                print(f"Node: {alice['name']}")
                print(f"Type: {alice['type']}")
                print(f"Description: {alice.get('description', 'N/A')}")
                print(f"\nText Representation: {embedding_data['text']}")
                print(f"Vector Index: {vector_index[:30]}...")
                print(f"Embedding Dimension: {len(embedding_data['embedding'])}")
                print(f"Metadata: {json.dumps(embedding_data['metadata'], indent=2)}")
                print(f"\nFirst 10 embedding values: {[round(x, 4) for x in embedding_data['embedding'][:10]]}")
                print()

# Inspect an edge embedding
print("Inspecting Edge Embedding:")
print("-" * 70)

with GraphStore(db_path=graph_name) as store:
    # Get a sample edge
    edges = store.query_by_pattern(predicate="works_at")
    if edges:
        edge = edges[0]
        edge_key = f"{edge['subject']}::{edge['predicate']}::{edge['object']}"
        
        if edge_key in edge_embeddings:
            vector_index = edge_embeddings[edge_key]
            embedding_data = vector_store.get(vector_index)
            
            if embedding_data:
                print(f"Edge: {edge['subject']} --[{edge['predicate']}]--> {edge['object']}")
                print(f"\nText Representation: {embedding_data['text']}")
                print(f"Vector Index: {vector_index[:30]}...")
                print(f"Embedding Dimension: {len(embedding_data['embedding'])}")
                print(f"Metadata: {json.dumps(embedding_data['metadata'], indent=2)}")
                print(f"\nFirst 10 embedding values: {[round(x, 4) for x in embedding_data['embedding'][:10]]}")
                print()


Inspecting Node Embedding:
----------------------------------------------------------------------
Node: ALICE JOHNSON
Type: Person
Description: Senior software engineer specializing in Python and machine learning

Text Representation: ALICE JOHNSON is a Person. who/which Senior software engineer specializing in Python and machine learning
Vector Index: e4c9a1df-7509-4184-bca0-3caf3c...
Embedding Dimension: 1536
Metadata: {
  "text_representation": "ALICE JOHNSON is a Person. who/which Senior software engineer specializing in Python and machine learning",
  "embedding_method": "text",
  "type": "node",
  "entity_type": "Person",
  "name": "ALICE JOHNSON"
}

First 10 embedding values: [0.0143, -0.0303, 0.0426, 0.0294, 0.0279, -0.0067, 0.026, 0.0455, 0.0052, -0.0119]

Inspecting Edge Embedding:
----------------------------------------------------------------------
Edge: ALICE JOHNSON --[WORKS_AT]--> TECHCORP

Text Representation: ALICE JOHNSON works at TECHCORP. ALICE JOHNSON is Senior so

## Part 8: Summary

This notebook demonstrated:
1. **Building a knowledge graph** with nodes and edges
2. **Generating text embeddings for nodes** by combining name, type, and description
3. **Generating text embeddings for edges** by formatting relationships as natural language
4. **Querying by semantic similarity** using natural language queries
5. **Comparing text embeddings** with structure-based embeddings (Node2Vec)

### Key Takeaways

- **Text embeddings** are ideal for semantic similarity search based on descriptions and content
- **Edge embeddings** capture the meaning of relationships, not just the structure
- **Natural language queries** work well with text embeddings (e.g., "software engineer", "employment relationships")
- **Text embeddings** complement structure-based embeddings (Node2Vec) - use both for comprehensive analysis

### Use Cases

- **Semantic search**: Find nodes/edges by meaning ("find data scientists", "find employment relationships")
- **Content-based recommendations**: Suggest similar entities based on descriptions
- **Knowledge discovery**: Discover relationships with similar semantic meaning
- **Hybrid search**: Combine text embeddings with Node2Vec for comprehensive similarity analysis


In [14]:
# Cleanup (uncomment to run)
vector_store.close()

with GraphStore(db_path=graph_name) as store:
    store.delete_graph()
    print("Graph deleted")


Graph deleted
