In [13]:

!pip install -q langchain langchain-groq langchain-community neo4j sentence-transformers python-dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m26.0[0m[39;49m -> [0m[32;49m26.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


# üõçÔ∏è GraphRAG for E-Commerce ‚Äî Knowledge Graph + RAG Pipeline

> **Stack:** Neo4j ¬∑ Groq (LLaMA-3) ¬∑ HuggingFace Embeddings ¬∑ LangChain

## What This Project Does
This notebook builds a **GraphRAG system** for an e-commerce product catalog. It combines:
- **Neo4j** as a graph database to store products, categories, brands, and their relationships
- **HuggingFace Embeddings** for semantic vector search
- **Groq (LLaMA-3)** to generate Cypher queries from natural language *and* to answer questions
- **Graph Traversal** to enrich RAG context with related nodes (brand ‚Üí products, category ‚Üí subcategories)

## Architecture
```
User Question
     ‚îÇ
     ‚îú‚îÄ‚îÄ‚ñ∫ Groq: Generate Cypher Query ‚îÄ‚îÄ‚ñ∫ Neo4j Graph Traversal ‚îÄ‚îÄ‚îê
     ‚îÇ                                                             ‚îú‚îÄ‚îÄ‚ñ∫ Merged Context ‚îÄ‚îÄ‚ñ∫ Groq: Final Answer
     ‚îî‚îÄ‚îÄ‚ñ∫ HuggingFace: Vector Search  ‚îÄ‚îÄ‚ñ∫ Semantic Results ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

## Setup Requirements
```
pip install langchain langchain-groq langchain-community neo4j sentence-transformers python-dotenv
```
You'll need:
- `GROQ_API_KEY` ‚Üí https://console.groq.com
- `NEO4J_URI`, `NEO4J_USERNAME`, `NEO4J_PASSWORD` ‚Üí https://neo4j.com/cloud/platform/aura-graph-database (free tier)

In [14]:
import os
import json
from typing import Any

from dotenv import load_dotenv
from neo4j import GraphDatabase
from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
from langchain_core.documents import Document

from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Neo4jVector

load_dotenv()

# ‚îÄ‚îÄ Credentials (set these in a .env file or paste directly for testing) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
GROQ_API_KEY    = groq_api_key
NEO4J_URI       = NEO4J_URI
NEO4J_USERNAME  = NEO4J_USERNAME
NEO4J_PASSWORD  = NEO4J_PASSWORD

os.environ["GROQ_API_KEY"] = GROQ_API_KEY
print("‚úÖ Config loaded")

‚úÖ Config loaded


## üì¶ Step 1 ‚Äî Build the E-Commerce Knowledge Graph

We model the domain with the following node types and relationships:

```
(Brand)-[:MAKES]->(Product)-[:BELONGS_TO]->(Category)
(Product)-[:SIMILAR_TO]->(Product)
(Product)-[:HAS_TAG]->(Tag)
(Category)-[:SUBCATEGORY_OF]->(Category)
```

In [15]:
# ‚îÄ‚îÄ Cell 3: Sample e-commerce dataset ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
PRODUCTS = [
    # Electronics
    {"id": "P001", "name": "UltraBook Pro 15",       "brand": "TechCore",   "category": "Laptops",       "parent_category": "Electronics", "price": 1299.99, "rating": 4.7, "stock": 45,  "tags": ["thin", "powerful", "business"],          "description": "A slim, high-performance laptop with 32GB RAM and 1TB SSD, perfect for professionals."},
    {"id": "P002", "name": "GamerX 17 RTX",          "brand": "TechCore",   "category": "Laptops",       "parent_category": "Electronics", "price": 1899.99, "rating": 4.8, "stock": 20,  "tags": ["gaming", "RTX4070", "144Hz"],            "description": "High-refresh-rate gaming laptop with RTX 4070 GPU and RGB keyboard."},
    {"id": "P003", "name": "SoundWave ANC 300",      "brand": "AudioMax",   "category": "Headphones",    "parent_category": "Electronics", "price": 249.99,  "rating": 4.5, "stock": 200, "tags": ["noise-cancelling", "wireless", "30hr"],    "description": "Premium over-ear headphones with 30-hour battery and active noise cancellation."},
    {"id": "P004", "name": "ClearBuds X2",           "brand": "AudioMax",   "category": "Headphones",    "parent_category": "Electronics", "price": 89.99,   "rating": 4.3, "stock": 350, "tags": ["earbuds", "wireless", "sport"],           "description": "Lightweight wireless earbuds with IPX5 water resistance, ideal for workouts."},
    {"id": "P005", "name": "PixelCam 4K Pro",        "brand": "VisionTech", "category": "Cameras",       "parent_category": "Electronics", "price": 799.99,  "rating": 4.6, "stock": 60,  "tags": ["4K", "mirrorless", "photography"],        "description": "Compact mirrorless camera with 4K video and 24MP sensor for creators."},
    # Home & Living
    {"id": "P006", "name": "ErgoChair Elite",        "brand": "ComfortPlus","category": "Chairs",        "parent_category": "Furniture",   "price": 499.99,  "rating": 4.9, "stock": 30,  "tags": ["ergonomic", "lumbar", "office"],          "description": "Award-winning ergonomic office chair with adjustable lumbar support and 4D armrests."},
    {"id": "P007", "name": "StandDesk Pro",          "brand": "ComfortPlus","category": "Desks",         "parent_category": "Furniture",   "price": 649.99,  "rating": 4.7, "stock": 15,  "tags": ["standing", "electric", "height-adjust"],  "description": "Electric height-adjustable standing desk with memory presets and cable management."},
    {"id": "P008", "name": "BrewMaster Pro 12",      "brand": "KitchenAce", "category": "Coffee Makers", "parent_category": "Appliances",  "price": 199.99,  "rating": 4.6, "stock": 80,  "tags": ["espresso", "programmable", "12-cup"],     "description": "Programmable 12-cup coffee maker with built-in grinder and espresso mode."},
    # Sports
    {"id": "P009", "name": "TrailBlazer Shoes X5",   "brand": "RunFast",    "category": "Running Shoes", "parent_category": "Sports",      "price": 129.99,  "rating": 4.4, "stock": 120, "tags": ["trail", "waterproof", "grip"],            "description": "Waterproof trail running shoes with enhanced grip and foam cushioning."},
    {"id": "P010", "name": "FitTrack Smart Watch",   "brand": "TechCore",   "category": "Wearables",     "parent_category": "Electronics", "price": 299.99,  "rating": 4.5, "stock": 90,  "tags": ["GPS", "heart-rate", "7-day-battery"],    "description": "GPS smartwatch with 7-day battery, health tracking, and 50m water resistance."},
]

# Explicit similarity edges (product pairs that are similar)
SIMILAR_PAIRS = [
    ("P001", "P002"),  # Both TechCore laptops
    ("P003", "P004"),  # Both AudioMax headphones
    ("P006", "P007"),  # Both ComfortPlus furniture
    ("P001", "P010"),  # Laptop + smartwatch ‚Äî productivity combo
    ("P002", "P010"),  # Gaming laptop + smartwatch
    ("P003", "P010"),  # Headphones + wearable ‚Äî audio/tech
]

print(f"‚úÖ Dataset ready: {len(PRODUCTS)} products, {len(SIMILAR_PAIRS)} similarity edges")

‚úÖ Dataset ready: 10 products, 6 similarity edges


In [16]:
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

def run_query(query: str, params: dict = {}) -> list:
    with driver.session() as session:
        result = session.run(query, params)
        return [dict(r) for r in result]

# Verify connection
info = run_query("RETURN 'Connected to Neo4j!' AS msg")
print(info[0]["msg"])

Connected to Neo4j!


In [17]:
# ‚îÄ‚îÄ Cell 5: Clear existing data & build the knowledge graph ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
def build_knowledge_graph(products: list, similar_pairs: list):
    print("üóëÔ∏è  Clearing existing data...")
    run_query("MATCH (n) DETACH DELETE n")

    print("üî® Creating constraints & indexes...")
    run_query("CREATE CONSTRAINT IF NOT EXISTS FOR (p:Product) REQUIRE p.id IS UNIQUE")
    run_query("CREATE CONSTRAINT IF NOT EXISTS FOR (b:Brand) REQUIRE b.name IS UNIQUE")
    run_query("CREATE CONSTRAINT IF NOT EXISTS FOR (c:Category) REQUIRE c.name IS UNIQUE")

    print("üì¶ Inserting products, brands, categories, tags...")
    for p in products:
        # Create Product node
        run_query("""
            MERGE (prod:Product {id: $id})
            SET prod.name        = $name,
                prod.price       = $price,
                prod.rating      = $rating,
                prod.stock       = $stock,
                prod.description = $description
        """, {k: p[k] for k in ["id","name","price","rating","stock","description"]})

        # Brand ‚Üí MAKES ‚Üí Product
        run_query("""
            MERGE (b:Brand {name: $brand})
            WITH b
            MATCH (prod:Product {id: $id})
            MERGE (b)-[:MAKES]->(prod)
        """, {"brand": p["brand"], "id": p["id"]})

        # Category hierarchy: ParentCategory ‚Üí PARENT_OF ‚Üí Category
        run_query("""
            MERGE (pc:Category {name: $parent_category})
            MERGE (c:Category  {name: $category})
            MERGE (pc)-[:PARENT_OF]->(c)
            WITH c
            MATCH (prod:Product {id: $id})
            MERGE (prod)-[:BELONGS_TO]->(c)
        """, {"parent_category": p["parent_category"], "category": p["category"], "id": p["id"]})

        # Tags
        for tag in p["tags"]:
            run_query("""
                MERGE (t:Tag {name: $tag})
                WITH t
                MATCH (prod:Product {id: $id})
                MERGE (prod)-[:HAS_TAG]->(t)
            """, {"tag": tag, "id": p["id"]})

    print("üîó Creating SIMILAR_TO relationships...")
    for p1, p2 in similar_pairs:
        run_query("""
            MATCH (a:Product {id: $p1}), (b:Product {id: $p2})
            MERGE (a)-[:SIMILAR_TO]->(b)
            MERGE (b)-[:SIMILAR_TO]->(a)
        """, {"p1": p1, "p2": p2})

    # Verify
    counts = run_query("""
        MATCH (n) RETURN labels(n)[0] AS label, count(n) AS count
        ORDER BY count DESC
    """)
    print("\nüìä Graph Summary:")
    for row in counts:
        print(f"   {row['label']:15s} ‚Üí {row['count']} nodes")

    rels = run_query("MATCH ()-[r]->() RETURN type(r) AS rel, count(r) AS count ORDER BY count DESC")
    print()
    for row in rels:
        print(f"   {row['rel']:20s} ‚Üí {row['count']} edges")

build_knowledge_graph(PRODUCTS, SIMILAR_PAIRS)

üóëÔ∏è  Clearing existing data...
üî® Creating constraints & indexes...
üì¶ Inserting products, brands, categories, tags...
üîó Creating SIMILAR_TO relationships...

üìä Graph Summary:
   Tag             ‚Üí 29 nodes
   Category        ‚Üí 12 nodes
   Product         ‚Üí 10 nodes
   Brand           ‚Üí 6 nodes

   HAS_TAG              ‚Üí 30 edges
   SIMILAR_TO           ‚Üí 12 edges
   BELONGS_TO           ‚Üí 10 edges
   MAKES                ‚Üí 10 edges
   PARENT_OF            ‚Üí 8 edges


In [18]:
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True},
)

# Quick sanity check
test_vec = embeddings.embed_query("gaming laptop with RTX")
print(f"‚úÖ Embedding model ready ‚Äî vector dim: {len(test_vec)}")

  embeddings = HuggingFaceEmbeddings(
Loading weights: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 103/103 [00:00<00:00, 1625.53it/s, Materializing param=pooler.dense.weight]                             
[1mBertModel LOAD REPORT[0m from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


‚úÖ Embedding model ready ‚Äî vector dim: 384


In [19]:
# ‚îÄ‚îÄ Cell 7: Build Neo4j vector index from product descriptions ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
# Convert products to LangChain Documents
docs = [
    Document(
        page_content=p["description"],
        metadata={
            "id":       p["id"],
            "name":     p["name"],
            "brand":    p["brand"],
            "category": p["category"],
            "price":    p["price"],
            "rating":   p["rating"],
        }
    )
    for p in PRODUCTS
]

# Store embeddings inside Neo4j on the Product nodes


In [20]:
vector_store = Neo4jVector.from_existing_graph(
    embedding=embeddings,
    url=NEO4J_URI,
    username=NEO4J_USERNAME,
    password=NEO4J_PASSWORD,
    index_name="product_embeddings",
    node_label="Product",
    text_node_properties=["description"],  
    embedding_node_property="embedding",
)


## ü§ñ Step 3 ‚Äî Groq LLM: Natural Language ‚Üí Cypher Query Generation

We use `llama-3.1-8b-instant` on Groq to **translate user questions into Cypher queries**. This is the key differentiator of a Graph-RAG system ‚Äî structured graph traversal guided by natural language.

In [21]:
# to convert user query into cypher query 

# ‚îÄ‚îÄ Cell 8: Initialise Groq LLM ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0,          # Deterministic for Cypher generation
    api_key=GROQ_API_KEY,
)

print("‚úÖ Groq LLM ready ‚Äî model: llama-3.1-8b-instant")

‚úÖ Groq LLM ready ‚Äî model: llama-3.1-8b-instant


In [22]:
# ‚îÄ‚îÄ Cell 9: Cypher generation prompt & chain ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
CYPHER_GENERATION_PROMPT = """
You are an expert Neo4j Cypher query generator for an e-commerce product database.

## Graph Schema:
Nodes:
  (:Product  {{id, name, price, rating, stock, description}})
  (:Brand    {{name}})
  (:Category {{name}})
  (:Tag      {{name}})

Relationships:
  (:Brand)-[:MAKES]->(:Product)
  (:Product)-[:BELONGS_TO]->(:Category)
  (:Category)-[:PARENT_OF]->(:Category)          -- parent_category -> sub-category
  (:Product)-[:SIMILAR_TO]->(:Product)
  (:Product)-[:HAS_TAG]->(:Tag)

## Rules:
- ALWAYS return meaningful product fields: p.name, p.price, p.rating, p.description
- Use case-insensitive matching with toLower() for string filters
- For brand queries: match via (:Brand)-[:MAKES]->(:Product)
- For category queries: match via (:Product)-[:BELONGS_TO]->(:Category)
- For "similar products": use [:SIMILAR_TO] traversal
- For tag queries: use [:HAS_TAG]->(:Tag {{name: $tag}})
- Limit results to 5 unless asked for more
- Return ONLY the raw Cypher query. No explanation. No markdown. No backticks.

## Examples:
Q: Show me all products from TechCore
A: MATCH (b:Brand {{name: 'TechCore'}})-[:MAKES]->(p:Product) RETURN p.name, p.price, p.rating, p.description LIMIT 5

Q: What are the cheapest laptops under $500?
A: MATCH (p:Product)-[:BELONGS_TO]->(c:Category {{name: 'Laptops'}}) WHERE p.price < 500 RETURN p.name, p.price, p.rating, p.description ORDER BY p.price ASC LIMIT 5

Q: Find products similar to the GamerX 17 RTX
A: MATCH (p:Product {{name: 'GamerX 17 RTX'}})-[:SIMILAR_TO]->(s:Product) RETURN s.name, s.price, s.rating, s.description LIMIT 5

Q: What electronics do you have?
A: MATCH (parent:Category {{name: 'Electronics'}})-[:PARENT_OF]->(c:Category)<-[:BELONGS_TO]-(p:Product) RETURN p.name, p.price, p.rating, c.name AS category, p.description LIMIT 5

## User Question:
{question}

Cypher:"""

cypher_prompt = PromptTemplate(
    input_variables=["question"],
    template=CYPHER_GENERATION_PROMPT,
)

def generate_cypher(question: str) -> str:
    """Use Groq to turn a natural language question into a Cypher query."""
    chain = cypher_prompt | llm
    response = chain.invoke({"question": question})
    return response.content.strip()

# Test it
test_q = "Show me all products from TechCore"
cypher = generate_cypher(test_q)
print(f"Question : {test_q}")
print(f"Generated: {cypher}")

Question : Show me all products from TechCore
Generated: MATCH (b:Brand {name: toLower('TechCore')})-[:MAKES]->(p:Product) RETURN p.name, p.price, p.rating, p.description LIMIT 5


## üï∏Ô∏è Step 4 ‚Äî Graph Traversal Context Enrichment

After finding relevant products, we **traverse the graph** to pull in richer context:
- What brand makes this product?
- What category does it belong to?
- What other products are similar?
- What tags does it carry?

This enriched context is passed to the LLM for a much better answer than raw vector search alone.

In [23]:
# ‚îÄ‚îÄ Cell 10: Graph traversal enrichment ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
ENRICHMENT_QUERY = """
MATCH (p:Product {id: $product_id})

// Brand that makes this product
OPTIONAL MATCH (b:Brand)-[:MAKES]->(p)

// Category the product belongs to
OPTIONAL MATCH (p)-[:BELONGS_TO]->(c:Category)

// Parent category
OPTIONAL MATCH (pc:Category)-[:PARENT_OF]->(c)

// Similar products (1-hop traversal)
OPTIONAL MATCH (p)-[:SIMILAR_TO]->(sim:Product)

// Tags
OPTIONAL MATCH (p)-[:HAS_TAG]->(t:Tag)

RETURN
    p.name          AS product_name,
    p.price         AS price,
    p.rating        AS rating,
    p.stock         AS stock,
    p.description   AS description,
    b.name          AS brand,
    c.name          AS category,
    pc.name         AS parent_category,
    collect(DISTINCT sim.name)  AS similar_products,
    collect(DISTINCT t.name)    AS tags
"""

def enrich_product_context(product_id: str) -> dict:
    """Traverse the graph around a product to build rich context."""
    results = run_query(ENRICHMENT_QUERY, {"product_id": product_id})
    return results[0] if results else {}

def format_enriched_context(ctx: dict) -> str:
    """Format enriched graph context into a readable string for the LLM."""
    if not ctx:
        return "No additional context found."
    return (
        f"Product: {ctx['product_name']}\n"
        f"Brand: {ctx['brand']} | Category: {ctx['category']} ({ctx['parent_category']})\n"
        f"Price: ${ctx['price']:.2f} | Rating: {ctx['rating']}/5.0 | Stock: {ctx['stock']} units\n"
        f"Description: {ctx['description']}\n"
        f"Tags: {', '.join(ctx['tags']) if ctx['tags'] else 'none'}\n"
        f"Similar Products: {', '.join(ctx['similar_products']) if ctx['similar_products'] else 'none'}"
    )

# Test enrichment
ctx = enrich_product_context("P002")
print("üîç Enriched context for GamerX 17 RTX:")
print("‚îÄ" * 60)
print(format_enriched_context(ctx))

üîç Enriched context for GamerX 17 RTX:
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Product: GamerX 17 RTX
Brand: TechCore | Category: Laptops (Electronics)
Price: $1899.99 | Rating: 4.8/5.0 | Stock: 20 units
Description: High-refresh-rate gaming laptop with RTX 4070 GPU and RGB keyboard.
Tags: gaming, RTX4070, 144Hz
Similar Products: UltraBook Pro 15, FitTrack Smart Watch


## üöÄ Step 5 ‚Äî The GraphRAG Pipeline

Now we wire everything together into the full **GraphRAG retrieval loop**:

1. **Vector Search** ‚Üí semantic product matches from HuggingFace embeddings
2. **Cypher Generation** ‚Üí Groq translates question to graph query
3. **Graph Execution** ‚Üí run Cypher on Neo4j
4. **Context Enrichment** ‚Üí traverse graph for each result
5. **LLM Answer Generation** ‚Üí Groq answers using merged context

In [None]:
# ‚îÄ‚îÄ Cell 11: Full GraphRAG pipeline ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
ANSWER_PROMPT = PromptTemplate(
    input_variables=["question", "vector_context", "graph_context"],
    template="""
You are a concise e-commerce assistant.

## User Question:
{question}

## Semantic Search Results:
{vector_context}

## Graph Results:
{graph_context}

## Instructions:
- Be brief and direct ‚Äî 3 to 5 lines max
- List products with name, price, and rating only
- No explanations, no filler sentences, no alternatives unless asked
- If no relevant info, say so in one line

Answer:""")

def graphrag_query(question: str, top_k: int = 3, verbose: bool = True) -> str:
    """
    Full GraphRAG pipeline:
    1. Vector search (HuggingFace embeddings in Neo4j)
    2. Cypher generation (Groq) + graph execution
    3. Graph traversal enrichment
    4. LLM answer synthesis (Groq)
    """
    if verbose:
        print(f"\n{'‚ïê'*65}")
        print(f"‚ùì Question: {question}")
        print(f"{'‚ïê'*65}")

    # ‚îÄ‚îÄ STEP 1: Vector Search ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    if verbose: print("\nüî¢ [1/4] Running vector similarity search...")
    vec_results = vector_store.similarity_search(question, k=top_k)
    vector_context_parts = []
    for doc in vec_results:
        m = doc.metadata
        vector_context_parts.append(
            f"- {m['name']} (${m['price']}, ‚≠ê{m['rating']}) ‚Äî {doc.page_content}"
        )
    vector_context = "\n".join(vector_context_parts) or "No semantic matches found."
    if verbose: print(f"   Found {len(vec_results)} semantic matches")

    # ‚îÄ‚îÄ STEP 2: Cypher Generation ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    if verbose: print("\nü§ñ [2/4] Generating Cypher query with Groq...")
    cypher = generate_cypher(question)
    if verbose: print(f"   Generated: {cypher}")

    # ‚îÄ‚îÄ STEP 3: Execute Cypher + Enrich via Graph Traversal ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    
    graph_context_parts = []
    enriched_ids = set()

    def resolve_ids_from_name(name: str):
        """Look up product ID in Neo4j by name."""
        rows = run_query(
            "MATCH (p:Product {name: $name}) RETURN p.id AS id",
            {"name": name}
        )
        return [r["id"] for r in rows if r.get("id")]

    try:
        cypher_results = run_query(cypher)

        # Extract product IDs from Cypher results via name lookup
        for row in cypher_results:
            if verbose: print(f"   Graph row: {row}")
            name_key = next((k for k in row if "name" in k.lower()), None)
            if name_key and row[name_key]:
                for pid in resolve_ids_from_name(row[name_key]):
                    enriched_ids.add(pid)

    except Exception as e:
        if verbose: print(f"   ‚ö†Ô∏è Cypher execution issue: {e}")

    # Always enrich top vector-search results (via name lookup, not metadata["id"])
    for doc in vec_results:
        for pid in resolve_ids_from_name(doc.metadata["name"]):
            enriched_ids.add(pid)

    # Traverse graph for each unique product ID
    for pid in list(enriched_ids)[:top_k + 2]:
        ctx = enrich_product_context(pid)
        if ctx:
            graph_context_parts.append(format_enriched_context(ctx))

    graph_context = ("\n" + "‚îÄ"*50 + "\n").join(graph_context_parts) or "No graph results found."
    if verbose: print(f"\n   Enriched {len(graph_context_parts)} products via graph traversal")

    # ‚îÄ‚îÄ STEP 4: LLM Answer Synthesis ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    if verbose: print("\nüí¨ [4/4] Synthesizing answer with Groq...")
    answer_chain = ANSWER_PROMPT | llm
    response = answer_chain.invoke({
        "question": question,
        "vector_context": vector_context,
        "graph_context": graph_context,
    })

    answer = response.content.strip()
    if verbose:
        print(f"\n{'‚îÄ'*65}")
        print("‚úÖ FINAL ANSWER:")
        print(f"{'‚îÄ'*65}")
        print(answer)
    return answer

print("‚úÖ GraphRAG pipeline ready!")

‚úÖ GraphRAG pipeline ready!


In [44]:
answer1 = graphrag_query("What products does TechCore make and are they any good?")


‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
‚ùì Question: What products does TechCore make and are they any good?
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

üî¢ [1/4] Running vector similarity search...
   Found 3 semantic matches

ü§ñ [2/4] Generating Cypher query with Groq...
   Generated: MATCH (b:Brand {name: toLower('TechCore')})-[:MAKES]->(p:Product) RETURN p.name, p.price, p.rating, p.description LIMIT 5

üï∏Ô∏è  [3/4] Running Cypher + enriching via graph traversal...


[#C7F8]  _: <CONNECTION> error: Failed to read from defunct connection IPv4Address(('p-65b05b5b-2b72-0001.production-orch-0894.neo4j.io', 7687)) (ResolvedIPv4Address(('35.200.158.141', 7687))): OSError('No data')


   ‚ö†Ô∏è Cypher execution issue: Failed to read from defunct connection IPv4Address(('p-65b05b5b-2b72-0001.production-orch-0894.neo4j.io', 7687)) (ResolvedIPv4Address(('35.200.158.141', 7687)))

   Enriched 3 products via graph traversal

üí¨ [4/4] Synthesizing answer with Groq...

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
‚úÖ FINAL ANSWER:
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
TechCore makes:

1. BrewMaster Pro 12 ($199.99, ‚≠ê4.6)
2. FitTrack Smart Watch ($299.99, ‚≠ê4.5)
3. UltraBook Pro 15 ($1299.99, ‚≠ê4.7)

No relevant information found on other products.


In [46]:
answer2 = graphrag_query("I'm looking for affordable smart watch under $1000, what do you have?")


‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
‚ùì Question: I'm looking for affordable smart watch under $1000, what do you have?
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

üî¢ [1/4] Running vector similarity search...
   Found 3 semantic matches

ü§ñ [2/4] Generating Cypher query with Groq...
   Generated: MATCH (p:Product)-[:BELONGS_TO]->(c:Category {name: toLower('Smart Watch')}) WHERE p.price < 1000 RETURN p.name, p.price, p.rating, p.description LIMIT 5

üï∏Ô∏è  [3/4] Running Cypher + enriching via graph traversal...

   Enriched 3 products via graph traversal

üí¨ [4/4] Synthesizing answer with Groq...

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î

In [47]:
answer3=graphrag_query("Can you recommend some good headphones for working out?")


‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
‚ùì Question: Can you recommend some good headphones for working out?
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

üî¢ [1/4] Running vector similarity search...
   Found 3 semantic matches

ü§ñ [2/4] Generating Cypher query with Groq...
   Generated: MATCH (p:Product)-[:BELONGS_TO]->(c:Category {name: 'Headphones'})-[:PARENT_OF]->(sub:Category)<-[:BELONGS_TO]-(s:Product)-[:HAS_TAG]->(t:Tag {name: 'Workout'}) WHERE toLower(s.description) CONTAINS 'sweat' OR toLower(s.description) CONTAINS 'water' RETURN s.name, s.price, s.rating, s.description LIMIT 5

üï∏Ô∏è  [3/4] Running Cypher + enriching via graph traversal...

   Enriched 3 products vi

In [48]:
answer3=graphrag_query(" suggest Lightweight wireless earbuds ")


‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
‚ùì Question:  suggest Lightweight wireless earbuds 
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

üî¢ [1/4] Running vector similarity search...
   Found 3 semantic matches

ü§ñ [2/4] Generating Cypher query with Groq...
   Generated: MATCH (p:Product)-[:BELONGS_TO]->(c:Category {name: 'Earbuds'})-[:PARENT_OF]->(sub:Category {name: 'Wireless'})-[:PARENT_OF]->(sub2:Category {name: 'Lightweight'})<-[:BELONGS_TO]-(p2:Product) WHERE toLower(p2.description) CONTAINS 'lightweight' AND toLower(p2.description) CONTAINS 'wireless' RETURN p2.name, p2.price, p2.rating, p2.description LIMIT 5

üï∏Ô∏è  [3/4] Running Cypher + enriching via graph traversal

In [49]:
answer4= graphrag_query("What electronics do you have?")


‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
‚ùì Question: What electronics do you have?
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

üî¢ [1/4] Running vector similarity search...
   Found 3 semantic matches

ü§ñ [2/4] Generating Cypher query with Groq...
   Generated: MATCH (parent:Category {name: toLower('electronics')})-[:PARENT_OF]->(c:Category)<-[:BELONGS_TO]-(p:Product) RETURN p.name, p.price, p.rating, c.name AS category, p.description LIMIT 5

üï∏Ô∏è  [3/4] Running Cypher + enriching via graph traversal...

   Enriched 3 products via graph traversal

üí¨ [4/4] Synthesizing answer with Groq...

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î

In [50]:
answer5=graphrag_query("what is the costliest product you have?")


‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
‚ùì Question: what is the costliest product you have?
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

üî¢ [1/4] Running vector similarity search...
   Found 3 semantic matches

ü§ñ [2/4] Generating Cypher query with Groq...
   Generated: MATCH (p:Product) RETURN p.name, p.price, p.rating, p.description ORDER BY p.price DESC LIMIT 1

üï∏Ô∏è  [3/4] Running Cypher + enriching via graph traversal...
   Graph row: {'p.name': 'GamerX 17 RTX', 'p.price': 1899.99, 'p.rating': 4.8, 'p.description': 'High-refresh-rate gaming laptop with RTX 4070 GPU and RGB keyboard.'}

   Enriched 4 products via graph traversal

üí¨ [4/4] Synthesizing answer with Groq