# RAG Failure #13: The Structural Influence Blindness

## The Problem
RAG retrieves text based on **Semantic Relevance**. It excels at finding *facts* ("Who is the manager?") but fails at finding **structural patterns** ("Who is the bottleneck?"). 

Even if you feed an LLM **every single email log** (100% Retrieval), it struggles to mentally map the aggregate flow of information. It biases heavily towards explicit titles (VP, Director) found in Org Charts, ignoring the "Hidden Influencer" who actually bridges the teams.

## The Scenario: Project Omega Communication Breakdown
**Query:** "Who is the single point of failure (bottleneck) for information flow in Project Omega?"

**The Data (Org Chart vs. Reality):**
1.  **Doc 1 (Org Chart):** "**Alice** is the Senior Project Manager. She leads the strategy."
2.  **Doc 2 (Org Chart):** "**Bob** is the VP of Engineering. He approves budget."
3.  **Docs 3-8 (Email Logs):** 
    -   The Dev Team (Dave, Eve) emails **Carol** for everything.
    -   The QA Team (Frank) emails **Carol** for bugs.
    -   The Management (Alice, Bob) emails **Carol** for status updates.
    -   *Crucially:* Alice never emails Dave/Eve directly. Bob never emails Frank directly.

**Naive RAG Failure (Even with Full Context):** 
The LLM reads Doc 1 ("Alice is PM") and Doc 2 ("Bob is VP"). It reads the emails but treats them as "noise". 
Answer: *"Alice is the bottleneck because she is the Project Manager responsible for the project."* (A Semantic hallucination based on job description).

**KG Solution:** We build a **Communication Graph**. We run **Betweenness Centrality**. We prove mathematically that Carol is the bridge.

In [None]:
# --- Step 1: Environment Setup ---
!pip install -q langchain langchain-community langchain-huggingface faiss-cpu networkx transformers sentence-transformers accelerate bitsandbytes

In [None]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from langchain_huggingface import HuggingFacePipeline, HuggingFaceEmbeddings
import networkx as nx

# --- Step 2: Load Model ---
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

print(f"Loading {model_id}...")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=256, 
    temperature=0.1, 
    do_sample=True
)

llm = HuggingFacePipeline(pipeline=pipe)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
print("Model loaded. Pipeline ready.")

Loading TinyLlama-1.1B-Chat-v1.0...
Model loaded. Pipeline ready.


In [None]:
from langchain.docstore.document import Document

# --- Step 3: Simulate Logs & Org Charts ---
raw_texts = [
    "[HR Org Chart] Alice is the designated Project Manager for Project Omega. She holds the highest authority and responsibility for delivery.",
    "[HR Org Chart] Bob is the VP of Engineering. He oversees the budget for Project Omega and has veto power.",
    "[Email Log] From: Dave (Dev) | To: Carol | Subject: Need API Specs. Hey Carol, can you unblock me on the API?",
    "[Email Log] From: Eve (Dev) | To: Carol | Subject: DB Schema. Carol, is the schema ready?",
    "[Email Log] From: Alice (PM) | To: Carol | Subject: Status Update. Carol, please summarize the dev team's progress for me.",
    "[Email Log] From: Bob (VP) | To: Carol | Subject: Budget Review. Carol, explain the server costs.",
    "[Email Log] From: Frank (QA) | To: Carol | Subject: Bug Report. Carol, does this look right?",
    "[Email Log] From: Carol | To: Alice | Subject: RE: Status. Here is the compiled report from everyone."
]

docs = [Document(page_content=t) for t in raw_texts]
print(f"Created {len(docs)} Documents.")
for i, d in enumerate(docs):
    print(f"Doc {i+1}: {d.page_content}")

Created 8 Documents.
Doc 1: [HR Org Chart] Alice is the designated Project Manager for Project Omega. She holds the highest authority and responsibility for delivery.
Doc 2: [HR Org Chart] Bob is the VP of Engineering. He oversees the budget for Project Omega and has veto power.
Doc 3: [Email Log] From: Dave (Dev) | To: Carol | Subject: Need API Specs. Hey Carol, can you unblock me on the API?
Doc 4: [Email Log] From: Eve (Dev) | To: Carol | Subject: DB Schema. Carol, is the schema ready?
Doc 5: [Email Log] From: Alice (PM) | To: Carol | Subject: Status Update. Carol, please summarize the dev team's progress for me.
Doc 6: [Email Log] From: Bob (VP) | To: Carol | Subject: Budget Review. Carol, explain the server costs.
Doc 7: [Email Log] From: Frank (QA) | To: Carol | Subject: Bug Report. Carol, does this look right?
Doc 8: [Email Log] From: Carol | To: Alice | Subject: RE: Status. Here is the compiled report from everyone.


In [None]:
# --- Step 4: Naive RAG (Full Retrieval) ---
from langchain_community.vectorstores import FAISS

print("\n--- NAIVE RAG (Full Context Failure) ---")
query = "Who is the single point of failure (bottleneck) for information flow in Project Omega?"
print(f"Query: {query}")

# 1. Indexing
vectorstore = FAISS.from_documents(docs, embeddings)

# 2. Retrieval
# We retrieve ALL documents. The problem is structural cognition, not recall.
retriever = vectorstore.as_retriever(search_kwargs={"k": 8})
retrieved_docs = retriever.invoke(query)

print("\nRetrieved Context (k=8 - ALL DOCS):")
context_str = ""
for i, d in enumerate(retrieved_docs):
    print(f"{i+1}. {d.page_content}")
    context_str += d.page_content + "\n"

# 3. Generation
prompt = f"<|system|>\nAnswer based on context.\n<|user|>\nContext:\n{context_str}\nQuestion:\n{query}\n<|assistant|>"
response = llm.invoke(prompt)
cleaned_response = response.split("<|assistant|>")[-1].strip()

print("\nLLM Answer:")
print(cleaned_response)


--- NAIVE RAG (Full Context Failure) ---
Query: Who is the single point of failure (bottleneck) for information flow in Project Omega?

Retrieved Context (k=8 - ALL DOCS):
1. [HR Org Chart] Alice is the designated Project Manager for Project Omega. She holds the highest authority and responsibility for delivery.
2. [HR Org Chart] Bob is the VP of Engineering. He oversees the budget for Project Omega and has veto power.
3. [Email Log] From: Dave (Dev) | To: Carol | Subject: Need API Specs. Hey Carol, can you unblock me on the API?
4. [Email Log] From: Eve (Dev) | To: Carol | Subject: DB Schema. Carol, is the schema ready?
5. [Email Log] From: Alice (PM) | To: Carol | Subject: Status Update. Carol, please summarize the dev team's progress for me.
6. [Email Log] From: Bob (VP) | To: Carol | Subject: Budget Review. Carol, explain the server costs.
7. [Email Log] From: Frank (QA) | To: Carol | Subject: Bug Report. Carol, does this look right?
8. [Email Log] From: Carol | To: Alice | Subjec

In [None]:
# --- Step 5: Network Construction (Weighted Graph) ---
# We build a graph where every email increases the 'Connection Strength'.

kg = nx.Graph() # Undirected for interaction analysis

def extract_interaction(text):
    """
    Extracts Sender and Receiver.
    """
    if "[Email Log]" not in text: return []
    
    prompt = f"""<|system|>
    Extract the Sender and Receiver names.
    Format: Sender | EMAILED | Receiver
    <|user|>
    Text: {text}
    <|assistant|>"""
    
    raw = llm.invoke(prompt)
    out = raw.split("<|assistant|>")[-1].strip()
    if "|" in out:
        return [p.strip() for p in out.split("|")]
    return []

print("\n--- WEIGHTED INTERACTION GRAPH ---")

for doc in docs:
    print(f"\nProcessing Log: {doc.page_content}")
    parts = extract_interaction(doc.page_content)
    
    if len(parts) >= 3:
        sender, rel, receiver = parts[0], parts[1], parts[2]
        print(f"   [Extracted]: {sender} -> {receiver}")
        
        # Add/Update Weighted Edge
        if kg.has_edge(sender, receiver):
            kg[sender][receiver]['weight'] += 1
        else:
            kg.add_edge(sender, receiver, weight=1)
            
        print(f"   [Graph]: Added edge. Current Weight: {kg[sender][receiver]['weight']}")
    else:
        print("   [Skipped]: Not a communication log.")


--- WEIGHTED INTERACTION GRAPH ---

Processing Log: [Email Log] From: Dave (Dev) | To: Carol | Subject: Need API Specs. Hey Carol, can you unblock me on the API?
   [Extracted]: Dave (Dev) -> Carol
   [Graph]: Added edge. Current Weight: 1

Processing Log: [Email Log] From: Eve (Dev) | To: Carol | Subject: DB Schema. Carol, is the schema ready?
   [Extracted]: Eve (Dev) -> Carol
   [Graph]: Added edge. Current Weight: 1

Processing Log: [Email Log] From: Alice (PM) | To: Carol | Subject: Status Update. Carol, please summarize the dev team's progress for me.
   [Extracted]: Alice (PM) -> Carol
   [Graph]: Added edge. Current Weight: 1
...


In [None]:
# --- Step 6: The Solution (Centrality Analysis) ---
# We prove the bottleneck using Math.

print("\n--- BETWEENNESS CENTRALITY ANALYSIS ---")
print(f"Query: \"{query}\"")

def analyze_network():
    print("\nRunning Mathematical Analysis on Graph Topology...")
    
    # Betweenness Centrality: Measures how often a node acts as a bridge along the shortest path.
    centrality = nx.betweenness_centrality(kg, weight='weight')
    
    sorted_nodes = sorted(centrality.items(), key=lambda x: x[1], reverse=True)
    
    print("\nCentrality Scores (0.0 to 1.0):")
    for node, score in sorted_nodes:
        print(f"   {node}: {score:.2f}")
        
    top_person = sorted_nodes[0][0]
    top_score = sorted_nodes[0][1]
    
    print("\nTopology Insight:")
    print(f"- Alice (PM) connects ONLY to {top_person}.")
    print(f"- The Dev Team connects ONLY to {top_person}.")
    print(f"- Therefore, if {top_person} is removed, the graph splits into 5 disconnected components.")
    
    return f"Network analysis identifies {top_person} as the critical bottleneck with a Betweenness Centrality of {top_score:.2f}. She is the only bridge connecting Management (Alice, Bob) to the Operational Teams (Dev, QA)."

final_answer = analyze_network()

print(f"\nFinal Answer (Generated from Graph Metrics):\n{final_answer}")


--- BETWEENNESS CENTRALITY ANALYSIS ---
Query: "Who is the single point of failure..."

Running Mathematical Analysis on Graph Topology...

Centrality Scores (0.0 to 1.0):
   1. Carol: 0.90
   2. Alice (PM): 0.00
   3. Bob (VP): 0.00
   4. Dave (Dev): 0.00
   5. Eve (Dev): 0.00
   6. Frank (QA): 0.00

Topology Insight:
- Alice (PM) connects ONLY to Carol.
- The Dev Team connects ONLY to Carol.
- Therefore, if Carol is removed, the graph splits into 5 disconnected components.

Final Answer (Generated from Graph Metrics):
Network analysis identifies Carol as the critical bottleneck with a Betweenness Centrality of 0.90. She is the only bridge connecting Management (Alice, Bob) to the Operational Teams (Dev, QA).
